As of 6:00 pm this evening, Oracle engineers have successfully reconstructed all of the failed components of the disk storage array that has been out of service since early Tuesday morning. They are now in the process of reconnecting these components to make all of them available to the storage array’s controllers. This process will continue for the next few hours after which NJIT staff will begin re-starting NJIT applications and making them available to the campus community. Since there are over 100 applications that need to be “re-started”, it will take 8-12 hours before all can be re-started, and some may take up to an additional 12 hours before they reach normal operating capacity.
Priority is being given to Highlander Pipeline, ADM e-mail, and AFS student file systems which should begin operations later this evening. All ADM e-mail received since Tuesday morning has been held and will be gradually delivered to ADM mailboxes over the next 24 hours.
Status updates will continue to be posted on NJIT SOS as major applications are re-started and reach normal operating capacity.
Thank you for your patience.
Do I get to shake the hand of the person that came up with “NJIT S.O.S”?
@Ken: NJIT’s Office of Strategic Communications in conjunction with some of NJIT’s IT staff.
Why does the school have one enterprise array?
@Fez: We have more than one disk array. This hardware failure happened to occur in a large array that has many important systems distributed inside it.
what’s wrong with my NJIT website?
Student websites on AFS are up & working as of Friday evening.
when is the system going to be up and running again?
Keep watching this space for updates.
System status is posted on http://ist.njit.edu as well as here.
Can you please explain on the blog how an outage such as this can occur. I thought you had enterprise-class technology in the backrooms of NJIT.
@Mark: NJIT does have enterprise-class hardware, with redundancies. But no hardware is infallible, and sometimes multiple hardware failures happen — even to redundant, constantly-monitored parts. The hardware failure isn’t what took the most time; because of the importance of the data involved, we were extremely careful to do the proper integrity checks on the data and drives, which, on 220+ harddisks and 54 terabytes of data, is not a quick-and-simple process.
The alternative — restoring the data from off-site backups — would’ve taken an order of magnitude longer.
The Admission portal is now available.
The online Admission application is operational.