ERROR–When Servers Get You Down

If you use the library website or Canvas, you may have noticed some issues on February 28. If you were doing schoolwork, you probably could not get Canvas to load or you may not have been able to access an article from our library’s popular databases like ProQuest or ScienceDirect. While everything is back up and working now, the break in service shows that even major organizations can have their hiccups.

What happened was the Amazon Web Services’ Simple Storage Service (S3), a computer system that provides cloud storage for many online services and companies, would not let their users access or add new files. The online services that use that server for storage could not access any of the files that needed to make their services run.

Amazon released a statement at the end of the week to give additional information on the break: during a debugging process, a member of their S3 service team entered a command incorrectly, causing a larger set of servers to be removed from the debug. Those servers supported two other subsystems, and the damage was done. It took three hours to fix the problem and four hours to get the system back to normal functions.

Although outages like this are rare, because of a number of online services using this storage, any outage like this has far reaching effects and leaves many of us wondering what we should do when we can’t get to what they need.

In this case, the solution was just to wait it out. After Amazon Web Services fixed their server, the online services that use Amazon Web Services repaired their sites. By Wednesday, almost all affected services had returned to business as usual.

The other thing you can do is try to get in touch with a live person, like one of our librarians, to see what your options are. If you can’t get a resource you need because of a much bigger problem, we can see if there’s another way to access it, what other resources are available, and help you evaluate those resources for credibility and to see if they fit your professor’s requirements.

Not even the biggest companies are immune to problems, and not every problem has a simple fix. But Amazon is making changes because of this event, that should hopefully help prevent it from happening in the future. The server outage was a learning experience for many, and it’s encouraging to know that there are always people willing to help, research, and try to find solutions.