Saturday, August 23, 2014

This Is What Happens When You Daisy-Chain 200,000 Hard Drives





120 petabytes, 120 million gigabytes, 24 billion MP3s, 1 trillion files. That's how much IBM's new storage array holds-nearly an order of magnitude more data than the largest current system, making it the biggest hard drive array ever built.



The array is composed of 200,000 traditional hard drives working together. To avoid bottle-necks when reading or writing data, this array employs a proprietary file system known as GPFS to increase the speed at which the computer accesses data. It spreads a file across multiple drives, enabling the computer to read or write multiple parts of the file simultaneously. That also drastically cuts down on the amount of time spent indexing-allowing the system to blow through 10 billion files in just 43 minutes, four times faster than the previous three-hour record. And, to keep from losing data when a drive fails, the system automatically migrates files from the dead drive to its replacement, minimizing downtime and providing a fail-proof backup without affecting performance.



While IBM built this specific array for an unnamed client, the technology could one day allow for detailed weather system simulations, seismic activity monitoring, cloud storage infrastructure and other data-intensive projects.


No comments:

Post a Comment