RAID explained and how it safeguards data - manzur fahim

RAID explained and how it safeguards data

18th September, 2014

This is one subject that gets discussed very often. When it comes to photography, often many of us simply stores the photos and videos in their computer hard drive, thinking it is good enough. I myself have done this before, and I learned the hard way when the hard drive crashed or the file system got corrupted and I lost many files including photos, music and videos. But a simple backup method could have protected all the photos and videos I lost, and I deeply regret not having that in place. But the disaster did some good thing too, it taught me to take steps, have redundancy and backup plans so that it does not happen that easily again.

A good backup exercise and a sound disaster recovery plan is vital in safekeeping the photos and videos. And with the increased requirement of storage system for photos and videos for professional and amateur photographers, it has never been more important. The most common thing I have heard from people when the subject backup comes up for discussion: “I moved all my photos in the new USB drive I purchased, so they are safe”. It is not safe. Having your photos in a USB drive does not change the fact that you only have one copy of your photos, and USB drive breaks too, sometimes more often than normal computer drives. The simplest form of backup is to have a duplicate copy of the photos in another hard drive.

RAID is a popular term when setting up a fault-tolerant storage sub-system. Once set up, RAID works just like a normal hard drive. It shows up in your computer just like a normal hard drive with a partition, and it can be used just like a normal hard drive. Only it uses more than one hard drive to give you some degree of protection against drive failures. RAID is an acronym for Redundant Array of Inexpensive disks. Nowadays, almost all computers and operating systems are capable of setting up simple RAID system using more than one hard drive. There are different type of RAID and RAID Arrays, and I will try to describe them briefly here.

Different types of RAID array:

There are different type of RAID arrays, and you can configure it according to what you need. I will describe the common types of RAID arrays here, and hopefully this will help you.

RAID 0 :

RAID 0 is basically no RAID, which means it does not offer you any protection against drive failures. What is does is let you configure one, two or more drives to give you a single, bigger storage space. Because it is simultaneously using more than one drive to read and write data, it can do so at a much faster rate than a normal hard drive. But if a single hard drive fails from a RAID 0 array, the whole array will fail and all data will be corrupted and lost.

Example:  2 x 2TB hard drive in RAID0 will give you a single partition of 4TB usable space, 100% of the total capacity of the drives in the array, with no redundancy.

As you can in the picture above, there are data in different blocks in two drives which is making up a RAID 0. None of the blocks are duplicated, and therefore no redundancy.

Advantage of RAID 0: Very high read and write speed because different data blocks can be simultaneously accessed from different hard drives.

Disadvantage of RAID 0: No fault-tolerance. One failed drive will corrupt the whole array.

Usage: Can be used for Lightroom or Photoshop cache storage, where high read speed is necessary. But you can just get a SSD and that will give you very fast read performance.

RAID 1 :

RAID 1 is usually formed of two or more drives, with even number of drives. RAID 1 is also known as mirroring, where one drive is a duplicate copy of another. A RAID 1 array will be shown on your computer just like a normal hard drive, but using two or more hard drives in the background. So if one hard drive fails, the drive will still be online with all the data and when you replace the failed drive, it will start rebuilding the array in the background. Once the rebuild is complete, your RAID 1 array will be fault-tolerant again.

Example: 2 x 2TB hard drive will give you a single partition of 2TB, 50% of the total capacity of the hard drives in the array, with a fault-tolerance of a single drive failure.

From the photo, you can see that every data block is duplicated in both hard drives. This way, one hard drive can fail, and you will still have your data available. It has a 50% overhead, as it is using 50% of the total hard drive capacity to mirror your data.

Advantages of RAID 1: Duplicate data in different drives (available as one copy), protects against a single drive failure.

Disadvantages of RAID 1: Requires two hard drives, only 50% total usable storage. Read and write speed is equal or lower in software and hybrid RAID controller, faster read and write speed with hardware RAID controller mainly due to caching.

RAID 5 :

RAID 5 is the most common parity calculated RAID array. It uses three or more drives, and offers fault-tolerance and at the same time, less overhead than a RAID 1 array. In a RAID 5 array, data gets stored by means of parity, I believe a photo will picture a thousand words here :)

As you can see in the photo. four hard drives of same capacity are being used to create a RAID 5 array. Numbered aata blocks of A, B and C are spread across all four hard drives. You can also see some blocks named as Ap, Bp, Cp and Dp, each residing on each of the hard drives.

These Ap, Bp, Cp and Dp blocks are parity blocks, and were calculated using unique binary XOR to stack reconstruction data. So when one hard drive fails (lets assume the first hard disk is failed here), the RAID 5 becomes degraded. Once we replace the failed drive, parity blocks Cp, Bp and Ap from the other three hard drives will reconstruct data block A1, B1 and C1 onto the new one, and then will reconstruct the other parity block Dp. It does so using the binary XOR calculation, and it is more time consuming than rebuilding a RAID 1 or RAID 10 array due to the complex calculations.

Regardless of the number of drives, precisely a storage space equivalent of one hard drive will be used for parity data.

Example 1:  3 x 2TB hard drive will give you a single partition of 4TB, 66% of the total capacity of the hard drives in the array, with a fault tolerance of a single drive failure.

Example 2:  4 x 2TB hard drive will give you a single partition of 6TB, 75% of the total capacity of the hard drives in the array, with a fault tolerance of a single drive failure.

Advantages of RAID 5: Less overhead than RAID1, simplest RAID5 gives you 66% usable storage. Higher read speed as data can be accessed from different hard drives at the same time.

Disadvantages of RAID 5: Slower write speed due to writing and calculating parity whenever there is a change of data, Rebuild will take a lot longer than RAID 1 or RAID 10, due to complex parity calculations and reconstruction of data.

Because RAID5 uses a parity calculation to reconstruct failed RAID array, it is more susceptible to drive read error than RAID1 or RAID 10. I will include a link explaining read errors and its effects when it is done.

RAID 6 :

A RAID 6 array works like a RAID 5 array, but instead of using an equivalent capacity of a single hard drive like RAID 5 does, it uses equivalent capacity of two hard drives. It offers and additional layer of protection over RAID 5 by using double parity, with the expense of one extra hard drive.

As you can see in the photo, five hard drives have been used to create a RAID 6 array, and all parity blocks are duplicated and residing in different hard drives. This gives an additional layer of protection over RAID 5 should a read error occurred, and the rebuild can still continue in that case.

There is a minimum of 50% overhead in RAID 6, where you use four hard drives to create a RAID 6 array. The overhead decreases as you add more hard drive, since the parity only takes up an equivalent space of two hard drive.

Example: A 6 x 2TB RAID 6 array will give you an usable space of 8TB, with an overhead of 33.3% of the total storage space.

Advantages of RAID 6: Additional layer of protection, protects against two drive failure before RAID array can be degraded.

Disadvantages of RAID 6: Good read speed, slow write speed due to twice the amount of parity calculations. Rebuilding will take a lot longer due to complex calculations and reconstructions.

RAID 10 :

A RAID 10 array is a combination of RAID 1 and RAID 0 (1 + 0), and therefore called RAID 10. A RAID 10 is made up of a minimum of four hard drives or more drives of even numbers. In a RAID 10 array, each pair of hard drives forms a RAID 1 array, and then the RAID 1 arrays form a RAID 0 array.

As you can see from the photo, each pair for hard drives formed RAID 1, and then the RAID 1s formed a RAID 0 array. Data blocks have been spread across different RAID 1 arrays, and then stripped together to form a RAID 0 upper layer.

A RAID 10 can sustain up to one drive failure from each RAID 1, a total of 2 in this case. Because each RAID 1 is mirrored, they can survive one drive loss, and can rebuild once the drive is replaced. But because both RAID 1 are in a RAID 0 mode, all data will be lost if two hard drive fails from a single RAID 1.

RAID 10 has a 50% overhead regardless of the number of drives used. For example: A six hard drive RAID 10 will have 3 different RAID 1 arrays, combined in a RAID 0, providing an usable space of 3 hard drives.

Advantages of RAID 10: Very good read and write performance, due to RAID 0 accessing data from all four or more hard drives. Rebuild is also very fast, as there is no parity reading and data will be copied from adjacent hard drive without speed loss.

Disadvantages of RAID 10: It uses 50% of total drive capacity, so it can be expensive. Also two hard drive failure from one RAID 1 array can be catastrophic.

Although RAID 6 offers double protection, it does so by means of parity, which is always risky and depends a lot on the integrity of the other hard drives in the array. Please check this article where I explained the hidden risks in RAID 5 and RAID 6 array.

I am currently using RAID 10, which is made up of 4 x 4TB WD RE4 Enterprise hard drives. I am using a LSI 9271-8i Enterprise grade hardware RAID controller, which has a lot of protective features and keeping my RAIDs alive and secure. I have decided to use hardware controller over software or hybrid ones, but I will explain that later in another article.

I hope this article will help you, as it helped me when I was deciding on setting up RAID to protect my precious photos. If you have any questions or queries or even feedback, please contact me using the link above.

Manzur Fahim

All photos used in this page have been collected using Google search and were marked for "Non-commercial reuse"