Thursday, December 09, 2010

RAID setup

RAID stands for Redundant Array of Independent Disks. There are many forms and flavors of RAID, but the goal is the same: data storage and rapid retrieval. RAID [disk arrays] typically provide two or more linked hard drives that either back up or speed up data storage and delivery. The simplest form, RAID 0, provides disk striping, which uses two or more hard drives to speed up data storage and retrieval. RAID 1 provides mirroring, which means the data is duplicated across two or more hard drives. That way if one drive fails, the other can still provide the information, while the failed drive is being replaced. Other RAID levels, such as RAID 6 provide even more redundancy, allowing two or more disks to fail before data is lost.

I recently designed a web server for a local translation agency, and I was asked to provide a RAID solution for worldwide data delivery at a reasonable cost. The server is running Linux CentOS, a Red Hat clone. 

A RAID card can provide a hardware-based RAID solution, but a good RAID card is expensive, ranging from $150 to $650. The RAID card determines which drive receives data in which phase of the write process, and where that data is stored. However there are two other methods of creating a RAID that do not require an additional card.

The server's motherboard has a SoftRAID on chip, allowing me to set up a RAID in the BIOS, but Cent OS requires that its boot partition be separate from the RAID, and with Intel SoftRAID it is not easy to create a small, non-RAIDed partition. The SoftRAID wants to put the whole disk into the RAID, while we only need a small space on one disk for the boot partition.

I chose to use an open source tool called mdadm, multiple device administrator, to create a software RAID called an md (multiple device). Md0 uses four 1Tb hard drives to stripe and mirror data for redundancy and speed. I chose a RAID 1+0 array, because it increases the speed of disk access (RAID 0) and mirrors the data (RAID 1), simultaneously. That means that any one drive can fail without data loss, and up to two drives can fail if they are in separate mirrors.

The mdadm configuration tool also allows the administrator to configure where on the disk to write. I chose to use an f2 layout, or "far" layout. In this case, "far" means that data is written to two locations on the drive, one that is quicker for the drive to access, and one that is "farther" for the drive to access. While this means that write speed is slower (the drive spindle has to move farther to write), read speed (from the closer block) is much faster, coming close to, or equaling RAID 0 speeds. As users worldwide hit the database, I am assuring optimal availability and redundancy.

No comments: