Next Previous Contents

4. EtherDrive storage and Linux Software RAID

Some AoE devices are internally redundant. A Coraid SR1521, for example, might be exporting a 14-disk RAID 5 as a single 9.75 terabyte LUN. In that case, the AoE target itself is performing RAID, enhancing performance and reliability.

You can also perform RAID on the AoE initiator. Linux Software RAID can increase performance by striping over multiple AoE targets and reliability by using data redundancy. Reading the Linux Software RAID HOWTO before you start to work with RAID will likely save time in the long run. The Linux kernel has an "md" driver that performs the Software RAID, and there are several tool sets that allow you to use this kernel feature.

The main software package for using the md driver is mdadm. Less popular alternatives include the older raidtools package (discussed in the Archives below), and EVMS.

4.1 Example: RAID 5 with mdadm

In this example we have five AoE targets in shelves 0-4, with each shelf exporting a single LUN 0. The following mdadm command uses these five AoE devices as RAID components, creating a level-5 RAID array. The md configuration information is stored on the components themselves in "md superblocks", which can be examined with another mdadm command.

# mdadm -C -n 5 --level=raid5 --auto=md /dev/md0 /dev/etherd/e[0-4].0
mdadm: array /dev/md0 started.
# mdadm --examine /dev/etherd/e0.0
/dev/etherd/e0.0:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 46079e2f:a285bc60:743438c8:144532aa (local to host ellijay)
...

The /proc/mdstats file contains summary information about the RAID as reported by the kernel itself.

# cat /proc/mdstat 
Personalities : [raid5] [raid4] 
md0 : active raid5 etherd/e4.0[5] etherd/e3.0[3] etherd/e2.0[2] etherd/e1.0[1] etherd/e0.0[0]
      5860638208 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
      [>....................]  recovery =  0.0% (150272/1465159552) finish=23605.3min speed=1032K/sec
      
unused devices: <none>

Until md finishes initializing the parity of the RAID, performance is sub-optimal, and the RAID will not be usable if one of the components fails during initialization. After initialization is complete, the md device can continue to be used even if one component fails.

Later the array can be stopped in order to shut it down cleanly in preparation for a system reboot or halt.

# mdadm -S /dev/md0

In a system init script (see the aoe-init example in the FAQ) an mdadm command can assemble the RAID components using the configuration information that was stored on them when the RAID was created.

# mdadm -A /dev/md0 /dev/etherd/e[0-4].0
mdadm: /dev/md0 has been started with 5 drives.

To make an xfs filesystem on the RAID array and mount it, the following commands can be issued:

# mkfs -t xfs /dev/md0
# mkdir /mnt/raid
# mount /dev/md0 /mnt/raid

Once md has finished initializing the RAID, the storage is single-fault tolerant: Any of the components can fail without making the storage unavailable. Once a single component has failed, the md device is said to be in a "degraded" state. Using a degraded array is fine, but a degraded array cannot remain usable if another component fails.

Adding hot spares makes the array even more robust. Having hot spares allows md to bring a new component into the RAID as soon as one of its components has failed so that the normal state may be achieved as quickly as possible. You can check /proc/mdstat for information on the initialization's progress.

The new write-intent bitmap feature can dramatically reduce the time needed for re-initialization after a component fails and is later added back to the array. Reducing the time the RAID spends in degraded mode makes a double fault less likely. Please see the mdadm manpages for details.

4.2 Important notes

  1. Some Linux distributions come with an mdmonitor service running by default. Unless you configure the mdmonitor to do what you want, consider turning off this service with chkconfig mdmonitor off and /etc/init.d/mdmonitor stop or your system's equivalent commands. If mdadm is running in its "monitor" mode without being properly configured, it may interfere with failover to hot spares, the stopping of the RAID, and other actions.
  2. There is a problem with the way some 2.6 kernels determine whether an I/O device is idle. On these kernels, RAID initialization is about five times slower than it needs to be. On these kernels you can do the following to work around the problem:
    echo 100000 > /proc/sys/dev/raid/speed_limit_max
    echo 100000 > /proc/sys/dev/raid/speed_limit_min
    


Next Previous Contents