Skip Navigation

Coraid

1.877.548.7200  |  Contact Us  |  Partner Login

Bookmark and Share

   Linkedin   Twitter

Support

Home > Support > SR/SRX FAQ

SATA+RAID FAQ List


  • Q 1: Can my Linux system mount a filesystem created on a SR Appliance jbod disk?

    Yes, you can use the "losetup" command to access the data.

    Here's an example:

    losetup -o 3072 /dev/loop0 /dev/sdb
    mount /dev/loop0 /mnt
    ls /mnt
    

  • Q 2: How can I connect the console port to my server and remotely access the Coraid appliance?

    The low-level settings for the serial connection are documented in the "EtherDrive Storage Installation Guide".

    Here is a simple example. You can put a null-modem cable with female db9 connectors between the RAIDBlade and a Linux host's serial port. Then you can run kermit on the Linux host and, at kermit's prompt, do something like this:

    set line /dev/ttyS0
    set carrier-watch off
    log session
    connect
    

    To save time, you can put those lines in a file and run kermit with the name of the file as the argument to kermit.

  • Q 3: Can I read and write to the storage while the RAID is initializing/rebuilding?

    Yes, but keep in mind that RAID initialization creates the redundancy that protects you from disk failure. Before initialization is complete, your LUN will fail if one disk fails.

    Also, performance will not be as great during RAID initialization, compared to operation after initialization is complete.

  • Q 4: Does the spare automatically become part of a RAID if a disk fails?

    Yes, a device from the spare pool can be used to replace any failed device. The spare will be recruited to replace the failed device, and then it will be "recovered", after which time the RAID will be fully redundant once again.

  • Q 5: Can I use host RAID software on top of two Coraid storage appliances to mirror the data and improve reliability?

    Yes, a LUN is just an AoE device as far as other hosts on the LAN are concerned. A Linux host could, for example, use Linux Software RAID 1 to mirror two LUNs.

  • Q 6: How do I shutdown the system without causing the RAID sets to re-initialize again when I reboot?

    You can use the "halt" or "reboot" commands to cleanly shutdown the system, including the RAIDs.

  • Q 7: What alarms and status messages do I get from the appliance? Where do they go or where should they go?

    The appliance sends syslog messages out from the first network interface. On another host on the LAN, you can configure the syslog daemon to receive messages from other hosts.

    These are some of the types of messages the appliance sends:

    • begin RAID parity building
    • finish RAID parity building
    • stop RAID parity building
    • begin RAID device recovery
    • abort RAID device recovery
    • complete RAID device recovery
    • no spare found
    • failed device

    The AoE appliances do not, in general, perform IP networking. The source IP in the syslog datagrams from any appliance is always the same: 205.185.197.30. You can send test messages from the appliance by using the syslog command after configuring it with "syslog -c".

    Some Linux hosts will not pass incoming packets to userspace if the source IP conflicts with the kernel's expectations. Those expectations are built from the receiving NIC's IP address and the Linux host's routing table, so you can prepare the kernel to receive the syslog messages from the appliance by setting the Linux host's AoE-side NIC to 205.185.197.1, or some similar IP (besides 205.185.197.30, of course.) The IP you choose should be unique for your network.

  • Q 8: Do all the disks in the appliance need to be the same size? Same vendor?

    No, you can use any combination of disks you like. However, you will maximize the amount of usable space by using disks of approximately the same size in each raid0, raid1, raid10, or raid5.

  • Q 9: Can I move my disks from an active SR to a spare SR and have them recognized?

    Yes. If you just want to move a LUN from one SR to another, you can use the eject and restore commands. They are documented in the SR Software User Manual, found on the SR Support Page.

    For a spare shelf, however, you will probably want to pre-configure the spare so that it matches the active shelf. The rest of this answer provides the details that will help you to configure your spare shelf so that it matches the active one.

    The configuration of an SR is stored in two places, on the flash disk and on the SATA disk(s).

    The configuration stored on the flash includes:

    • shelf address
    • syslog source/destination IP address
    • CEC configuration
    • password

    The configuration of the actual LUN/RAID is stored on the component disks. When an SR boots, the startup script reads this configuration from each disk and assembles the LUN/RAID elements belonging to the SR's assigned shelf address.

    If the standby and primary SR units have matching configuration stored on their flash disks, then the standby is ready to take the place of the primary unit. The above list may be followed to prepare the standby by setting the shelf address, etc.

    When the standby SR is needed, shutdown the primary cleanly (if possible), move the disks to the standby shelf and power it on. On boot the LUN/RAID elements will be started and the standby will behave just like the primary.

  • Q 10: Do you have any example configurations showing transfer rates?

    Yes, please see SR Performance Analysis for a performance comparison of all the RAID types.

  • Q 11: Can the SR Appliance use jumbo frames?

    Yes, jumbo frames are enabled in the 20060316 firmware release. An aoe6-25 or newer linux driver is necessary to take advantage of this feature. No configuration is necessary on the SR Appliance to enable jumbo frames.

    The linux client will need to have the mtu of the interface increased and any switches between the linux client and the SR Appliance will need jumbo frames enabled. Switches supporting jumbo frames do not usually come with jumbo frames enabled by default.

  • Q 12: Can I use both ports of the SR Appliance to achieve redundant network paths?

    The SR Appliance does not currently support bonding. Any redundancy achieved by making both interfaces accessible from a client must be provided by the aoe initiator driver.

    The aoe6-33 and later aoe drivers contain a feature called multipathing that performs load balancing over multiple targets with the same shelf and slot. For more information on this feature and its implications for the SR1520, please see SR Redundancy and Throughput in Linux

    The aoe6-32 and earlier aoe drivers handled aoe targets on multiple paths differently. The aoe driver broadcasts a discovery beacon once a minute. In these earlier drivers, if multiple mac addresses respond for a particular shelf.slot address, the last one to respond becomes the new access destination. When a network path fails, it will be on average 30s before the other path is switched to. This approach has the side effect that the access path can change every minute. For the SR1520 this can be undesirable as the second port does not perform as well as the first port due to its position on the PCI bus.

  • Q 13: How can I verify the throughput numbers in your performance papers with my Linux installation?

    When doing initial performance testing it is important to start with the simplest configuration possible, then add components. Begin by plugging the client linux system directly into the SR to determine a baseline, then introduce any desired switches to see how it affects throughput.

    There are a few things to check to ensure optimal performance.

    1. When plugged directly into the client system, ensure all links negotiate at 1GbE Full Duplex. This can be checked on the linux system using the ethtool program.
    2. Use cat-5e cabling at a minimum, and cat-6 cabling if possible. There is little benefit over cat-5e, but it's an inexpensive upgrade to ensure the best quality transport.
    3. Use the latest standalone aoe driver available from http://support.coraid.com/support/linux/.
    4. Ensure all links are using jumbo frames.
    5. Ensure the aoe driver knows it can use the larger mtu. The aoe driver will log a message indicating when it changes its usable mtu for each device discovered on each link. It is usually sufficient to check dmesg | tail.
    6. If you're testing with ddt, it's best to give ddt enough work to do in order to ensure correct throughput numbers. By default, ddt will perform 4GiB of I/O. If your client system has 4GiB (or more) ram, then you will need to increase the amount of I/O ddt performs in order to guarantee you're actually going through the filesystem to the block device and not just working out of the buffer cache. A good rule of thumb is to perform at least 1.5x the amount of ram; e.g. if you have 4GiB of Ram, you can force ddt to perform a total of 6GiB of I/O to mountpoint /mnt/foo as follows:

      ddt -t 6g /mnt/foo

    If all things appear normal and you still can't get throughput numbers analagous to what we've provided in our analysis papers, you might have packet loss causing excessive retransmissions. To check for this you can run:

    cat /dev/etherd/err

    ... while running ddt to watch for any retransmits that occur. Occasional retransmission is OK, but a steady stream indicates a problem.

  • Q 14: Why do some of my SATA 3.0Gbps disks connect at "sata1"? Should I jumper my SATA disks for 3.0 Gbps?

    When the SR Appliance autonegotiates the uplink speed, it may sometimes connect to a drive at 1.5 Gbps even though the disk may be capable of linking at 3.0 Gbps. This is normal and not indicative of a problem. Even if a drive has the capability of being forced to perform at a certain uplink speed instead of autonegotiating, we've never found this setting to help a disk be more compatibile with the SR Appliance. Barring any bugs internal to the drive firmware, there is no speed advantage to using 3.0 Gbps unless you are using an SSD.

    This is because a spinning disk is only capable of reading data off the drive at around 80-110MB/s in the fastest zone. Changing the uplink speed from the drive to the system doesn't have any effect because it's not the bottleneck. The "faster" 3.0 Gpbs drive is largely marketing hype.

  • Q 15: Is there a command that will send me an email if there is a disk failure?

    The SR can be configured with the syslog command to send out syslog messages to any host that is listening for syslog messages.

    Many of the more recent versions of syslog daemon software can be configured to email you on receiving specific kinds of log messages.

  • Q 16: How do I cleanly shutdown the SR when power failure is imminent?

    Using CEC, the Coraid Ethernet Console, you can run the "halt" command on the SR.

    Here is an example expect script that shows how CEC can be used in this way.

  • Q 17: How do I update the SR from Windows?

    1. Download the SR Software User Manual from the SR Support Page and read 'Appendix C. Updating'. All steps relating to the SR are still valid.
    2. Download a free dd utility implementation for Windows, for example: http://www.chrysocome.net/dd
    3. Login to the SR console and prepare the update LUN: make <lun num> update.
    4. Open the Command Prompt and use dd.exe --list to show all disk devices and partitions available. We have had reports that some versions of Windows (e.g. Vista, Server 2008) may require Command Prompt to be launched using "Run as Administrator" for dd.exe to function properly.
    5. Mount the LUN from the client Windows machine with by claiming it with the EtherDrive Tool.
    6. Run dd.exe --list again and compare the output to the previous listing. The new HarddiskX device should be the update LUN, but be absolutely certain. Using the wrong disk device can lead to data loss or complete system crash.
    7. Use the following command to write the firmware image to the device. <disk> should be a string similar to '\\?\Device\Harddisk4\Partition0':
      dd.exe if=<firmware_update> of=<disk>
    8. Unmount the device from the EtherDrive Tool on your Windows machine with the Release button.
    9. Start update from the SR Console according to the manual.

  • Q 18: Is there a command I should run before physically removing a disk?

    If a disk is a spare, you should run "rmspare" to remove it from the spare pool before physically removing it.

    If a disk is in a RAID in the role of a "failed" component, then there is no need to run a command before physically removing it.

    If a disk is a "normal" or "clean" component in a RAID, then in order to preserve the RAID data, the "eject" or "halt" command should be used before the disk is physically removed.

    If a disk is not in a RAID and is not a spare, then no command needs to be run before physically removing the disk.

  • Q 19: I still need help. What should I do next?

    We have hints that will help you get the fastest and best support we can provide at this page: http://support.coraid.com/support/howto.html.