Next Previous Contents

5. FAQ (contains important info)

5.1 Q: How does the system know about the AoE targets on the network?

A: When an AoE target comes online, it emits a broadcast frame indicating its presence. In addition to this mechanism, the AoE initiator may send out a query frame to discover any new AoE targets.

The Linux aoe driver, for example, sends an AoE query once per minute. The discovery can be triggered manually with the "aoe-discover" tool, one of the aoetools.

5.2 Q: How do I see what AoE devices the system knows about?

A: The /usr/sbin/aoe-stat program (from the aoetools) lists the devices the system considers valid. It also displays the status of the device (up or down). For example:

root@makki root# aoe-stat
      e0.0     10995.116GB   eth0 up            
      e0.1     10995.116GB   eth0 up            
      e0.2     10995.116GB   eth0 up            
      e1.0      1152.874GB   eth0 up            
      e7.0       370.566GB   eth0 up

5.3 Q: What is the "closewait" state?

A: The "down,closewait" status means that the device went down but at least one process still has it open. After all processes close the device, it will become "up" again if it the remote AoE device is available and ready.

The user can also use the "aoe-revalidate" command to manually cause the aoe driver to query the AoE device. If the AoE device is available and ready, the device state on the Linux host will change from "down,closewait" to "up".

5.4 Q: How does the system know an AoE device has failed?

A: When an AoE target cannot complete a requested command it will indicate so in the response to the failed request. The Linux aoe driver will mark the AoE device as failed upon reception of such a response. In addition, if an AoE target has not responded to a prior request within a default timeout (currently three minutes) the aoe driver will fail the device.

5.5 Q: How do I take an AoE device out of the failed state?

A: If the aoe driver shows the device state to be "down", first check the EtherDrive storage itself and the AoE network. Once any problem has been rectified, you can use the "aoe-revalidate" command from the aoetools to ask the aoe driver to change the state back to "up".

If the Linux Software RAID driver has marked the device as "failed" (so that an "F" shows up in the output of "cat /proc/mdstat"), then you first need to remove the device from the RAID using mdadm. Next you add the device back to the array with mdadm.

An example follows, showing how (after manually failing e10.0) the device is removed from the array and then added back. After adding it back to the RAID, the md driver begins rebuilding the redundancy of the array.

root@kokone ~# cat /proc/mdstat
Personalities : [raid1] [raid5] 
md0 : active raid1 etherd/e10.1[1] etherd/e10.0[0]
      524224 blocks [2/2] [UU]
      
unused devices: <none>
root@kokone ~# mdadm --fail /dev/md0 /dev/etherd/e10.0
mdadm: set /dev/etherd/e10.0 faulty in /dev/md0
root@kokone ~# cat /proc/mdstat
Personalities : [raid1] [raid5] 
md0 : active raid1 etherd/e10.1[1] etherd/e10.0[2](F)
      524224 blocks [2/1] [_U]
      
unused devices: <none>
root@kokone ~# mdadm --remove /dev/md0 /dev/etherd/e10.0
mdadm: hot removed /dev/etherd/e10.0
root@kokone ~# mdadm --add /dev/md0 /dev/etherd/e10.0
mdadm: hot added /dev/etherd/e10.0
root@kokone ~# cat /proc/mdstat
Personalities : [raid1] [raid5] 
md0 : active raid1 etherd/e10.0[2] etherd/e10.1[1]
      524224 blocks [2/1] [_U]
      [=>...................]  recovery =  5.0% (26944/524224) finish=0.6min speed=13472K/sec
unused devices: <none>
root@kokone ~# 

5.6 Q: How can I use LVM with my EtherDrive storage?

A: With older LVM2 releases, you may need to edit lvm.conf, but the current version of LVM2 supports AoE devices "out of the box".

You can also create md devices from your aoe devices and tell LVM to use the md devices.

It's necessary to understand LVM itself in order to use AoE devices with LVM. Besides the manpages for the LVM commands, the LVM HOWTO is a big help getting started if you are starting out with LVM.

If you have an old LVM2 that does not already detect and work with AoE devices, you can add this line to the "devices" block of your lvm.conf.

types = [ "aoe", 16 ]

If you are creating physical volumes out of RAIDs over EtherDrive storage, make sure to turn on md component detection so that LVM2 doesn't go snooping around on the underlying EtherDrive disks.

md_component_detection = 1

The snapshots feature in LVM2 did not work in early 2.6 kernels. Lately, Coraid customers have reported success using snapshots on AoE-backed logical volumes when using a recent kernel and aoe driver. Older aoe drivers, like version 22, may need a fix to work correctly with snapshots.

Customers have reported data corruption and kernel panics when using striped logical volumes (created with the "-i" option to lvcreate) when using aoe driver versions prior to aoe6-48. No such problems occur with normal logical volumes or with Software RAID's striping (RAID 0).

Most systems have boot scripts that try to detect LVM physical volumes early in the boot process, before AoE devices are available. In playing with LVM, you may need to help LVM to recognize AoE devices that are physical devices by running vgscan after loading the aoe module.

There have been reports that partitions can interfere with LVM's ability to use an AoE device as a physical volume. For example, with partitions e0.1p1 and e0.1p2 residing on e0.1, pvcreate /dev/etherd/e0.1 might complain,

Device /dev/etherd/e0.1 not found.

Removing the partitions allows LVM to create a physical volume from e0.1.

5.7 Q: I get an "invalid module format" error on modprobe. Why?

A: The aoe module and the kernel must be built to match one another. On module load, the kernel version, SMP support (yes or no), the compiler version, and the target processor must be the same for the module as it was building the kernel.

5.8 Q: Can I allow multiple Linux hosts to use a filesystem that is on my EtherDrive storage?

A: Yes, but you're now taking advantage of the flexibility of EtherDrive storage, using it like a SAN. Your software must be "cluster aware", like GFS. Otherwise, each host will assume it is the sole user of the filesystem and data corruption will result.

5.9 Q: Can you give me an overview of GFS and related software?

A: Yes, here's a brief overview.

Background

GFS is a scalable, journaled filesystem designed to be used by more than one computer at a time. There is a separate journal for each host using the filesystem. All the hosts working together are called a cluster, and each member of the cluster is called a cluster node.

To achieve acceptible performance, each cluster node remembers what was on the block device the last time it looked. This is caching, where data from copies in RAM are used temporarily instead of data directly from the block device.

To avoid chaos, the data in the RAM cache of every cluster node has to match what's on the block device. The members of the cluster (called "cluster nodes") communicate over TCP/IP to agree on who is in the cluster and who has the right to use a particular part of the shared block device.

Hardware

To allow the cluster nodes to control membership in the cluster and to control access to the shared block storage, "fencing" hardware can be used.

Some network switches can be dynamically configured to turn single ports on and off, effectively fencing a node off from the rest of the network.

Remote power switches can be told to turn an outlet off, powering a cluster node down, so that it is certainly not accessing the shared storage.

Software

The RedHat Cluster Suite developers have created several pieces of software besides the GFS filesystem itself to allow the cluster nodes to coordinate cluster membership and to control access to the shared block device.

These parts are listed here, on the GFS Project Page.

http://sources.redhat.com/cluster/gfs/

GFS and its related software are undergoing continuous heavy development and are maturing slowly but steadily.

As might be expected, the devleopers working for RedHat target RedHat Enterprise Linux as the ultimate platform for GFS and its related software. They also use Fedora Core as a platform for testing and innovation.

That means that when choosing a distribution for running GFS, recent versions of Fedora Core, RedHat Enterprise Linux (RHEL), and RHEL clones like CentOS should be considered. On these platforms, RPMs are available that have a good chance of working "out of the box."

With a RedHat-based distro like Fedora Core, using GFS means seeking out the appropriate documentation, installing the necessary RPMs, and creating a few text files for configuring the software.

Here is a good overview of what the process is generally like. Note that if you're using RPMs, then building and installing the software will not be necessary.

http://sources.redhat.com/cluster/doc/usage.txt

Use

Once you have things ready, using the GFS is like using any other filesystem.

Performance will be greatest when the filesystem operations of the different nodes do not interfere with one another. For instance, if all the nodes try to write to the same place in a directory or file, much time will be spent in coordinating access (locking).

An easy way to eliminate a large amount of locking is to use the "noatime" (no access time update) mount option. Even in traditional filesystems the use of this option often results in a dramatic performance benefit, because it eliminates the need to write to the block storage just to record the time that the file was last accessed.

Fencing

There are several ways to keep a cluster node from accessing shared storage when that node might have outdated assumptions about the state of the cluster or the storage. Preventing the node from accessing the storage is called "fencing", and it can be accomplished in several ways.

One popular way is to simply kill the power to the fenced node by using a remote power switch. Another is to use a network switch that has ports that can be turned on and off remotely.

When the shared storage resource is a LUN on an SR, it is possible to manipulate the LUN's mask list in order to accomplish fencing. You can read about this technique in the Contributions area.

5.10 Q: How can I make a RAID of more than 27 components?

A: For Linux Software RAID, the kernel limits the number of disks in one RAID to 27. However, you can easily overcome this limitation by creating another level of RAID.

For example, to create a RAID 0 of thirty block devices, you may create three ten-disk RAIDs (md1, md2, and md3) and then stripe across them (md0 is a stripe over md1, md2, and md3).

Here is an example raidtools configuration file that implements the above scenario for shelves 5, 6, and 7: multi-level RAID 0 configuration file. Non-trivial raidtab configuration files are easier to generate from a script than to create by hand.

EtherDrive storage gives you a lot of freedom, so be creative.

5.11 Q: Why do my device nodes disappear after a reboot?

A: Some Linux distributions create device nodes dynamically. The upcoming method of choice is called "udev". The aoe driver and udev work together when the following rules are installed.

These rules go into a file with a name like 60-aoe.rules. Look in your udev.conf file (usually /etc/udev/udev.conf) for the line starting with udev_rules= to find out where rules go (usually /etc/udev/rules.d).

# These rules tell udev what device nodes to create for aoe support.
# They may be installed along the following lines.  Check the section
# 8 udev manpage to see whether your udev supports SUBSYSTEM, and 
# whether it uses one or two equal signs for SUBSYSTEM and KERNEL.

# aoe char devices
SUBSYSTEM=="aoe", KERNEL=="discover",   NAME="etherd/%k", GROUP="disk", MODE="0220"
SUBSYSTEM=="aoe", KERNEL=="err",        NAME="etherd/%k", GROUP="disk", MODE="0440"
SUBSYSTEM=="aoe", KERNEL=="interfaces", NAME="etherd/%k", GROUP="disk", MODE="0220"
SUBSYSTEM=="aoe", KERNEL=="revalidate", NAME="etherd/%k", GROUP="disk", MODE="0220"
SUBSYSTEM=="aoe", KERNEL=="flush",      NAME="etherd/%k", GROUP="disk", MODE="0220"

# aoe block devices     
KERNEL=="etherd*",       NAME="%k", GROUP="disk"

Unfortunately the syntax for the udev rules file has changed several times as new versions of udev appear. You will probably have to modify the example above for your system, but the existing rules and the udev documentation should help you.

There is an example script in the aoe driver, linux/Documentation/aoe/udev-install.sh, that can install the rules on most systems.

The udev system can only work with the aoe driver if the aoe driver is loaded. To avoid confusion, make sure that you load the aoe driver at boot time.

5.12 Q: Why does RAID initialization seem slow?

A: The 2.6 Linux kernel has a problem with its RAID initialization rate limiting feature. You can override this feature and speed up RAID initialization by using the following commands. Note that these commands change kernel memory, so the commands must be re-run after a reboot.

echo 100000 > /proc/sys/dev/raid/speed_limit_max
echo 100000 > /proc/sys/dev/raid/speed_limit_min

5.13 Q: I can only use shelf zero! Why won't e1.9 work?

A: Every block device has a device file, usually in /dev, that has a major and minor number. You can see these numbers using ls. Note the high major numbers (1744, 2400, and 2401) in the example below.

ecashin@makki ~$ ls -l /dev/etherd/
total 0
brw-------  1 root disk 152, 1744 Mar  1 14:35 e10.9
brw-------  1 root disk 152, 2400 Feb 28 12:21 e15.0
brw-------  1 root disk 152, 2401 Feb 28 12:21 e15.0p1

The 2.6 Linux kernel allows high minor device numbers like this, but until recently, 255 was the highest minor number one could use. Some distributions contain userland software that cannot understand the high minor numbers that 2.6 makes possible.

Here's a crude but reliable test that can determine whether your system is ready to use devices with high minor numbers. In the example below, we tried to create a device node with a minor number of 1744, but ls shows it as 208.

root@kokone ~# mknod e10.9 b 152 1744
root@kokone ~# ls -l e10.9
brw-r--r--  1 root root 158, 208 Mar  2 15:13 e10.9

On systems like this, you can still use the aoe driver to use up to 256 disks if you're willing to live without support for partitions. Just make sure that the device nodes and the aoe driver are both created with one partition per device.

The commands below show how to create a driver without partition support and then to create compatible device nodes for shelf 10.

make install AOE_PARTITIONS=1
rm -rf /dev/etherd
env n_partitions=1 aoe-mkshelf /dev/etherd 10

As of version 1.9.0, the mdadm command supports large minor device numbers. The mdadm versions before 1.9.0 do not. If you would like to use versions of mdadm older than 1.9.0, you can configure your driver and device nodes as outlined above. Be aware that it's easy confuse yourself by creating a driver that doesn't match the device nodes.

5.14 Q: How can I start my AoE storage on boot and shut it down when the system shuts down?

A: That is really a question about your own system, so it's a question you, as the system administrator, are in the best position to answer.

In general, though, many Linux distributions follow the same patterns when it comes to system "init scripts". Most use a System V style.

The example below should help get you started if you have never created and installed an init script. Start by reading the comments at the top. Make sure you understand how your system works and what the script does, because every system is different.

Here is an overview of what happens when the aoe module is loaded and the aoe module begins AoE device discovery. It should help you to understand the example script below. Starting up the aoe module on boot can be tricky if necessary parts of the system are not ready when you want to use AoE.

To discover an AoE device, the aoe driver must receive a Query Config reponse packet that indicates the device is available. A Coraid SR broadcasts this response unsolicited when you run the online SR command, but it is usually sent in response to an AoE initiator broadcasting a Query Config command to discover devices on the network. Once an AoE device has been discovered, the aoe driver sends an ATA Device Identify command to get information about the disk drive. When the disk size is known, the aoe driver will install the new block device in the system.

The aoe driver will broadcast this AoE discovery command when loaded, and then once a minute thereafter.

The AoE discovery that takes place on loading the aoe driver does not take long, but it does take some time. That's why you'll see "sleep" commands in the example aoe-init script below. If AoE discovery is failing, try unloading the aoe module and tuning your init script by invoking it at the command line.

You will often find that a delay is necessary after loading your network drivers (and before loading the aoe driver). This delay allows the network interface to initialize and to become usable. An additional delay is necessary after loading the aoe driver, so that AoE discovery has time to take place before any AoE storage is used.

Without such a delay, the initial AoE Config Query broadcast packet might never go out onto the AoE network, and then the AoE initiator will not know about any AoE targets until the next periodic Config Query broadcast occurs, usually one minute later.

#! /bin/sh
# aoe-init - example init script for ATA over Ethernet storage
# 
#   Edit this script for your purposes.  (Changing "eth1" to the
#   appropriate interface name, adding commands, etc.)  You might
#   need to tune the sleep times.
#
#   Install this script in /etc/init.d with the other init scripts.
#
#   Make it executable:
#     chmod 755 /etc/init.d/aoe-init
#
#   Install symlinks for boot time:
#     cd /etc/rc3.d && ln -s ../init.d/aoe-init S99aoe-init
#     cd /etc/rc5.d && ln -s ../init.d/aoe-init S99aoe-init
#
#   Install symlinks for shutdown time:
#     cd /etc/rc0.d && ln -s ../init.d/aoe-init K01aoe-init
#     cd /etc/rc1.d && ln -s ../init.d/aoe-init K01aoe-init
#     cd /etc/rc2.d && ln -s ../init.d/aoe-init K01aoe-init
#     cd /etc/rc6.d && ln -s ../init.d/aoe-init K01aoe-init
#

case "$1" in
        "start")
                # load any needed network drivers here

                # replace "eth1" with your aoe network interface
                ifconfig eth1 up

                # time for network interface to come up
                sleep 4

                modprobe aoe

                # time for AoE discovery and udev
                sleep 7

                # add your raid assemble commands here
                # add any LVM commands if needed (e.g. vgchange)
                # add your filesystem mount commands here

                test -d /var/lock/subsys && touch /var/lock/subsys/aoe-init
                ;;
        "stop")
                # add your filesystem umount commands here
                # deactivate LVM volume groups if needed
                # add your raid stop commands here
                rmmod aoe
                rm -f /var/lock/subsys/aoe-init
                ;;
        *)
                echo "usage: `basename $0` {start|stop}" 1>&2
                ;;
esac

5.15 Q: Why do I get "permission denied" when I'm root?

A: Some newer systems come with SELinux (Security-Enhanced Linux), which can limit what the root user can do.

SELinux is usually good about creating entries in the system logs when it prevents root from doing something, so examine your logs for such messages.

Check the SELinux documentation for information on how to configure or disable SELinux according to your needs.

5.16 Q: Why does fdisk ask me for the number of cylinders?

A: Your fdisk is probably asking the kernel for the size of the disk with a BLKGETSIZE block device ioctl, which returns the sector count of the disk in a 32-bit number. If the size of the disk exceeds the ability to be stored in this 32-bit number (2 TB is the limit), the ioctl returns ETOOBIG as an error. This error indicates that the program should try the 64-bit ioctl (BLKGETSIZE64), but when fdisk doesn't do that, it just asks the user to supply the number of cylinders.

You can tell fdisk the number of cylinders yourself. The number to use (sectors / (255 * 63)) is printed by the following commands. Use the appropriate device instead of "e0.0".

sectors=`cat /sys/block/etherd\!e0.0/size`
echo $sectors 255 63 '*' / p | dc

But no MSDOS partition table can ever work with more than 2TB. The reason is that the numbers in the partition table itself are only 32 bits in size. That means you can't have a partition larger than 2TB in size or starting further than 2TB from the beginning of the device.

Some options for multi-terabyte volumes are:

  1. By doing without partitions, the filesystem can be created directly on the AoE device itself (e.g., /dev/etherd/e1.0),
  2. LVM2, the Logical Volume Manager, is a sophisticated way of allocating storage to create logical volumes of desired sizes, and
  3. GPT partition tables.

The last item in the list above is a new kind of partition table that overcomes the limitations of the older MSDOS-style partition table. Andrew Chernow has related his successful experiences using GPT partition tables on large AoE devices in this contributed document.

Please note that some versions of the GNU parted tool, such as version 1.8.6, have a bug. This bug allows the user to create an MSDOS-style partition table with partitions larger than two terabytes even though these partitions are too large for an MSDOS partition table. The result is that the filesystems on these partitions will only be usable until the next reboot.

5.17 Q: Can I use AoE equipment with Oracle software?

A: Oracle used to have a Oracle Storage Compatibility Program, but simple block-level storage technologies do not require Oracle validation. ATA over Ethernet provides simple, block-level storage.

Oracle used to have a list of a frequently asked questions about running Oracle on Linux, but they have replaced it with documentation about their own Linux distribution list covering. A third party site continues to maintain a FAQ about running Oracle on Linux.

5.18 Q: Why do I have intermittent problems?

A: Make sure your network is in good shape. Having good patch cables, reliable network switches with good flow control, and good network cards will keep your network storage happy.

5.19 Q: How can I avoid running out of memory when copying large files?

A: You can tell the Linux kernel not to wait so long before writing data out to backing storage.

echo 3 > /proc/sys/vm/dirty_ratio 
echo 4 > /proc/sys/vm/dirty_background_ratio 
echo 32768 > /proc/sys/vm/min_free_kbytes

When a large MTU, like 9000, is in being used on the AoE-side network interfaces, a larger min_free_kbytes setting could be helpful. The more RAM you have, the larger the number you might have to use.

There are also alternative settings to the above "ratio" settings, available as of kernel version 2.6.29. They are dirty_bytes and dirty_background_bytes, and they provide finer control for systems with large amounts of RAM.

If you find the /proc settings to be helpful, you can make them permanent by editing /etc/sysctl.conf or by creating an init script that performs the settings at boot time.

The Documentation/sysctl/vm.txt file for your kernel has details on the settings available for your particular kernel, but some guiding principles are...

5.20 Q: Why doesn't the aoe driver notice that an AoE device has disappeared or changed size?

A: Prior to the aoe6-15 driver, aoe drivers only learned an AoE device's characteristics once, and the only way to use an AoE device that had grown or to get rid of "phantom" AoE devices that were no longer present was to re-load the aoe module completely.

rmmod aoe
modprobe aoe

Since aoe6-15, aoe drivers have supported the aoe-revalidate command. See the aoe-revalidate manpage for more information.

5.21 Q: My NFS client hangs when I export a filesystem on an AoE device.

A: If you are exporting a filesystem over NFS, then that filesystem resides on a block device. Every block device has a major and minor device number that you can see by running "ls -l".

If the block device has a "high" minor number, over 255, and you're trying to export a filesystem on that device, then NFS will have trouble using the minor number to identify the filesystem. You can tell the NFS server to use a different number by using the "fsid" option in your /etc/exports file.

The fsid option is documented in the "exports" manpage. Here's an example of how its use might look in /etc/exports.

/mnt/alpha 205.185.197.207(rw,sync,no_root_squash,fsid=20)

As the manpage says, each filesystem needs its own unique fsid.

5.22 Q: Why do I see "unknown partition table" errors in my logs?

A: Those are probably not errors. Usually this message means that your disk doesn't have a partition table. With AoE devices, that's the common case.

When a new block device is detected by the kernel, the kernel tries to read the part of the block device where a partition table is conventially stored.

The kernel checks to see whether the data there looks like any kind of partition table that it knows about. It can't tell the difference between a disk with a kind of partition table it doesn't know about and a disk with no partition table at all.

5.23 Q: Why do I get better throughput to a file on an AoE device than to the device itself?

Most of the time a filesystem resides on a block device, so that the filesystem can be mounted and the storage is used by reading and writing files and directories. When you are not using a filesystem at all, you might see somewhat degraded performance. Sometimes this degradation comes as a surprise to new AoE users when they first try out an AoE device with the dd command, for example, before creating a filesystem on the device.

If the AoE device has an odd number of sectors, the block layer of the Linux kernel presents the aoe driver with 512-byte I/O jobs. Each AoE packet winds up with only one sector of data, doubling the number of AoE packets when normal ethernet frames are in use.

The Linux kernel's block layer gives special treatment to filesystem I/O, giving the aoe driver I/O jobs in the filesystem block size, so there is no performance penalty to using a filesystem on an AoE device that has an odd number of sectors. Since there isn't a large demand for non-filesystem I/O, the complexity associated with coalescing multiple I/O jobs in the aoe driver is probably not worth the potential driver instability it could introduce.

One way to work around this issue is to use the O_DIRECT flag to the "open" system call. For recent versions of dd, you can use the option, "oflag=direct" to tell dd to use this O_DIRECT flag. You should combine this option with a large blocksize, such as "bs=4M" in order to take use the larger possible I/O batch size.

Another way to work around this issue is to use a trivial md device as a wrapper. (Almost everyone uses a filesystem. This technique is only interesting to those who are not using a filesystem, so most people should ignore this idea.) In the example below, a single-disk RAID 0 is created for the AoE device e0.3. Although e0.3 has an odd number of sectors, the md1 device does not, and tcpdump confirms that each AoE packet has 1 KiB of data as we would like.

makki:~# mdadm -C -l 0 -n 1 --auto=md  /dev/md1 /dev/etherd/e0.3
mdadm: '1' is an unusual number of drives for an array, so it is probably
     a mistake.  If you really mean it you will need to specify --force before
     setting the number of drives.
makki:~# mdadm -C -l 0 --force -n 1 --auto=md  /dev/md1 /dev/etherd/e0.3
mdadm: array /dev/md1 started.
makki:~# cat /sys/block/etherd\!e0.3/size
209715201
makki:~# cat /sys/block/md1/size
209715072

5.24 Q: How can I boot diskless systems from my Coraid EtherDrive devices?

Booting from AoE devices is similar to other kinds of network booting. Customers have contributed examples of successful strategies in the Contributions Area of the Coraid website.

Jayson Vantuyl: Making A Flexible Initial Ramdisk

Jason McMullan: Add root filesystem on AoE support to aoe driver

Keep in mind that if you intend to use AoE devices before udev is running, you must use static minor numbers for the device nodes. An aoe6 driver version 50 or above can be instructed to use static minor numbers by being loaded with the aoe_dyndevs=0 module parameter. (Previous aoe drivers only used static minor device numbers.)

5.25 Q: What filesystems do you recommend for very large block devices?

The filesystem you choose will depend on how you want to use the storage. Here are some generalizations that may serve as a starting point.

There are two major classes of filesystems: cluster filesystems and traditional filesystems. Cluster filesystems are more complex and support simultaneous access from multiple independent computers to a single filesystem stored on a shared block device.

Traditional filesystems are only mounted by one host at a time. Some traditional filesystems that scale to sizes larger than those supported by ext3 include the following journalling filesystems.

XFS, developed at SGI, specializes in high throughput to large files.

Reiserfs, an often experimental filesystem can perform well with many small files.

JFS, developed at IBM, is a general purpose filesystem.

5.26 Q: Why does umount say, "device is busy"?

A: That just means you're still using the filesystem on that device.

Unless something has gone very wrong, you should be able to unmount after you stop using the filesystem. Here are a few ways you might be using the filesystem without knowing it:

The lsof command can be helpful in finding processes that are using files.

5.27 Q: How do I use the multiple network path support in driver versions 33 and up?

A: You don't have to do anything to benefit from the aoe driver's ability to use multiple network paths to the same AoE target.

The aoe driver will automatically use each end-to-end path in an essentially round-robin fashion. If one network path becomes unusable, the aoe driver will attempt to use the remaining network paths to reach the AoE target, even retransmitting any lost packets through one of the remaining paths.

5.28 Q: Why does "xfs_check" say "out of memory"?

A: The xfstools use a huge amount of virtual memory when operating on large filesystems. The CLN HOWTO has some helpful information about using temporary swap space when necessary for accomodating the xfstools' virtual memory requirements.

CLN HOWTO: Repairing a Filesystem

The 32-bit xfstools are limited in the size of the filesystem they can operate on, but 64-bit systems overcome this limitation. This limit is likely to be encountered with 32-bit xfstools for filesystems over 2 TiB in size.

5.29 Q: Can virtual machines running on VMware ESX use AoE over jumbo frames?

A: It is somewhat difficult to find public information about the ESX configuration necessary to use jumbo frames, but there is information in the public forum at the URL below.

How to setup TCP/IP Jumbo packet support in VMware ESX 3.5 on W2K3 VMs

5.30 Q: Can I use SMART with my AoE devices?

A: The early Coraid products like the EtherDrive PATA blades simply passed ATA commands through to the attached PATA disk, including SMART commands. While there was no way to ask the aoe driver to send SMART commands, one could ask aoeping to send SMART commands. The aoeping manpage has more information.

The Coraid SR and VS storage appliances present AoE targets that are LUNs, not corresponding to a specific disk. The SR supports SMART internally, on its command line, but the AoE LUNs do not support SMART.


Next Previous Contents