The Coraid Linux NAS HOWTO

Ed L. Cashin

2009-22-04

Table of Contents

Initial Configuration
Setting the Root User Password
Setting the Timezone
Limiting Login Access
Configure Ethernet MTU
Configure IP Networking
Setting the Hostname
Setting the Syslogging Receiver IP
Outgoing Mail
Email Alerts for SR Events
User and Group Information
Setting Up ATA over Ethernet
Setting Up Linux Software RAID
Using LVM to Manage Storage
Growing a Logical Volume
LVM Jargon
Creating a Filesystem
Growing a Filesystem
Repairing a Filesystem
Using NFS to Export Filesystems
Samba and CIFS for FS Export
Using Filesystems Quotas
Using Netconsole to See Kernel Messages
Sending Console Messages from Every CLN
Sending Console Messages outside the LAN
Using APT to Manage the CLN Software
Searching for Packages
Merging Configuration Files
Extra Virtual Memory for APT
Sweeping Upgrades
Getting Secure SSH Host Keys
Staying Informed
Mailing List
Changelog

Initial Configuration

Before using the Coraid Linux NAS (CLN), there is some site-specific information that needs to be configured. You can quickly store this configuration information on the CLN by using the commands and text editing steps described below.

For text editing, the CLN provides a traditional UNIX text editor, vi, as well as a "user friendly" editor called pico.

It's prudent to make a backup of an important file before editing it in place. You can use the convenient bkp script the CLN provides for this purpose.

makki:~# bkp /etc/hostname
bkp "/etc/hostname" --> "/etc/hostname.20051108"
makki:~# vi /etc/hostname

If you're really being careful you can examine your changes using the diff command. The old lines have a "-" at the left, and the new lines have a "+" at the left. Unchanged lines are shown with a space at the left in order to provide context.

makki:~# diff -u /etc/hosts.20051108 /etc/hosts
--- /etc/hosts.20051108 2005-11-08 10:31:46.000000000 -0500
+++ /etc/hosts  2005-11-08 10:32:04.000000000 -0500
@@ -1,5 +1,5 @@
 127.0.0.1      localhost.localdomain   localhost
-205.185.197.212        makki.coraid.com        makki
+192.168.1.20   foo.example.com foo
 # The following lines are desirable for IPv6 capable hosts
 ::1     ip6-localhost ip6-loopback
makki:~#

Setting the Root User Password

Every CLN comes with the root user account's password set to the word, "changeme". Unless only legitimate system administrators have physical or network access to the CLN, it's important to give the root user account a password right away.

Login to the CLN as the root user using the default password, "changeme", and then run the passwd command to change the root user's password to one you have chosen.

You type your new password and then type the same new password again. It is not displayed to the screen when you are typing, so you are typing blind when entering the new password.

makki:~# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
makki:~#

Setting the Timezone

The CLN uses NTP to set the system time when it boots. By setting the timezone for your location, your time will display correctly, and software will work more smoothly.

To find out the name for your timezone, you can use the interactive tzconfig tool. When prompted, just type in the appropriate answers for your location and hit enter.

Limiting Login Access

Before networking is configured, you can limit the IP addresses that are allowed to connect to the CLN by ssh. The only way to login to the CLN over the network is via the secure shell, ssh.

By editing the /etc/hosts.allow file, you can selectively provide access. The /etc/hosts.allow lines below only allow ssh connections from 205.185.197.207.

sshd: 205.185.197.207: allow
sshd: ALL: deny

The hosts.allow syntax is flexible enough to accomodate more complex access specifications. Run man hosts.allow to find out more.

Configure Ethernet MTU

The CLN will attempt to use a "jumbo" MTU (Maximum Transmission Unit) on all of its ethernet interfaces. If you are using a network switch that does not support jumbo frames (an MTU above 1500), you can override this behavior by editing the mtu.conf file. The simple configuration file syntax is explained in the comments at the top of the file.

bkp /etc/aoe/mtu.conf
vi /etc/aoe/mtu.conf

It is helpful to configure eth2 and eth3, the onboard ports, so that they do not use an MTU above 4200. Above this size the onboard ports may drop some packets, decreasing performance.

eth2 4200
eth3 4200

If you do not have that file, you are probably using an older version of the coraid-init package, and your CLN will not attempt to use jumbo frames. You can upgrade using the commands below, after you have completed the rest of the initial configuration steps listed in the following sections.

Even if your network switches support jumbo frames, they might need to be configured before this feature is available.

Configure IP Networking

The "front side" network of the CLN will be connected to your LAN, through which all the storage clients will contact the CLN. The "back" network will be used for ATA over Ethernet (AoE). The CLN is like a storage doorway, with NFS clients on the front side of the CLN and AoE storage devices behind the door.

For networking on the front side, you need an IP address that the clients will use to reach the CLN.

The CLN has onboard network interfaces that are free for administrative uses. When you view the back of the CLN, the PCI-X extension card appearing on the right contains eth0 on the right, and eth1 on the left. Figure 1 shows this view of the CLN's network ports.

CLN Network Ports

The port for the front network is eth0, and AoE storage will be accessed through eth1.

The examples in the following sections use this configuration.

(An alternative configuration would be to use eth0 and eth1 for the back network, with eth2 serving the front. This alternative provides higher AoE throughput and somewhat lower TCP/IP throughput, since the onboard ports cannot handle the 9000-MTU frames that the extension card ports can use. The important thing is not to mix ports of dissimilar performance characteristics.)

Setting the IP Address

To set the IP address, you use a text editor on two files. One is /etc/network/interfaces and the other is /etc/hosts. Replace the IP 205.185.197.212 with the IP address that you want your clients to use for accessing your NAS box.

You should not use 205.185.197.x addresses for your front side network. These are Coraid IP addresses, and you will need to use them on the back network if you are going to forward syslog messages from any SATA+RAID units. Syslog messages will be discussed below.

Also, edit /etc/hosts, again replacing 205.185.197.212 with your IP.

Setting the Network Information

Edit /etc/network/interfaces to reflect your IP network.

  • Replace 205.185.197.0 with the address of your network.
  • Replace 205.185.197.254 with the gateway you use to reach networks outside your LAN.
  • Replace 205.185.197.2 with the IP address of your primary name server.
  • Replace coraid.com with your local domain name.

Then restart networking with the command below.

/etc/init.d/networking restart

Edit /etc/resolv.conf, supplying your name server's address and the default domain.

  • Replace 205.185.197.2 with the IP address of your primary name server.
  • Replace coraid.com with your local domain name.

Nameless Hosts

For internal networks, it is sometimes the case that the name server specified in /etc/resolv.conf and /etc/network/interfaces doesn't know any names for certain hosts that will be connecting to the CLN.

Those hosts may experience delays when connecting to the CLN's services. You can put names in the /etc/hosts file for each such "nameless host" so that the CLN does not look up the name.

You can even generate names with a simple shell loop.

makki:~# for n in `seq 101 105`; do echo 192.168.2.$n h$n.coraid.com h$n; done
192.168.2.101 h101.coraid.com h101
192.168.2.102 h102.coraid.com h102
192.168.2.103 h103.coraid.com h103
192.168.2.104 h104.coraid.com h104
192.168.2.105 h105.coraid.com h105
makki:~# for n in `seq 101 105`; do echo 192.168.2.$n h$n.coraid.com h$n; done >> /etc/hosts

Setting the Hostname

Because the CLN is a server, we store the hostname on the CLN itself instead of looking it up via DNS.

  • Edit /etc/hostname, replacing "makki" with the hostname you have chosen for this CLN unit.
  • Edit /etc/hosts and /etc/mailname, replacing "makki" with your chosen hostname and "coraid.com" with your domain name.
  • Run the command hostname makki, using your own host name instead of "makki".

The hostname and domainname, joined by a period, form the "fully qualified domain name" (FQDN).

Setting the Syslogging Receiver IP

Note: If you decide to use syslog-ng, as described below in the section, Email Alerts for SR Events, you will not need to configure /etc/syslog.conf as described in this section. It is, however, instructive to read this section in any case.

The CLN uses flash storage locally, so to minimize writes to the flash medium, we can send its system log information to a remote host. The syslog daemon on the remote host should be run in a mode such that it will accept the syslog messages coming from the CLN.

Configuring the CLN to send its syslog information to a remote host is easy. In the echo command below, replace "kokone.coraid.com" with the host that will receive the syslog messages.

makki:~# bkp /etc/syslog.conf
bkp "/etc/syslog.conf" --> "/etc/syslog.conf.20051108"
makki:~# echo '*.* @kokone.coraid.com' > /etc/syslog.conf
makki:~# /etc/init.d/sysklogd restart
Restarting system log daemon: syslogd.
makki:~#

You can test this setup using logger.

makki:~# logger "testing"

Lines like this should show up on the remote host running syslog -r or the equivalent.

Nov  8 16:26:51 makki.coraid.com syslogd 1.4.1#17: restart.
Nov  8 16:26:57 makki.coraid.com ecashin: testing

(It says "ecashin" because during testing I logged into the CLN as that user before using the su command to become root.)

Forwarding SR Messages via Syslog

You can skip this section for now if you're just getting started.

You can configure the CLN to forward messages from one or more SATA+RAID (SR) units.

The SR appliances do not, in general, perform IP networking, but they do generate syslog messages in the form of UDP packets. These UDP packets have a source IP of 205.185.197.30 by default. Even if all the SR units you have use this default, you can still tell which SR unit on a given AoE network generated a message, because the shelf address is contained in the syslog message.

If your CLN is forwarding its own syslog messages to a remote host, it can also send messages from SR units as well. Here are the changes to make.

First, tell the syslog daemon on the CLN to listen to the network for messages from other hosts, and to forward those messages.

You can change the SYSLOGD options in /etc/init.d/sysklogd from an empty string to this:

SYSLOGD="-h -r"

Remember to restart the syslog daemon after editing the startup script.

/etc/init.d/sysklogd restart

Next, assign an IP to the CLN's network interface on the back network by editing /etc/network/interfaces. The purpose of this IP is only to help the kernel receive syslog UDP packets from the SR units, so it should be an IP on the same network as the source IP for the SR syslog messages. An example would be to make the CLN's eth1 be 205.185.197.1, so that it can receive the syslog messages with the default source IP 205.185.197.30.

# the back network interface
auto eth1
iface eth1 inet static
        address 205.185.197.1
        netmask 255.255.255.0
        network 205.185.197.0
        broadcast 205.185.197.255

In this example you are using 205.185.197.1 (or another IP in that network) as your own CLN's eth1 IP, so that it can receive packets with a source IP of 205.185.197.30.

Outgoing Mail

The CLN has only a stripped-down mail transport agent (MTA), enough to get mail off the CLN and onto a mail server. The primary reason for performing this easy step now is that the package management system, APT, sometimes sends email about installed packages.

To configure outgoing mail, edit the /etc/ssmtp/ssmtp.conf file, making the following changes:

  • Replace makki.coraid.com with the FQDN (hostname + dot + dommainname) of your CLN.
  • In mailhub=kokone, replace kokone with the hostname of your mail server.

Email Alerts for SR Events

The CLN comes with a traditional syslog daemon (discussed in Setting the Syslogging Receiver IP), but if you would like to receive email alerts based on the syslog messages that the CLN receives from your SR units, you can install and configure a more advanced syslog daemon. Here we discuss the use of syslog-ng.

It is safe to skip this section if you are getting started with your CLN, or if you have no interest in receiving emails containing messages on the SR.

First, ensure that you have a recent enough coraid-scripts package. It should be at least version 1.4. If it isn't recent enough, you can upgrade the package as shown below.

makki:~# dpkg -l | grep coraid-scripts
ii  coraid-scripts         1.2                   Helpful scripts for the CLN
makki:~# apt-get update       # repeat if necessary
makki:~# apt-get install coraid-scripts

Next replace the existing syslog daemon with syslog-ng.

makki:~# apt-get install syslog-ng
Reading package lists... Done
Building dependency tree... Done
The following packages will be REMOVED:
  klogd sysklogd
The following NEW packages will be installed:
  syslog-ng
0 upgraded, 1 newly installed, 2 to remove and 17 not upgraded.
Need to get 199kB of archives.
After unpacking 225kB of additional disk space will be used.
Do you want to continue [Y/n]?

Hit enter to accept the default response and initiate the replacement.

Once syslog-ng is installed, it needs to be configured. You can backup the distributed configuration file first.

makki:~# bkp /etc/syslog-ng/syslog-ng.conf

Below is an example syslog-ng.conf that will send email to "bogus@example.com". To receive email alerts, you should change that address to that of the intended recipient.

You will also need to replace "kokone.coraid.com" with the name or IP address of the host where you want the CLN's syslog messages to wind up.

# example /etc/syslog-ng/syslog-ng.conf for email alerts
options {
      # five seconds
      flush_timeout(5000);
      use_dns(no);
      log_msg_size(1024);
      stats_freq(0);
};
source local {
        internal();
        unix-stream("/dev/log");
        file("/proc/kmsg" log_prefix("kernel: "));
};
source remote {       udp(); };
filter sr_critical {
      match("shelf_") and (
              match("fail") or
              match("offline") or
              match("abort") or
              match("cknowledge by running online")
      );
};
filter emerg { level(emerg); };
destination loghost {
      udp("kokone.coraid.com" port (514));
};
destination users { usertty("*"); };
destination email_alert {
      program("/opt/bin/sendalert bogus@example.com");
};
log {
      source(local);
      source(remote);
      destination(loghost);
};
log {
      source(remote);
      filter(sr_critical);
      destination(email_alert);
};
log { source(local); filter(emerg); destination(users); };

You can modify the configuration to suit your needs by consulting the syslog-ng documentation and examples. You can see the contents of the syslog-ng package (or any package you like) by using the dpkg command.

makki:~# dpkg -L syslog-ng | less

User and Group Information

No special configuration is needed on the CLN for users and groups.

With NFS, all access is provided based on the numerical identifier for the user, called a "UID", and a numerical group identifier ("GID"). The users and groups need only exist on the NFS clients. You don't need to create the users and groups on the CLN.

If you have multiple NFS clients sharing a filesystem, then the UIDs and GIDs for the users and groups on all the NFS clients should match. There are many options for managing users and groups on NFS clients (NIS+, LDAP, etc.), but user and group information does not need to be stored on the CLN.

Setting Up ATA over Ethernet

Configuring ATA over Ethernet (AoE) is the easiest part of CLN administration. The CLN comes with drivers and tools that will help you to work with your AoE devices.

The cln-init script in /etc/init.d is designed to get AoE up and running. This script runs automatically during system startup. The back network is on eth1.

You can see what AoE devices are available by using the aoe-stat command.

makki:~# aoe-stat
      e0.2         5.368GB   eth1 up
     e14.0         7.339GB   eth1 up
makki:~#

The CLN comes with udev installed, and udev creates device nodes in /dev/etherd when AoE devices are detected by the aoe driver.

makki:~# ls /dev/etherd/
discover  e0.2  e0.2p1  e14.0  err  interfaces

The discover, err, and interfaces device nodes are character devices that are used to interact with the aoe driver. The other files listed are block device nodes used to control AoE block devices.

A block device is, briefly put, anything that can be used like a hard disk or floppy drive. It stores data and allows access to any part of the data, unlike a tape drive, where you'd have to fast forward or rewind the tape before being able to read from or write to some random location.

MSDOS-style partitions can't be used on block devices over two terabytes in size, and AoE devices are usually used without first being partitioned. If they do appear on an AoE device, however, device nodes will be created for its partitions in addition to the device node for the whole device (like e0.2p1 in the example).

Setting Up Linux Software RAID

You only need to use Linux Software RAID if you want the CLN itself to perform RAID over multiple AoE devices.

The md driver, as well as drivers for each RAID level (raid0, raid5, etc.) are modules in the Linux kernel that allow you to perform RAID using the CLN. The mdadm tool is a utility that you use to control the Software RAID performed by the kernel.

You can see what Software RAID the kernel is doing by looking at the contents of the /proc/mdstat virtual file using cat.

makki:~# cat /proc/mdstat
Personalities :
unused devices: <none>
makki:~#

The mdadm tool provides a way to create a Linux Software RAID block device from AoE devices.

makki:~# aoe-stat
      e1.0       573.749GB   eth1 up
      e1.1       576.437GB   eth1 up
makki:~# mdadm -C --level=raid1 -n 2 --auto=md /dev/md0 /dev/etherd/e1.0 /dev/etherd/e1.1
mdadm: array /dev/md0 started.

You can see the Linux Software RAID driver, md, initializing the new RAID 1 by examining /proc/mdstat.

makki:~# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 etherd/e1.1[1] etherd/e1.0[0]
      560302272 blocks [2/2] [UU]
      [>....................]  resync =  0.0% (92480/560302272) finish=504.6min speed=18496K/sec
unused devices: <none>
makki:~#

The new block device you have created is md0.

You can now create a filesystem on md0, make it into an LVM physical volume as described below, or even use it as a component in a higher-level RAID.

To have the CLN start your new RAID at boot time and cleanly shut it down when the system is going down, add the RAID to the /etc/aoe/md.conf file.

0 raid1 /dev/etherd/e1.1 /dev/etherd/e1.0

Each line in that file describes a RAID, beginning with the number of the md device. The "0" in the example is for "md0". The next word shows what kind of RAID it is, and the block devices underlying the RAID follow. The CLN will attempt to start each RAID when a sufficient number of components are present, even if some are missing.

For each RAID device, all of its components must be listed individually on its line in /etc/aoe/md.conf. If it is inconvenient to type them all, a timesaving trick is to copy and paste the output of the echo command, using shell globbing.

makki:/home/ecashin# echo /dev/etherd/e10.0 /dev/etherd/e10.[2-9]
/dev/etherd/e10.0 /dev/etherd/e10.2 /dev/etherd/e10.3 /dev/etherd/e10.4 /dev/etherd/e10.5 /dev/etherd/e10.6 /dev/etherd/e10.7 /dev/etherd/e10.8 /dev/etherd/e10.9

Using LVM to Manage Storage

Unless you want the extra flexibility provided by the Logical Volume Manager (LVM), you don't need to use LVM. You can instead just put a filesystem on the AoE or md block device itself.

Using LVM means mastering a few new concepts and the corresponding jargon. We'll cover these concepts and terms below, reviewing them at the end of the section.

The Logical Volume Manager software on systems with a 2.6 Linux kernel is LVM2. In the text below, "LVM" refers to LVM2, because the CLN runs a 2.6 kernel.

LVM uses the kernel's device mapper module to create the block devices you want out of lower level block devices. For example, if you have two SATA+RAID shelves with seven terabytes of storage on each one, you can combine the two 7TB AoE devices into one 14TB block device, where you can create a filesystem for NFS export.

To make a block device ready for use with LVM, we use the pvcreate command. LVM will write information to the end of the block device. This "LVM metadata" will completely describe your LVM setup, so you won't have to store information about your LVM setup on the CLN.

Continuing the example from the previous section, we'll use pvcreate to create an LVM "physical volume" out of the RAID 1 block device we just made, /dev/md0.

makki:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created

You could, of course, use an AoE device like /dev/etherd/e0.0 as a physical volume, too. You can make any block device into an LVM physical volume.

Now we want to create a logical volume that we can grow later. You create logical volumes by allocating "extents" from a collection of LVM physical volumes. The collection is called a "volume group". We now have one such physical volume, the one we just created, so we put it in a volume group called "vg0". You can use a more descriptive name like "accounting" if it's helpful.

makki:~# vgcreate vg0 /dev/md0
  Volume group "vg0" successfully created

Now we can use vgdisplay vg0 to examine the characteristics of the new volume group. Notice the line that says "Free PE / Size"? The number before the slash is the number of extents in the volume group. We'll use all the extents to create one logical volume with the mundane name, "lv0".

makki:~# vgdisplay vg0 | grep Free
  Free  PE / Size       136792 / 534.34 GB
makki:~# lvcreate --extents 136792 --name lv0 vg0
  Logical volume "lv0" created

There is a new block device ready for us to use, /dev/vg0/lv0. We'll create a filesystem on it in the section on filesystem creation.

Growing a Logical Volume

An existing logical volume can grow if there's more space available in its volume group. That space is measured in free extents.

Here is an example, building upon the previous example, where we add a new AoE device to the existing volume group and grow the existing logical volume.

Here's vg0, with no free space available.

makki:~# vgdisplay vg0 | grep Free
  Free  PE / Size       0 / 0

The new AoE device is e0.1. We'll make it into a physical volume and add it to vg0.

makki:~# aoe-stat
      e0.1       536.870GB   eth0 up
      e1.0       573.749GB   eth0 up
      e1.1       576.437GB   eth0 up
makki:~# pvcreate /dev/etherd/e0.1
  Physical volume "/dev/etherd/e0.1" successfully created
makki:~# vgextend vg0 /dev/etherd/e0.1
  Volume group "vg0" successfully extended

Now there's free space, so we can extend the old logical volume. Notice that you have to put a plus before the number of extents to add.

makki:~# vgdisplay vg0 | grep Free
  Free  PE / Size       127999 / 500.00 GB
makki:~# lvextend --extents +127999 /dev/vg0/lv0
  Extending logical volume lv0 to 1.01 TB
  Logical volume lv0 successfully resized

If you have a filesystem on lv0, note that you have done nothing to it yet. To LVM, the filesystem itself is just data on the logical volume. See the section on Growing a Filesystem for specifics.

LVM Jargon

  • A physical volume is a block device that has been prepared for use with LVM.
  • A volume group is a collection of physical volumes that provide storage space, measured in extents.
  • Extents are the space smallest units of space allocatable for logical volumes.
  • Logical volumes are the LVM block devices you use instead of a more simple block device, like a hard disk.

Creating a Filesystem

Once you have selected or created a block device that you would like to use as your storage medium, you can create a filesystem on that block device. For the block device you might use …

In this example we're using an LVM logical volume. The mkfs command is used to create a filesystem. Creating an XFS generates some verbose output.

makki:~# mkfs -t xfs /dev/vg0/lv0
meta-data=/dev/vg0/lv0           isize=256    agcount=32, agsize=8473312 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=271145984, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
makki:~#

To use this filesystem, we just have to mount it on a directory somewhere.

makki:~# mkdir /mnt/alpha
makki:~# mount /dev/vg0/lv0 /mnt/alpha
makki:~# echo testing > /mnt/alpha/test.txt
makki:~# cat /mnt/alpha/test.txt
testing

It's important to make sure this filesystem is mounted on boot and cleanly unmounted when the system goes down. On the CLN we add it to /etc/aoe/fs.conf. Just add a line with the block device and the mountpoint. The line from the above example looks like this:

/dev/vg0/lv0 /mnt/alpha

You can use an editor to do it. If you're careful you can just append (with >>) to the file as shown below.

makki:~# bkp /etc/aoe/fs.conf
bkp "/etc/aoe/fs.conf" --> "/etc/aoe/fs.conf.20051122"
makki:~# echo /dev/vg0/lv0 /mnt/alpha >> /etc/aoe/fs.conf

For more complex mount commands (e.g., ones with special options) the mount command itself may be placed on a single line like this in /etc/aoe/fs.conf, with the mountpoint last.

mount -o uquota /dev/md0 /mnt/md0

The /etc/fstab file is not used because filesystems on AoE devices are not mounted as early in the boot sequence as the filesystems in /etc/fstab.

Growing a Filesystem

The xfs_growfs command is used to expand an existing XFS. Let's say we have just extended the logical volume /dev/vg0/lv0. Now we want to make sure that the XFS can use all the new space.

With the XFS already mounted, we can grow it to fill the empty space. The XFS is still the size of the logical volume where it was created, 1.1 terabytes in this example.

makki:~# df -h | grep alpha
/dev/mapper/vg0-lv0   1.1T  528K  1.1T   1% /mnt/alpha
makki:~#

We've already grown the logical volume by adding another terabyte of space (using vgextend and lvextend as shown in the Growing a Logical Volume section). All we need to do is tell XFS to use the new space. The only argument to xfs_growfs is the directory where the filesystem is mounted.

PLEASE NOTE: There is a bug in the xfs driver in Coraid Linux that causes growth in increments of over two terabytes to fail. We are working to fix this bug., but you can work around the problem by growing repeatedly in increments of one terabyte (268435456 blocks of 4096 bytes each). Using the "-D" option to specify an absolute total size in terms of 4096-byte data blocks, and adding 268435456 for each growth, then omitting the "-D" option to finally grow by the sub-terabyte remainder, you can expand the XFS as needed.

makki:~# xfs_growfs /mnt/alpha
meta-data=/mnt/alpha             isize=256    agcount=32, agsize=8473312 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=271145984, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
data blocks changed from 271145984 to 539580416

Notice that the number of data blocks has about doubled.

makki:~# df -h | grep alpha
/dev/mapper/vg0-lv0   2.1T  1.1M  2.1T   1% /mnt/alpha

Now the extra terabyte is ready to use. No downtime!

Repairing a Filesystem

Having a single large filesystem is attractive in its convenience. A more conservative approach is to use many small filesystems, so that each independent filesystem has its own metadata, and problems in one filesystem can't cause problems in any other. But many administrators prefer to use one very large filesystem instead.

It takes a lot of memory to repair a huge XFS filesystem. If you ever have to repair an XFS, you can use the xfs_repair tool.

Here is a warning about using xfs_repair: If xfs_repair rescues your files, placing them in a lost+found directory, it is a good idea to rename the directory from "lost+found" to something else. The reason is that running xfs_repair again could result in the deletion and recreation of of lost+found.

To run this tool, the XFS experts say that you'll need …

  • about 2GB of memory per terabyte of filesystem space, and
  • 100-200MB of memory per million inodes in the filesystem.

That memory can be virtual memory, so by adding temporary swap space to the CLN, you can more easily run xfs_repair on large filesystems. In general, it's best not to use AoE devices for swap. It can be an attractive option, though, when the CLN is just performing an xfs_repair.

Using spare disks on Coraid SR appliances helps to make their LUNs more resilient to disk failures. It also provides a potential source of emergency virtual memory for your CLNs.

The strategy is as follows. (Please double check your commands!)

  • On the SR unit, borrow the spare by using the rmspare command.
  • On the SR unit, export the former spare disk using the jbod command.
  • On the CLN, initialize the new LUN as a swap device with the mkswap command.
  • On the CLN, activate the swap device with the swapon command.
  • On the CLN, perform the xfs_repair.
  • On the CLN, finish using the swap device with the swapoff command.
  • On the SR unit, remove the temporary jbod LUN and make the disk a spare again.

If you don't have an SR unit with a spare disk, another option would be to use a vblade process to make space from a Linux-based system available to the CLN.

Using NFS to Export Filesystems

The CLN will export any filesystems configured in the /etc/exports file. Here is an example of a line in /etc/exports that tells the CLN to export a filesystem mounted on /mnt/alpha.

/mnt/alpha 205.185.197.207(rw,sync,no_root_squash,fsid=20)

The line tells the CLN to make the filesystem available via NFS to the host with the IP address 205.185.197.207. NFS provides security based on access granted in /etc/exports and based on user identity. Presumably, the network is secure enough that we know this IP corresponds to a particular host, and the users on host are managed by an authorized system administrator.

If you have a trusted network, you can open up NFS even more by specifying a network instead of one or more IP addresses. The example below uses a 24-bit netmask to allow any IP starting with "205.185.197." to access the exported filesystem.

/mnt/alpha 205.185.197.0/24(rw,sync,no_root_squash,fsid=20)

Between the parentheses, the line includes options to control the way the filesystem is exported. In this example, the CLN is on a trusted storage network, so client access is liberal.

These options are documented in the manpage for the /etc/exports file.

makki:~# man exports

After changing the /etc/exports file, run exportfs -ra. This command tells the running NFS server to re-read /etc/exports.

makki:~# exportfs -ra

On kokone, the host with IP 205.185.197.207, we can now mount and use the exported filesystem using NFS.

root@kokone ~# mount -t nfs makki:/mnt/alpha /mnt/makki
root@kokone ~# rsync -a /usr/share/doc/ruby1.8 /mnt/makki
root@kokone ~# ls /mnt/makki/ruby1.8/
COPYING     NEWS.Debian.gz  README.Debian  changelog.Debian.gz
COPYING.ja  NEWS.gz         README.ja      changelog.gz
LEGAL.gz    README          ToDo.gz        copyright
root@kokone root#

Any other host in the 205.185.197.0 network can also mount and use the exported filesystem at the same time.

If kokone is connected to makki by a lossy network, doing NFS over TCP may result in better performance. The tcp mount option specifies that the client would like to use TCP for NFS.

Solaris 10 NFS clients might need to configure the client not to use NFS v4 when mounting volumes from the CLN. Solaris 10 users report that adding a line like the one below to the /etc/auto_master file on the NFS client eliminated "not owner" errors.

/home           auto_home       -nobrowse,vers=3

An alternative would be to set up NFS v4 on the CLN. NFS v4 is not officially supported on the CLN, but the framework exists for those who would like to be early adopters.

Samba and CIFS for FS Export

A popular way to export filesystems from Linux-based systems to machines running Windows is by using samba. Samba uses the CIFS protocol to make files available to Windows hosts.

It is beyond the scope of this HOWTO to cover Windows administration or Samba configuration, but there are many good online resources and books that cover the basics and more.

If your CLN already has samba installed, you can see it in the output of the dpkg command.

makki:~# dpkg -l | grep samba
ii  samba          3.0.22-1    a LanManager-like file and printer server fo
ii  samba-common   3.0.22-1    Samba common files used by both the server a
makki:~#

If you don't have samba already, it is easy to install using APT.

makki:~# apt-get update && apt-get install samba

If you are new to Samba, here are some resources that will help get you started.

Using Filesystems Quotas

The CLN supports quotas via the xfs_quota command. The examples below are based on the ones in the xfs_quota man page.

The NFS clients have a user "ecashin" with the user ID number 1000, so we create a quota for ecashin in the example. Some commands take a while to run on a large filesystem.

makki:~# umount /mnt/md0
makki:~# mount -o uquota /dev/md0 /mnt/md0
makki:~# xfs_quota -x -c quot
/dev/md0 (/mnt/md0) User:
125034628   #1000
 2351424    root
  695432    #500

The "quot" scans the filesystem and takes a while to complete. It shows that the user with user ID 1000 is currently using 125034628 blocks.

makki:~# xfs_quota -x -c state
User quota state on /mnt/md0 (/dev/md0)
  Accounting: ON
  Enforcement: ON
  Inode: #3891717 (3 blocks, 3 extents)
Group quota state on /mnt/md0 (/dev/md0)
  Accounting: OFF
  Enforcement: OFF
  Inode: #18446744073709551615 (0 blocks, 0 extents)
Project quota state on /mnt/md0 (/dev/md0)
  Accounting: OFF
  Enforcement: OFF
  Inode: #18446744073709551615 (0 blocks, 0 extents)
Blocks grace time: [7 days]
Inodes grace time: [7 days]
Realtime Blocks grace time: [7 days]
makki:~#

We now set quota limits for user 1000 a bit beyond the current usage using xfs_quota interactively.

makki:/home/ecashin# xfs_quota -x /mnt/md0
xfs_quota> path
      Filesystem          Pathname
[000] /mnt/md0            /dev/md0 (uquota)
xfs_quota> limit bsoft=122g bhard=123g 1000
xfs_quota> quota -h 1000
Disk quotas for User ecashin (1000)
Filesystem   Blocks  Quota  Limit Warn/Time    Mounted on
/dev/md0     119.5G   122G   123G  00 [------] /mnt/md0
xfs_quota>

The -h option, like the one for df shows amounts in human-readable form.

On an NFS client named kokone, user ecashin with user ID 1000 is using the CLN via NFS.

ecashin@kokone ecashin$ id
uid=1000(ecashin) gid=1000(ecashin) groups=4(adm),24(cdrom),29(audio),1000(ecashin)
ecashin@kokone ecashin$ mount | grep nfs
makki:/mnt/md0 on /mnt/makki type nfs (rw,tcp,addr=205.185.197.212)

User ecashin tries writing files into a test directory.

ecashin@kokone ecashin$ d=/mnt/makki/ecashin/test
ecashin@kokone ecashin$ mkdir $d
ecashin@kokone ecashin$

He makes successive 1MB files until his quota is reached. The loop uses the dd command to create 1MB files until there's an error. The $md variable keeps track of which megabyte dd is trying to write.

It takes a while, because ecashin has about 3.5 gigabytes to go.

ecashin@kokone ecashin$ mb=0   # the number of megabyte written
ecashin@kokone ecashin$ while mb=`expr $mb + 1`; do
> dd if=/dev/zero of=$d/$mb bs=1M count=1 2>/dev/null || break
> done; echo $mb were written
3333 were written
ecashin@kokone ecashin$

Now that the quota has been exceeded, ecashin can't use any more space.

ecashin@kokone ecashin$ dd if=/dev/zero of=$d/newfile bs=1M count=1
dd: opening `/mnt/makki/ecashin/test/newfile': Disk quota exceeded
ecashin@kokone ecashin$ mkdir $d/newdir
mkdir: cannot create directory `/mnt/makki/ecashin/test/newdir': Disk quota exceeded

In addition to user quotas, XFS supports group quotas and "project" quotas. The xfs_quota man page has details.

To include options like "-o uquota" in /etc/aoe/fs.conf, so that the filesystem is mounted when the CLN boots, simply put the entire mount command on a line in that file.

Using Netconsole to See Kernel Messages

The CLN's kernel has support for a feature called "netconsole". You can use it to send the console messages from the CLN's kernel to another host on your front-side network.

Its configuration in /boot/grub/menu.lst looks complicated, but it isn't too bad once you break up the magical incantation into small pieces.

Because netconsole starts early in the system boot procedure, it needs some low-level information, namely the ethernet address of the host that will receive the netconsole messages. The ethernet address is called the "hardware address" or "MAC address", as opposed to the higher-level "IP address".

The easiest way to find the ethernet address of the receiver host is to ping that host and then run arp.

makki:~# ping -c 1 kokone
PING kokone.coraid.com (205.185.197.207) 56(84) bytes of data.
64 bytes from kokone.coraid.com (205.185.197.207): icmp_seq=1 ttl=64 time=0.275 ms
--- kokone.coraid.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.275/0.275/0.275/0.000 ms
makki:~# arp
Address                  HWtype  HWaddress           Flags Mask            Iface
kokone.coraid.com        ether   00:0D:87:AA:C9:00   C                     eth0
makki:~#

The output of arp says that kokone has the ethernet address 00:0D:87:AA:C9:00. It's convenient that ping shows the IP address 205.185.197.207, because we need that too.

You already know the CLN's IP, but you can look it up again.

makki:~# ifconfig | awk '/^eth0/{getline; print $2;}'
 addr:205.185.197.212

The file, /boot/grub/menu.lst, is managed by the update-grub program. You can add a netconsole option to the default kernel boot parameters. Look in /boot/grub/menu.lst for the part labeled "Start Default Options".

In that section, lines beginning with a single "#" are used by update-rc. Lines beginning with "##" are just comments. You add the netconsole boot parameter to the existing parameters.

The original line looks like this:

# kopt=root=/dev/hda1 ro acpi=off

… and you add to the end of that line so that it looks like this instead, with the replacements listed below:

# kopt=root=/dev/hda1 ro acpi=off netconsole=4444@205.185.197.212/eth0,6666@205.185.197.207/00:0D:87:AA:C9:00

Now use the update-grub command to propogate your default boot parameters to all the kernels.

makki:~# update-grub
Searching for GRUB installation directory ... found: /boot/grub .
Testing for an existing GRUB menu.list file... found: /boot/grub/menu.lst .
Searching for splash image... none found, skipping...
Found kernel: /boot/vmlinuz-2.6.13.3-c1
Updating /boot/grub/menu.lst ... done

Now after rebooting the CLN you can listen for incoming UDP messages on port 6666 on the receiver, e.g., with the netcat tool. Remember that this is not the CLN but the remote host receiving the messages.

ecashin@kokone ecashin$ nc -l -p 6666 -u
Bootdata ok (command line is root=/dev/hda1 ro acpi=off netconsole=4444@205.185.197.212/eth2,6666@205.185.197.207/00:0D:87:AA:C9:00)
Linux version 2.6.13.3-c1 (root@makki) (gcc version 4.0.1 (Debian 4.0.1-2)) #7 SMP Fri Nov 4 09:31:27 EST 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)

(… and so forth.)

If you want to capture the console output instead of just viewing it, you can redirect the output or pipe it to tee. The command below daemonizes the netcat process on the receiver host.

root@kokone /# (sh -c "nc -l -p 6666 -u 2>&1 | tee /var/log/makki-console" &)

Sending Console Messages from Every CLN

It isn't difficult to use netconsole on all of your CLN units, but each CLN should send its netconsole messages to a different port on the receiver host.

In the examples above, the receiver is listening to port 6666. For every CLN host, use a different receiver port, replacing 6666 with a different number for each CLN you have.

On the host that's receiving all the netconsole messages you'll run one netcat listener for each CLN you have, replacing 6666 with the port each CLN is writing to.

Sending Console Messages outside the LAN

To send messages to a host that isn't in the same ethernet broadcast domain as your CLN, use the MAC address of the network gateway instead of the MAC address of the receiver. That will allow the netconsole UDP packets to be routed.

If you can't remember the default gateway's IP address, you can find it using the route command. Then just ping the gateway and use arp to see its ethernet address.

makki:~# /sbin/route | awk '/^default/{print $2}'
205.185.197.250
makki:~# ping -c 1 205.185.197.250
PING 205.185.197.250 (205.185.197.250): 56 data bytes
64 bytes from 205.185.197.250: icmp_seq=0 ttl=64 time=0.9 ms
--- 205.185.197.250 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.9/0.9/0.9 ms
makki:~# arp | fgrep 205.185.197.250
205.185.197.250          ether   37:02:F4:00:60:0F   C                     eth0
makki:~#

Using APT to Manage the CLN Software

The CLN comes with a sophisticated package management system made famous by the debian distribution. It is APT, the Advanced Packaging Tool.

NOTE! The Coraid website recently underwent changes that make it necessary to edit /etc/apt/sources.list. If your sources.list file contains the text "www.coraid.com", replace it wherever it occurs with "support.coraid.com", leaving everything else as it stands.

As a preliminary step to using APT, it is a good idea to check for a new version of coraid-aptcfg. If a new version is installed, be sure to run "apt-get update" a second time afterwards.

makki:~# apt-get update       # repeat if needed
makki:~# apt-get install coraid-aptcfg

If you have an old CLN, you might need to perform the steps listed in the CLN FAQ: http://support.coraid.com/support/cln/faq.html.

The most conservative way to update CLN software is to selectively update specific packages. For example, to take advantage of bugfixes and improvements in the CLN kernel, aoe driver, aoetools, and cec packages, you can do …

makki:~# apt-get update       # repeat if needed
makki:~# apt-get install coraid-{kernel,aoe,aoetools,cec}

The above example works just as if you had typed out "coraid-kernel coraid-aoe coraid-aoetools coraid-cec". Dependencies that also need to be updated will be detected by APT.

The CLN has limited space on its flash disk, and you can recover some free space after upgrading by getting rid of the deb packages after they've been installed.

makki:~# apt-get clean

Searching for Packages

APT makes it easy to search for and manage packages. Here is an example that shows how to find out what package owns the /etc/init.d/cln-init file, displays information about the package using apt-cache, and then upgrades only that package (and any dependencies).

apt-get update        # repeat if necessary
dpkg -S /etc/init.d/cln-init
apt-cache show coraid-init
apt-get install coraid-init

Here is an example showing a search for packages that have to do with telnet.

apt-cache search telnet | less

Merging Configuration Files

When upgrading the CLN software new packages sometimes include a newer version of a configuration file. The new file might include important changes, but your current configuration files contain critical information.

If you haven't changed a config file, then apt-get upgrade will just install the new one, but if you have made changes apt-get allows you to control how the config files are updated.

When you're asked about a config file, like /etc/aoe/fs.conf, make sure you keep the configuration changes you have made. A sure way to do that, when you are prompted by apt-get, is to …

  • use "z" to get a shell (or just ssh in again or use a different virtual terminal),
  • use bkp to backup your current config file,
  • use "d" to examine the difference between your current config file and the new one,
  • install the new one with "i",
  • and finally (and most importantly) transfer the important parts from your backup to the new file using an editor.

Extra Virtual Memory for APT

The CLN has no swap area, and it's using its memory for important work, so it is more stingy about virtual memory (VM) than some tools expect. These tools ask for more memory than they really need.

If apt-get complains that it is out of memory, running it again will often succeed, simply because most of the work has already been done. If you still have problems, you can make the CLN's kernel use a more liberal VM policy while you use the software management tools.

sysctl vm.overcommit_memory=0
sysctl vm.overcommit_ratio=50
apt-get update
apt-get upgrade
sysctl vm.overcommit_memory=2
sysctl vm.overcommit_ratio=20

Sweeping Upgrades

On an important server, it is often best to upgrade software when there is a specific need, but if you would like to upgrade all packages on the CLN, you can.

Upgrading all of the software on the CLN is as simple as issuing the following commands.

makki:~# apt-get update       # repeat if needed
makki:~# apt-get upgrade

Those commands cause APT to update its cache of package information from software repositories in /etc/apt/sources.list and then to upgrade all the CLN-specific packages.

Sometimes when important packages have changed, you need to use dist-upgrade to tell APT to figure out how to upgrade them.

makki:~# apt-get dist-upgrade

Remember that after upgrading the kernel, it's best to reboot, so keep an eye out for kernel upgrades.

The APT HOWTO is a good place to learn more.

Getting Secure SSH Host Keys

This step will be unecessary in future CLN releases. Once networking is set up, you can upgrade openssl and openssh. You will then have a new set of host keys for the CLN. This step works around the security problem, CVE-2008-0166.

makki~# apt-get update        # repeat if needed
makki~# apt-get install openssh-{client,server} openssl

Staying Informed

Staying informed of important developments will help you to ensure that your CLN continues to provide uninterrupted file services.

Mailing List

There is a mailing list for CLN-related announcements. You can email support@coraid.com to request to be added to the CLN Announcements mailing list.

Changelog

On the Coraid website you will find a changelog for Coraid Linux.

http://support.coraid.com/support/cln/changelog


www.coraid.com