Adding Storage // HomeLab Adventures

Keeping an Eye on Hard Drives

My server is currently set up with two ZFS pools; a RAIDZ1 pool composed of 4 2TB SATA-3 drives that I use for media storage and docker container storage, and another RAIDZ1 pool composed of 5 1TB SATA-2 drives that are extremely old. I wanted to make sure I was keeping a sharp eye on them, so I set up some monitoring to email me when any errors are found.

Installing SMART

The simplest way to keep an eye on your drives is by monitoring their SMART status. SMART is a hardware-level technology that monitors the current status of your hard drives and can, to some degree, predict imminent failures. The vast majority of hard drives out today support SMART, so all we need to do is check the SMART status of each drive. Even better, lets set it up to run automatically and email us if there are any problems!

The smartmontools package for Ubuntu is a great way to manage SMART drive monitoring and alerts. This package includes the smartctl command line executable for ad-hoc monitoring as well as the smartd service that can monitor and alert automatically. Install it with sudo apt install smartmontools

Inspecting Drive Health with smartctl

Let’s see what we can monitor with smartctl. I start by running it on one of my drives like so: sudo smartctl -a /dev/sdg. Here’s the first part of the results:

> sudo smartctl -a /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-76-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi/HGST Ultrastar 7K4000
Device Model:     HGST HUS724030ALA640
Serial Number:    PN1234P9GNWR2X
LU WWN Device Id: 5 000cca 248c97f73
Firmware Version: MF8OAC50
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 11 12:56:50 2020 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

This shows information about the drive, as pulled from the SMART capabilities of the drive itself. If you have a lot of drives, this is extremely useful in making sure you’re looking at the right one!

Next up is the General section:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
    was never started.
    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
    without error or no self-test has ever
    been run.
Total time to complete Offline
data collection:   (   24) seconds.
Offline data collection
capabilities:     (0x5b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    No Conveyance Self-test supported.
    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
    General Purpose Logging supported.
Short self-test routine
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 432) minutes.
SCT capabilities:         (0x003d) SCT Status supported.
    SCT Error Recovery Control supported.
    SCT Feature Control supported.
    SCT Data Table supported.

And that first sentence is good news! This drive is healthy! The rest of this section details what SMART capabilities this drive has. Next up are specific SMART attribute values:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   054    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       4
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 23/37)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

Lots of good data here, but the last five are the particularly important ones. The rest of the information output from smartctl shows the error and self-test logs (empty for this drive, as it’s brand new).

Monitoring Drive Health with smartd

While I could run this from the command line for each of my 12 drives, that would be really annoying. So instead, let’s set up the SMART daemon smartd. We do this by first configuring our drives in /etc/smartd.conf. This file is almost entirely filled with different examples of how to configure different drives with different controllers, and it’s absolutely worth reading through this as well as the smartd man pages. What I want is to get emails whenever something bad happens to my drives, such as excessive temperature, bad sectors, or various other errors that results in a failed self-test. Further, a single bad sector sometimes just happens; I only want to know when the bad sector counts are increasing.

The first step is to comment out the DEVICESCAN directive. Best practice is to set up each of your drives individually. And since I have hot-swap bays, I don’t want to use the /dev/sda nomenclature, but rather want to drill down to the /dev/disk/by-id/ drive names. This ensures that if I have special rules for a particular drive that it will follow that drive if it gets moved around.

Next up is to declare a DEFAULT line that will apply to all my drives. Mine is set up like this:

# Default settings for all drives:
# -f            Check for failure of any Usage Attributes.  Use -i ID to ignore specific ID (1-255)
# -C +          Check if current number of pending sectors has increased
# -U +          Check if current number of uncorrectable sectors has increased
# -H            Check health status pre-fail attributes
# -l error      Check if num errors in Summary error log has increased since last check
# -W 5,45,55    Check if temp has increased by at least 5 deg, log if over 45, warn if over 55
# -s (O/../../[1-6]/01|S/../../0/01)    Run Offline (Mon-Sat) or Short (Sun) tests at 1am
# -m admin@example.com                  Mail report to cloudadmin@flora.family
# -M daily                              Mail daily reports
DEFAULT -f -C + -U + -H -l error -W 5,45,55 -s (O/../../[1-6]/01|S/../../0/01) -m admin@example.com -M daily

# OS Drives (SSD)
/dev/disk/by-id/ata-OCZ-AGILITY3_OCZ-RM2UOIAMWW73DI4X
...

I’ve cut out the list of all 12 drives for clarity here. By default, the individual drives don’t have any additional settings. However, old drives can have high, but stable, values for many of the usage attributes that can lead to false positives. If that starts happening, you can turn off warnings for those attributes by adding the -i ID command after the individual drive. For instance, to turn off checks on Spin_Up_Time for just the OCZ drive:

# OS Drives (SSD)
/dev/disk/by-id/ata-OCZ-AGILITY3_OCZ-RM2UOIAMWW73DI4X -i 3

Once you have your drives configured, let’s make sure smartd is up and running.

> sudo systemctl start smartd
> sudo systemctl enable smartd
> sudo systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
   Loaded: loaded (/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-02-11 09:17:01 EST; 4h 14min ago
     Docs: man:smartd(8)
           man:smartd.conf(5)
 Main PID: 3603 (smartd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/smartd.service
           └─3603 /usr/sbin/smartd -n

Feb 11 09:47:15 sol smartd[3603]: Device: /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P8JES3YR [SAT], Temperature changed +6 Celsius to 35 Celsius
Feb 11 09:47:16 sol smartd[3603]: Device: /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P9GMUEVW [SAT], Temperature changed +5 Celsius to 34 Celsius

If smartd doesn’t show that it’s loaded and active, take a look at the logs to determine why with journalctl -xe -u smartd. It’s most likely a syntax error in smartd.conf. For instance, make sure there’s a space between -C and + (that got me the first time).

NOTE: smartd uses mail to send notifications by email. Make sure you have that properly setup and configured first. I have mail configured to send all mail from a single address defined in my Namecheap mail hosting. The smartd.conf mail configuration above identifies who to send the mail to. Getting email notifications from system tools is an essential part of properly managing your server, so make sure this is setup and working. See this DigitalOcean article for a great HOWTO on setting up postfix, mail, and system aliases. I will create a post on this in the future as well.

You can test the smartd configuration with a test config file. Let’s create a file beside smartd.conf:

sudo nano /etc/smartd.conf.test

And we’re going to add a simple test run on one drive (make sure to replace the email below with yours):

/dev/sda -m user@example.com -M test

Now we can run this test with sudo smartd -c /etc/smartd.conf.test. If everything’s working, we should get this message shortly.

SMART Errors!

Once I had this installed, I was quite surprised to immediately get notification of errors on 4 of my drives!

This message was generated by the smartd daemon running on:

   host name:  sol
   DNS domain: flora.family

The following warning/error was logged by the smartd daemon:

Device: /dev/sdg [SAT], 3 Offline uncorrectable sectors

Device info:
SAMSUNG HD103SJ, S/N:S246J90B114526, WWN:5-0024e9-20438de02, FW:1AJ10001, 1.00 TB

Admittedly, they are all very old 1TB drives that I’ve had running hard for at least 5 years now. They were all part of my 5 disk backup zpool, which I’ve been meaning to overhaul anyway. Thanks to my pro-active monitoring, I can see that they’re having issues which were slowly getting worse over the 3 weeks I monitored them with these tools. Instead of waiting for them to die and scrambling to replace them, I can plan their replacement.

What’s the Plan?

I first looked at maxing out my available hot-swap bays with 2TB drives. I have a total of 12 bays; 2 are for the SSD drives I use for the OS and there are 4 2TB drives that I currently use for my library zpool. smartd reports that these are in great shape, so we’ll keep them. So that leaves 6 open bays. After some quick research, I found that refurbished 3TB drives were available at almost the same price as 2TB drives - around $40 each. 4TB drives started at $65 and quickly went up from there. I really try to keep costs down as low as possible, and I always try to remember that RAID stands for Redundant Array of Inexpensive Disks! And of course if this were an enterprise environment, I’d never be able to justify refurbished over new, but for a home lab setup it seems like a good deal. The 1TB refurbished Samsung’s lasted me at least 5 years, and the 2TB refurbished Hitachi’s are still going strong after 2 years; hopefully these refurbished HGST drives will last at least as long!

So the new plan is to replace the 5 1TB drives with 6 3TB drives. I’ll move my existing library zpool from the 4 2TB drives to the 6 3TB drives and move the backup zpool from the 5 1TB drives to the 4 2TB drives. We’ll do mirrored sets on library (9TB sets mirrored) and backup (4TB sets mirrored). Having mirrored sets is a lot faster than RAIDZ2, is much easier for recovery from drive failure, and is much easier to expand. Luckily, we don’t need all of our storage space on library backed up, so the smaller size on backup is perfectly acceptable for my uses. The long-term plan will be to max out all of the drives in this case with 3TB drives (which is simple for mirrored sets and basically impossible for RAIDZ), and create a second, more powerful server with even larger drives. That way we can use the current server just for backups, and have the new server be the primary data store and container server.

But back to the project at hand! We’ll first need to create the new zpool on the 6 3TB pool, copy the datasets from the 4 2TB pool to the 6 3TB pool, and delete the pool from the 4 2TB drives. Of course, the deleting will happen after thorough testing. To make everything as seamless as possible, we’re also going to name to the new pool the same as the old (meaning we’ll have to give the new one a different name to start and then rename it after everything’s done).

We could also do the same for moving the backup zpool from one set of drives to the other, but instead I want to implement a new & improved backup strategy. So (after the library move is tested!) we’ll be destroying the backup zpool, creating a new one on the 4 2TB drives, and implementing a new backup strategy.

Clean All The Things!

The first step is to bring everything off-line. Well, there are certainly ways to do this without bringing everything off-line for an extended period (such as using the zfs send and recv commands on snapshots). But in my case, I don’t have anything critical running, so bringing everything off-line will allow me time to do some server spring cleaning and a couple quick upgrades.

You see, while I was installing my new 3TB drives I noticed they were only coming up as SATA-2. A quick look at my motherboard manual confirms that the 6 SATA ports on my MB are just SATA-2. There’s a total of 5 SATA-3 ports, but I’m using two of those for the OS drives. And the existing library zpool is connected via a 4-port SATA-3 PCIe card. So I may as well go ahead and get a second SATA-3 PCIe card with 6 ports. There’s plenty of expansion ports available, and the card I chose was only $36 from amazon. With this installed, I now have my 2 SSD drives (used for the OS) connected to the SATA-3 ports on the motherboard, the existing set of 4 2TB drives on the 4-port SATA-3 expansion card, and the new set of 6 3TB drives on the new 6-port SATA-3 expansion card. This leaves 3 SATA-3 and 4 SATA-2 ports open on the motherboard if I need them down the road; a little unused capacity is good news for uptime! This 6 port SATA expansion card is a PCIe x4, but don’t worry if you don’t have any x4 slots; you can put an x4 (or an x1) card in an x16 slot, and I have plenty of those.

SATA Expansion Cards

And whenever you have your server off and open, it’s always worthwhile to double-check that all the cables are snugged & plugged, that everything is organized so as to not impede airflow, and that no cables might be blocking a fan. And last thing before buttoning it back up is to break out the air can to remove as much dust as possible.

Creating the New zpool

First up is to bring down everything using the ZFS drives. The backup pool has two clients; it hosts a TimeMachine backup point for the Macs in the house, and I have znapzend pointed to it for ZFS snapshot backups. So I removed the TimeMachine network location from each Mac (they all also have local backups and external drive backups). I then removed the znapzend remote configuration from the library zpool. We don’t want to get rid of anything yet; let’s keep these backups safe and sound until we’ve verified that the copy of the library zpool was successful! But we can go ahead and carefully remove the drives and keep them in a safe place. Once they’re all removed, we can replace them with the 6 3TB drives that we’re going to copy library over to.

Next is to bring down everything using the library zpool. With the exception of a few local network media clients, everything using this pool is hosted in docker containers stored on the pool. So I used docker-compose to bring all of these down.

Now I need to rename the library zpool. The simplest way to do this is to first export the pool: sudo zpool export library. This unmounts all datasets in the pool and disconnects the pool from the ZFS service. If it fails, that typically means that some service or other is still actively using the zpool, or that you’re in the zpool in a command-line instance.

Once the pool is exported, you simply import it back in with a different name: sudo zpool import library library-old - easy-peasy!

With that done, we can create our new RAIDZ2 pool on the 6 3TB drive set. Again, best practice here is not to refer to the simple /dev/sdc drive nomenclature, but to use the actual drive id’s. It makes the command a lot harder to parse, but makes managing these disks much easier in the long run.

sudo zpool create library-new \
    mirror \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN1234P9GNWR2X \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P8JES3YR \
    mirror \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P9GMUEVW \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P8JES4LR \
    mirror \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P8JES38R \
      /dev/disk/by-id/ata-HGST_HUS724030ALA640_PN2234P9GXKBTY

With ZFS you can also add SLOG and L2ARC drives that might improve performance, or at least lengthen the life of your data drives. It’s really easy to set up, but difficult to see if it’s making any performance increases (especially for the L2ARC). In fact, many discussions online recommend against the L2ARC for personal use. On the other hand, small SSD drives these days are just dirt cheap, and I thought it would be fun to try it out. I’ve maxed out my hot-swap bays already, and I don’t want to partition the OS SSD’s to put the log or cache there. So in addition to the new SATA card I was already getting, I got a pair of 120GB SSD drives for $19 each. I also found a tray for mounting 2 2.5” SSD drives in a 3.5” bay; while I don’t have a 3.5” bay, I was easily able to attach this tray to the support bar running through the middle of the case which conveniently had pre-drilled holes.

New Drive Cage

What I’m going to do is make two partitions on each drive. The SLOG does not require much space at all, so I’m going to create a 4GB partition on each disk, and add those as a mirrored log. For the L2ARC, I’ll use the remainder of each drive and add those as a plain (not mirrored or RAID’ed) cache.

sudo zpool add library log \
    mirror \
      /dev/disk/by-id/ata-SATA_SSD_19111812002747-part1 \
      /dev/disk/by-id/ata-SATA_SSD_19111812002750-part1

sudo zpool add library cache \
      /dev/disk/by-id/ata-SATA_SSD_19111812002747-part2 \
      /dev/disk/by-id/ata-SATA_SSD_19111812002750-part2

You can check the utilization of the log and cache by running zpool iostat -v. The cache will grow slowly over the course of days or weeks (depending on utilization) as it caches more and more data. There’s no harm on it filling up; that’s when it’s most efficient. But if you get close to the max on your SLOG’s, you will want to increase the available size to those by removing the log and cache from the zpool and resizing their partitions. There’s nothing in either of these that will affect data integrity, so you can completely destroy the partitions and recreate them with more appropriate sizes.

With the zpool created, lets ensure that it’s set up how we want. The most important thing that I want to do is enable compression on the entire zpool. ZFS is pretty smart with compression; it doesn’t attempt to compress files that are already compressed, and using the lz4 algorithm results in extremely little overhead or performance hit. We can enable that with sudo zfs set compression=lz4 library. This setting will cascade to all child datasets, but you can also override it for specific datasets if a different compression algorithm or no compression would work better. For instance, there are multiple gzip algorithms at varying compression/performance ratios. But whatever compression scheme you choose, make sure to set it now before the data transfer! This setting only applies to new files added to the dataset; any files already on the dataset before this change is made will be unaffected.

Moving the zpools

We’re going to use the built-in ZFS commands for this process, which will have to be repeated for each dataset. The first step is to take a snapshot of the filesystem:

sudo zfs snapshot -r library/christopher@dataxfer

Here the -r option indicates that we want to recursively copy all nested datasets. IE, if I had a library/christopher/temp dataset, the -r option would include that in the snapshot. The @ indicates that we’re working with a snapshot, and dataxfer is the name of the snapshot. We’ll need to do this for each parent dataset.

Once we have these snapshots, we can send them to the new zpool. We do this with the zfs send and zfs recv commands. We use zfs send to create the datastream from the snapshots we just made on the old zpool, and use zfs recv to capture that datastream and save the data in the new zpool. Depending on the size of the dataset, this can take some time!

WARNING: If you’re doing these commands through an ssh shell, be warned that your session will likely expire somewhere in the middle of the transfer, terminating the send/receive and leaving your zpool in a bad place! Best practice here is to use a terminal multiplexer app like tmux. You would ssh to your server, open tmux, and then execute the zfs send/receive command. You can then detach tmux with ctrl-b d and then can safely exit your ssh shell. Use tmux attach to reopen that tmux session to see if everything’s complete.

sudo zfs send library/christopher@dataxfer | sudo zfs recv library-new/christopher

You can keep an eye on the progress of your transfers as well the compression ratios with the following command:

zfs list -t filesystem -o name,used,refer,compressratio,compress

This will show all of the datasets in both library-old and library, so you can see the dataset in progress.

NOTE: The beauty of copying datasets with ZFS snapshots and send & recv is that the whole process can be done on a live system with minimal downtime! zfs snapshot takes an image of your data at a particular point in time, and send pushes that snapshot. So you can start the process with a snapshot on your live system, which will take quite a while. Once the first pass is done, you can take the dataset off-line, take a second snapshot, and transfer that. Since the snapshots record differences since the previous snapshot, this one will be much quicker.

Once it’s all done, check over the resulting files to be sure everything is there as you expect. Any important files need to be visually inspected to be safe, but we can use the diff command to do a quick check of dataset against dataset:

sudo diff --brief --recursive /library/christopher /library-new/christopher

Once you’re sure everything is where it’s supposed to be, double-check to ensure file permissions are set correctly.

Finally, let’s do one last test by renaming the datasets. We’re going to rename library to library-old, and rename library-new to library.

sudo zpool export library
sudo zpool import library library-old
sudo zpool export library-new
sudo zpool import library-new library

NOTE: Between step 2 & 3, check your directory structure to ensure all of the old library directories were moved to library-old. The first time I tried this, I was unable to import library-new as library because the mount points hadn’t changed for library-old. This was easily corrected by executing sudo zfs set mountpoint=library-new/christopher library-new/christopher for each dataset that didn’t move. Once all directories are in the correct places, you can move library-new.

You should now be able to spin up your docker containers or VM’s on the newly imported library and everything should just work. If you’re having issues at this point, feel free to reach out in the comments.

When you’ve confirmed that everything is up and running as it should be, it’s time to bid a tearful adieu to library-old:

sudo zfs destroy library-old

ZFS Maintenance

Now that we have everything set up exactly how we want it, we also want to ensure that we’re keeping our zpools in good shape. The only maintenance required for zpools is a recurring zfs scrub, which can be set up with a timer via systemd. To set this up, we’ll need to create two files:

sudo nano /etc/systemd/system/zfs-scrub@.timer

[Unit]
Description=Monthly zpool scrub on %i

[Timer]
OnCalendar=*-*-1,15 2:00:00
AccuracySec=1h
Persistent=true

[Install]
WantedBy=multi-user.target

sudo nano /etc/systemd/system/zfs-scrub@.service

[Unit]
Description=zpool scrub on %i

[Service]
Nice=19
IOSchedulingClass=idle
KillSignal=SIGINT
ExecStart=/sbin/zpool scrub %i

The timer is set to run the scrub on the 1st and 15th of each month at 2am. We’ve set the timer & service up with a wildcard so that we can reuse them for each zpool we create. For library, we’ll want to enable it like so:

sudo systemctl enable zfs-scrub@library
sudo systemctl start zfs-scrub@library

A zfs scrub will not interfere with normal disk access, which is good because they can take hours! To check on the status of your scrubs, run zpool status.

> zpool status
  pool: library-old
 state: ONLINE
  scan: scrub repaired 0B in 3h8m with 0 errors on Sun Feb  9 03:32:36 2020

Ubuntu’s ZFS package comes out-of-the-box with email notifications on scrub finish events, so if you’ve successfully set up mail for the SMART logging at the beginning of this article, you’ll also be getting emails with the results of your dataset scrubs. You can try it out by starting a scrub immediately sudo zfs scrub library, although be warned that it will take several hours to complete. The script that sends the email is located at /etc/zfs/zed.d/scrub_finish-notify.sh and you can modify it if you like.

What’s Next?

We have to implement a new backup strategy, and we have to do it as soon as possible. We’ll discuss that in the next post. We also want to keep an eye on the SMART monitoring we’ve set up, and the zfs scrub schedule. And it’s always fun to keep an eye on the results of zpool status and zpool iostat -v to make sure everything’s running smoothly.