-- Leo's gemini proxy
-- Connecting to gmi.noulin.net:1965...
-- Connected
-- Sending request
-- Meta line: 20 text/gemini
date: 2022-02-06 22:43:28
categories: linux
firstPublishDate: 2021-11-07 16:47:13
I need to know the health of my disks because I use my computers until they fails. In general, the power supply fails first then hard disks then RAM and CPU.
When a disk fails, I restore the data from backup. With the ZFS filesystem, I check the integrity of my data and my backups.
The disk health information is provided by SMART and is displayed with `smartctl`. All commands have to be run as root.
To install `smartctl`, run:
apt-get install smartmontools
Then choose the disk you want to check:
lsblk #or ls /dev/disk/by-id/
Then run:
smartctl -a /dev/sdX
Newer disks provide more information like Form_Factor, Head_Flying_Hours, ...
Very important parameters to check are, among the others, *Reallocated_Sector_Ct* and *Current_Pending_Sector*. The Reallocated_Sector_Ct is the count of sectors on the block device which cannot be used correctly. When such a sector is found it is remapped to one of the available spare sectors of the storage device, and data contained in it is relocated. The Current_Pending_Sector attribute, instead, is the count of bad sectors that are still waiting to be remapped. If you want to know more about the S.M.A.R.T attributes and their meaning, you can begin to take a look at the
.
`smartctl` can also be used to start the self-tests:
smartctl -t short /dev/sdX
When the test is finished, the result is shown with the command:
smartctl -a /dev/sdX
For more information about the self-tests, read `man smartctl`.
On my Toshiba nvme ssd, `smartctl` doesn't give a lot of information and it is not possible to run self-tests
I have ZFS on my disks and to check the health of the file system, I run:
zpool list zpool scrub myPool
The scrub command is fast, it takes a few seconds for multiple TB of data. `zpool scrub` starts a background process that check the pool, the status is displayed with the command:
zpool status
I want to run scrub regularly and get an email when my pools are unhealthy as described in this serverfault post:
Configuration for the ZED is located in /etc/zfs/zed.d/zed.rc
I set my email address and my email program (mutt):
ZED_EMAIL_ADDR="myemail@example.com" ZED_EMAIL_PROG="mutt"
zed sends an email only the pool is degraded like this:
ZFS has finished a scrub: eid: 23 class: scrub_finish host: nuc time: 2022-02-06 18:08:12+0200 pool: rpool state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:05:24 with 2 errors on Sun Feb 6 18:08:12 2022 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 ata-WDC_WDS100T1B0A-00H9H0_164710800985-part4 DEGRADED 0 0 2 too many errors errors: 2 data errors, use '-v' for a list
And I setup a cronjob to scrub my pools regularly:
crontab -e 0 1 * * 4 /root/bin/scrub.sh # scrub.sh: zpool scrub rpool zpool scrub bpool
These jobs are setup in the root crontab.
The first cronjob scrubs the pools and the second job check the string returned by `zpool status -x`, it should be:
pool poolName is healthy.
When this string is not found a mail is sent.
hashtags: #zfs
-- Response ended
-- Page fetched on Wed May 22 02:02:58 2024