Corrupt file system: Replace hard disk or not?
Eli Billauer
eli at billauer.co.il
Fri Sep 22 12:11:00 IDT 2017
Hello all,
TL;DR: My hard disk's filesystem was corrupt, but the SMART statistics
is perfect. Should I replace the hard disk?
Full version:
It seems like one of my hard disks has passed its own premature Yom
Kippur verdict. Rebooting my computer this morning, it failed to mount,
saying "Group descriptor 32768 checksum is invalid" and forced me into a
shell.
I made the mistake (?) of running fsck and then aborting it with a
(proper CTRL-ALT-DEL) reboot, as it took ages. This is a 3 TB disk,
which isn't necessary for booting, so I removed it from /etc/fstab, and
brought up the computer fine.
Then I ran fsck on that disk, which generated a log of 125 MB, and
basically threw everything into /lost+found, leaving nothing in the root
directory. Hurray.
It's a Western Digital WDC WD30EZRX-00DC0B0, with one big ext4 over LUKS
over LVM, 4 years in service, containing stuff that doesn't deserve a
backup. So the damage is limited, but I wonder if I should replace the disk.
Despite its age, this disk's SMART status is perfect: No bad sectors, no
reallocated sectors, nothing. No parameter can be better. I know there's
a "don't trust SMART" word around, but had a sector failed, I would
expect that to appear in the statistics. I mean, I do understand that
SMART can't predict a failure, but doesn't it mean anything?
And there's another thing: The reason a rebooted the computer was that I
found the screen frozen, but the mouse pointer moved. The time stood
still at 3:01 (AM). This is highly unusual on my computer, which usually
runs of months with zero issues.
So I connected with ssh, and saw nothing suspicious: Not in
/var/log/messages, not in dmesg, not in .xsession-errors. No process was
busy in particular. From the remote terminal, I couldn't have guessed
something was wrong. So I issued a reboot from remote, which failed as I
mentioned above.
Bottom line: The panic instinct is to replace the disk, even though the
whole computer is due for replacement within a year or so. Money left
aside, it's a bit of an effort, and involves a lot of scary commands as
root, which are a risk factor by themselves. I'm not implying that I'm
stupid enough to mke2fs the wrong disk. Not me. I never err. ;)
Insights are welcome.
Shana Tova,
Eli
--
Web: http://www.billauer.co.il
More information about the Linux-il
mailing list