Corrupt file system: Replace hard disk or not?
Eli Billauer
eli at billauer.co.il
Sat Sep 23 12:56:47 IDT 2017
Hello,
Actually, I got the SMART data from Gnome's Disk Utility. It gives me
the number of (relocated) bad sectors found ever.
I do have certain issues with my ATI graphics card, and given the
coincidence, it might have written DMA data where it shouldn't.
But as I mentioned before, I really looked for something wrong before
that crucial reboot, in all possible logs. Nothing. And still nothing
after that.
Looks like a strike of bad luck.
Regards,
Eli
On 22/09/17 18:21, Borissh1983 wrote:
> I'm assuming you atcually had run smart scan to do set the counters
> (few hours per scan), what you describe sounds like something caused
> by an X issue - there had been several different bugs both in X itself
> and in some DE's that made your "screen freeze" (the workaround was to
> switch to a different VT and back) while the apps themselvs continue
> to run.
>
> Check your dmesg and other logs for any messages containting stuff
> such as link_down, exception Emask , failed command,SError If
> nothing like that exist (and you did run smart scan) you should be ok.
>
> If any message such that exist, it could be either the drive or the cables.
>
> On 9/22/17, Eli Billauer<eli at billauer.co.il> wrote:
>
>> Hello all,
>>
>> TL;DR: My hard disk's filesystem was corrupt, but the SMART statistics
>> is perfect. Should I replace the hard disk?
>>
>> Full version:
>>
>> It seems like one of my hard disks has passed its own premature Yom
>> Kippur verdict. Rebooting my computer this morning, it failed to mount,
>> saying "Group descriptor 32768 checksum is invalid" and forced me into a
>> shell.
>>
>> I made the mistake (?) of running fsck and then aborting it with a
>> (proper CTRL-ALT-DEL) reboot, as it took ages. This is a 3 TB disk,
>> which isn't necessary for booting, so I removed it from /etc/fstab, and
>> brought up the computer fine.
>>
>> Then I ran fsck on that disk, which generated a log of 125 MB, and
>> basically threw everything into /lost+found, leaving nothing in the root
>> directory. Hurray.
>>
>> It's a Western Digital WDC WD30EZRX-00DC0B0, with one big ext4 over LUKS
>> over LVM, 4 years in service, containing stuff that doesn't deserve a
>> backup. So the damage is limited, but I wonder if I should replace the
>> disk.
>>
>> Despite its age, this disk's SMART status is perfect: No bad sectors, no
>> reallocated sectors, nothing. No parameter can be better. I know there's
>> a "don't trust SMART" word around, but had a sector failed, I would
>> expect that to appear in the statistics. I mean, I do understand that
>> SMART can't predict a failure, but doesn't it mean anything?
>>
>> And there's another thing: The reason a rebooted the computer was that I
>> found the screen frozen, but the mouse pointer moved. The time stood
>> still at 3:01 (AM). This is highly unusual on my computer, which usually
>> runs of months with zero issues.
>>
>> So I connected with ssh, and saw nothing suspicious: Not in
>> /var/log/messages, not in dmesg, not in .xsession-errors. No process was
>> busy in particular. From the remote terminal, I couldn't have guessed
>> something was wrong. So I issued a reboot from remote, which failed as I
>> mentioned above.
>>
>> Bottom line: The panic instinct is to replace the disk, even though the
>> whole computer is due for replacement within a year or so. Money left
>> aside, it's a bit of an effort, and involves a lot of scary commands as
>> root, which are a risk factor by themselves. I'm not implying that I'm
>> stupid enough to mke2fs the wrong disk. Not me. I never err. ;)
>>
>> Insights are welcome.
>>
>> Shana Tova,
>> Eli
>>
>> --
>> Web: http://www.billauer.co.il
>>
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>>
>
--
Web: http://www.billauer.co.il
More information about the Linux-il
mailing list