dd(8)-written disk has ~800MB of NULs
Daniel Shahaf
d.s at daniel.shahaf.name
Sat Aug 27 21:19:10 IDT 2022
Subject: dd(8)-written disk has ~800MB of NULs
tl;dr: I dd(8)'d a partition to a HD in an external enclosure, then
cmp(1)'d to verify the copy, and found 800MB of NULs in the target of
the copy; I'm trying to figure out what went wrong and whether I can
trust the enclosure and HD the data was written to.
---
What happened:
I booted a host from a live USB [debian buster, kernel 4.19.67-2+deb10u1]
and used dd(8) to copy one of the host's partitions to a 3.5" SATA disk
in an external USB3.0 enclosure [vid 2109 pid 0711, quirks US_FL_NO_ATA_1X].
The enclosure is connected via two USB cables (the manufacturer's USB 3
B–A cable and a 3m USB A extension cable) and has its own 220V power
supply.
The partition in question is ~700GB. It wasn't mounted at the time.
The argument to dd(8)'s of= option was a partition on the target disk,
not a regular file.
dd(8) processed data at 26MB/s. (IIRC, I didn't specify any bs= argument.)
dd(8)'s exit code was zero.
I turned off the host and the next day, in the same live environment,
cmp(1)ed the source and target partitions. cmp(1) found a difference
about 20% of the way in. A closer look revealed that 192774 4096-byte
blocks (about 770MB) in the middle of the target partition contained
only NULs. Other than those NULs, the target partition was identical
to the source partition. I have now re-written those 800MB, which
succeeded. Reading them back succeeded too and they compare equal
to the source partition.
SMART status of the source disk is clean. I can't get SMART status of
the target disk easily (that's unsupported by the enclosure).
---
I'm not sure what to make of that. It seems like dd(8) silently failed
to write 800MB of data.
The target partition is in an area of the target drive that was likely
never used before. It's possible all-NULs is what those 771MB contained
before the dd(8) run. Thus, two possibilities: either the sectors
weren't written to at all, or they were written to with NULs rather than
with the correct data.
---
I'd like to understand what caused the silent write failure so I can
ensure it won't happen again, and more importantly, so I can ensure
disks I write will be readable when I need them.
What should be my first suspect here? A hardware issue? What part of
the setup should I look at first?
What should I do to make sure the data will be readable? If I verify
the data after writing it [e.g., by cmp(1) to a known-good copy, or by
verifying PGP signatures], does that ascertain that the data will be
readable /in the future/ assuming the drive is kept in storage in the
meantime?
Cheers,
Daniel
P.S. I have another, verified backup of that partition, as well as
a non-block-level backup of it, so no need to worry about that partition.
More information about the Linux-il
mailing list