Eliminating binary from a text file

Eliminating binary from a text file

Orna Agmon Ben-Yehuda ladypine at gmail.com
Mon Jul 20 22:19:11 IDT 2015


The bad data is NULLs (I did not have hexedit, but was introduced to
hexedit mode in emacs which proved useful).

In the meantime, Muli Ben-Yehuda suggested to prevent the mess to begin
with. The corrupted file is the output of a C program. The problem is that
the program continues writing to the file, but it does not verify that the
data is written. In a normal filesystem, I would not care, but mine fails
several times a day. The NULLs are empty data, because the program did
fseek forward, but the file was not written.

The solution I am testing is syncing. The options I got  were:
1.  to mount the filesystem such that it will always sync,
2.  to sync everything the user is running at a certain point, or
3. to fsync just the problematic file, when I stop writing to it.

I am currently testing the third option, for one file only. It is likely to
hurt the performance the least.



On Mon, Jul 20, 2015 at 1:40 PM, Rabin Yasharzadehe <rabin at rabin.io> wrote:

> can you provide a example of a bad lines and how do you like them to look
> like after you fix them ?
>
> --
> Rabin
>
> On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <
> ladypine at gmail.com> wrote:
>
>> Hello everyone,
>>
>> I often have damaged text files (due to a lovely storage system). The
>> files are of different formats, although I can usually assume they contain
>> spaces. The files are structured as lines.
>>
>> Every once in a while, the lovely destruction (ahm....storage) system
>> inserts binary garbage to the file. I wish to fix the files by removing the
>> cancer without leaving any leftovers. That is, I want to lose partial lines.
>>
>> I tried using grep with all sorts of keys, but it did not do the trick.
>> strings catches too little - it leaves partial lines.
>> Is there an elegant  way to  do the trick line-wise?
>>
>> Thanks
>> Orna
>>
>> --
>> Orna Agmon Ben-Yehuda.
>> http://ladypine.org
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>>
>


-- 
Orna Agmon Ben-Yehuda.
http://ladypine.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150720/edc3e3b0/attachment.html>


More information about the Linux-il mailing list