Eliminating binary from a text file

Eliminating binary from a text file

Dov Grobgeld dov.grobgeld at gmail.com
Mon Jul 20 12:25:39 IDT 2015


Why not do it through a short python script? Something like (not tested)


import os

for dirpath, dirnames, filenames in os.walk('damagedfilesystem'):
   for fn in filenames:
     if fn.endswith('.txt'):
       new_fn = fn.replace('.txt','-fixed.txt')
       out_fh = open(new_fn,'w')
        for line in open(fn):
           if islineok(line):
               out_fh.write(line)
        close(out_fh)


Just fill in islineok() with whatever logic you want.

Regards,
Dov


On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <ladypine at gmail.com>
wrote:

> Hello everyone,
>
> I often have damaged text files (due to a lovely storage system). The
> files are of different formats, although I can usually assume they contain
> spaces. The files are structured as lines.
>
> Every once in a while, the lovely destruction (ahm....storage) system
> inserts binary garbage to the file. I wish to fix the files by removing the
> cancer without leaving any leftovers. That is, I want to lose partial lines.
>
> I tried using grep with all sorts of keys, but it did not do the trick.
> strings catches too little - it leaves partial lines.
> Is there an elegant  way to  do the trick line-wise?
>
> Thanks
> Orna
>
> --
> Orna Agmon Ben-Yehuda.
> http://ladypine.org
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150720/951476d5/attachment-0001.html>


More information about the Linux-il mailing list