Eliminating binary from a text file

Eliminating binary from a text file

Shlomi Fish shlomif at gmail.com
Mon Jul 20 12:34:36 IDT 2015


Hi Orna,

On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <ladypine at gmail.com>
wrote:

> Hello everyone,
>
> I often have damaged text files (due to a lovely storage system). The
> files are of different formats, although I can usually assume they contain
> spaces. The files are structured as lines.
>
> Every once in a while, the lovely destruction (ahm....storage) system
> inserts binary garbage to the file. I wish to fix the files by removing the
> cancer without leaving any leftovers. That is, I want to lose partial lines.
>
> I tried using grep with all sorts of keys, but it did not do the trick.
> strings catches too little - it leaves partial lines.
> Is there an elegant  way to  do the trick line-wise?
>
>
It would help to know exactly which lines you wish to eliminate. Otherwise,
you can do various tasks like that using perl -lane (while possibly using
the -i flag) E.g: (untested):

$ export THRESH=5
$ perl -lan -E 'print unless ((() = /([\x80-\xFF])/g) > $ENV{THRESH})' <
existing-file.txt > new-file.txt

The "ruby" executable has similar flags (with the Ruby’s expression syntax
naturally).

Hope it helps.

Regards,

— Shlomi Fish


-- 
Chuck Norris helps the gods that help themselves.

Please reply to list if it's a mailing list post - http://shlom.in/reply .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150720/fe3b432d/attachment.html>


More information about the Linux-il mailing list