Eliminating binary from a text file
Shlomi Fish
shlomif at gmail.com
Mon Jul 20 12:34:36 IDT 2015
Hi Orna,
On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <ladypine at gmail.com>
wrote:
> Hello everyone,
>
> I often have damaged text files (due to a lovely storage system). The
> files are of different formats, although I can usually assume they contain
> spaces. The files are structured as lines.
>
> Every once in a while, the lovely destruction (ahm....storage) system
> inserts binary garbage to the file. I wish to fix the files by removing the
> cancer without leaving any leftovers. That is, I want to lose partial lines.
>
> I tried using grep with all sorts of keys, but it did not do the trick.
> strings catches too little - it leaves partial lines.
> Is there an elegant way to do the trick line-wise?
>
>
It would help to know exactly which lines you wish to eliminate. Otherwise,
you can do various tasks like that using perl -lane (while possibly using
the -i flag) E.g: (untested):
$ export THRESH=5
$ perl -lan -E 'print unless ((() = /([\x80-\xFF])/g) > $ENV{THRESH})' <
existing-file.txt > new-file.txt
The "ruby" executable has similar flags (with the Ruby’s expression syntax
naturally).
Hope it helps.
Regards,
— Shlomi Fish
--
Chuck Norris helps the gods that help themselves.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20150720/fe3b432d/attachment.html>
More information about the Linux-il
mailing list