<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Why not do it through a short python script? Something like (not tested)<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>import os<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>for dirpath, dirnames, filenames in os.walk('damagedfilesystem'):<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> for fn in filenames:<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> if fn.endswith('.txt'):<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> new_fn = fn.replace('.txt','-fixed.txt')<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> out_fh = open(new_fn,'w')<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> for line in open(fn):<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> if islineok(line):<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> out_fh.write(line)<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> close(out_fh)<br><br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Just fill in islineok() with whatever logic you want.<br><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Regards,<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Dov<br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <span dir="ltr"><<a href="mailto:ladypine@gmail.com" target="_blank">ladypine@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello everyone,<div><br></div><div>I often have damaged text files (due to a lovely storage system). The files are of different formats, although I can usually assume they contain spaces. The files are structured as lines.</div><div><br></div><div>Every once in a while, the lovely destruction (ahm....storage) system inserts binary garbage to the file. I wish to fix the files by removing the cancer without leaving any leftovers. That is, I want to lose partial lines.</div><div><br></div><div>I tried using grep with all sorts of keys, but it did not do the trick.</div><div>strings catches too little - it leaves partial lines.</div><div>Is there an elegant way to do the trick line-wise?</div><div><br></div><div>Thanks</div><span class="HOEnZb"><font color="#888888"><div>Orna<br clear="all"><div><br></div>-- <br><div>Orna Agmon Ben-Yehuda.<br><a href="http://ladypine.org" target="_blank">http://ladypine.org</a></div>
</div></font></span></div>
<br>_______________________________________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" rel="noreferrer" target="_blank">http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il</a><br>
<br></blockquote></div><br></div>