<div dir="ltr">The bad data is NULLs (I did not have hexedit, but was introduced to hexedit mode in emacs which proved useful).<div><br></div><div>In the meantime, Muli Ben-Yehuda suggested to prevent the mess to begin with. The corrupted file is the output of a C program. The problem is that the program continues writing to the file, but it does not verify that the data is written. In a normal filesystem, I would not care, but mine fails several times a day. The NULLs are empty data, because the program did fseek forward, but the file was not written. </div><div><br></div><div>The solution I am testing is syncing. The options I got were:</div><div>1. to mount the filesystem such that it will always sync,</div><div>2. to sync everything the user is running at a certain point, or</div><div>3. to fsync just the problematic file, when I stop writing to it.</div><div><br></div><div>I am currently testing the third option, for one file only. It is likely to hurt the performance the least. </div><div><div><br></div><div> </div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 20, 2015 at 1:40 PM, Rabin Yasharzadehe <span dir="ltr"><<a href="mailto:rabin@rabin.io" target="_blank">rabin@rabin.io</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">can you provide a example of a bad lines and how do you like them to look like after you fix them ? <br></div></div><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><font size="1"><span style="font-family:courier new,monospace">--<br>Rabin<br></span></font></div></div></div>
<br><div class="gmail_quote"><span class="">On Mon, Jul 20, 2015 at 11:56 AM, Orna Agmon Ben-Yehuda <span dir="ltr"><<a href="mailto:ladypine@gmail.com" target="_blank">ladypine@gmail.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div dir="ltr">Hello everyone,<div><br></div><div>I often have damaged text files (due to a lovely storage system). The files are of different formats, although I can usually assume they contain spaces. The files are structured as lines.</div><div><br></div><div>Every once in a while, the lovely destruction (ahm....storage) system inserts binary garbage to the file. I wish to fix the files by removing the cancer without leaving any leftovers. That is, I want to lose partial lines.</div><div><br></div><div>I tried using grep with all sorts of keys, but it did not do the trick.</div><div>strings catches too little - it leaves partial lines.</div><div>Is there an elegant way to do the trick line-wise?</div><div><br></div><div>Thanks</div><span><font color="#888888"><div>Orna<br clear="all"><div><br></div>-- <br><div>Orna Agmon Ben-Yehuda.<br><a href="http://ladypine.org" target="_blank">http://ladypine.org</a></div>
</div></font></span></div>
<br></div></div><span class="">_______________________________________________<br>
Linux-il mailing list<br>
<a href="mailto:Linux-il@cs.huji.ac.il" target="_blank">Linux-il@cs.huji.ac.il</a><br>
<a href="http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il" rel="noreferrer" target="_blank">http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il</a><br>
<br></span></blockquote></div><br></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Orna Agmon Ben-Yehuda.<br><a href="http://ladypine.org" target="_blank">http://ladypine.org</a></div>
</div>