<div dir="ltr"><br><br><div class="gmail_quote">On Thu, Aug 6, 2009 at 9:02 PM, Oleg Goldshmidt <span dir="ltr"><<a href="mailto:pub@goldshmidt.org">pub@goldshmidt.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="h5">Erez D <<a href="mailto:erez0001@gmail.com">erez0001@gmail.com</a>> writes:<br>
<br>
> hi<br>
> i have an html file with few different instances of:<br>
> <span class="myclass"><br>
> ... some html, e.g. <B> blah blah <a href=....> </a> </b><br>
> </span><br>
> i want to remove theses instances.<br>
> ( the html inside the <span> varies between instances, and there is a non<br>
> constant number of instances)<br>
> i thought of replacing '<[^/]' (i.e. '<' folowed by somthing else then '/' )<br>
> with '{' and '</' with '}' and then doing parenthesis matching<br>
> however i need it done automatically in batch. (i can do parenthesis matching<br>
> in vi. can i do this in sed ?)<br>
<br>
</div></div>Sed is line-oriented which will make it a bit difficult.<br>
<br>
If I understand you correctly, and you want to remove everything<br>
between "<span" and "span>" including the span tags themselves, *and*<br>
the file does not contain the span tags in comments or string literals<br>
or anything like that, *and* "<span" always has a matching "span>",<br>
then one way to do it would be<br>
<br>
$ awk 'BEGIN {RS="(<span|span>)"} NR%2==1' <filename><br>
<br>
which will consider either "<span" or "span>" as a record separator<br>
and will print only the odd records (everything between "<span" and<br>
"span>" will be even records and will be skipped).<br>
<br>
All you need to know about awk is that it splits the input into<br>
records, RS is the record separator (set to a regexp in the<br>
beginning), and NR is the number of the current record. It prints the<br>
records matching the "odd NR" condition.<br>
<br>
Does this do what you want?<br>
<font color="#888888"></font></blockquote><div>the problem is that between <span class="myclass"...> and its </span> there may be other <span class="otherclass"...> and its </span> <br>
<br>that is why i wanted parenthesis matching...<br><br><br>thanks,<br>erez<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><font color="#888888"><br>
--<br>
Oleg Goldshmidt | <a href="mailto:pub@goldshmidt.org">pub@goldshmidt.org</a><br>
</font></blockquote></div><br></div>