batch parenthesis matching

batch parenthesis matching

Erez D erez0001 at gmail.com
Thu Aug 6 21:33:43 IDT 2009


On Thu, Aug 6, 2009 at 9:02 PM, Oleg Goldshmidt <pub at goldshmidt.org> wrote:

> Erez D <erez0001 at gmail.com> writes:
>
> > hi
> > i have an html file with few different instances of:
> > <span class="myclass">
> > ... some html, e.g. <B> blah blah <a href=....> </a> </b>
> > </span>
> > i want to remove theses instances.
> > ( the html inside the <span> varies between instances, and there is a non
> > constant number of instances)
> > i thought of replacing '<[^/]' (i.e. '<' folowed by somthing else then
> '/' )
> > with '{' and '</' with '}' and then doing parenthesis matching
> > however i need it done automatically in batch. (i can do parenthesis
> matching
> > in vi. can i do this in sed ?)
>
> Sed is line-oriented which will make it a bit difficult.
>
> If I understand you correctly, and you want to remove everything
> between "<span" and "span>" including the span tags themselves, *and*
> the file does not contain the span tags in comments or string literals
> or anything like that, *and* "<span" always has a matching "span>",
> then one way to do it would be
>
> $ awk 'BEGIN {RS="(<span|span>)"} NR%2==1' <filename>
>
> which will consider either "<span" or "span>" as a record separator
> and will print only the odd records (everything between "<span" and
> "span>" will be even records and will be skipped).
>
> All you need to know about awk is that it splits the input into
> records, RS is the record separator (set to a regexp in the
> beginning), and NR is the number of the current record. It prints the
> records matching the "odd NR" condition.
>
> Does this do what you want?
>
the problem is that between <span class="myclass"...> and its </span> there
may be other <span class="otherclass"...> and its </span>

that is why i wanted parenthesis matching...


thanks,
erez

>
> --
> Oleg Goldshmidt | pub at goldshmidt.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20090806/67c7d543/attachment.html>


More information about the Linux-il mailing list