batch parenthesis matching
Oleg Goldshmidt
pub at goldshmidt.org
Fri Aug 7 00:41:25 IDT 2009
Erez D <erez0001 at gmail.com> writes:
> the problem is that between <span class="myclass"...> and its </span> there
> may be other <span class="otherclass"...> and its </span>
> that is why i wanted parenthesis matching...
Ah, nesting, I was afraid it would pop up. Try this script
#!/bin/awk -f
BEGIN {
RS="<span[ \\t\\n]+class=\\\"myclass\\\"[^>]*>"
}
NR==1 {print}
NR > 1 {
record=$0
nesting=1
ends=0
while (nesting > 0) {
match(record,/<[/]?span[^>]*>/,tag)
if (tag[0] ~ /^<\/span/)
nesting--
else
nesting++
ends+=(RSTART+RLENGTH)
record=substr(record,RSTART+RLENGTH)
}
print substr($0,ends)
}
I hope extra whitespace here and there is irrelevant for HTML.
--
Oleg Goldshmidt | pub at goldshmidt.org
More information about the Linux-il
mailing list