batch parenthesis matching

batch parenthesis matching

Oleg Goldshmidt pub at goldshmidt.org
Fri Aug 7 00:41:25 IDT 2009


Erez D <erez0001 at gmail.com> writes:

> the problem is that between <span class="myclass"...> and its </span> there
> may be other <span class="otherclass"...> and its </span>
> that is why i wanted parenthesis matching...

Ah, nesting, I was afraid it would pop up. Try this script

#!/bin/awk -f

BEGIN {
	RS="<span[ \\t\\n]+class=\\\"myclass\\\"[^>]*>"
}

NR==1 {print}
NR > 1 {
	record=$0
	nesting=1
	ends=0
	while (nesting > 0) {
		match(record,/<[/]?span[^>]*>/,tag)
		if (tag[0] ~ /^<\/span/)
			nesting--
		else
			nesting++
		ends+=(RSTART+RLENGTH)
		record=substr(record,RSTART+RLENGTH)
	}
	print substr($0,ends)
}

I hope extra whitespace here and there is irrelevant for HTML.

-- 
Oleg Goldshmidt | pub at goldshmidt.org



More information about the Linux-il mailing list