preg_match and Hebrew?

preg_match and Hebrew?

Yuval Hager yuval at avramzon.net
Tue Aug 4 11:49:37 IDT 2009


Let's try some regex matching in PHP.
,----
| php > echo preg_match('/\w/', 'a');
| 1
`----

ok, so the basic stuff works in English. Let's go on.

,----
| php > echo preg_match("/\w/", 'א');
| 0
| php > echo preg_match('/\w/u', 'א');
| 0
`----

Oops.. maybe some kind of encoding issue? My whole system is UTF8, so this 
should not be a problem I guess..
,----
| php > var_dump('א');
| string(2) "א"
`----

So I revert to the soon-to-be-deprecated mb_ereg:

,----
| php > mb_regex_encoding('UTF-8');
| php > echo mb_ereg('\w', 'א');
| 1
`----

and now it works.. 

maybe I was wrong expecting preg_match would know hebrew alphanumeric 
characters in the first place? I understand it will for PHP6, but until 
then, beware of sophisticated Hebrew string parsing.

--yuval
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20090804/5d5507cf/attachment.bin>


More information about the Linux-il mailing list