Finding if a line contains Hebrew characters in perl

Finding if a line contains Hebrew characters in perl

Meir Guttman meir at guttman.co.il
Fri Apr 26 13:42:06 IDT 2013


-----Original Message-----
From: Gabor Szabo [mailto:szabgab at gmail.com] 
Sent: יום ו 26 אפריל 2013 09:25
To: linux-il
Cc: Ori Idan; Meir Guttman
Subject: Re: Finding if a line contains Hebrew characters in perl

>On Thu, Apr 25, 2013 at 6:05 PM, ik <idokan at gmail.com> wrote:
>> try this
>>
>> #!/usr/bin/env perl -w
>> #
>>
>> use v5.14;
>> use utf8;
>>
>> my $text = 'שלוabv';
>>
>> if ($text =~ /^[\x{5D0}-\x{5ea}]{3}/) {
>>   say "yes";
>> } else {
>>   say "no";
>> }
>
>I'd probably use   \p{IsHebrew}  or \p{InHebrew} instead of the hexa code.
>Check here: http://perldoc.perl.org/perluniprops.html to learn way more than you'd probably want to :)
>
>I also CC-ed Meir Guttman who is *the* Perl Unicode expert.
>He might have something more correct to suggest.
>
>Gabor
>

Well, first I am by no means a "Unicode Expert", let alone *the* expert. All
I have is some experience.

Anyway, I did use the \p{HEBREW} instead of the "\x{}" and it returned "yes".
Please note, just {HEBREW} and ALL-CAPS! Here it is:

#!/usr/bin/env perl -w
#

use v5.14;
use utf8;

my $text = 'שלוabv';

if ($text =~ /^[\p{HEBREW}]/) {
  say "yes";
} else {
  say "no";
}

I also used "if ($text =~ /^[ש]/) {...}", simply entering the Hebrew letter
"Shin" directly, and it printed "yes" too, signifying that 'ש' is the first
letter. (My editor, as well as MS Outlook, show, from left to right, first
'ו', then 'ל', then 'ש' and then "abv".)

I also tried to use the official Unicode name for 'ש' - \p{HEBREW LETTER SHIN}
see http://www.unicode.org/charts/PDF/U0590.pdf , and evidently it isn't
defined. I got a compile time error: "Can't find Unicode property definition
"HEBREW LETTER SHIN" at...". A bit disappointing!

Try it out!

Meir




More information about the Linux-il mailing list