Finding if a line contains Hebrew characters in perl
Gabor Szabo
szabgab at gmail.com
Sun Apr 28 14:03:15 IDT 2013
(forwarding )to linux-il
On Sun, Apr 28, 2013 at 1:56 PM, Meir Guttman <meir at guttman.co.il> wrote:
> Dear Gabor and Ido,
>
> This post to the Il Linux mailing list bounced and wasn't posted since I am not a member in the list. Please one of you post it so it shows and is distributed.
>
> BTW Ido, from your original post I saw that you want to find lines with (exactly?) three Hebrew characters, so I modified it and it is now:
>
> #!/usr/bin/env perl -w
> #
>
> use v5.14;
> use utf8;
>
> my $text = 'שלוabv';
>
> if ($text =~ /^[\p{HEBREW}]{3}/) {
> say "yes";
> } else {
> say "no";
> }
>
> Regards,
> Meir
>
> -----Original Message-----
> From: Gabor Szabo [mailto:szabgab at gmail.com]
> Sent: יום ו 26 אפריל 2013 09:25
> To: linux-il
> Cc: Ori Idan; Meir Guttman
> Subject: Re: Finding if a line contains Hebrew characters in perl
>
>>On Thu, Apr 25, 2013 at 6:05 PM, ik <idokan at gmail.com> wrote:
>>> try this
>>>
>>> #!/usr/bin/env perl -w
>>> #
>>>
>>> use v5.14;
>>> use utf8;
>>>
>>> my $text = 'שלוabv';
>>>
>>> if ($text =~ /^[\x{5D0}-\x{5ea}]{3}/) {
>>> say "yes";
>>> } else {
>>> say "no";
>>> }
>>
>>I'd probably use \p{IsHebrew} or \p{InHebrew} instead of the hexa code.
>>Check here: http://perldoc.perl.org/perluniprops.html to learn way more than you'd probably want to :)
>>
>>I also CC-ed Meir Guttman who is *the* Perl Unicode expert.
>>He might have something more correct to suggest.
>>
>>Gabor
>>
>
> Well, first I am by no means a "Unicode Expert", let alone *the* expert. All
> I have is some experience.
>
> Anyway, I did use the \p{HEBREW} instead of the "\x{}" and it returned "yes".
> Please note, just {HEBREW} and ALL-CAPS! Here it is:
>
> #!/usr/bin/env perl -w
> #
>
> use v5.14;
> use utf8;
>
> my $text = 'שלוabv';
>
> if ($text =~ /^[\p{HEBREW}]/) {
> say "yes";
> } else {
> say "no";
> }
>
> I also used "if ($text =~ /^[ש]/) {...}", simply entering the Hebrew letter
> "Shin" directly, and it printed "yes" too, signifying that 'ש' is the first
> letter. (My editor, as well as MS Outlook, show, from left to right, first
> 'ו', then 'ל', then 'ש' and then "abv".)
>
> I also tried to use the official Unicode name for 'ש' - \p{HEBREW LETTER SHIN}
> see http://www.unicode.org/charts/PDF/U0590.pdf , and evidently it isn't
> defined. I got a compile time error: "Can't find Unicode property definition
> "HEBREW LETTER SHIN" at...". A bit disappointing!
>
> Try it out!
>
> Meir
More information about the Linux-il
mailing list