Unicode in C

Tue Mar 13 22:34:54 IST 2012

On Tue, Mar 13, 2012 at 10:16 PM, Nadav Har'El <nyh at math.technion.ac.il>wrote:

> On Tue, Mar 13, 2012, Elazar Leibovich wrote about "Re: Unicode in C":
> > Something very important, one need to consider is Unicode normalization.
> > That is, how to strip out the Niqud, and to substitute, say KAF WITH
> DAGESH
> > (U+FB3B) with just a KAF (U+05DB) etc.
>
> Is this really important? Does anybody actually use "Kaf with Dagesh" ?
> Why does it even exist? :(
>

I'm not sure, neither I'm not sure why LOVE HOTEL or JAPANESE GOBLIN
exists. When I read those stuff I'm not sure whether to laugh or cry. Most
are probably never used, although I need to ask people at the publishing
industry, maybe they use special symbols there. Maybe some of the wise
folks in the list will enlighten as.
However as they say, the Unicode consortium הקדים רפואה למכה, and made
standard normalization algorithms which are supposed to solve this problem
and convert all text to standard form (I'm not sure if it's really covering
all the edge cases though).

I'm not sure if the normalization should be included in hspell, however I
would put a notice that the input is expected to be normalized in order to
work. And I would at least support Niqud.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120313/cfecf800/attachment.html>