CDDB Hebrew encoding
Chen Levy
mailist at chenlevy.com
Thu Jul 16 20:27:50 IDT 2009
Shalom, fine folks.
-- Short story: --
When ripping Hebrew CDs the data I get from CDDB (or freeCDDB, I can't tell),
data encoded with Aleph as 0xC3A0, Bet as 0xC3A1 and so on.
-- Longer story: --
I was able to convert it into proper utf8 [Aleph as (d7,90)] only via the
pipeline:
... | iconv -f utf8 -t unicode | sed 's/\x0//g' | iconv -c -f iso88598 -t utf8
That is:
C3 A0 ==> `iconv -f utf8 -t unicode` ==> 00 E0
E0 hex = 224 dec # iso88598 , but for each byte I get an extra 00.
So the next part: `sed 's/\x0//g'` discard the 00 bytes.
Then the: `iconv -f iso8895 -tutf8` is a trivial step but without the `-c` it
complains about illegal characters.
-- Some background: --
LANG=en_US.utf8 # but I had no success with any other LANG value.
LC_* is undefined
LANGUAGE=en_US:en
KDE 4.2,4
Kubuntu 9.04
English interface (the Hebrew interface in KDE4 is currently broken)
-- Questions: --
1. Is there an encoding where Aleph is 0xC3A0, if so what is it? If not how
did I end up with this it?
2. Is there a less ugly way to get to from Aleph=0xC3A0 to proper UTF8?
3. Is this a bug, or a stupidity from my end?
Thanks you for your attention.
__
Cheers,
Chen.
More information about the Linux-il
mailing list