CDDB Hebrew encoding
Levy, Chen
mailist at chenlevy.com
Fri Jul 17 11:44:00 IDT 2009
OK, I found a better way to do it, that also gives a hint to the
questions below:
cat cddbread.2 | iconv -f utf8 -t latin1 | iconv -f hebrew -t utf8
My guess is that the data somehow along the way was encoded as
windows-cp1255, and then interpreted as latin-1. From there the
convention to utf8 is trivial.
So the line above simply reverse that process, and I left with the
question, who should I blame, and where to report the bug?
__
Cheers,
Chen.
On Thu, 16 Jul 2009 16:27 +0300, "Chen Levy" <mailist at chenlevy.com>
wrote:
> Shalom, fine folks.
>
> -- Short story: --
>
> When ripping Hebrew CDs the data I get from CDDB (or freeCDDB, I can't
> tell),
> data encoded with Aleph as 0xC3A0, Bet as 0xC3A1 and so on.
>
>
> -- Longer story: --
>
> I was able to convert it into proper utf8 [Aleph as (d7,90)] only via the
> pipeline:
> ... | iconv -f utf8 -t unicode | sed 's/\x0//g' | iconv -c -f iso88598 -t
> utf8
>
> That is:
>
> C3 A0 ==> `iconv -f utf8 -t unicode` ==> 00 E0
>
> E0 hex = 224 dec # iso88598 , but for each byte I get an extra 00.
>
> So the next part: `sed 's/\x0//g'` discard the 00 bytes.
>
> Then the: `iconv -f iso8895 -tutf8` is a trivial step but without the
> `-c` it
> complains about illegal characters.
>
>
> -- Some background: --
>
> LANG=en_US.utf8 # but I had no success with any other LANG value.
> LC_* is undefined
> LANGUAGE=en_US:en
>
> KDE 4.2,4
> Kubuntu 9.04
> English interface (the Hebrew interface in KDE4 is currently broken)
>
>
> -- Questions: --
>
> 1. Is there an encoding where Aleph is 0xC3A0, if so what is it? If not
> how
> did I end up with this it?
>
> 2. Is there a less ugly way to get to from Aleph=0xC3A0 to proper UTF8?
>
> 3. Is this a bug, or a stupidity from my end?
>
>
> Thanks you for your attention.
> __
> Cheers,
> Chen.
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
--
Levy, Chen
chenlevy at imapmail.org
--
http://www.fastmail.fm - Same, same, but different...
More information about the Linux-il
mailing list