CDDB Hebrew encoding

CDDB Hebrew encoding

Levy, Chen mailist at chenlevy.com
Fri Jul 17 11:44:00 IDT 2009


OK, I found a better way to do it, that also gives a hint to the
questions below:

cat cddbread.2 | iconv -f utf8 -t latin1 | iconv -f hebrew -t utf8

My guess is that the data somehow along the way was encoded as
windows-cp1255, and then interpreted as latin-1. From there the
convention to utf8 is trivial. 

So the line above simply reverse that process, and I left with the
question, who should I blame, and where to report the bug?
__
Cheers,
Chen.

On Thu, 16 Jul 2009 16:27 +0300, "Chen Levy" <mailist at chenlevy.com>
wrote:
> Shalom, fine folks.
> 
> -- Short story: --
> 
> When ripping Hebrew CDs the data I get from CDDB (or freeCDDB, I can't
> tell), 
> data encoded with Aleph as 0xC3A0, Bet as 0xC3A1 and so on.
> 
> 
> -- Longer story: --
> 
> I was able to convert it into proper utf8 [Aleph as (d7,90)] only via the 
> pipeline:
> ... | iconv -f utf8 -t unicode | sed 's/\x0//g' | iconv -c -f iso88598 -t
> utf8
> 
> That is:
> 
> C3 A0 ==> `iconv -f utf8 -t unicode` ==> 00 E0
> 
> E0 hex = 224 dec # iso88598 , but for each byte I get an extra 00.
> 
> So the next part: `sed 's/\x0//g'` discard the 00 bytes.
> 
> Then the: `iconv -f iso8895 -tutf8` is a trivial step but without the
> `-c` it 
> complains about illegal characters.
> 
> 
> -- Some background: --
> 
> LANG=en_US.utf8 # but I had no success with any other LANG value.
> LC_* is undefined
> LANGUAGE=en_US:en
> 
> KDE 4.2,4
> Kubuntu 9.04
> English interface (the Hebrew interface in KDE4 is currently broken)
> 
> 
> -- Questions: --
> 
> 1. Is there an encoding where Aleph is 0xC3A0, if so what is it? If not
> how 
> did I end up with this it?
> 
> 2. Is there a less ugly way to get to from Aleph=0xC3A0 to proper UTF8?
> 
> 3. Is this a bug, or a stupidity from my end?
> 
> 
> Thanks you for your attention.
> __
> Cheers,
> Chen.
> 
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
-- 
  Levy, Chen
  chenlevy at imapmail.org

-- 
http://www.fastmail.fm - Same, same, but different...




More information about the Linux-il mailing list