Hebrew spell-checking in OpenOffice

Hebrew spell-checking in OpenOffice

Lior Kaplan kaplanlior at gmail.com
Tue Nov 2 11:52:15 IST 2010


2010/11/2 Nadav Har'El <nyh at math.technion.ac.il>

> Recently I noticed that (thanks to Lior Kaplan, it seems) it is now trivial
> to get Hebrew spellchecking (based on Hspell 1.1) in OpenOffice.
> The Hebrew localized version (now available on the official OpenOffice
> site!)
> comes with Hebrew spell-checking pre-bundled, and there's an extension [1]
> for those who use the English version of open-office.
>

My pleasure (:

It's available only as the 3.3 RC releases, and will be available on the
final release.
http://download.openoffice.org/all_rc.html

The first issue is acronyms (rashei tevot) and abbreviations. In Hebrew,
> these use the geresh and gershaim (or single or double quotes), which is
> part of the word. OpenOffice does not understand that these quotes are part
> of the Hebrew word, and splits the word on them. As a result all acronyms
> are
> marked as spelling mistakes. This is really annoying, especially for
> certain
> types of documents where acronyms are common.
>

Known issue, and reported at
http://www.openoffice.org/issues/show_bug.cgi?id=99796

It is marked for work during the 3.4 release.


> The second issue is the correction suggestions for spelling errors. All
> the suggestions indeed appear to be valid words, but their order is
> terrible - it appears little or no attention was paid to trying to provide
> the most likely suggestions first. The screenshot on the extension page [1]
> provides an excellent example: When given the mis-spelling עיברי, rather
> than
> provide the most likely suggestion first - עברי, it is given as the 8th
> suggestion, and the first suggestions are highly unlikely.
>
[..]

> I believe that hunspell's dictionary in fact has a way to give such
> correction
> rules, but I don't know how to correctly write them, or how to make
> OpenOffice
> use them.
>

The word list in the extension is created with myspell's format. Hunspell
should be similar but I couldn't build that format at the time. The builds
were done as part of the debian hspell package which I maintain.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20101102/ff6c08f5/attachment-0001.html>


More information about the Linux-il mailing list