Creating a Ladino spell-checker and including it in OS projects

Creating a Ladino spell-checker and including it in OS projects

Dan Kenigsberg danken at cs.technion.ac.il
Fri Jun 17 13:03:17 IDT 2022


On Wed, Jun 15, 2022 at 10:23:47AM +0300, Gabor Szabo wrote:
> On Mon, Jun 13, 2022 at 9:00 AM Dan Kenigsberg <danken at cs.technion.ac.il>
> wrote:
> 
> > On Sun, Jun 12, 2022 at 04:24:27PM +0300, Gabor Szabo wrote:
> > > Hi Dan,
> > >
> > >
> > > On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg <danken at cs.technion.ac.il
> > >
> > > wrote:
> > >
> > > > On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > > > > Hi,
> > > > >
> > > > > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > > > > https://diksionaryo.szabgab.com/ The code is open source the
> > content is
> > > > > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > > > > All linked from the About page.
> > > > > Along with the creation of the translation I also have a (growing)
> > list
> > > > of
> > > > > ladino words.
> > > > >
> > > > > I would like to make this available as a spell checker in various
> > Open
> > > > > Source tools.
> > > > > E.g. Firefox, Chromium, LibreOffice etc.
> > > > > I wrote about it a few weeks ago
> > > > > https://szabgab.com/add-spellchecker-to-various-applications.html
> > but I
> > > > am
> > > > > still unclear what and how to do.
> > > > >
> > > > > I started to generate a pair of files that resemble the format of
> > hspell,
> > > > > but I don't know how to really test them and in any case they don't
> > seem
> > > > to
> > > > > work well.
> > > > > I also don't know how to distribute what I already have and how to
> > make
> > > > it
> > > > > included in those projects.
> > > > >
> > > > > Anyone here has experience with spell-checkers?
> > > > > Could anyone help me in the project or at least point me in the right
> > > > > direction?
> > > >
> > > > Well, if I were you, I'd start by creating a github repository with
> > your
> > > > code and a tagged version of your artifacts, these .aff and .dic files
> > > > used by hunspell.

I think that if you do so ^^^ you'd have a stable URL to share.

> > > >
> > >
> > > It is being generated now on every push:
> > > https://github.com/szabgab/ladino-diksionaryo-generated/
> >
> > Thanks for the URL. But where are the artifacts? They probably hide in
> > plain sight... Can you provide a URL to the .aff/.dic files?
> >
> 
> Oh well, GitHub can be tricky sometimes :)
> (I think this is the direct link to download the zip of the two files:
> https://github.com/szabgab/ladino-diksionaryo-generated/suites/6938585306/artifacts/270273527
> )
> 
> Manually:
> Visit the project repo:
> https://github.com/szabgab/ladino-diksionaryo-generated/
> click on "Actions"
> then on the job that created the artifact (in this case it is called CI)
> There you'll have the artifacts of the project
> e.g. This is the direct link to a recent build
> https://github.com/szabgab/ladino-diksionaryo-generated/actions/runs/2500028854
> 
> If you now click on "hunspell" it will download the two files in a zip.
> 
> AFAIK the artifacts are removed after a few weeks so these links will be
> gone, but the desription above should still work.
> 
> 
> 
> >
> > >
> > > I can put on some tags if you think they are important for some reason,
> > but
> > > I don't have specific release points.
> > > Every change in the dictionary triggers the re-build of the whole web
> > site
> > > and the two files as well.
> > >
> > >
> > >
> > > > This would let anyone with high-enough motivation the ability to test
> > it
> > > > on their own machine (I may volunteer).
> > > >
> > >
> > > I'd really like to know how do you (or some else) test it.
> >
> > `hunspell -D` shows where you can drop the files; then `hunspell -d
> > language` would lets me spell-check a text, say a random page from
> > https://lad.wikipedia.org.
> >
> >
> Thanks. And yeah, the ladino version of vikipedia is quite bad - as I am
> told - as it is written mostly by spanish speakers
> who include a lot of words from modern spanish instead of using Ladino.
> That's another project to work on to fix that :)

I can say that it works well on my box. Nice!

I suppose that the coverage should improve. For example, I think you can
add Astronomiya to the lexicon. But I cannot judge Ladino correctness in
any way...

In hspell it took us several years of pain-stakingly spellchecking
Wikipedia pages to reach good coverage. But please do not let this
discourage you. Helping Ladino survive is a noble cause to follow.




More information about the Linux-il mailing list