Creating a Ladino spell-checker and including it in OS projects

Creating a Ladino spell-checker and including it in OS projects

Gabor Szabo gabor at szabgab.com
Wed Jun 15 10:23:47 IDT 2022


On Mon, Jun 13, 2022 at 9:00 AM Dan Kenigsberg <danken at cs.technion.ac.il>
wrote:

> On Sun, Jun 12, 2022 at 04:24:27PM +0300, Gabor Szabo wrote:
> > Hi Dan,
> >
> >
> > On Tue, Jun 7, 2022 at 11:18 PM Dan Kenigsberg <danken at cs.technion.ac.il
> >
> > wrote:
> >
> > > On Wed, Jun 01, 2022 at 06:47:41AM +0300, Gabor Szabo wrote:
> > > > Hi,
> > > >
> > > > I've been working on an online Ladino (Judeo-Espanyol) dictionary
> > > > https://diksionaryo.szabgab.com/ The code is open source the
> content is
> > > > CC BY-SA 4.0  https://creativecommons.org/licenses/by-sa/4.0/
> > > > All linked from the About page.
> > > > Along with the creation of the translation I also have a (growing)
> list
> > > of
> > > > ladino words.
> > > >
> > > > I would like to make this available as a spell checker in various
> Open
> > > > Source tools.
> > > > E.g. Firefox, Chromium, LibreOffice etc.
> > > > I wrote about it a few weeks ago
> > > > https://szabgab.com/add-spellchecker-to-various-applications.html
> but I
> > > am
> > > > still unclear what and how to do.
> > > >
> > > > I started to generate a pair of files that resemble the format of
> hspell,
> > > > but I don't know how to really test them and in any case they don't
> seem
> > > to
> > > > work well.
> > > > I also don't know how to distribute what I already have and how to
> make
> > > it
> > > > included in those projects.
> > > >
> > > > Anyone here has experience with spell-checkers?
> > > > Could anyone help me in the project or at least point me in the right
> > > > direction?
> > >
> > > Well, if I were you, I'd start by creating a github repository with
> your
> > > code and a tagged version of your artifacts, these .aff and .dic files
> > > used by hunspell.
> > >
> >
> > It is being generated now on every push:
> > https://github.com/szabgab/ladino-diksionaryo-generated/
>
> Thanks for the URL. But where are the artifacts? They probably hide in
> plain sight... Can you provide a URL to the .aff/.dic files?
>

Oh well, GitHub can be tricky sometimes :)
(I think this is the direct link to download the zip of the two files:
https://github.com/szabgab/ladino-diksionaryo-generated/suites/6938585306/artifacts/270273527
)

Manually:
Visit the project repo:
https://github.com/szabgab/ladino-diksionaryo-generated/
click on "Actions"
then on the job that created the artifact (in this case it is called CI)
There you'll have the artifacts of the project
e.g. This is the direct link to a recent build
https://github.com/szabgab/ladino-diksionaryo-generated/actions/runs/2500028854

If you now click on "hunspell" it will download the two files in a zip.

AFAIK the artifacts are removed after a few weeks so these links will be
gone, but the desription above should still work.



>
> >
> > I can put on some tags if you think they are important for some reason,
> but
> > I don't have specific release points.
> > Every change in the dictionary triggers the re-build of the whole web
> site
> > and the two files as well.
> >
> >
> >
> > > This would let anyone with high-enough motivation the ability to test
> it
> > > on their own machine (I may volunteer).
> > >
> >
> > I'd really like to know how do you (or some else) test it.
>
> `hunspell -D` shows where you can drop the files; then `hunspell -d
> language` would lets me spell-check a text, say a random page from
> https://lad.wikipedia.org.
>
>
Thanks. And yeah, the ladino version of vikipedia is quite bad - as I am
told - as it is written mostly by spanish speakers
who include a lot of words from modern spanish instead of using Ladino.
That's another project to work on to fix that :)

Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20220615/3d4732a2/attachment-0001.html>


More information about the Linux-il mailing list