Emacs & Hebrew

Thu Jun 14 19:58:33 IDT 2012

> Date: Thu, 14 Jun 2012 07:36:21 +0300
> From: Nadav Har'El <nyh at math.technion.ac.il>
> Cc: linux-il at cs.huji.ac.il
> 
> Like I said, try the example Perl instruction of changing aleph into
> bet:
> 
> 	s/א/ב/
> 
> See how the Bidi algorithm makes it appear as if we're changing the
> other way around - bet into aleph. I honestly don't see what kind of
> base direction choice or other high-level protocol can "save" this
> case. I have to admit I didn't try it on Emacs 24 (which I don't have
> yet), but did try it on other bidi-capable programs. Maybe there is a
> workaround in this case, and you can educate me?

This snippet will indeed be rendered confusingly by any bidi-aware
application (including Emacs 24.1).  But the solution is not to turn
off the reordering wholesale.  That's because you do want the
reordering here:

    $word =~ s/ה$/h/o if $opts{"שמור_מפיק"};

and here:

  # TODO: resolve the following linguistic question: Is there a difference
  # in the pronunciation and spelling of בדוחפם (when they push, bdoxpam)
  # and לדחופם (to push them, lidxpam)? The first is an subjectization of
  # לדחוף, and the second is a objectization. I do *not* know if the above
  # differentiation is valid or correct, and failed to find references to
  # support my gut feeling. Thus, on the mean while, I produce a waw-less
  # form, as done by rav-millim.

So what is needed is _selective_ reordering: some portions of the
buffer need to be reordered, while others need to stay in their
original logical order.

For buffers that hold source code in some programming language, the
parts to reorder are comments and strings.  Precisely what is a
comment or a string is something each major mode should determine --
and they all do already.  For markup languages such as HTML and XML,
these are labels and perhaps other things.  It's possible that there
are other types of non-plain text that need similar treatment; I need
feedback for making sure all the possibilities are covered.

Emacs currently lacks some fundamental infrastructure to support such
selective reordering, but it is on my todo list, and I will publish
the basic design principles in the near future.  However, my ideas of
which infrastructure is needed and how to expose it to Lisp
applications are based on a very limited number of use cases, most of
them from my daily experience, where I almost never use Hebrew.  I
cannot be sure my experience covers enough turf to be a solid basis
for supporting non-plain text.

This is where this community should enter the picture.  When you find
a use case where the default reordering doesn't do a good enough job,
don't just disable reordering.  Instead, report those cases as bugs or
missing features, and if you can, send suggestions for how to solve
this, so that you and others will be happy.  If you just disable
reordering, we will never be able to get it better.

Thanks in advance.