Emacs & Hebrew

Emacs & Hebrew

Eli Zaretskii eliz at gnu.org
Fri Jun 15 00:32:47 IDT 2012


> From: Omer Zak <w1 at zak.co.il>
> Date: Thu, 14 Jun 2012 23:38:16 +0300
> 
> First of all, it is GREAT that Emacs 24 has BiDi support.

Thanks.

> Simply, have Emacs continue to support programming languages, just
> without default BiDi support when editing such text.
> 
> Since it is anyway good practice to localize strings in separate *.po
> files, I suggest that all of we invest our nitpicking skills in
> specifying how to get Emacs to provide excellent support for BiDi
> rendering of *.po files and similar ones (such as Android's
> strings.xml).

The same infrastructure that is needed to support *.po files, is also
needed to support HTML/XML markup and comments/strings in programming
languages.  After all, the messages in a .po file are just strings,
they use string syntax.  Whether to use that infrastructure once it
exists is up to the maintainers of the respective modes.  My job will
be done when the infrastructure exists and is reasonably usable and
flexible to satisfy these needs.

> About comments, let's not bother.  I don't think it'll be a good
> practice to write comments in any language other than English.  If
> anyone wants to do so, caveat emptor.

I just showed you an example from Nadav's Perl script in the Hspell
distribution.  The comments are mostly in English, but they have
embedded Hebrew words and phrases, which is understandable given the
goal of that script.  I don't want to dismiss this use case so easily,
and don't really see a reason to do that, since comments are
recognized by Emacs in any language it supports.

> What if anyone needs to illustrate how a function handles Hebrew text
> and because of this he needs to write a portion of the function's
> comment in Hebrew?

That's what Nadav's script does.

> Would it be reasonable to require the user to insert one of the
> directionality special characters as a signal for Emacs to turn on
> visual BiDi ordering mode in that comment (and turn it back off at
> the comment's end as defined in that programming language)?

There are no directional controls that turn on reordering.  There are
directional controls that override the bidi properties of the
characters, so you can have Hebrew characters have L2R directionality,
which will effectively disable their reordering.  But these controls'
effect ends at the newline, by virtue of the UBA, so you would need to
use a lot of them to manually enable and disable reordering in the
whole buffer, part by part.

> You ignored other places which might use Hebrew text in programming
> languages:
> - Perl's regular expressions (also Python).

These are already recognized by Emacs for fontification, so the same
machinery can be used to reorder these strings (or not, as the Perl
users want).

> - Identifiers in computer languages, which allow non-ASCII characters in
> identifiers.

Likewise: Emacs already fontifies identifiers, so they are
recognizable.

> - Strings may have various and differing formatting characters/phrases
> (C, LISP and FORTRAN have their differing formatting languages).

And they all are already recognized by Emacs, aren't they?  Otherwise
you get bad fontification and bug reports.

> - HTML/XML fragments - can be part of either strings or comments.

The relevant modes will have to write code to recognize them.  Emacs
has powerful features -- regular expressions and syntax tables -- to
facilitate that.

> - How will we let the programmer override, for one place in one file,
> the automatic derivation as needed to deal with pathological cases?

By using the directional controls, or by inserting newlines
judiciously.  This should cover almost everything.

> - How reasonable will it be to ask the programmer to insert extra
> characters into the text just to get it correctly rendered?

I don't know, I guess it depends on the programmer.

> Unless priorities have changed without my knowing about this, Emacs is
> an editor.  It is not meant to be a WYSIWYG type editor.  It is not
> meant to be a text viewer (such as a Web browser).  It must make it easy
> to edit the text, not necessarily render it in some final form.

Well, I guess priorities have changed, then: Emacs now has a lot of
specialized modes whose goal is to display plain-text files in a lot
of complex ways that border on WYSIWYG.  Just to mention a few
examples, take the Org mode, or the way cross-references are displayed
in Info manuals, or the fancy tabulated list displays provided by
tabulated-list.el (used by buffer-menu.el).

> And when programming, there are enough cases in which it is easier to
> edit BiDi text displayed in logical order (not reordered) rather than
> visual order.

Easiness is not relevant here: the reordering engine is already coded
and fully operational.  If there are no strong R2L characters in a
buffer, the text displays as it was in previous versions of Emacs,
even though it passes through the reordering engine.



More information about the Linux-il mailing list