Emacs & Hebrew

Emacs & Hebrew

Dov Grobgeld dov.grobgeld at gmail.com
Fri Jun 15 10:41:28 IDT 2012


I absolutely agree with Eli, that getting reasonable Bidi display when
editing source code in Emacs is feasible, for the very reason that emacs is
*syntax aware*. As long as the syntax is understood, it is possible to make
sure that the various syntax elements (keywords, strings, comments) are
handled in isolation from one another, and that there is no influence
flowing e.g. from one embedded Hebrew string to the surrounding syntax
characters.

Consider a text with the following syntax:

    print("HELLO!"); // TELL THE USER HELLO

where small characters are LTR and capitals RTL. Without syntax awareness
the charactes "!"); //" would be sandwiched between two RTL characters and
be turned into RTL direction by the Bidi algorithm. But by handling strings
and comments in isolation that does will not occur.

Thus I expect Bidi for syntaxes to influence strings and comments only and
leave everything else intact. This would work for Nadav's example as well:

   s/A/B/

as the syntax engine should isolate the bidi reordering of A from that of
B, there is no problem. It won't be flawless though as e.g. in the
following example:

   string.tr("abcdef",
             "ABCDEF")

where you would like to substitute an 'a' for an 'A' and a 'b' for a 'B'.
By applying reordering on "ABCDEF" you would loose the visual alignment.

Regards,
Dov

On Fri, Jun 15, 2012 at 12:32 AM, Eli Zaretskii <eliz at gnu.org> wrote:

> > From: Omer Zak <w1 at zak.co.il>
> > Date: Thu, 14 Jun 2012 23:38:16 +0300
> >
> > First of all, it is GREAT that Emacs 24 has BiDi support.
>
> Thanks.
>
> > Simply, have Emacs continue to support programming languages, just
> > without default BiDi support when editing such text.
> >
> > Since it is anyway good practice to localize strings in separate *.po
> > files, I suggest that all of we invest our nitpicking skills in
> > specifying how to get Emacs to provide excellent support for BiDi
> > rendering of *.po files and similar ones (such as Android's
> > strings.xml).
>
> The same infrastructure that is needed to support *.po files, is also
> needed to support HTML/XML markup and comments/strings in programming
> languages.  After all, the messages in a .po file are just strings,
> they use string syntax.  Whether to use that infrastructure once it
> exists is up to the maintainers of the respective modes.  My job will
> be done when the infrastructure exists and is reasonably usable and
> flexible to satisfy these needs.
>
> > About comments, let's not bother.  I don't think it'll be a good
> > practice to write comments in any language other than English.  If
> > anyone wants to do so, caveat emptor.
>
> I just showed you an example from Nadav's Perl script in the Hspell
> distribution.  The comments are mostly in English, but they have
> embedded Hebrew words and phrases, which is understandable given the
> goal of that script.  I don't want to dismiss this use case so easily,
> and don't really see a reason to do that, since comments are
> recognized by Emacs in any language it supports.
>
> > What if anyone needs to illustrate how a function handles Hebrew text
> > and because of this he needs to write a portion of the function's
> > comment in Hebrew?
>
> That's what Nadav's script does.
>
> > Would it be reasonable to require the user to insert one of the
> > directionality special characters as a signal for Emacs to turn on
> > visual BiDi ordering mode in that comment (and turn it back off at
> > the comment's end as defined in that programming language)?
>
> There are no directional controls that turn on reordering.  There are
> directional controls that override the bidi properties of the
> characters, so you can have Hebrew characters have L2R directionality,
> which will effectively disable their reordering.  But these controls'
> effect ends at the newline, by virtue of the UBA, so you would need to
> use a lot of them to manually enable and disable reordering in the
> whole buffer, part by part.
>
> > You ignored other places which might use Hebrew text in programming
> > languages:
> > - Perl's regular expressions (also Python).
>
> These are already recognized by Emacs for fontification, so the same
> machinery can be used to reorder these strings (or not, as the Perl
> users want).
>
> > - Identifiers in computer languages, which allow non-ASCII characters in
> > identifiers.
>
> Likewise: Emacs already fontifies identifiers, so they are
> recognizable.
>
> > - Strings may have various and differing formatting characters/phrases
> > (C, LISP and FORTRAN have their differing formatting languages).
>
> And they all are already recognized by Emacs, aren't they?  Otherwise
> you get bad fontification and bug reports.
>
> > - HTML/XML fragments - can be part of either strings or comments.
>
> The relevant modes will have to write code to recognize them.  Emacs
> has powerful features -- regular expressions and syntax tables -- to
> facilitate that.
>
> > - How will we let the programmer override, for one place in one file,
> > the automatic derivation as needed to deal with pathological cases?
>
> By using the directional controls, or by inserting newlines
> judiciously.  This should cover almost everything.
>
> > - How reasonable will it be to ask the programmer to insert extra
> > characters into the text just to get it correctly rendered?
>
> I don't know, I guess it depends on the programmer.
>
> > Unless priorities have changed without my knowing about this, Emacs is
> > an editor.  It is not meant to be a WYSIWYG type editor.  It is not
> > meant to be a text viewer (such as a Web browser).  It must make it easy
> > to edit the text, not necessarily render it in some final form.
>
> Well, I guess priorities have changed, then: Emacs now has a lot of
> specialized modes whose goal is to display plain-text files in a lot
> of complex ways that border on WYSIWYG.  Just to mention a few
> examples, take the Org mode, or the way cross-references are displayed
> in Info manuals, or the fancy tabulated list displays provided by
> tabulated-list.el (used by buffer-menu.el).
>
> > And when programming, there are enough cases in which it is easier to
> > edit BiDi text displayed in logical order (not reordered) rather than
> > visual order.
>
> Easiness is not relevant here: the reordering engine is already coded
> and fully operational.  If there are no strong R2L characters in a
> buffer, the text displays as it was in previous versions of Emacs,
> even though it passes through the reordering engine.
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20120615/7dcfbfdb/attachment-0001.html>


More information about the Linux-il mailing list