Odyssey of trying to convert several utf-8 encoded text files into MS-Word *.doc files

Odyssey of trying to convert several utf-8 encoded text files into MS-Word *.doc files

Omer Zak w1 at zak.co.il
Sun Nov 27 00:01:03 IST 2011


I need to convert several utf-8 encoded text files into MS-Word *.doc
format.  So I need to accomplish it from the command line.
In Linux, it is easy to find tools to convert from MS-Word formats into
text, but a Google search failed to yield converters in the opposite
direction.

I tried two word processors:  LibreOffice and AbiWord.

LibreOffice (version 1:3.4.3-3~bpo60+1 in Debian Squeeze - yes, it's a
backport) allows you to perform the conversion, but you must go through
a GUI.  I did not find instructions how to accomplish this from the
command line.

AbiWord (version 2.8.2-2.1 in Debian Squeeze) knows to convert documents
from the command line, but to correctly identify utf-8 encoded text
files, you must prefix them by BOM (0xEF 0xBB 0xBF), which is not
difficult to do from the command line (or by a short Python script).
However I didn't find how to specify the font and without specifying the
font, AbiWord selects a gibberish font.
AbiWord has the --inp-props command line option, but it expects its
argument as a "CSS String" (which is not a true CSS fragment).  Nothing
I gave as argument seemed to be recognized.  Illegal strings did not
yield any error message.

The next thing that I'll try is to convert from text into HTML and then
try to import HTML into the above word processors.

Did anyone else succeed in performing such conversions?  If yes, how
were they accomplished?

--- Omer


-- 
Did you shave a yak today?
My own blog is at http://www.zak.co.il/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html




More information about the Linux-il mailing list