Reading RTF files

Reading RTF files

Ehud Karni ehud at unix.mvs.co.il
Sun Jul 12 19:41:44 IDT 2009


On Fri, 10 Jul 2009 09:38:45 Micha Silver wrote:
>
> Ehud Karni wrote:
>
> > I use `catdoc' which works quiet good (for both *doc and *rtf).
> > `catdoc' is available as a package for Centos and Debian.
>
> Thanks for the tip, but I can't get any sensible output. I ran:
>
>  catdoc -a -d8859-8 invoice150711.rtf | fribidi --charset ISO8859-8
> --width=80 --rtl
> and I get 188 empty lines. :-(

After Micha sent me his RTF file, I found out that it contain
"text boxes", not plain text.

Using open office does not help, It shows empty (almost) page.

Filter your RTF with the sed command bellow, it will drop the boxes.
Than run catdoc as above or use open office (I used `ooviewdoc') both
will show you the data (ooviewdoc saves more of the original layout).
BTW. When viewed with M$word, the filtered file show empty boxes.

Ehud.


sed -e "s/{...do.dobxpage.dobypara.dodhgt8192.dptxbx.dptxbxmar0{/{/g"   \
    -e "s/}.dpx[0-9]*.dpy[0-9]*.dpxsize[0-9]*.dpysize[0-9]*.dplinehollow0}/}/g"



--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



More information about the Linux-il mailing list