Reading RTF files
Ehud Karni
ehud at unix.mvs.co.il
Sun Jul 12 19:41:44 IDT 2009
On Fri, 10 Jul 2009 09:38:45 Micha Silver wrote:
>
> Ehud Karni wrote:
>
> > I use `catdoc' which works quiet good (for both *doc and *rtf).
> > `catdoc' is available as a package for Centos and Debian.
>
> Thanks for the tip, but I can't get any sensible output. I ran:
>
> catdoc -a -d8859-8 invoice150711.rtf | fribidi --charset ISO8859-8
> --width=80 --rtl
> and I get 188 empty lines. :-(
After Micha sent me his RTF file, I found out that it contain
"text boxes", not plain text.
Using open office does not help, It shows empty (almost) page.
Filter your RTF with the sed command bellow, it will drop the boxes.
Than run catdoc as above or use open office (I used `ooviewdoc') both
will show you the data (ooviewdoc saves more of the original layout).
BTW. When viewed with M$word, the filtered file show empty boxes.
Ehud.
sed -e "s/{...do.dobxpage.dobypara.dodhgt8192.dptxbx.dptxbxmar0{/{/g" \
-e "s/}.dpx[0-9]*.dpy[0-9]*.dpxsize[0-9]*.dpysize[0-9]*.dplinehollow0}/}/g"
--
Ehud Karni Tel: +972-3-7966-561 /"\
Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign
Insurance agencies (USA) voice mail and X Against HTML Mail
http://www.mvs.co.il FAX: 1-815-5509341 / \
GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry
More information about the Linux-il
mailing list