|
|
Hi there, What is the difference in rtf file generated by wordpad and that generated by office 2007?
I find some differences while storing images.
Are there any other differences?
Also i want to extract image files and other embedded files from RTF file, how is this possible?
thank you for your time miztaken
|
|
"miztaken" <justjunktome[ at ]gmail.com> wrote:
[Quoted Text] > Hi there, > What is the difference in rtf file generated by wordpad and that > generated by office 2007? > > I find some differences while storing images. > > Are there any other differences?
Wordpad is based on a much older version of Word, and doesn't support a lot of features even of older Word versions (97, 2000, ...). I guess the RTF it exports will correspond to maybe RTF version 1.5 or so, Word 2007 uses 1.9.
So if you save Word 2007 documents as RTF in WordPad, a lot of stuff will be stripped out.
> Also i want to extract image files and other embedded files from > RTF file, how is this possible?
You could save as HTML. The images then end up in a separate folder. Word 2007 *.docx files are ZIP files... I have 't tried yet, but guess you could save as *docx, look through the subfolders after unzipping it, and probably find the images and other embedded files in some subfolder.
Regards, Klaus
|
|
Hey Klaus,
[Quoted Text] > Wordpad is based on a much older version of Word, and doesn't support a lot > of features even of older Word versions (97, 2000, ...). > I guess the RTF it exports will correspond to maybe RTF version 1.5 or so, > Word 2007 uses 1.9.
How can i know which version of RTF file is it. I can have bunch of RTF files. So for different version of RTF do i have to prepare different parsers? If yes/no, what do you suggest me?
> You could save as HTML. The images then end up in a separate folder. If i save those RTF as html then i believe, all the embedded objects and attachments will be dumped in a single binary file named oledata.mso and i have no clue how to read that.
I want to extract image file from RTF file without saving them as HTML? Are there any other ways??
Thank you miztaken
|
|
"miztaken" <justjunktome[ at ]gmail.com> wrote:
[Quoted Text] > How can i know which version of RTF file is it.
I don't think you can tell from the RTF file. An RTF reader (say some version of Word or Wordpad) will just ignore things it does not understand.
> I can have bunch of RTF files. > So for different version of RTF do i have to prepare different > parsers? > If yes/no, what do you suggest me?
Your parser will likely ignore things you aren't interested in, so you won't need different parsers.
If your goal is to extract images and embedded objects, I wouldn't try to parse the RTF though. There are likely better, safer and simpler ways ... either exporting to some other format as described in my last post, or using VBA in Word.
>> You could save as HTML. The images then end up in a separate folder. > If i save those RTF as html then i believe, all the embedded objects > and attachments will be dumped in a single binary file named > oledata.mso and i have no clue how to read that.
Have you tried to export as "HTML, filtered"?
> I want to extract image file from RTF file without saving them as > HTML? Are there any other ways??
Maybe someone else can help... As I said, I wouldn't try it. If you want to try using VBA, you could post in one of the VBA groups. I suspect the code would depend on the format of the image or embedded object though, and it might be necessary to, say, copy/paste the image into that [graphics] program, and save it from there.
Regards, Klaus
|
|
|
[Quoted Text] >> If i save those RTF as html then i believe, all the embedded objects >> and attachments will be dumped in a single binary file named >> oledata.mso and i have no clue how to read that.
Graham Mayor has an article with detailed tipps: http://www.gmayor.com:80/extract_images_from_word.htm
Klaus
|
|
|