Don't use any kind of hash to compare MSG files. Would you consider two message different if one MSG file stores first the sender name, then the subject and the second MSG file first stores the subject then the sender name? To an end user, the order in which thee properties are stored is irrelevant, but changing the order throws any hashing off. You need to define "sameness" of 2 MSG files. If I were you, I'd extract the properties that you care about (minus trailing carriage returns, etc) then compare them either separately or as a concatenated string.
Dmitry Streblechenko (MVP) http://www.dimastr.com/ OutlookSpy - Outlook, CDO and MAPI Developer Tool
"Martin" <Martin.Nikel[ at ]gmail.com> wrote in message news:1147971412.067243.297310[ at ]i40g2000cwc.googlegroups.com...
[Quoted Text] > Hi all, > > I noticed some relevant posts in this group, so thought I'd post my > query here. > > I have two MSG files, both exported from PST files. They are the same > message, same sent time, same sent date, To, From, CC, BCC, Importance > etc. In fact all the fields are the same. > > However, I am trying to de-duplicate these two messages using an MD5's > based partially on the body of the email. > > Viewing the source of the messages (the examples are HTML messages) and > comparing them using a diffing tool reveals no differences, but viwing > the text version of the message, in one there is an extra carriage > return. > > I think this could be due to different versions of outlook, or > something that has happened to the messages along the way, but I am > unsure how to tell. Does anyone have any experience of this particular > problem? > > Many thanks, > > Martin. >
|