Group:  Microsoft Word ยป microsoft.public.word.newusers
Thread: Finding duplicate phrases and paragraphs.

Geek News

Finding duplicate phrases and paragraphs.
"Frank Martin" <fm[ at ]general.com.au> 11/21/2008 10:25:41 PM
I am copying a rare particular story from many different
newsgroups and pasting the fragments into a Word2003
document.

Is there some way to automatically find duplicated sections
of the story to as to help weld it into one seamless whole?

In the spell checker one can easily do this for duplicated
words, but I need the same thing for duplicated strings, and
even sentences.

Please help, Frank


Re: Finding duplicate phrases and paragraphs.
"Klaus Linke" <info[ at ]fotosatz-kaufmann.de> 11/21/2008 11:38:52 PM
"Frank Martin" <fm[ at ]general.com.au> wrote:
[Quoted Text]
>I am copying a rare particular story from many different newsgroups and
>pasting the fragments into a Word2003 document.
>
> Is there some way to automatically find duplicated sections of the story
> to as to help weld it into one seamless whole?
>
> In the spell checker one can easily do this for duplicated words, but I
> need the same thing for duplicated strings, and even sentences.


Hi Frank,

For repeated paragraphs, you could try a wildcard search for

(^13[!^13][ at ]^13)*\1

If a repeated paragraph is found, you'll see it at the start and end of the
selection... though there needs to be at least one paragraph in between.
For repeated paragraphs right next to each other, you could use

^13([!^13][ at ]^13)\1


For repeated sentences or other duplicated strings of some length, you'd
need a more complicated macro.
You could read the whole document into a string. You probably can find
algorithms for finding repeated phrases in the string using Google:
http://en.wikipedia.org/wiki/Longest_common_substring_problem

Regards,
Klaus

Re: Finding duplicate phrases and paragraphs.
"Frank Martin" <fm[ at ]general.com.au> 11/22/2008 1:03:21 AM

"Klaus Linke" <info[ at ]fotosatz-kaufmann.de> wrote in message
news:%23xzxRJDTJHA.2256[ at ]TK2MSFTNGP02.phx.gbl...
[Quoted Text]
> "Frank Martin" <fm[ at ]general.com.au> wrote:
>>I am copying a rare particular story from many different
>>newsgroups and pasting the fragments into a Word2003
>>document.
>>
>> Is there some way to automatically find duplicated
>> sections of the story to as to help weld it into one
>> seamless whole?
>>
>> In the spell checker one can easily do this for
>> duplicated words, but I need the same thing for
>> duplicated strings, and even sentences.
>
>
> Hi Frank,
>
> For repeated paragraphs, you could try a wildcard search
> for
>
> (^13[!^13][ at ]^13)*\1
>
> If a repeated paragraph is found, you'll see it at the
> start and end of the selection... though there needs to be
> at least one paragraph in between.
> For repeated paragraphs right next to each other, you
> could use
>
> ^13([!^13][ at ]^13)\1
>
>
> For repeated sentences or other duplicated strings of some
> length, you'd need a more complicated macro.
> You could read the whole document into a string. You
> probably can find algorithms for finding repeated phrases
> in the string using Google:
> http://en.wikipedia.org/wiki/Longest_common_substring_problem
>
> Regards,
> Klaus


Thank you. I could not get this to work, but I have found a
site with worked examples in word.
http://www.tutorials-win.com/archive/WordDoc/
Is there any way to search this archive for a specific
example?
Frank


Home | Search | Terms | Imprint Contact
Newsgroups Reader - provided by WiredBox.Net