Group:  Microsoft Excel ยป microsoft.public.excel.datamap
Thread: European Address Data

Geek News

European Address Data
"Mark S" <bob[ at ]bob.com> 5/10/2007 6:19:32 PM
Hi and thanks in advance for any help.

I have 700,000 European Companies in a csv file.

Unfortunately the COUNTRY is separate but all other information (street,
zip/postal, city, region/state) is in one ADDRESS column.

I would like to be able to "find and copy/move" info to separate
street/city/zip columns as best as possible

For example for French or German addresses the zip code is five digits so I
want to search the the ability to move '#####' ("five number characters in a
row")strings to a ZIP column

Ideally this would also allow the ability to also take everything AFTER the
"five number characters in a row" string (i.e. Paris and/or Chalonnes sur
Loire) to a CITY column given the European ordering standard of
address-zip-city)

Suggestions???


Re: European Address Data
"Peo Sjoblom" <terre08[ at ]mvps.org> 5/10/2007 6:28:18 PM
You almost have an impossible task, for instance in your example you can see
how hard it would be given that you have 2 cities in France, one use one
word the other 3 words, how would you know where one city ends? What about
UK postal codes or is this just for Germany and France?


--
Regards,

Peo Sjoblom




"Mark S" <bob[ at ]bob.com> wrote in message
news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no...
[Quoted Text]
> Hi and thanks in advance for any help.
>
> I have 700,000 European Companies in a csv file.
>
> Unfortunately the COUNTRY is separate but all other information (street,
> zip/postal, city, region/state) is in one ADDRESS column.
>
> I would like to be able to "find and copy/move" info to separate
> street/city/zip columns as best as possible
>
> For example for French or German addresses the zip code is five digits so
> I want to search the the ability to move '#####' ("five number characters
> in a row")strings to a ZIP column
>
> Ideally this would also allow the ability to also take everything AFTER
> the "five number characters in a row" string (i.e. Paris and/or Chalonnes
> sur Loire) to a CITY column given the European ordering standard of
> address-zip-city)
>
> Suggestions???
>


Re: European Address Data
"Mark S" <bob[ at ]bob.com> 5/10/2007 7:24:45 PM
Yes, I do realize (and have been working at it for a couple of days
already). FWIW I've had "some success" replacing some spaces with _ then
using the spaces to separate into different coloumns but obviously very time
consuming -- and british codes are the worst, what with not only a space
between but variable total length of characters . . . so as before, I'm
open to suggestions!





"Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message
news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl...
[Quoted Text]
> You almost have an impossible task, for instance in your example you can
> see how hard it would be given that you have 2 cities in France, one use
> one word the other 3 words, how would you know where one city ends? What
> about UK postal codes or is this just for Germany and France?
>
>
> --
> Regards,
>
> Peo Sjoblom
>
>
>
>
> "Mark S" <bob[ at ]bob.com> wrote in message
> news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no...
>> Hi and thanks in advance for any help.
>>
>> I have 700,000 European Companies in a csv file.
>>
>> Unfortunately the COUNTRY is separate but all other information (street,
>> zip/postal, city, region/state) is in one ADDRESS column.
>>
>> I would like to be able to "find and copy/move" info to separate
>> street/city/zip columns as best as possible
>>
>> For example for French or German addresses the zip code is five digits so
>> I want to search the the ability to move '#####' ("five number characters
>> in a row")strings to a ZIP column
>>
>> Ideally this would also allow the ability to also take everything AFTER
>> the "five number characters in a row" string (i.e. Paris and/or Chalonnes
>> sur Loire) to a CITY column given the European ordering standard of
>> address-zip-city)
>>
>> Suggestions???
>>
>
>


Re: European Address Data
"NickHK" <TungCheWah[ at ]Invalid.com> 5/11/2007 4:22:33 AM
Mark,
You mean you have addresses like these 2 (bad) examples, i.e. no delimeter
between the various parts:

2 Rue de boulogne Chalonnes sur Loire 28544, France
17 Cul de Sac Taunton Somerset TA2 7HF, england

There are list of country names (or create your own limited ones of
interest) that you can search through.
Once you have the country name, you could search a list of that country's
postal/area/zip codes.
May be also a list of country's counties/departements and towns/cities.
Split out the relevant part each time a section is resolved.
Road names may well be a problem, but that may be all you are left with
after you have removed the other information.

Such list are commercially available. Some free lists are also available,
but quality may be patchy.
You could try throwing the address at some suitable web site that deals in
locating/mapping.

Not easy all round, but doing 700K manually would also take a while.

NickHK

"Mark S" <bob[ at ]bob.com> wrote in message
news:1kK0i.173865$aG1.10148[ at ]pd7urf3no...
[Quoted Text]
> Yes, I do realize (and have been working at it for a couple of days
> already). FWIW I've had "some success" replacing some spaces with _ then
> using the spaces to separate into different coloumns but obviously very
time
> consuming -- and british codes are the worst, what with not only a space
> between but variable total length of characters . . . so as before, I'm
> open to suggestions!
>
>
>
>
>
> "Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message
> news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl...
> > You almost have an impossible task, for instance in your example you can
> > see how hard it would be given that you have 2 cities in France, one use
> > one word the other 3 words, how would you know where one city ends? What
> > about UK postal codes or is this just for Germany and France?
> >
> >
> > --
> > Regards,
> >
> > Peo Sjoblom
> >
> >
> >
> >
> > "Mark S" <bob[ at ]bob.com> wrote in message
> > news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no...
> >> Hi and thanks in advance for any help.
> >>
> >> I have 700,000 European Companies in a csv file.
> >>
> >> Unfortunately the COUNTRY is separate but all other information
(street,
> >> zip/postal, city, region/state) is in one ADDRESS column.
> >>
> >> I would like to be able to "find and copy/move" info to separate
> >> street/city/zip columns as best as possible
> >>
> >> For example for French or German addresses the zip code is five digits
so
> >> I want to search the the ability to move '#####' ("five number
characters
> >> in a row")strings to a ZIP column
> >>
> >> Ideally this would also allow the ability to also take everything
AFTER
> >> the "five number characters in a row" string (i.e. Paris and/or
Chalonnes
> >> sur Loire) to a CITY column given the European ordering standard of
> >> address-zip-city)
> >>
> >> Suggestions???
> >>
> >
> >
>
>


Re: European Address Data
"Tim Williams" <timjwilliams at gmail dot com> 5/11/2007 4:59:17 AM
This is a job for regular expressions. Try this out:

'***********************
Sub ExtractZipAndCity()
Dim regEx As Object
Dim m
Dim i As Integer
Dim rngText As Range
Dim s As String

Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "\b(\d{5})\b"
regEx.Global = True
regEx.IgnoreCase = True

Set rngText = ActiveSheet.Range("A1")

Do While rngText.Value <> ""

s = rngText.Value
Set m = regEx.Execute(s)
If Not m Is Nothing Then
If m.Count = 1 Then
rngText.Offset(0, 1).Value = m(0)
rngText.Offset(0, 2).Value = Trim(Split(s, m(0))(1))
End If
End If

Set rngText = rngText.Offset(1, 0)
Loop

End Sub
'***************************

However, with 700k records you've little hope of being able to check the
results
in any meaningful way, so your final version will need to be a lot more
robust.

Tim


"Mark S" <bob[ at ]bob.com> wrote in message
news:1kK0i.173865$aG1.10148[ at ]pd7urf3no...
[Quoted Text]
> Yes, I do realize (and have been working at it for a couple of days
> already). FWIW I've had "some success" replacing some spaces with _ then
> using the spaces to separate into different coloumns but obviously very
> time consuming -- and british codes are the worst, what with not only a
> space between but variable total length of characters . . . so as before,
> I'm open to suggestions!
>
>
>
>
>
> "Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message
> news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl...
>> You almost have an impossible task, for instance in your example you can
>> see how hard it would be given that you have 2 cities in France, one use
>> one word the other 3 words, how would you know where one city ends? What
>> about UK postal codes or is this just for Germany and France?
>>
>>
>> --
>> Regards,
>>
>> Peo Sjoblom
>>
>>
>>
>>
>> "Mark S" <bob[ at ]bob.com> wrote in message
>> news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no...
>>> Hi and thanks in advance for any help.
>>>
>>> I have 700,000 European Companies in a csv file.
>>>
>>> Unfortunately the COUNTRY is separate but all other information (street,
>>> zip/postal, city, region/state) is in one ADDRESS column.
>>>
>>> I would like to be able to "find and copy/move" info to separate
>>> street/city/zip columns as best as possible
>>>
>>> For example for French or German addresses the zip code is five digits
>>> so I want to search the the ability to move '#####' ("five number
>>> characters in a row")strings to a ZIP column
>>>
>>> Ideally this would also allow the ability to also take everything AFTER
>>> the "five number characters in a row" string (i.e. Paris and/or
>>> Chalonnes sur Loire) to a CITY column given the European ordering
>>> standard of address-zip-city)
>>>
>>> Suggestions???
>>>
>>
>>
>
>


RE: European Address Data
John.Greenan 5/11/2007 7:42:00 AM
There is no simple way to do this - there is no "European standard" for
addresses. You can either try a data driven approach - parse out differently
for cases where you find "england", "france" and so on, but then you have to
be cute to look for spelling mistakes.

Sorry to be a doom merchant, but if I was in your shoes I'd either grit my
teeth, roll up my sleeves and prepare for a long long job OR go back to the
source of the data and ask for the data in a better format. If you cannot get
the data in a better format then good luck.




--
http://www.alignment-systems.com


"Mark S" wrote:

[Quoted Text]
> Hi and thanks in advance for any help.
>
> I have 700,000 European Companies in a csv file.
>
> Unfortunately the COUNTRY is separate but all other information (street,
> zip/postal, city, region/state) is in one ADDRESS column.
>
> I would like to be able to "find and copy/move" info to separate
> street/city/zip columns as best as possible
>
> For example for French or German addresses the zip code is five digits so I
> want to search the the ability to move '#####' ("five number characters in a
> row")strings to a ZIP column
>
> Ideally this would also allow the ability to also take everything AFTER the
> "five number characters in a row" string (i.e. Paris and/or Chalonnes sur
> Loire) to a CITY column given the European ordering standard of
> address-zip-city)
>
> Suggestions???
>
>
>

Home | Search | Terms | Imprint Contact
Newsgroups Reader - provided by WiredBox.Net