|
|
Hi and thanks in advance for any help.
I have 700,000 European Companies in a csv file.
Unfortunately the COUNTRY is separate but all other information (street, zip/postal, city, region/state) is in one ADDRESS column.
I would like to be able to "find and copy/move" info to separate street/city/zip columns as best as possible
For example for French or German addresses the zip code is five digits so I want to search the the ability to move '#####' ("five number characters in a row")strings to a ZIP column
Ideally this would also allow the ability to also take everything AFTER the "five number characters in a row" string (i.e. Paris and/or Chalonnes sur Loire) to a CITY column given the European ordering standard of address-zip-city)
Suggestions???
|
|
You almost have an impossible task, for instance in your example you can see how hard it would be given that you have 2 cities in France, one use one word the other 3 words, how would you know where one city ends? What about UK postal codes or is this just for Germany and France?
-- Regards,
Peo Sjoblom
"Mark S" <bob[ at ]bob.com> wrote in message news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no...
[Quoted Text] > Hi and thanks in advance for any help. > > I have 700,000 European Companies in a csv file. > > Unfortunately the COUNTRY is separate but all other information (street, > zip/postal, city, region/state) is in one ADDRESS column. > > I would like to be able to "find and copy/move" info to separate > street/city/zip columns as best as possible > > For example for French or German addresses the zip code is five digits so > I want to search the the ability to move '#####' ("five number characters > in a row")strings to a ZIP column > > Ideally this would also allow the ability to also take everything AFTER > the "five number characters in a row" string (i.e. Paris and/or Chalonnes > sur Loire) to a CITY column given the European ordering standard of > address-zip-city) > > Suggestions??? >
|
|
Yes, I do realize (and have been working at it for a couple of days already). FWIW I've had "some success" replacing some spaces with _ then using the spaces to separate into different coloumns but obviously very time consuming -- and british codes are the worst, what with not only a space between but variable total length of characters . . . so as before, I'm open to suggestions!
"Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl...
[Quoted Text] > You almost have an impossible task, for instance in your example you can > see how hard it would be given that you have 2 cities in France, one use > one word the other 3 words, how would you know where one city ends? What > about UK postal codes or is this just for Germany and France? > > > -- > Regards, > > Peo Sjoblom > > > > > "Mark S" <bob[ at ]bob.com> wrote in message > news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no... >> Hi and thanks in advance for any help. >> >> I have 700,000 European Companies in a csv file. >> >> Unfortunately the COUNTRY is separate but all other information (street, >> zip/postal, city, region/state) is in one ADDRESS column. >> >> I would like to be able to "find and copy/move" info to separate >> street/city/zip columns as best as possible >> >> For example for French or German addresses the zip code is five digits so >> I want to search the the ability to move '#####' ("five number characters >> in a row")strings to a ZIP column >> >> Ideally this would also allow the ability to also take everything AFTER >> the "five number characters in a row" string (i.e. Paris and/or Chalonnes >> sur Loire) to a CITY column given the European ordering standard of >> address-zip-city) >> >> Suggestions??? >> > >
|
|
Mark, You mean you have addresses like these 2 (bad) examples, i.e. no delimeter between the various parts:
2 Rue de boulogne Chalonnes sur Loire 28544, France 17 Cul de Sac Taunton Somerset TA2 7HF, england
There are list of country names (or create your own limited ones of interest) that you can search through. Once you have the country name, you could search a list of that country's postal/area/zip codes. May be also a list of country's counties/departements and towns/cities. Split out the relevant part each time a section is resolved. Road names may well be a problem, but that may be all you are left with after you have removed the other information.
Such list are commercially available. Some free lists are also available, but quality may be patchy. You could try throwing the address at some suitable web site that deals in locating/mapping.
Not easy all round, but doing 700K manually would also take a while.
NickHK
"Mark S" <bob[ at ]bob.com> wrote in message news:1kK0i.173865$aG1.10148[ at ]pd7urf3no...
[Quoted Text] > Yes, I do realize (and have been working at it for a couple of days > already). FWIW I've had "some success" replacing some spaces with _ then > using the spaces to separate into different coloumns but obviously very
time > consuming -- and british codes are the worst, what with not only a space > between but variable total length of characters . . . so as before, I'm > open to suggestions! > > > > > > "Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message > news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl... > > You almost have an impossible task, for instance in your example you can > > see how hard it would be given that you have 2 cities in France, one use > > one word the other 3 words, how would you know where one city ends? What > > about UK postal codes or is this just for Germany and France? > > > > > > -- > > Regards, > > > > Peo Sjoblom > > > > > > > > > > "Mark S" <bob[ at ]bob.com> wrote in message > > news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no... > >> Hi and thanks in advance for any help. > >> > >> I have 700,000 European Companies in a csv file. > >> > >> Unfortunately the COUNTRY is separate but all other information (street, > >> zip/postal, city, region/state) is in one ADDRESS column. > >> > >> I would like to be able to "find and copy/move" info to separate > >> street/city/zip columns as best as possible > >> > >> For example for French or German addresses the zip code is five digits so > >> I want to search the the ability to move '#####' ("five number characters > >> in a row")strings to a ZIP column > >> > >> Ideally this would also allow the ability to also take everything AFTER > >> the "five number characters in a row" string (i.e. Paris and/or Chalonnes > >> sur Loire) to a CITY column given the European ordering standard of > >> address-zip-city) > >> > >> Suggestions??? > >> > > > > > >
|
|
This is a job for regular expressions. Try this out:
'*********************** Sub ExtractZipAndCity() Dim regEx As Object Dim m Dim i As Integer Dim rngText As Range Dim s As String
Set regEx = CreateObject("vbscript.regexp") regEx.Pattern = "\b(\d{5})\b" regEx.Global = True regEx.IgnoreCase = True
Set rngText = ActiveSheet.Range("A1")
Do While rngText.Value <> ""
s = rngText.Value Set m = regEx.Execute(s) If Not m Is Nothing Then If m.Count = 1 Then rngText.Offset(0, 1).Value = m(0) rngText.Offset(0, 2).Value = Trim(Split(s, m(0))(1)) End If End If
Set rngText = rngText.Offset(1, 0) Loop
End Sub '***************************
However, with 700k records you've little hope of being able to check the results in any meaningful way, so your final version will need to be a lot more robust.
Tim
"Mark S" <bob[ at ]bob.com> wrote in message news:1kK0i.173865$aG1.10148[ at ]pd7urf3no...
[Quoted Text] > Yes, I do realize (and have been working at it for a couple of days > already). FWIW I've had "some success" replacing some spaces with _ then > using the spaces to separate into different coloumns but obviously very > time consuming -- and british codes are the worst, what with not only a > space between but variable total length of characters . . . so as before, > I'm open to suggestions! > > > > > > "Peo Sjoblom" <terre08[ at ]mvps.org> wrote in message > news:OF%23pXEzkHHA.3472[ at ]TK2MSFTNGP04.phx.gbl... >> You almost have an impossible task, for instance in your example you can >> see how hard it would be given that you have 2 cities in France, one use >> one word the other 3 words, how would you know where one city ends? What >> about UK postal codes or is this just for Germany and France? >> >> >> -- >> Regards, >> >> Peo Sjoblom >> >> >> >> >> "Mark S" <bob[ at ]bob.com> wrote in message >> news:UmJ0i.173268$6m4.84588[ at ]pd7urf1no... >>> Hi and thanks in advance for any help. >>> >>> I have 700,000 European Companies in a csv file. >>> >>> Unfortunately the COUNTRY is separate but all other information (street, >>> zip/postal, city, region/state) is in one ADDRESS column. >>> >>> I would like to be able to "find and copy/move" info to separate >>> street/city/zip columns as best as possible >>> >>> For example for French or German addresses the zip code is five digits >>> so I want to search the the ability to move '#####' ("five number >>> characters in a row")strings to a ZIP column >>> >>> Ideally this would also allow the ability to also take everything AFTER >>> the "five number characters in a row" string (i.e. Paris and/or >>> Chalonnes sur Loire) to a CITY column given the European ordering >>> standard of address-zip-city) >>> >>> Suggestions??? >>> >> >> > >
|
|
There is no simple way to do this - there is no "European standard" for addresses. You can either try a data driven approach - parse out differently for cases where you find "england", "france" and so on, but then you have to be cute to look for spelling mistakes.
Sorry to be a doom merchant, but if I was in your shoes I'd either grit my teeth, roll up my sleeves and prepare for a long long job OR go back to the source of the data and ask for the data in a better format. If you cannot get the data in a better format then good luck.
-- http://www.alignment-systems.com
"Mark S" wrote:
[Quoted Text] > Hi and thanks in advance for any help. > > I have 700,000 European Companies in a csv file. > > Unfortunately the COUNTRY is separate but all other information (street, > zip/postal, city, region/state) is in one ADDRESS column. > > I would like to be able to "find and copy/move" info to separate > street/city/zip columns as best as possible > > For example for French or German addresses the zip code is five digits so I > want to search the the ability to move '#####' ("five number characters in a > row")strings to a ZIP column > > Ideally this would also allow the ability to also take everything AFTER the > "five number characters in a row" string (i.e. Paris and/or Chalonnes sur > Loire) to a CITY column given the European ordering standard of > address-zip-city) > > Suggestions??? > > >
|
|
|