Werbung: SecurityConsole.de verwaltet Ihre Computer mit Security Essentails aus der Cloud!
30 Tage kostenfrei testen und 20% Rabatt für Ihre Bestellung mit Promocode: WBF2685582
(Promocode gültig bis 31.12.2011)

Group:  English: General » microsoft.public.windows.powershell
Thread: Comparing two files and making changes

HTVi
TV Discussion Newsgroups

Comparing two files and making changes
Swackhammer1 12/30/2008 9:42:37 PM
Hi,
I've been banging my head against a wall trying to figure this out and
nothing seems to work.

I have two files, File A contains a bunch of student information and File B
contains email aliases. File A has about 15,000 lines with each line being a
different student. File B has about 50,000 lines with about 49,000 lines
being aliases. The first 1000 lines or so in File B is miscellaneous text but
is required.

File A is in the following format:

year;FirstName;LastName;LastName
(again);firstname.lastname;firstname.lastname (again);code


File B is in the following format:

misc
misc
somename:somename[ at ]domain.com
firstname.lastname:firstname.lastname[ at ]studentdomain.com


What I need to do is this:
1) Read line from File A
2) get firstname.lastname
3) search in File B for firstname.lastname
4) if found, replace studentdomain.com with domain.com
5) Do until all names from File A are completed.

I know of an easy way to do it, but that would mean reading the file and
writing to File B 15000 times. Not efficient enough.
I've been trying to come up with something that can atleast do 500-1000
users/hr.

I've tried a bunch of things and nothing works. I get duplicate entries and
all kind of nasties due the nested loops.

Any help would be greatly appreciated.

Thank you.

Re: Comparing two files and making changes
"Keith Hill [MVP]" <r_keith_hill[ at ]mailhot.moc.no_spam_I> 12/30/2008 10:23:25 PM
I would read File B into a string array and while in the process, check each
line to see if it is a firstname.lastname:firstname.lastname[ at ] and if so, use
the firstname.lastname as the key into a hashtable where the value is the
line index e.g.:

$ht = [ at ]{}
$fileb = Get-content fileb.txt | %{$i=0}{if ($_ -match '^(\w+\.\w+):\1[ at ]')
{ $ht[$matches[1]] += [ at ]($i) }; $i++; $_}

Now scan through file A, pull out firstname.lastname, use that to index in
to the hashtable. That will give you back an array of line indices (I
assuming each person can have more than one alias). Go patch up those line
indices in the array and when you're down, save the array back out to a new
file.

HTH
--
Keith

"Swackhammer1" <Swackhammer1[ at ]discussions.microsoft.com> wrote in message
news:3C2CA67B-C492-4090-A978-7D60E5421780[ at ]microsoft.com...
[Quoted Text]
> Hi,
> I've been banging my head against a wall trying to figure this out and
> nothing seems to work.
>
> I have two files, File A contains a bunch of student information and File
> B
> contains email aliases. File A has about 15,000 lines with each line being
> a
> different student. File B has about 50,000 lines with about 49,000 lines
> being aliases. The first 1000 lines or so in File B is miscellaneous text
> but
> is required.
>
> File A is in the following format:
>
> year;FirstName;LastName;LastName
> (again);firstname.lastname;firstname.lastname (again);code
>
>
> File B is in the following format:
>
> misc
> misc
> somename:somename[ at ]domain.com
> firstname.lastname:firstname.lastname[ at ]studentdomain.com
>
>
> What I need to do is this:
> 1) Read line from File A
> 2) get firstname.lastname
> 3) search in File B for firstname.lastname
> 4) if found, replace studentdomain.com with domain.com
> 5) Do until all names from File A are completed.
>
> I know of an easy way to do it, but that would mean reading the file and
> writing to File B 15000 times. Not efficient enough.
> I've been trying to come up with something that can atleast do 500-1000
> users/hr.
>
> I've tried a bunch of things and nothing works. I get duplicate entries
> and
> all kind of nasties due the nested loops.
>
> Any help would be greatly appreciated.
>
> Thank you.
>
Re: Comparing two files and making changes
Swackhammer1 12/30/2008 10:40:00 PM
Thanks for the quick response. I'll try what you suggested and let you know
how it works out.

"Keith Hill [MVP]" wrote:

[Quoted Text]
> I would read File B into a string array and while in the process, check each
> line to see if it is a firstname.lastname:firstname.lastname[ at ] and if so, use
> the firstname.lastname as the key into a hashtable where the value is the
> line index e.g.:
>
> $ht = [ at ]{}
> $fileb = Get-content fileb.txt | %{$i=0}{if ($_ -match '^(\w+\.\w+):\1[ at ]')
> { $ht[$matches[1]] += [ at ]($i) }; $i++; $_}
>
> Now scan through file A, pull out firstname.lastname, use that to index in
> to the hashtable. That will give you back an array of line indices (I
> assuming each person can have more than one alias). Go patch up those line
> indices in the array and when you're down, save the array back out to a new
> file.
>
> HTH
> --
> Keith
>
> "Swackhammer1" <Swackhammer1[ at ]discussions.microsoft.com> wrote in message
> news:3C2CA67B-C492-4090-A978-7D60E5421780[ at ]microsoft.com...
> > Hi,
> > I've been banging my head against a wall trying to figure this out and
> > nothing seems to work.
> >
> > I have two files, File A contains a bunch of student information and File
> > B
> > contains email aliases. File A has about 15,000 lines with each line being
> > a
> > different student. File B has about 50,000 lines with about 49,000 lines
> > being aliases. The first 1000 lines or so in File B is miscellaneous text
> > but
> > is required.
> >
> > File A is in the following format:
> >
> > year;FirstName;LastName;LastName
> > (again);firstname.lastname;firstname.lastname (again);code
> >
> >
> > File B is in the following format:
> >
> > misc
> > misc
> > somename:somename[ at ]domain.com
> > firstname.lastname:firstname.lastname[ at ]studentdomain.com
> >
> >
> > What I need to do is this:
> > 1) Read line from File A
> > 2) get firstname.lastname
> > 3) search in File B for firstname.lastname
> > 4) if found, replace studentdomain.com with domain.com
> > 5) Do until all names from File A are completed.
> >
> > I know of an easy way to do it, but that would mean reading the file and
> > writing to File B 15000 times. Not efficient enough.
> > I've been trying to come up with something that can atleast do 500-1000
> > users/hr.
> >
> > I've tried a bunch of things and nothing works. I get duplicate entries
> > and
> > all kind of nasties due the nested loops.
> >
> > Any help would be greatly appreciated.
> >
> > Thank you.
> >
Re: Comparing two files and making changes
"Kiron" <Kiron[ at ]HighPlainsDrifter.com> 12/31/2008 3:08:10 AM
Here's another way. First collect the students from FileA.txt and stored unique ones in an array. Then read FileB.txt and apply changes where the line matches one of the students. Finally pipe all lines to a new file. There are four different methods to accomplish the second step:

# using UTF8 encoding in these samples
# extract a list of unique students, good for all four methods
gc FileA.txt -en utf8 | % {$z = [ at ]()} {
$x,$y = $_.split(';')[4,5]
if ($x.length -and $x -eq $y -and $z -notcontains $x) {$z += ,$x}
}

# here are the four different methods
# method A - Switch statement, -match and -contains operators
$(switch -file FileB.txt {
{$_ -match '^(\w+\.\w+):\1[ at ]' -and
$z -contains $matches[1]} {$_ -replace '(?<=\[ at ])student(?=domain)'}
default {$_}
}) | sc FileC_A.txt -en utf8

# method B - Switch statement, -match operator against an ORed RegEx
# build the ORed and escaped RegEx pattern, good for methods B and C
$pat = &{$ofs = '|'
"$($z | sort | % {[regex]::escape($_)})"}

$(switch -regex -file FileB.txt {
"($pat):\1[ at ]" {$_ -replace '(?<=\[ at ])student(?=domain)'}
default {$_}
}) | sc FileC_B.txt -en utf8

# method C - Get-Content, -match operator against the ORed RegEx
gc FileB.txt -en utf8 | % {
if ($_ -match "($pat):\1[ at ]") {
$_ -replace '(?<=\[ at ])student(?=domain)'
} else {$_}
} | sc FileC_C.txt -en utf8

# method D - Get-Content, -match and -contains operators
gc FileB.txt -en utf8 | % {
if ($_ -match '(\w+\.\w+):\1[ at ]' -and $z -contains $matches[1]) {
$_ -replace '(?<=\[ at ])student(?=domain)'
} else {$_}
} | sc FileC_D.txt -en utf8

I wonder which is faster, including Keith's?... :)

--
Kiron

Home | Search | Terms | Imprint Contact
Newsgroups Reader - provided by WiredBox.Net
Suche nach Orten, Städten, Postleitzahlen, Vorwahlen, Kfz-Kennzeichen