|
|
Starting last Sunday (day before memorial day), we started getting a slew of critical errors in our logs, as well as lost functionality on our domain:
We have three Domain Controllers at AD 2003 level. We also have two Exchange 2003 boxes (front/back configuration), and a few other boxes. All servers are Win2K3 SP2.
Two of our three Domain Controllers (DC1 & DC2) are essentially running crippled: the shares work, but nothing else does. The Third DC (DC3) is functioning properly, possibly because it's on a separate subnet - although replication is not functioning.
When We look in the event log, there are similar errors on both DC1 & DC2.
In the Application Event log, there is:
Event ID 1030 (Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this.) three times at five minute intervals.
Event ID 1053 (Windows cannot determine the user or computer name. (Not enough storage is available to complete this operation. ). Group Policy processing aborted. ) every five minutes to reboot: This server also have more then 30GB of space on each partition.
The directory service Event log has a few Event ID 1308 warnings (The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate with the following domain controller has consistently failed.
Attempts: 2 Domain controller: CN=NTDS Settings,CN=***SRVR,CN=Servers,CN=Default-First-Site- Name,CN=Sites,CN=Configuration,DC=***,DC=org Period of time (minutes): 124
The Connection object for this domain controller will be ignored, and a new temporary connection will be established to ensure that replication continues. Once replication with this domain controller resumes, the temporary connection will be removed.
Additional Data Error value: 14 Not enough storage is available to complete this operation.)
The FRS Event log has a few 13508 and 13509 errors.
The DNS log is where the real action is:
Event ID 404: (The DNS server could not bind a Transmission Control Protocol (TCP) socket to address 0.0.0.0. The event data is the error code. An IP address of 0.0.0.0 can indicate a valid "any address" configuration in which all configured IP addresses on the computer are available for use. Restart the DNS server or reboot the computer.)
Event ID 408 (The DNS server could not open socket for address 0.0.0.0. Verify that this is a valid IP address for the server computer. If it is NOT valid use the Interfaces dialog under Server Properties in the DNS Manager to remove it from the list of IP interfaces. Then stop and restart the DNS server. (If this was the only IP interface on this machine and the DNS server may not have started as a result of this error. In that case remove the DNS\Parmeters\ ListenAddress value in the services section of the registry and restart.) If this is a valid IP address for this machine, make sure that no other application (e.g. another DNS server) is running that would attempt to use the DNS port.)
Event ID 4015 (The DNS server has encountered a critical error from the Active Directory. Check that the Active Directory is functioning properly. The extended error debug information (which may be empty) is "". The event data contains the error.)
Event ID 4004 (The DNS server was unable to complete directory service enumeration of zone .. This DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it. Check that the Active Directory is functioning properly and repeat enumeration of the zone. The extended error debug information (which may be empty) is "". The event data contains the error.)
Event ID 140 (The DNS server could not initialize the remote procedure call (RPC) service. If it is not running, start the RPC service or reboot the computer. The event data is the error code.) - This one after attempting to restart the DNS server service a few times
Our server has been running without modification for a few months, with the exception of the 2nd Tuesday of the month patches. To resolve this, we have removed the second IP address for the server, and changed the binding order of the network cards to put the active one at the top of the list. (Although, all of the cards are disabled but one.)
DC1 has a DNS server, and it's ip configuration points at itself, then at DC3. No DNS forwarding is configured. DC2 points at DC1; DC3 for DNS
We first noticed the issue, because some of our users with server based (AD Published) printers lost functionality. Other Symptoms we have noticed are: The Backup Exec Server cannot reach the remote agent on the domain controllers; from the impared machines: We can resolve FQDN's and computer names to IP addresses (e.g. we can ping). (May be through DC3, even though we put NSlookup to DC1??) but we cannot browse the network, nor get any web page loaded on IE, nor print, use live update (symantec AV)
DCDiag.exe reports about the same for DC1; DC2. (DC[ at ] does not give the errors about missing printer drivers) here is a report from DC1:
Domain Controller Diagnosis
Performing initial setup: Done gathering initial info.
Doing initial required tests
Testing server: Default-First-Site-Name\DC1SRVR Starting test: Connectivity ......................... DC1SRVR passed test Connectivity
Doing primary tests
Testing server: Default-First-Site-Name\DC1SRVR Starting test: Replications [Replications Check,DC1SRVR] A recent replication attempt failed: From DC2SVR to DC1SRVR Naming Context: CN=Schema,CN=Configuration,DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:48:53. The last success occurred at 2007-05-30 17:48:53. 21 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC3SRVR to DC1SRVR Naming Context: CN=Schema,CN=Configuration,DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:48:53. The last success occurred at 2007-05-30 17:48:53. 21 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC2SVR to DC1SRVR Naming Context: CN=Configuration,DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:48:53. The last success occurred at 2007-05-30 17:48:53. 24 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC3SRVR to DC1SRVR Naming Context: CN=Configuration,DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:57:01. The last success occurred at 2007-05-30 17:48:53. 50 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC2SRVR to DC1SRVR Naming Context: DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:48:53. The last success occurred at 2007-05-30 17:56:43. 480 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC3SRVR to DC1SRVR Naming Context: DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 15:31:09. The last success occurred at 2007-05-30 17:56:37. 2642 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC3SRVR to DC1SRVR Naming Context: DC=ForestDnsZones,DC=xyz,DC=org The replication generated an error (1256): The remote system is not available. For information about network oubleshooting, see Windows Help. The failure occurred at 2007-05-31 14:48:53. The last success occurred at 2007-05-30 17:48:53. 21 failures have occurred since the last success. [Replications Check,DC1SRVR] A recent replication attempt failed: From DC3SRVR to DC1SRVR Naming Context: DC=DomainDnsZones,DC=xyz,DC=org The replication generated an error (14): Not enough storage is available to complete this operation. The failure occurred at 2007-05-31 14:57:07. The last success occurred at 2007-05-30 17:48:53. 36 failures have occurred since the last success. REPLICATION-RECEIVED LATENCY WARNING DC1SRVR: Current time is 2007-05-31 15:31:21. CN=Schema,CN=Configuration,DC=xyz,DC=org Last replication recieved from DC2SRVR at 2007-05-30 17:48:53 Last replication recieved from DC3SRVR at 2007-05-30 17:48:53. CN=Configuration,DC=xyz,DC=org Last replication recieved from DC2SRVR at 2007-05-30 17:48:52 Last replication recieved from DC3SRVR at 2007-05-30 17:48:53. DC=xyz,DC=org Last replication recieved from DC2SRVR at 2007-05-30 17:56:43 Last replication recieved from DC3SRVR at 2007-05-30 17:56:37. DC=ForestDnsZones,DC=xyz,DC=org Last replication recieved from DC3SRVR at 2007-05-30 17:48:53. DC=DomainDnsZones,DC=xyz,DC=org Last replication recieved from DC3SRVR at 2007-05-30 17:48:53. ......................... DC1SRVR passed test Replications Starting test: NCSecDesc ......................... DC1SRVR passed test NCSecDesc Starting test: NetLogons ......................... DC1SRVR passed test NetLogons Starting test: Advertising ......................... DC1SRVR passed test Advertising Starting test: KnowsOfRoleHolders ......................... DC1SRVR passed test KnowsOfRoleHolders Starting test: RidManager ......................... DC1SRVR passed test RidManager Starting test: MachineAccount ......................... DC1SRVR passed test MachineAccount Starting test: Services ......................... DC1SRVR passed test Services Starting test: ObjectsReplicated ......................... DC1SRVR passed test ObjectsReplicated Starting test: frssysvol ......................... DC1SRVR passed test frssysvol Starting test: frsevent ......................... DC1SRVR passed test frsevent Starting test: kccevent ......................... DC1SRVR passed test kccevent Starting test: systemlog An Error Event occured. EventID: 0x00000457 Time Generated: 05/31/2007 14:37:40 Event String: Driver BB PostScript Creator required for printer An Error Event occured. EventID: 0x00000457 Time Generated: 05/31/2007 14:37:40 Event String: Driver Microsoft XPS Document Writer required for An Error Event occured. EventID: 0x00000457 Time Generated: 05/31/2007 14:37:42 Event String: Driver An Error Event occured. EventID: 0x00000457 Time Generated: 05/31/2007 14:37:42 Event String: Driver FinePrint pdfFactory required for printer An Error Event occured. EventID: 0x00000457 Time Generated: 05/31/2007 14:37:42 Event String: Driver HP Color LaserJet 5500 PCL 6 required for An Error Event occured. EventID: 0x00000423 Time Generated: 05/31/2007 14:53:15 (Event String could not be retrieved) An Error Event occured. EventID: 0x00000423 Time Generated: 05/31/2007 14:53:15 (Event String could not be retrieved) An Error Event occured. EventID: 0xC0001F60 Time Generated: 05/31/2007 15:03:40 Event String: The browser service has failed to retrieve the ......................... DC1SRVR failed test systemlog Starting test: VerifyReferences ......................... DC1SRVR passed test VerifyReferences
Running partition tests on : ForestDnsZones Starting test: CrossRefValidation ......................... ForestDnsZones passed test CrossRefValidati
Starting test: CheckSDRefDom ......................... ForestDnsZones passed test CheckSDRefDom
Running partition tests on : DomainDnsZones Starting test: CrossRefValidation ......................... DomainDnsZones passed test CrossRefValidati
Starting test: CheckSDRefDom ......................... DomainDnsZones passed test CheckSDRefDom
Running partition tests on : Schema Starting test: CrossRefValidation ......................... Schema passed test CrossRefValidation Starting test: CheckSDRefDom ......................... Schema passed test CheckSDRefDom
Running partition tests on : Configuration Starting test: CrossRefValidation ......................... Configuration passed test CrossRefValidatio Starting test: CheckSDRefDom ......................... Configuration passed test CheckSDRefDom
Running partition tests on : xyz Starting test: CrossRefValidation ......................... xyz passed test CrossRefValidation Starting test: CheckSDRefDom ......................... xyz passed test CheckSDRefDom
Running enterprise tests on : xyz.org Starting test: Intersite ......................... xyz.org passed test Intersite Starting test: FsmoCheck ......................... xyz.org passed test FsmoCheck
Unfortunately, searching for this combination of error messages does not reveal anyone post SP2 with this issue.
Any suggestions are appreciated
|
|
Note: Rebooting the Domain controllers fixes the issue for most symptoms (Printing, DNS MMC, internet, default exchange recipeint policy) for a day or two, although some things like mmc group policy plug-ins don't work (Big red X).
On May 31, 5:43 pm, "amcdi...[ at ]gmail.com" <amcdi...[ at ]gmail.com> wrote:
[Quoted Text] > Starting last Sunday (day before memorial day), we started getting a > slew of critical errors in our logs, as well as lost functionality on > our domain: > > We have three Domain Controllers at AD 2003 level. We also have two > Exchange 2003 boxes (front/back configuration), and a few other boxes. > All servers are Win2K3 SP2. > > Two of our three Domain Controllers (DC1 & DC2) are essentially > running crippled: the shares work, but nothing else does. The Third > DC (DC3) is functioning properly, possibly because it's on a separate > subnet - although replication is not functioning. > > When We look in the event log, there are similar errors on both DC1 & > DC2. > > In the Application Event log, there is: > > Event ID 1030 (Windows cannot query for the list of Group Policy > objects. Check the event log for possible messages previously logged > by the policy engine that describes the reason for this.) three times > at five minute intervals. > > Event ID 1053 (Windows cannot determine the user or computer name. > (Not enough storage is available to complete this operation. ). Group > Policy processing aborted. ) every five minutes to reboot: This server > also have more then 30GB of space on each partition. > > The directory service Event log has a few > Event ID 1308 warnings (The Knowledge Consistency Checker (KCC) has > detected that successive attempts to replicate with the following > domain controller has consistently failed. > > Attempts: > 2 > Domain controller: > CN=NTDS Settings,CN=***SRVR,CN=Servers,CN=Default-First-Site- > Name,CN=Sites,CN=Configuration,DC=***,DC=org > Period of time (minutes): > 124 > > The Connection object for this domain controller will be ignored, and > a new temporary connection will be established to ensure that > replication continues. Once replication with this domain controller > resumes, the temporary connection will be removed. > > Additional Data > Error value: > 14 Not enough storage is available to complete this operation.) > > The FRS Event log has a few 13508 and 13509 errors. > > The DNS log is where the real action is: > > Event ID 404: (The DNS server could not bind a Transmission Control > Protocol (TCP) socket to address 0.0.0.0. The event data is the error > code. An IP address of 0.0.0.0 can indicate a valid "any address" > configuration in which all configured IP addresses on the computer are > available for use. > Restart the DNS server or reboot the computer.) > > Event ID 408 (The DNS server could not open socket for address > 0.0.0.0. > Verify that this is a valid IP address for the server computer. If it > is NOT valid use the Interfaces dialog under Server Properties in the > DNS Manager to remove it from the list of IP interfaces. Then stop > and restart the DNS server. (If this was the only IP interface on this > machine and the DNS server may not have started as a result of this > error. In that case remove the DNS\Parmeters\ ListenAddress value in > the services section of the registry and restart.) > If this is a valid IP address for this machine, make sure that no > other application (e.g. another DNS server) is running that would > attempt to use the DNS port.) > > Event ID 4015 (The DNS server has encountered a critical error from > the Active Directory. Check that the Active Directory is functioning > properly. The extended error debug information (which may be empty) is > "". The event data contains the error.) > > Event ID 4004 (The DNS server was unable to complete directory service > enumeration of zone .. This DNS server is configured to use > information obtained from Active Directory for this zone and is unable > to load the zone without it. Check that the Active Directory is > functioning properly and repeat enumeration of the zone. The extended > error debug information (which may be empty) is "". The event data > contains the error.) > > Event ID 140 (The DNS server could not initialize the remote procedure > call (RPC) service. If it is not running, start the RPC service or > reboot the computer. The event data is the error code.) - This one > after attempting to restart the DNS server service a few times > > Our server has been running without modification for a few months, > with the exception of the 2nd Tuesday of the month patches. To > resolve this, we have removed the second IP address for the server, > and changed the binding order of the network cards to put the active > one at the top of the list. (Although, all of the cards are disabled > but one.) > > DC1 has a DNS server, and it's ip configuration points at itself, then > at DC3. No DNS forwarding is configured. > DC2 points at DC1; DC3 for DNS > > We first noticed the issue, because some of our users with server > based (AD Published) printers lost functionality. Other Symptoms we > have noticed are: The Backup Exec Server cannot reach the remote agent > on the domain controllers; from the impared machines: We can resolve > FQDN's and computer names to IP addresses (e.g. we can ping). (May be > through DC3, even though we put NSlookup to DC1??) but we cannot > browse the network, nor get any web page loaded on IE, nor print, use > live update (symantec AV) > > DCDiag.exe reports about the same for DC1; DC2. (DC[ at ] does not give the > errors about missing printer drivers) here is a report from DC1: > > Domain Controller Diagnosis > > Performing initial setup: > Done gathering initial info. > > Doing initial required tests > > Testing server: Default-First-Site-Name\DC1SRVR > Starting test: Connectivity > ......................... DC1SRVR passed test Connectivity > > Doing primary tests > > Testing server: Default-First-Site-Name\DC1SRVR > Starting test: Replications > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC2SVR to DC1SRVR > Naming Context: CN=Schema,CN=Configuration,DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:48:53. > The last success occurred at 2007-05-30 17:48:53. > 21 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC3SRVR to DC1SRVR > Naming Context: CN=Schema,CN=Configuration,DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:48:53. > The last success occurred at 2007-05-30 17:48:53. > 21 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC2SVR to DC1SRVR > Naming Context: CN=Configuration,DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:48:53. > The last success occurred at 2007-05-30 17:48:53. > 24 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC3SRVR to DC1SRVR > Naming Context: CN=Configuration,DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:57:01. > The last success occurred at 2007-05-30 17:48:53. > 50 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC2SRVR to DC1SRVR > Naming Context: DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:48:53. > The last success occurred at 2007-05-30 17:56:43. > 480 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC3SRVR to DC1SRVR > Naming Context: DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 15:31:09. > The last success occurred at 2007-05-30 17:56:37. > 2642 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC3SRVR to DC1SRVR > Naming Context: DC=ForestDnsZones,DC=xyz,DC=org > The replication generated an error (1256): > The remote system is not available. For information about > network > oubleshooting, see Windows Help. > The failure occurred at 2007-05-31 14:48:53. > The last success occurred at 2007-05-30 17:48:53. > 21 failures have occurred since the last success. > [Replications Check,DC1SRVR] A recent replication attempt > failed: > From DC3SRVR to DC1SRVR > Naming Context: DC=DomainDnsZones,DC=xyz,DC=org > The replication generated an error (14): > Not enough storage is available to complete this > operation. > The failure occurred at 2007-05-31 14:57:07. > The last success occurred at 2007-05-30 17:48:53. > 36 failures have occurred since the last success. > REPLICATION-RECEIVED LATENCY WARNING > DC1SRVR: Current time is 2007-05-31 15:31:21. > CN=Schema,CN=Configuration,DC=xyz,DC=org > Last replication recieved from DC2SRVR at 2007-05-30 > 17:48:53 > Last replication recieved from DC3SRVR at 2007-05-30 > 17:48:53. > CN=Configuration,DC=xyz,DC=org > Last replication recieved from DC2SRVR at 2007-05-30 > 17:48:52 > Last replication recieved from DC3SRVR at 2007-05-30 > 17:48:53. > DC=xyz,DC=org > Last replication recieved from DC2SRVR at 2007-05-30 > 17:56:43 > Last replication recieved from DC3SRVR at 2007-05-30 > 17:56:37. > DC=ForestDnsZones,DC=xyz,DC=org > Last replication recieved from DC3SRVR at 2007-05-30 > 17:48:53. > DC=DomainDnsZones,DC=xyz,DC=org > Last replication recieved from DC3SRVR at 2007-05-30 > 17:48:53. > ......................... DC1SRVR passed test Replications > Starting test: NCSecDesc > ......................... DC1SRVR passed test NCSecDesc > Starting test: NetLogons > ......................... DC1SRVR passed test NetLogons > Starting test: Advertising > ......................... DC1SRVR passed test Advertising > Starting test: KnowsOfRoleHolders > ......................... DC1SRVR passed test > KnowsOfRoleHolders > Starting test: RidManager > ......................... DC1SRVR passed test RidManager > Starting test: MachineAccount > ......................... DC1SRVR passed test MachineAccount > Starting test: Services > ......................... DC1SRVR passed test Services > Starting test: ObjectsReplicated > ......................... DC1SRVR passed test > ObjectsReplicated > Starting test: frssysvol > ......................... DC1SRVR passed test frssysvol > Starting test: frsevent > ......................... DC1SRVR passed test frsevent > Starting test: kccevent > ......................... DC1SRVR passed test kccevent > Starting test: systemlog > An Error Event occured. EventID: 0x00000457 > Time Generated: 05/31/2007 14:37:40 > Event String: Driver BB PostScript Creator required for > printer > An Error Event occured. EventID: 0x00000457 > Time Generated: 05/31/2007 14:37:40 > Event String: Driver Microsoft XPS Document Writer > required for > An Error Event occured. EventID: 0x00000457 > Time Generated: 05/31/2007 14:37:42 > Event String: Driver > An Error Event occured. EventID: 0x00000457 > Time Generated: 05/31/2007 14:37:42 > Event String: Driver FinePrint pdfFactory required for > printer > An Error Event occured. EventID: 0x00000457 > Time Generated: 05/31/2007 14:37:42 > Event String: Driver HP Color LaserJet 5500 PCL 6 required > for > An Error Event occured. EventID: 0x00000423 > Time Generated: 05/31/2007 14:53:15 > (Event String could not be retrieved) > An Error Event occured. EventID: 0x00000423 > Time Generated: 05/31/2007 14:53:15 > (Event String could not be retrieved) > An Error Event occured. EventID: 0xC0001F60 > Time Generated: 05/31/2007 15:03:40 > Event String: The browser service has failed to retrieve > the > ......................... DC1SRVR failed test systemlog > Starting test: VerifyReferences > ......................... DC1SRVR passed test > VerifyReferences > > Running partition tests on : ForestDnsZones > Starting test: CrossRefValidation > ......................... ForestDnsZones passed test > CrossRefValidati > > Starting test: CheckSDRefDom > ......................... ForestDnsZones passed test > CheckSDRefDom > > Running partition tests on : DomainDnsZones > Starting test: CrossRefValidation > ......................... DomainDnsZones passed test > CrossRefValidati > > Starting test: CheckSDRefDom > ......................... DomainDnsZones passed test > CheckSDRefDom > > Running partition tests on : Schema > Starting test: CrossRefValidation > ......................... Schema passed test > CrossRefValidation > Starting test: CheckSDRefDom > ......................... Schema passed test CheckSDRefDom > > Running partition tests on : Configuration > Starting test: CrossRefValidation > ......................... Configuration passed test > CrossRefValidatio > Starting test: CheckSDRefDom > ......................... Configuration passed test > CheckSDRefDom > > Running partition tests on : xyz > Starting test: CrossRefValidation > ......................... xyz passed test CrossRefValidation > Starting test: CheckSDRefDom > ......................... xyz passed test CheckSDRefDom > > Running enterprise tests on : xyz.org > Starting test: Intersite > ......................... xyz.org passed test Intersite > Starting test: FsmoCheck > ......................... xyz.org passed test FsmoCheck > > Unfortunately, searching for this combination of error messages does > not reveal anyone post SP2 with this issue. > > Any suggestions are appreciated
|
|
|