Hard Disk Corruption

July 21, 2010 at 11:24:27
Specs: Windows Server 2003/Arch Linux
Hello all,

I've been having problems with some type of hardware issue that appears to manifest itself as hard disk corruption. This has been happening sporadically for the past few months but has recently begun to occur more frequently.

About once in every three times that I turn on my computer Windows will report that it cannot boot because a file is "missing or corrupt". That file is either the kernel (ntoskrnl.exe), ntfs.sys or the SYSTEM hive. I simply restart the computer and it will usually boot normally the next time. Up until a few weeks ago, once the computer had booted into Windows, everything functioned perfectly. Recently I have had error messages starting to appear about services being unable to start and certain applications have been crashing very often. Linux has generally been more stable than Windows but recently has also begun to kernel panic at bootup with errors in the libata module.

This all points to hard disk issues, so I have done the following: I have run Western Digitals data lifeguard tool from a bootable CD and run the extended test which has also not found any bad sectors or other problems with the hard disk. All SMART tools I have tried report that the hard disk is operating properly. Specifically for Windows I have run chkdsk a number of times, never finding any corruption. I have also run fixboot/fixmbr from the installation disc to no avail. I have also defragmented the Windows partition a number of times.

As far as hardware, there is no special setup here (ie RAID). It is simply a single SATA hard drive plugged directly into the motherboard. I have tried different SATA ports on the motherboard to no effect. Jumpers are set properly. I have updated to the latest BIOS available for my motherboard as well.

This simply has me stumped! What could cause these problems during boot that nothing is able to detect? Of course, I am willing to do a reformat but I would like to hear some other suggestions as well, since I believe that the issue is likely to return. The presence of problems in Linux as well as in Windows suggests that it is unlikely that anything (such as a virus) has actually overwritten data on the hard drive. The fact that the system is capable of operating properly (in most cases) once it boots up also suggests that data isn't actually corrupt. I'd like to be able to take care of any potential hardware issues before the warranty in question expires.

System Specs:
Motherboard: Gigabyte GA-EP45T-UD3P
CPU: Intel Core 2 Quad Q9400 2.66GHz
Memory: 4GB DDR3
HDD: Western Digital 640GB SATA (WD6401AALS)

No overclocking/modification of any kind has been done.

Thanks in advance.

See More: Hard Disk Corruption

Report •

July 21, 2010 at 11:43:32
Could be a memory problems. Have you tested your RAM?

A single obscure error in memory an case corrupted data to be written to the hard disk.

Run Memtest 86 overnight and see if it comes up with any errors.

It could also be a PSU problem if the PSU is operating on the ragged edge of it's capabilities.


Report •

July 21, 2010 at 11:52:28
Thanks for the reply. I've updated system specs with the PSU. I think my 550W Corsair should be able to handle this system without a hitch. I will try memtest86+ today and report back.

Report •

July 21, 2010 at 12:34:45
Try a few of the memory tests also. Microsoft's test seems to be OK found on Vista and W7 dvd's. Might get an ultimate boot cd and run the hard drive diag's too or get the OEM diags.

I support the 'Everybody Draw Mohammed Day'. A religion doesn't deny my freedom.

Report •

Related Solutions

July 21, 2010 at 12:55:01
Check your SATA data cables. The connector on each end should "latch" into the socket on the drive and on the mboard, or on the drive controller card - it should not move when you merely brush your hand against it near the socket - if it does, mere vibration can cause a poor connection of it - use another SATA data cable that does "latch", or tape the connector in place.
(There is a slight projection or bump on one side of the outside of the connector that "latches" it into the socket - it's easily broken off or damaged)

The same thing applies for the SATA power connection.

Ram that has worked fine previously can develop a poor connection in the ram slot(s).

If you get ram errors when you run memtest86, remove the ram, wipe off it's contacts with a tissue or soft cloth, don't touch the contacts with your fingers after than, install the ram, make sure it's notch lines up with the bump in the bottom of the slot, make sure it's all the way down in it's slots and that the latches are against the ends of the module(s), run the long version of the memtest86 tests again.

Some ram manufacturer's modules do not strictly adhere to the JEDEC standards that most mboards bioses use to determine ram settings.
In that case, the ram settings in the bios Setup that the bios has automatically chosen may not be correct.
Check the ram settings in your bios - the ram voltage, and the ram timing numbers - those should be the same as for the specs for the modules themselves. Often the ram voltage and timing numbers are printed on the label on the modules.

If the voltage setting or timings settings in the bios are different from the specs for the ram, change them in the bios. The timing numbers must be as close as you can get to the same, or slower timings (higher numbers = slower) - you won't notice the difference the slower settings make.

If you have a mix of different modules
- don't mix ram that different voltages are specified for - the bios will force the ram to use the lowest voltage, if "by spd" or similar is used (default settings) - ram that a higher voltage is specified for is more likely to not work properly in that situation.
- the bios settings must be those for the slowest timing settings of all the modules, or slower (higher numbers = slower).

Report •

July 21, 2010 at 19:10:40
After a few hours of memtest86+, no errors have been found.

I did not make any changes to the default motherboard settings for RAM since I specifically did not want to overclock it and I assumed that the default settings would be "safe".

This is the memory I have: http://www.newegg.com/Product/Produ...

The memory tab of CPU-Z reports the following configuration from within Windows:

Operating in symmetric, dual-channel mode.
DRAM Frequency: 666.7MHz
CAS Latency: 9
RAS-to-CAS: 9
RAS Precharge: 9
Cycle time: 24

The specifications tab on Newegg lists 8-8-8-21 so these settings should be safe, no?

XMP is also mentioned in the SPD tab but I'm not sure what that is.

I opened the case and re-connected the SATA cables. There was an audible click so I am certain that a firm connection has been made.

More ideas?

Report •

July 21, 2010 at 20:24:51
Ah !
G Skil ram. We've sometimes heard of problems with G Skil ram.
G Skil is one of what I call the "also-ran" module makers.
The Mushkin web site says some such module makers rate their ram specs when one module is by itself in a ram slot in a mboard, and they may not actually try their ram in all mboard models they list their modules for. The one module may work fine when by itself, but when more than one of their modules are installed, the timing numbers for one module may not stand up to real world conditions for more than one.
You could try
- tweaking the timing numbers in the bios so they're a bit higher (slower).
- if that doesn't help, up the ram voltage a bit in the bios, no more than .1 volt, maybe no more than half that for DDR3 ram.
(Kingston tells you the max voltage range that can be used for many of their modules in their spec sheets, but I have found no such thing for the G Skil modules. Some mboards actually produce ram voltages that are a tiny bit less than what the bios says the ram voltage is.) Don't up the ram voltage if you'd rather not do that.

G Skil has a ram configurator with which you can look up which of their modules are listed for your mboard model, but not near as many mboard models are listed in comparison to many other web sites where you can look up which ram works - is compatible. If the lists are missing your mboard model, you wonder whether they have actually tested the ram in your particular model.

"Model F3-10666CL8D-4GBHK"

"Gigabyte GA-EP45T-UD3P"

In this case, G-Skil does not list your exact model, but it does list GA-EP45T-UD3T and for that it lists F3-10666CL8D-4GBHK
It uses 1.5v

Report •

July 21, 2010 at 20:37:37
Sorry, you have me a bit confused. CPU-Z lists the JEDEC tables and 9-9-9-24 @ 666MHz is an option. Are you saying that it may not be valid because I have two modules installed? I wasn't able to find the GA-EP45T-UD3P on G-Skill's website but that doesn't mean that it won't work either. Or am I missing something?

But in either case, since memtest86+ doesn't find any problems and since the system seems to have run smoothly (when it booted up) can't we conclude that memory is not the likely culprit?

Report •

July 21, 2010 at 22:34:35
CPU-Z lists the info found on the tiny SPD (Serial Presence Detect) chip on one end near the top of each module that was entered by the module's manufacturer, which may or may not conform to standard JEDEC specs. That info is up to the manufacturer to specify. Most mboard bioses DO conform to standard JEDEC specs. The bios will use the least (or slowest) specs of the SPD info on all the modules by default - if the modules are identical, the settings the bios uses will usually be the same as the specs as the modules are rated for on the SPD chip by default, but not always.

The issue with the "also ran" module manufacturers is they sometimes rate the timings faster than more than one of their modules will support reliably. When you run a memory test, the modules may not be subjected to tests that produce errors, but they might produce errors under certain conditions, casing tiny amounts of random data damage, such as while playing a recent game or when using some other program that stresses the computer a lot more . If you custom raise the timing the numbers of the ram (higher = slower) a bit, you may find that your problem has gone away.

The issue with the ram voltage is some mboards, in the real world, produce a ram voltage that is a bit lower than stated in the bios, and generally, a bit more voltage is frequently used to make the ram more stable when the mboard is overclocked, so it wouldn't hurt to try that to improve stability with a non-overclocked system, as long as you don't exceed the voltage range the ram can use without damaging it.

"I wasn't able to find the GA-EP45T-UD3P on G-Skill's website but that doesn't mean that it won't work either. Or am I missing something?"

As I pointed out above, there is a listing for a very similar model - the only difference is the letter at the end. Very similar models often have the same main chipset and can use the same ram.
On the other hand, when there are no listings for your mboard model or even similar models that use the same main chipset, there's no guarantee the ram will work properly with your mboard model - usually it will, but it might not. For that reason it's not a good idea to buy ram or use ram you have lying around without making sure it's listed for your mboard model somewhere. Sure, you could use memtest86 or similar to test all ram you install, but it's a pain in the axx to have to send back or take back ram that isn't 100% compatible with using it in your mboard.
In the worst cases of incompatible ram, the mboard WILL NOT BOOT all the way with the module(s) installed, and often the mboard will not beep either. The mboard appears to be dead or damaged.
It's much more common for incompatible ram to produce ram errors, especially when more than one module is installed.

Report •

July 22, 2010 at 11:55:18
I see. What are valid timing values to use? Can I simply increase each number by 1 (to 10-10-10-25) or is there a need to take more care?

I agree that memtest86+ is not always perfect. Just a few days ago I had a computer which was constantly crashing for which memtest86+ was not able to find any problems but which magically fixed itself when the RAM was replaced.

However, this computer is generally stable once it boots. Isn't the problem more likely to be with some sort of SATA settings in the BIOS? An addendum to my problem is that often times the system will not boot when there is a USB hard drive plugged in. I don't mean that I get the corruption error, it simply stops on the POST screen and the power has to be pulled.

Report •

July 22, 2010 at 12:09:15
It's not a memory issue (99% sure of this), it sounds like a chipset or CPU defect or problem. Check your windows system log file.

Have seen this time and time again, it's always been either the CPU or chipset on the board going flaky.

PowerMac 9600(1 ghz G4)
512mb RAM
50gb SCSI
ATi 9200 PCI

Report •

July 22, 2010 at 12:39:09
hard drive diags.

I support the 'Everybody Draw Mohammed Day'. A religion doesn't deny my freedom.

Report •

July 22, 2010 at 12:46:00
Outlander, the "System" log under event viewer has some errors but they mostly pertain to certain system services, .NET, bad blocks on CD's, etc. There is nothing in there to indicate hardware failure. Is there some other way to check this?

jefro, as I wrote above, I've run MANY hard drive diagnostics (WD Data Lifeguard, SMART tools, chkdsk, etc.). Do you have another suggestion?

Report •

July 22, 2010 at 14:53:05
"What are valid timing values to use?"

Anything a bit more than G Skil specifies.

" Can I simply increase each number by 1 (to 10-10-10-25) or is there a need to take more care?"

Sure, try that.

Increasing the timing numbers can't do any harm, and it's extremely hard to tell the difference the higher (slower) settings make.
E.g. The first timing number is the CAS rating - that only comes into effect when the ram is first accessed, not while it continues to be accessed.

Decreasing the timing numbers to Lower (lower = faster) than the manufacturer specifies can certainly cause ram errors.

I usually use the Microsoft memory diagnostics.
There was one oddball PC133 256mb? Kingston module I had ( Kingston doesn't list it ) that tested fine when by itself, and tested fine when other modules were installed along with it, yet when it was installed along with other modules, the combo produced all sorts of errors in Windows - the other modules had no problem in Windows when that one oddball module was not installed.
I also had one Infineon DDR400 ? 256mb module that tested fine when by itself, but memory errors were produced when it was installed with any of the other various DDR modules I had on hand - all of them use standard ram voltages - yet the other modules tested fine when that Infineon module was not installed. I finally got rid of it last week when I found it worked fine in combo with a 500mb module on a computer I was working on.

It should not matter at all what SATA settings you use in the bios - that should have no effect on whether you get errors in Windows.

"An addendum to my problem is that often times the system will not boot when there is a USB hard drive plugged in."

Strange things can happen if the external hard drive can't get enough current from the USB port it's plugged into, etc.

Troubleshooting USB device problems including for flash drives, external drives, memory cards.
See Response 1:

Check that out first.

Rarely, not all the ports on the back of a desktop case may be able to supply 500ma each.

If you have a desktop computer, Note that I answered a Topic on this site recently where a guy had an external drive, which does require the full 500ma, connected to a port on the back of a desktop case - it would not work properly when a webcam was in the port next to it, but it worked fine when the webcam was unplugged. Ports on the back of a desktop case often have two ports connected to the same USB controller module that are ports one above the other - you could try connecting the cable to one of those and leaving the other un-used.

In two recent posts I've answered, the problem was with the USB cable they used between the external drive and the computer - it was indequate. For one of the two, he was trying to boot from aUSB optical drive and an operating system CD or DVD and the bios would stall forever.

By the way, you can't boot an existing Windows 2000 and up operating system installation from an external hard drive because of Microsoft defaults - but you do get an error message.

Report •

Ask Question