Today we encountered a hard drive failure in the raid array. A few hours later , We had another drive failure on another server…. We immediately started to check all environmental logs and systems. Humidity was 40%, temperature was 75*, and no dust or other particles were flying around. We checked the UPS Log, no spikes are reported. About 3 hours later, another hard drive failed on the third system…
To review 3 HP DL380 G7, these servers are all sequential serial numbers. Although I Bet the array controller and the motherboard are, but the drives are not from the same batch. HP will be out in the morning… At the same time, we hope this will not become a habit… We have 1 in the entire server rack in 2.5 years A drive has failed. 12 hours within 3 hours today!
What else should we look for? Has anyone else had a similar problem?
Any help is greatly appreciated. This incident consumes our spare parts…. If we fail again, we will look for HP to exchange them.
Update: These are 146 GB 10k rpm SAS drive and a 300 GB 10k rpm SAS drive. Original HP.
You did the right thing by checking the environmental ESD, temperature and power issues.
As a ProLiant DL380 G7 device, the array controller is embedded on the system board. The batch number there is not too tight. I don’t think it’s a coincidence. However, this may be a good time for some firmware updates, because wrong drive failures sometimes appear bad Revised symptoms.
Now that you have the support, let HP handle the part/replacement and move on:)
BTW – Detailed description of the drive capacity and type involved (SAS, SATA, Nearline SAS) would be very helpful
In the IT world, I just won the lottery twice….
Today we are in raid A hard drive failure was encountered in the array. A few hours later, we had another drive failure on another server… We immediately started to check all environmental logs and the system. The humidity was 40%, the temperature was 75*, and there was no dust or Other particles flew around. We checked the UPS logs and no spikes were reported. About 3 hours later, another hard drive failed on the third system….
To review 3 HP DL380 G7 , These servers are all sequential serial numbers. Although I bet the array controller and motherboard are, but the drives are not from the same batch. HP will be out in the morning… At the same time, we hope this will not become a habit… We 1 drive failed in the entire server rack in 2.5 years. 12 hours in 3 hours today!
What else should we look for? Has anyone else had a similar problem?
Any help is greatly appreciated. This incident consumes our spare parts…. If we fail again, we will look for HP to exchange them.
Update: These are 146 GB 10k rpm SAS drive and a 300 GB 10k rpm SAS drive. Original HP.
These things happened… the same scale you see to me The device is surprised.
You did the right thing by checking the environmental ESD, temperature and power issues.
As a ProLiant DL380 G7 device, the array controller is embedded in the system On the board. The batch number there is not too tight. I don’t think it’s a coincidence. However, this may be a good time for some firmware updates, because wrong drive failures sometimes show symptoms of bad revisions.
Since You have support, let HP handle the parts/replacement and move on:)
BTW-It would be helpful to detail the drive capacity and type (SAS, SATA, Nearline SAS) involved
< /p>