Google Doubts Hard Drives Fail Because Of Excessive Temperature, Usage
Feb 17, 2007
Mountain View (CA) - As a company with one of the world's largest IT infrastructures, Google has an opportunity to do more than just search the Internet. From time to time, the company publishes the results of internal research. The most recent project one is sure to spark interest in exploring how and under what circumstances hard drives work - or not.
There is a rule of thumb for replacing hard drives, which taught customers to move data from one drive to another at least every five years. But especially the mechanical nature of hard drives makes these mass storage devices prone to error and some drives may fail and die long before that five-year-mark is reached. Traditionally, extreme environmental conditions are cited as the main reasons for hard drive failure, extreme temperatures and excessive activity being the most prominent ones.
A Google study presented at the currently held Conference on File and Storage Technologies questions these traditional failure explanations and concludes that there are many more factors impacting the life expectancy of a hard drive and that failure predictions are much more complex than previously thought. What makes this study interesting is the fact that Google's server infrastructure is estimated to exceed a number of 450,000 fairly mainstream systems that, in a large number, use consumer-grade devices with capacities ranging from 80 to 400 GB in capacity. According to the company, the project covered "more than 100,000" drives that were put into production in or after 2001. The drives ran at a platter rotation speed of 5400 and 7200 rpm, came from "many of the largest disk drive manufacturers and from at least nine different models."
Google said that it is collecting "vital information" about all of its systems every few minutes and stores the data for further analysis. For example, this information includes environmental factors (such as temperatures), activity levels and SMART parameters (Self-Monitoring Analysis and Reporting Technology) that are commonly considered to be good indicators to describe the health of disk drives.
In general, Google's hard drive population saw a failure rate that was increasing with the age of the drive. Within the group of hard drives up to one year old, 1.7% of the devices had to be replaced due to failure. The rate jumps to 8% in year 2 and 8.6% in year 3. The failure rate levels out thereafter, but Google believes that the reliability of drives older than 4 years is influenced more by "the particular models in that vintage than by disk drive aging effects."
Breaking out different levels of utilization, the Google study shows an interesting result. Only drives with an age of six months or younger show a decidedly higher probability of failure when put into a high activity environment. Once the drive survives its first months, the probability of failure due to high usage decreases in year 1, 2, 3 and 4 - and increases significantly in year 5. Google's temperature research found an equally surprising result: "Failures do not increase when the average temperature increases. In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates. Only at very high temperatures is there a slight reversal of this trend," the authors of the study found.
In contrast the company discovered that certain SMART parameters apparently do have an effect drive failures. For example, drives typically scan the disk surface in the background and report errors as they discover them. Significant scan errors can hint to surface errors and Google reports that fewer than 2% of its drives show scan errors. However, drives with scan errors turned out to be ten times more likely to fail than drives without scan errors. About 70% of Google's drives with scan errors survived the first eight months after the first scan error was reported.
Similarly, reallocation counts, a number that results from the remapping of faulty sectors to a new physical sector, can have a dramatic impact on a hard drive's life: Google said that drives with one or more reallocations fail more often than those with none. The observed average impact on the average fail rate came in at a factor of 3-6, while about 85% of the drives survive past eight months after the first reallocation.
Google discovered similar effects on hard drives in other SMART categories, but them bottom line revealed that 56% of all failed drives had no count in either one of these categories - which means that more than half of all failed drives were put out of operation by factors other than scan errors, reallocation count, offline reallocation and probational counts.
In the end, Google's research does not solve the problem of predicting when hard drives are likely to fail. However, it shows that temperature and high usage alone are not responsible for failures by default. Also, the researcher pointed towards a trend they call "infant mortality phase" - a time frame early in a hard drive's life that shows increased probabilities of failure under certain circumstances. The report lacks a clear cut conclusion, but the authors indicate that there is no promising approach at this time than can predict failures of hard drives: "Powerful predictive models need to make use of signals beyond those provided by SMART."
View 6 Replies
ADVERTISEMENT
Oct 9, 2009
Resource: Virtual Memory Size
Exceeded: 149 > 100 (MB)
Executable: /usr/bin/php
I've been receiving e-mails about this. May I know how to fix this?
View 6 Replies
View Related
Apr 23, 2009
I'm building a couple of VPS host servers for a client.
Each server have to host 20 VPS and each server will be 4 cores with 32GB of ram. So CPU and ram should be just fine, my interrogatioon now is hard drives. The company owns the machines, but not the drives yet.
I searched a lot on your forums but found nothing relating on VPS. I'm basicly a DBA IRL, so I have experience in hardrives when it comes to databases, but it's completely different for VPS.
According to my boss, each VPS will run a LAMP solution (having a separeted DB cluster is out of question for some reason).
First, raid1 is indeed a must. There is room for 2x 3.5 drives. I might be able to change the backplane for 4x2.5, but i'm not sure...
I've came to several solutions:
2x SATA 7.2k => comes to about 140$
2x SATA 10k (velociraptor) => comes to about 500$
2x SAS 10k with PCIe controller => comes to about 850$
2x SAS 15k with PCIe controller=> comes to about 1000$
They need at least 300GB storage.
But my problem is that the servers do not have SAS onboard so I need a controller and in my case the cheapest solution is best.
But I'm not sure that SATA 7.2k will hold the charge of 20 complete VPS.
Does it worth it to go with SAS anyway or SATA should be just fine? With SATA better use plain old sata 7.2k or 10k drives?
That's a lot of text for not much: What is best for VPS: SATA 7.2k, SATA 10k or SAS 10k?
View 14 Replies
View Related
Mar 25, 2007
I am about to buy a Compaq server with 6 SCSI hard drives. In you opinion, what is the best RAID configuration with 6 HDs?
View 14 Replies
View Related
Nov 20, 2008
Excessive resource usage: dbus (2015)
I get below alarm from lfd
Quote:
Time: Sun Sep 28 12:16:06 2008 +0200
Account: dbus
Resource: Process Time
Exceeded: 134303 > 1800 (seconds)
Executable: /bin/dbus-daemon
The file system shows that this executable file that the process is running has been deleted. This typically happens if the original file has been replaced by a new file when the application is updated. To prevent this being reported again, restart the process that runs this excecutable file.
Command Line: dbus-daemon --system
PID: 2015
Killed: No
How can I find which process runs this excecutable file ?
View 1 Replies
View Related
Jan 7, 2008
Do the old RLX Blade servers use 'mini' hard drives? I can't find an answer anywhere. I seem to recall that they use smaller 2.5" drives. Is this the case?
And, if so, do they make "good" drives worthy of being in a server in that size? Are they essentially just a laptop drive?
View 0 Replies
View Related
Jan 12, 2008
On my centos webserver I currently am using a 250gb ide drive.
I just bought 2 Western Digital Raptor WD1500ADFD 150GB 10,000 RPM 16MB hard drives.
And now I am wondering what kind of setup I should have?
Should I have the 250gb hd as a backup drive now and have the two raptors in a raid 0 array.
What would be the best configuration?
View 10 Replies
View Related
Jul 3, 2007
I am in a little bit of trouble I got a couple (5) of 750GB hdds that I need backed up to another couple (5?) of 750GB hdds so I can save the data storage on them. They are in a Linux box with a LVM setup I also have a RAID ware card on it but not using any RAID # on them. I decided after finding out what I could do with it to go to Windows 2003 on the server and installing RAID5/6 on it.
It seems that I will have to give up all my data and have everything wiped off from the hard drives this is very sad for me but I still have a chance to save the data on them. So I am thinking of copying them to another bunch of hard drives and then re-add it once the system is in place.
I was looking at this
[url]
But thats clearly too expensive as I just need to back up 5 hard drives (750GB/each) and just need to do it one time. Anyone have any suggestions to this or how should I go about doing it. It doesnt have to be right away but its good to know my options.
Is there any place where they might to do this kind of stuff they let you rent their machine for a couple of hours for a fee so you can back up your data? The server is a COLO and the hardware is mine so I have every right to take it off and back it up with no problem from the datacenter.
View 10 Replies
View Related
Sep 11, 2007
am getting new server with 2 (73GB) hard drives i need to know the following:
1.I need to put /home in one hard drive 73GB and the other partitions like /boot, /tmp,/usr and /var on the other drive
where should i put /home? on the primary or or secondary drive?is there any effect on the speed?
2. Am used to servers with 1 drive. is there any difference when it comes to security aplications such as APF,BFD,mod security and other aplicatuions settings?
3. in general should i take the same actions when handling a server with 1 drive and server with 2 drives?
View 5 Replies
View Related
May 31, 2007
i just got another HDD on my server, but i want to use 1 domain and use both hard drives for the 1 domain.
How do i set it up so that both hdd's can be used by the 1 domain..im using WHM but cant seem to do it.
View 13 Replies
View Related
Mar 28, 2008
i feel the NAS are not very cheap,
it looks likely just a low-level server with many hhds,
but why people not buy a server and put many hhds on it?
View 4 Replies
View Related
Jan 16, 2008
running new servers on 500GB+ hard drives, how are these drives performing when they become 50% full?
Can they properly be 50% or more utilized on a cPanel like server with 200+ acccounts?
View 0 Replies
View Related
May 14, 2007
I setup a Software RAID5 the following way:
/dev/sda:
1: /boot 101MB
2: software raid ALL
/dev/sdb
1: software raid ALL
/dev/sdc
1: software raid ALL
/dev/sdd
1: software raid ALL
/dev/md0: ext3 mounted as / for all of the software RAID partitions.
I was left to believe this would create redundancy as long as only one drive is removed from the array. Although when I unplug any of the hard drives (one at a time) I get input/output errors and when I try to reboot I get kernel sync errors.
What exactly am I doing wrong when trying to create redundancy? I know that SDA contains the /boot/ partition so it wouldn't boot without that but even if I unplug B,C, and D it still can't sync.
View 14 Replies
View Related
May 2, 2009
I know SCSI drives are better than SATA, but I wonder if the community will prefer SATA or SCSI especially when you will be paying more for less.
Heres an example.
You can get:
2GB SATA for $5 & 500MB SCSI for $5, which will you choose?
View 14 Replies
View Related
Dec 20, 2007
I want to try something different on our methods of replacing or upgrading hard drives.
I want to be able to do most of it via our KVM/IP instead of babysitting the server(s) for so long in the DC.
My thoughts are, how can I add the new hard drive in the DC, and move the data over via the KVM/IP. Can this be done with just a raw drive added (no new setup) using DD or even rsync, or is it better to setup a new installation of CentOS on the new drive, and use rsync to move the data over. Then how do I get the proper drive to boot until I go back into the DC to remove the bad or old drive? I'd be interested in how some of you folks are doing this, as far as what's easiest and could be done over the KVM/IP once the new drive is connected.
Or on systems that have 2 drives with cPanel/WHM, how can we temporarily on an emergency basis untilize the backup drive to do a new setup, copy the data over from the drive that is failing, then just replace the bad drive as a backup drive next time you go in the DC? We have an external USB CD in place to allow remote installs...just curious if anyone does something like this or has ideas how we could make this work.
We use cloning software now, but can end up babysitting a clone for a long period in the DC like this.
View 3 Replies
View Related
Mar 19, 2007
Suppose I have only two phisical hard drives. What is the most optimal way to distribute the following (a windows server):
OS/IIS
Web pages and scripts
SQL server
SQL database
Should it be :
HD1 : OS/IIS + Web pages/scripts + SQL server
HD2: SQL database
or other setups?
View 4 Replies
View Related
Mar 25, 2008
OP: Linux Centos
I just got an additional 500GB hard drive added and mounted it to /home2
There are files that are in /home1 (orginal HD) that will need to be constantly moved over to /home2 via a ftp
But i keep getting this error
550 Rename/move failure: Invalid cross-device link
Does anyone have any ideas? I tried changing permissions but no luck also tried mounting the 2nd hard drive within a directory in /home1. Still gives the error.
View 5 Replies
View Related
Apr 3, 2008
When I check hard disk usage I get the following:
Quote:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.7G 8.4G 905M 91% /
/dev/sda5 215G 415M 204G 1% /var
/dev/sda1 99M 16M 78M 17% /boot
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sdb1 227G 188M 215G 1% /disk1
sda2 is 91% used.. I am not sure what is it filled with. How can I know what is exactly on that partition?
View 6 Replies
View Related
Mar 7, 2007
Quote:
Today we are going to conduct a detailed study of RAIDability of contemporary 400GB hard drives on a new level. We will take two "professional" drives from Seagate and Western Digital and four ordinary "desktop" drives for our investigation. The detailed performance analysis and some useful hints on building RAID arrays are in our new detailed article.
[url]
View 0 Replies
View Related
Nov 5, 2009
i want to know if there is a difference between enterprise drives and desktop drives and which ones hosts use
View 7 Replies
View Related
Oct 30, 2008
I am going to implement my first SSL certificate hence I have so many doubts about it.I am on shared host, I created Signing request and got my SSL verified and issued. However, in my WHM there is no option to install it. I asked host and they said that they will install for us asked me for Cert and KEY ..
Thats ok but they said that I need to have dedicated IP assigned to the domain for which I Need SSL certificate..
so guys, it is not possible to install SSL on shared IP but bound to the domain I want ?
Please give your inputs and I will be happy if anyone want to share or tell me about common misconception about SSL certificate which a noob can have.. (based on your personal experience).
View 9 Replies
View Related
Apr 11, 2007
I´ve ordered a RAID 1 server 2x500GB SATA. With the 3ware raid card. I have cpanel installed.
This is the first time I´m using raid configs, so we are having some doubts:
1. How can I check if raid is running on the server? How can I be sure that raid system is correctly configured?
2. I´m only seeing one HDD, is this correct?
View 4 Replies
View Related
Oct 1, 2008
Is there a command I can issue using Putty, and logged in as root or admin, to see what temperature my server cpu is running at.
View 8 Replies
View Related
Jun 27, 2009
I hope some of you are using Google Apps and can help me to find an answer to the following question:
I own two different and independent domain names (e.g. domain1.com and domain2.com).
I'd like to use the Google Apps (Standard, free edition) with them to create two different and totally independent mailboxes (e.g. abc@domain1.com and xyz@domain2.com).
But how many Google accounts I need to do this? Can I manage two (or more) independent and fully functional domains using one Google account?
P.S.
Help section contains descriptions of aliases for multiple domains, which are just pointers or shortcuts, but not a fully functional mailboxes, so this solution isn't something I'm looking for.
View 7 Replies
View Related
Sep 22, 2008
if there is a command to check the CPU temperature. Is the following the right way?
cat /proc/acpi/thermal_zone/THRM/temperature always gives 30 C.
I recently got a Intel Quad Core with 8 GB RAM. When the load is nearing 1.00, the kernel flashes the message below. It is always CPU1 and CPU2 while CPU3 and CPU0 is reported to be normal.
====================================================
Sep 22 00:07:47 server2 kernel: CPU2: Temperature above threshold, cpu clock throttled
Sep 22 00:07:47 server2 kernel: CPU3: Temperature/speed normal
Sep 22 00:07:49 server2 kernel: CPU1: Temperature above threshold, cpu clock throttled
Sep 22 00:07:49 server2 kernel: CPU0: Temperature/speed normal
=====================================================
and /proc/acpi/thermal_zone/THRM/* always gives the following
====================================
<setting not supported>
cooling mode: critical
<polling disabled>
state: ok
temperature: 30 C
critical (S5): 110 C
====================================
View 11 Replies
View Related
Jun 26, 2007
I have been loosely monitoring the system temperature on my co-located 1U server and have noticed fluctuations of up to 9 degrees Celsius (or around 18 degrees Fahrenheit) depending on the time of day, and the current weather in the city the data center is located.
In the dead of night the system usually reads around 28C but in mid afternoon it will get up to 34 - 38C, not terribly hot, but the effect of the constantly changing temps on the hard drives has me concerned. Server load doesn't seem to be a huge contributor to the temp increase since it's peak load times are usually from late evening until early morning, so I'm guessing this is the data center heating up and cooling down following the outside weather patterns.
do any of you others see temperature swings like this on your servers and how much would be normal?
View 12 Replies
View Related
Sep 14, 2009
I have a couple of Dell 1950s and in one of them, I have 2x Seagate 15K.5s that I purchased through Dell and I also have a spare sitting in my rack in case one goes bad, also from Dell.
I was going to be repurposing one of my other 1950s and was going to get two more 15K.5s for it, but wasn't planning on getting them through Dell (rip off?). This way, could still keep the same spare drive around in case a drive went bad in that system as well.
When I was talking to my Dell rep recently when purchasing another system, their hardware tech said you can't use non-Dell drives with Dell drives in the same RAID array because of the different firmware between them.
Anyone know if it is true? Anyone have any experience with using drives from Dell in conjunction with the same model drives from a third party retailer?
View 1 Replies
View Related
Jul 30, 2008
how to read CPU Temperature on CentOS 4.6. and kernel 2.6.9 (CentOS kernel from yum)
View 4 Replies
View Related