- Google platform
Google requires large computational resources in order to provide their service. This article describes the technological infrastructure behind Google's websites, as presented in the company's public announcements.
Network topology
Though the numbers are not publicly known, some people estimate that Google maintains over 450,000 servers, arranged in racks located in clusters in cities around the world, with major centers in
Mountain View, California ;Virginia ;Atlanta, Georgia ;Dublin, Ireland ; and new facilities constructed inThe Dalles, Oregon Carr, David F. " [http://www.baselinemag.com/c/a/Projects-Networks-and-Storage/How-Google-Works-%5B1%5D How Google Works] ." " [http://www.baselinemag.com/ Baseline Magazine] ."July 6 ,2006 . Retrieved onJuly 10 ,2006 .] andSaint-Ghislain ,Belgium ." [http://www.investinwallonia.be/ofi-belgium/menu-news/Google-Saint-Ghislain-investment.php] ." " [http://www.investinwallonia.be Invest Wallonia] ."April 27 ,2007 . Retrieved onMay 10 ,2007 ] In 2009 Google is planning one of its first sites in the upper midwest to open in Council Bluffs, Iowa close to abundantwind power resources for fulfillinggreen energy objectives and proximate to fiber optic communications links." [http://www.google.com/datacenter/councilbluffs/] ." " [http://www.google.com/datacenter/councilbluffs/ Council Bluffs] ."July 9 ,2007 . Retrieved onAugust 21 ,2007 ]When an attempt to connect to Google is made, DNS servers resolve www.google.com to multiple IP addresses, which acts as a first level of load balancing by directing clients to different Google clusters. (When a domain name resolves to multiple IP addresses, typical implementation of clients is to use the first IP address for communication; the order of IP addresses provided by DNS servers for a domain name is typically done using Round Robin policy.)Each Google cluster has thousands of servers, and upon connection to a cluster further load balancing is performed by hardware in the cluster, in order to send the queries to the least loaded web server. This makes Google one of the biggest and most complex known
content delivery network s.Racks are custom-made and contain 40 to 80 servers (20 to 40 1U servers on either side), while new servers are 2U Rackmount systems. [http://labs.google.com/papers/googlecluster-ieee.pdf Web Search for a Planet: The Google Cluster Architecture] (Luiz André Barroso, Jeffrey Dean, Urs Hölzle) ] Each rack has a switch. Servers are connected via a 100 Mbit/s
Ethernet link to the local switch. Switches are connected to coregigabit switch using one or two gigabit uplinks.Fact|date=February 2007Main index
Since queries are composed of words, an
inverted index of documents is required. Such an index allows obtaining a list of documents by a query word. The index is very large due to the number of documents stored in the servers.erver types
Google's server infrastructure is divided in several types, each assigned to a different purpose:
*Google load balancers take the client request and forward it to one of the Google Web Servers via Squid proxy servers.*Squid proxy servers take the client request from load balancers and return the result if present in local cache otherwise forward it to
Google Web Server .*Google web servers coordinate the execution of queries sent by users, then format the result into an
HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.*Data-gathering servers are permanently dedicated to spidering the Web. They update the index and document databases and apply Google's algorithms to assign ranks to pages.
*Index servers each contain a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
*Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
*Ad servers manage advertisements offered by services like
AdWords andAdSense .*
Spelling servers make suggestions about the spelling of queries.Server hardware and software
Original hardware
The original hardware (ca.
1998 ) that was used by Google when it was located atStanford University , included: [" [http://web.archive.org/web/19990209043945/google.stanford.edu/googlehardware.html Google Stanford Hardware] ." "Stanford University (provided byInternet Archive )." Retrieved onJuly 10 ,2006 .]*Sun Ultra II with dual 200 MHz processors, and 256MB of
RAM . This was the main machine for the original Backrub system.
*2 x 300 MHz DualPentium II Servers donated byIntel , they included 512MB of RAM and 9 x 9GB hard drives between the two. It was on these that the main search ran.
*F50 IBMRS/6000 donated byIBM , included 4 processors, 512MB of memory and 8 x 9GB hard drives.
*Two additional boxes included 3 x 9GB hard drives and 6 x 4GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
*IBM disk expansion box with another 8 x 9GB hard drives donated by IBM.
*Homemade disk box which contained 10 x 9GBSCSI hard drives.Current hardware
Servers are commodity-class
x86 PCs running customized versions ofLinux . The goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. Estimates of the power required for over 450,000 servers range upwards of 20megawatts , which cost on the order of US$2 million per month in electricity charges.Specifications:
* Upwards of 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III (as of|2003|lc=on). A 2005 estimate byPaul Strassmann has 200,000 servers,Strassmann, Paul A. " [http://www.strassmann.com/pubs/gmu/LectureV4.pdf A Model for the Systems Architecture of the Future] ."December 5 ,2005 . Retrieved onMarch 18 ,2008 .] while unspecified sources claimed this number to be upwards of 450,000 in2006 .
* One or more 80GBhard disks per server (2003)
* 2–4 GB of memory per machine (2004)The exact size and whereabouts of the data centers Google uses are unknown, and official figures remain intentionally vague. In a
2000 estimate, Google's server farm consisted of 6000 processors, 12,000 common IDE disks (2 per machine, and one processor per machine), at four sites: two inSilicon Valley , California and two inVirginia .Hennessy, John; Patterson, David. (2002 ). "Computer Architecture: A Quantitative Approach. Third Edition." Morgan Kaufmann. ISBN 1-55860-596-7.] Each site had an OC-48 (2488 Mbit/s) internet connection and an OC-12 (622 Mbit/s) connection to other Google sites. The connections are eventually routed down to 4 x 1 Gbit/s lines connecting up to 64 racks, each rack holding 80 machines and two ethernet switches. The servers run custom server software calledGoogle Web Server .Project 02
Google is currently developing a
supercomputer at a data center located in the town ofThe Dalles, Oregon , on theColumbia River , approximately 80 miles from Portland. The project, codenamed "Project 02",Markoff, John; Hansell, Saul. " [http://www.signonsandiego.com/uniontrib/20060614/news_1n14supercom.html Google's quasi-secret power play] ." "San Diego Union Tribune ."June 14 ,2006 . Retrieved onJuly 10 ,2006 .] is expected to substantially add to their current global network capable of processing billions of search queries per day and a growing repertoire of other services. The new complex is approximately the size of two football fields with cooling towers four stories high.By JOHN MARKOFF and SAUL HANSELLPublished:June 14 ,2006 .The New York Times , Technology section. " [http://www.nytimes.com/2006/06/14/technology/14search.html?_r=1&n=Top/News/Business/Companies/Google%20Inc.&oref=slogin] " Retrieved on Feb 13,2008 ]oftware
Google has acknowledged that Python has played an important role from the beginning, and that it continues to do so as the system grows and evolves. [http://python.org/about/quotes/] The applications that crawl and cache data are thought to be built around the operating system, probably in C/
C++ .Fact|date=August 2008erver operation
Most operations are read-only. When an update is required, queries are redirected to other servers, so as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts in parallel, thus reducing the latency time.
To lessen the effects of unavoidable hardware failure, data stored in the servers may be mirrored using hardware RAID.Fact|date=February 2008 Software is also designed to be fault tolerant. Thus when a system goes down, data is still available on other servers, which increases reliability.
References
External links
* [http://labs.google.com/papers/index.html Google Research Publications]
* [http://www.uwtv.org/programs/displayevent.asp?rid=1680 The Google Linux Cluster] — Video about Google's Linux cluster
* [http://labs.google.com/papers/googlecluster-ieee.pdf Web Search for a Planet: The Google Cluster Architecture] (Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
* [http://www.baselinemag.com/c/a/Projects-Networks-and-Storage/How-Google-Works-%5B1%5D How Google Works]
* [http://backrub.tjtech.org/May1998/hardware.htm Original Google Hardware Pictures]
Wikimedia Foundation. 2010.