Friday, May 12, 2006

Everything About Google : Part III

Google is a search engine owned by Google, Inc. whose mission statement is to, "organize the world's information and make it universally accessible and useful." The largest search engine on the web, Google receives over 200 million queries each day through its various services.

The name "Google" is a play on the word "googol," which refers to the number represented by 1 followed by one hundred zeros. As a further play on this, Google's headquarters, located in California, are referred to as "the Googleplex" — a googolplex being 1 followed by a googol of zeros, and the HQ being a complex of buildings (cf. multiplex, cineplex, etc).

In addition to its tool for searching webpages, Google also provides services for searching images, Usenet newsgroups, news websites, videos, searching by locality, maps, and items for sale online. In 2006, Google has indexed over 25 billion web pages, 1.3 billion images, and over one billion Usenet messages — in total, approximately 12 billion items. It also caches much of the content that it indexes. Google operates other tools and services including Google News, Google Suggest, Froogle, and Google Desktop Search.


Main article: History of Google

Google began as a research project in January, 1996 by Larry Page and Sergey Brin, two Ph.D. students at Stanford.[1] They hypothesized that a search engine that analyzed the relationships between websites would produce better results than existing techniques (existing search engines at the time essentially ranked results according to how many times the search term appeared on a page).[2] It was originally nicknamed "BackRub" because the system checked backlinks to estimate a site's importance.[3] A small search engine called RankDex was already exploring a similar strategy.[4]

Convinced that the pages with the most links to them from other highly relevant web pages must be the most relevant pages associated with the search, Page and Brin tested their thesis as part of their studies, and laid the foundation for their search engine. Originally the search engine used the Stanford University website with the domain The domain was registered on September 15, 1997, and the company was incorporated as, Google Inc. on September 7, 1998 at a friend's garage in Menlo Park, California.

The Google search engine attracted a loyal following among the growing number of Internet users. They were attracted to its simple, uncluttered, clean design — a competitive advantage to attract users who did not wish to enter searches on web pages filled with visual distractions or popups. This appearance, while imitating the early AltaVista, had behind it Google's unique search capabilities. In 2000, Google began selling advertisements associated with the search keyword to produce enhanced search results for the user. This strategy was important for increasing advertising revenue, which is based upon the number of hits users make upon ads. The ads were text-based in order to maintain an uncluttered page design and to maximize page loading speed. It also only cost a very small amount per click to the websites that advertised this way. This model of selling keyword advertising was originally pioneered by (later renamed Overture, then Yahoo! Search Marketing).[5] While many of its dot-com rivals failed in the new Internet marketplace, Google quietly rose in stature while generating revenue.

U.S. Patent 6,285,999 describing Google's ranking mechanism (PageRank) was granted on September 4, 2001. The patent was officially assigned to Stanford University and lists Lawrence Page as the inventor.

"To google," as a verb, has come to mean, "to search for something on the internet." Google officials have discouraged this usage of the company's name out of fear of trademark dilution, as it could lead to their name becoming a genericized trademark. To prevent domain hijacking by unaffiliated third parties, Google has also purchased the redirecting rights to several similar-sounding domain names like,, etc. The registration of other domain names to prevent hijacking has also been found to be a common practice for other companies as well.


The search engine


Index size

At its start in 1998, Google claimed to index 25,000,000 websites.[6] By June 2005, this number had grown to 8,058,044,651 websites, as well as 1,187,630,000 images, 1 billion Usenet messages, 6,600 print catalogs, and 4,500 news sources.

* January 1998: 25,000,000

* August 2000: 1,060,000,000

* January 2002: 2,073,000,000

* February 2003: 3,083,000,000

* September 2004: 4,285,000,000

* November 2004: 8,058,044,651

* June 2005: 8,058,044,651

* February 2006: 20,000,000,000


Physical structure

Google employs data centers full of low-cost commodity computers running a custom Red Hat Linux in several locations around the world to respond to search requests and to index the web. The server farms in the data centers are built using a shared nothing architecture. The indexing is performed by a program named Googlebot, which periodically requests new copies of web pages it already knows about. The more often a page updates, the more often Googlebot will visit. The links in these pages are examined to discover new pages to be added to its internal database of the web. This index database and web page cache is several terabytes in size. Google has developed its own file system called Google File System for storing all this data.

See also: Google platform



Google uses an algorithm called PageRank to rank web pages that match a given search string. The PageRank algorithm computes a recursive figure of merit for web pages, based on the weighted sum of the PageRanks of the pages linking to them. The PageRank thus derives from human-generated links, and correlates well with human concepts of importance. Previous keyword-based methods of ranking search results, used by many search engines that were once more popular than Google, would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page. In addition to PageRank, Google also uses other secret criteria for determining the ranking of pages on result lists.

Google not only indexes and caches HTML files but also 13 other file types, which include PDF, Word documents, Excel spreadsheets, Flash SWF, plain text files, among others. Except in the case of text and SWF files, the cached version is a conversion to HTML, allowing those without the corresponding viewer application to read the file.

Users can customize the search engine somewhat. They can set a default language, use "SafeSearch" filtering technology (which is on 'moderate' setting by default), and set the number of results shown on each page. Google has been criticized for placing long-term cookies on users' machines to store these preferences, a tactic which also enables them to track a user's search terms over time. For any query (of which only the 32 first keywords are taken into account), up to the first 1000 results can be shown with a maximum of 100 displayed per page.

Despite its immense index, there is also a considerable amount of data in databases, which are accessible from websites by means of queries, but not by links. This so-called deep web is minimally covered by Google and contains, for example, catalogs of libraries, official legislative documents of governments, phone books, and more.

See also: PageRank


Google optimization

The webpage that shows the results of a search for Miserable failure. This is an example of Google bombing.


Since Google is the most popular search engine, many webmasters have become eager to influence their website's Google rankings. An industry of consultants has arisen to help websites raise their rankings on Google and on other search engines. This field, called search engine optimization, attempts to discern patterns in search engine listings, and then develop a methodology for improving rankings.

One of Google's chief challenges is that as its algorithms and results have gained the trust of web users, the profit to be gained by a commercial website in subverting those results has increased dramatically. Some search engine optimization firms have attempted to inflate specific Google rankings by various artifices, and thereby draw more searchers to their client's sites. Google has managed to weaken some of these attempts by reducing the ranking of sites known to use them.

Search engine optimization encompasses both "on page" factors (like body copy, title tags, H1 heading tags and image alt attributes) and "off page" factors (like anchor text and PageRank). The general idea is to affect Google's relevance algorithm by incorporating the keywords being targeted in various places "on page," in particular the title tag and the body copy (note: the higher up in the page, the better its keyword prominence and thus the ranking). Too many occurrences of the keyword, however, cause the page to look suspect to Google's spam checking algorithms.

One "off page" technique that works particularly well is Google bombing in which websites link to another site using a particular phrase in the anchor text, in order to give the site a high ranking when the word is searched for.

Google publishes a set of guidelines for a website's owners who would like to raise their rankings when using legitimate optimization consultants.[7]


Google jargon


Search Engine Optimization

To google

to search something using google (also, to seek information on someone by entering their full name or other information)


a person who uses Google's features very efficiently. Mostly uses the "I'm feeling lucky" button when searching. Fan of a google. 'Googler' is sometimes also used for "Expert Online Searcher". Also, a company term for a full-time google employee.


New Googler


The science of Google

Googlenym, Googlonym, Memomark, Google URL

A mental bookmark expressed as Google search ("go to my site by entering 'John Doe Chicago' into Google"). A phrase or group of random key words for which a Google search returns a corresponding page.


Search Engine Result Pages

Nigritude ultramarine, SERPs, Seraphim Proudleduck, Mangeur de cigogne

SEO competitions

Blackhat SEO

search engine optimization using dirty tricks such as linkfarms, wiki or guestbook spamming, and so on


A person who accidentally exposes information to the web by placing it into a location spidered by Google.

Whitehat SEO

search engine optimization using enhanced content, improved accessibility and usability, unique page titles, non-JavaScript linking methods, and so on


search-phrase delivering exactly the intended result while searching with google

Sandbox Effect

The name given to the phenomenon in which Google filters (from its results) websites created after March 2004.

Google bomb

An attempt to influence the ranking of a given site in results returned by the Google search engine. Also known as Google wash.

Blue Red Yellow Blue Green Red

synonym of Google (from the colors of their logo)


A search using two dictionary-valid (underlined by Google) words that only results in one hit.


Google games

* In Googlewhack you attempt to find two words that produce exactly one search result.

* In Google Talk Game, google searches are used to complete a beginning of a sentence with words, leading to amusing or interesting results.

* In Googlefight, you pit two keywords against each other to find which one has more results.

* In Guess The Google, you attempt to guess which search term resulted in the displayed images.

· In Toogle, you can search images with the text of the search item making up the image. "The most comprehensive image buggery on the web"

Source :


* Feb 2001: Deja (the Usenet archive, not the company) was acquired, and was incorporated to become part of the re-launched Google Groups [2].

* Sep 2001: Google acquired Outride Inc. Outride was a spin-off from Xerox (PARC). domain name still exists, but current forwards to Google. [3].



* Feb 2003: Google acquired Pyra Labs, a weblogging provider and owner of Blogger [4].

* Apr 2003: Neotonic Software was acquired as part of Google's plan to bring its CRM technology in house [5].

* Apr 2003: Applied Semantics was acquired for $102 Million. Applied Semanitcs was a context ad company whose acquisition by Google was integrated into Google's AdWords/AdSense programs.[6].

* Sep 2003: Kaltix was a small start-up acquired to develop and launch Google Personal [7].

* Oct 2003: Sprinks was acquired to enhance Google's Adwords and AdSense program [8].

* Oct 2003: Google acquired Genius Labs, another web logging provider [9].



* Apr 2004: Ignite Logic was acquired [10].

* Jun 2004: Google made a $10M investment into partial ownership of Baidu [11].

* Jul 2004: Picasa was acquired to provide picture management tools to Blogger [12].

* Oct 2004: Keyhole was acquired to provide the core mapping capabilities in Google Earth [13].

* Sept-Dec 2004, Google revealed in its annual 10-K filing that it had acquired 2 Silicon Valley start-up companies: ZipDash and Australian firm Where2 LLC, founded by Lars Rasmussen. The technology provided by ZipDash was used to develop and launch Google Ride Finder. Where2 provided the core mapping capabilities in Google Maps.



For 2005, Google submitted a 10-K filing with the SEC which revealed that it had acquired 9 companies and susbtantially all of the assets of another 6 companies. The combined purchase price for these 15 companies was equal to $130.535 Million USD.

* Mar 2005: Web analytics tools provider Urchin Software Corporation was acquired [14]. Urchin's technology was used to develop and launch Google Analytics.

* May 2005: DodgeBall [15], a social networking software provider for mobile devices, was acquired [16].

* Jul 2005: Google, in combination with Goldman Sachs, and the Hearst Corp., invests a total of $100 Million into Current Communications Group [17].

* Jul 2005: Google announced in its Q2 quarterly conference call that it had acquired Akwan Information Technologies as a part of its plan to open an R&D office and expand its presence into Latin and South America. [18]

* Jul 2005: Google acquires Canadian start-up firm Reqwireless, a Web browser and Mobile e-mail software developer for wireless devices, as a part of its initiative to develop a version of GMail for the mobile device. [19][20]

* Aug 2005: Google acquires Android Inc., a software provider for mobile devices [21]

* Dec 2005: Google pays $1Billion dollars to acquire a 5% stake in Time Warner's AOL division. [22]



* Jan 2006: Google acquires dMarc Broadcasting, creator and operator of an automated platform that lets advertisers more easily schedule, deliver, and monitor their ads over radio, and radio broadcasters to automate schedules and advertising spots. Purchase price was for $102Million, with an additional payout of $1.136 billion over three years if certain performance targets are met. [23]

* February 2006: Google acquires Measure Map from Adaptive Path. Measure Map is a product to help with Blog Analytics. Spearheaded by Jeffrey Veen

* March 2006: Google acquires Writely.

* March 2006: Google acquires Sketchup. Using a plugin, this program allows you to place 3D models into Google Earth.

· April 2006: Google acquires an advanced text search algorithm from the University of New South Wales in Australia. The algorithm was invented by Ori Allon, an Israeli student. Terms of the deal and purchase price not disclosed.

Source :


