How Google Leverages Big Data

Companies have had data about their key stakeholders for as long as they have existed, more recently we have been applying technology to analytical to solve problems for as long as we’ve had computers, so what’s new and what’s different here? So the first question is, “how much data are we talking about?” and why is there a preface big.  In 1979, Dr. Jack E. Shemer and Dr. Philip M. Neches formed Teradata, a name that represents the idea of unimaginably large amounts of data and the resulting value proposition is “this is what we are going to help our customers optimize on”.

In 2008, WIRED magazine published an article titled “The Petabyte Age: Because More Isn’t Just More — More Is Different” and so you begin to see that 28 years later, the prefix creep moves on and “terra” won’t work anymore in the new normal with all this data being collected. Four years later, WIRED came out with another article on “The Exabyte Revolution” and the prefix’s are marching up – this one didn’t work for 28 years or 4 years or even one year – it worked for about 6 months because Cisco recently came out with a report on “The Zettabyte Era—Trends and Analysis” and its not even the scary part. There is only one more prefix in existence that can describe the next phase in big data and its called the Yottabyte, indicating a value of 10008Wall Street Journal recently interviewed Andrew McAfee, principal research scientist for the Center for Digital Business at MIT’s Sloan School of Management on this problem wherein the term “Hellabyte” was unveiled as the potential successor for the Yottabyte, if and when it comes to that.

This goes to show the increasing cascade and raw amount of digital data being generated, its coming at us more quickly, from all directions and its got forms we are not used to. As a corporate sales professional, I’m used to seeing my data:

  • in numeric and text form,
  • organized in rows and columns,
  • spreadsheets and databases

and we got comfortable working with that. Now we’re getting data from the social web – pictures, status updates, hash-tags, videos and search terms. Because of GPS sensors, accelerometer, compasses and this array of sensing devices, our smart phones are giving off an exhaust trail of data. This really is a brave new world in the digital information realm. About 4,000 photos are uploaded to Facebook every second, with a quarter trillion to date. Google processes well over a billion searches a day. To find a way to profitably mine even half those searches and what the implications of the trends are – informational or transnational then business can gain insights & an advantage. Just some facts from less than 10 years ago:

  • Facebook had not yet been founded
  • Google was five years old
  • Twitter was a gleam in someone’s eye
  • The iPhone was four years away

It’s really unreasonable to expect that companies around the world are on top of this phenomenon. By looking at problems with fresh eyes and utilizing the sources of new data that are relevant towards an industry, whether from a demand or supply perspective, accurate predictions can be made based on stakeholder intent and for operational streamlining.

In the real estate market, the crystal ball on how housing prices would and will fluctuate is has the highest demand among brokers. The status quo means of attaining insights on housing prices is to approach the National Association of Realtor’s and subscribe to their forecasts on housing prices changes, which are developed by looking at economic indicators such as interest rates, GDP, demographic shifts which creates a statistical model that spits out forecasts area by area around the country. MIT’s Lynn Wu and Erik Brynjolfsson took the big data approach and sifted through Google’s search results within a specific territory and find links between searches that suggest buying curiosity “house prices in Detroit” and the shift towards call to action (i.e. the searcher has finally decided) searches such as “schools in X area of Detroit” and build correlations between reasons for interest generation and means of closing the transaction. The result: the prediction model is 23.6% more accurate than the status quo.


According to this paper, employee theft and fraud accounts for up to $200 billion a year across the economy. As enterprises grow and become more geographically spread out, it becomes really hard to stay on top of everything going on – especially in a fairly inexpensive table service restaurant that run on razor thin profit margins with an immense load of data based of triggering any action and in house transaction. The other fact about these businesses is that employee dishonesty is on the rise and can be really devious about making sure that money that should go to the business finds its way into their pockets instead. NCR had the idea of doing pattern matching on this ocean of data coming in and flag suspicious behavior which would alert managers. The study looks at the before and after of the implementation of the technology in 400 locations and found the following benefits:

  • drop in observed theft by USD 25 per week per location
  • weekly revenue increased on average by USD 3,000 per month
  • tip % significantly went up

So what happened that employees felt the strain of big brother, curtailed their theft practices, focused more on up-selling & customer service and gained more take home tip income.

Once upon a time, Google conducted application tests with brain teasers such as “How many golf balls can you fit in an airplane?” and “How many gas stations are there in Manhattan?” to assess the methods through which a candidate would reach the answer as opposed to focusing on the answer itself. After going through their data and second guessing the effectiveness of hiring practices and criteria, Google’s Senior Vice President of People Operations famously shared in an interview with NYTimes“Brainteasers are a complete waste of time.” It also turned out that values such as schooling and GPA had little to no correlation between workplace innovativeness and performance.

“Computers are useless. They can only give you answers.” – Pablo Picasso
The entire reliance on technology is not advised, as advanced machines do not know what questions to go ask next, so one of the cultural shifts the industry needs is to move away from making decisions towards asking questions and figuring out the next opportunity. The question to ask is “what do you need to get better at?” and a good way to do that is writing down 5 questions to which your organization does not yet have the answers to. When you look at what data driven companies do, its more incremental and experimental, they test and look at the data to assess the beneficial outcomes (if at all) and proceed down the path of highest returns. The scientific view is to have a hypothesis and work very hard to disprove it – the goal is to be wrong and not structure the outcomes around a safe answer senior managers hope to reach. It’s time to send the signal throughout the enterprise that the era of analytics and geeks is the path upward, towards smarter decision making in the new normal. 

One of the things that we’ve seen is that old dogs can learn new tricks, the internal tech and analytic teams can re-skill themselves to thrive in the new era of creating data scientists – able to work with big data sets and programs while having training in AI & machine learning including being able to talk to peers in management teams. Universities like KITE have recently launched the Center for Data Sciences to consult on big data and train the industry. With the introduction of MOOC’s from MIT, getting teams to upscale their minds on the latest technologies and methodologies in this era of adaptive learning has become more fluid.

Managers in this era need to start looking at their areas of ignorance and where the blind spots lie. The art of asking a good question is a really subtle art and that’s truly why it hasn’t been automated because the ability to be confused is still a very human skill. Managers need to stop thinking like accountants and more like geeks and think of new ways to combine what they are actually good at versus what the data are actually good at .. the next brave frontier.

Leave a Reply

Your email address will not be published. Required fields are marked *