In February 2010 I interviewed Google’s Matt Cutts with my colleague Mike Hanley. It was one of a number of fascinating interviews arranged for us by Google Australia with their key engineers in the Googleplex in California, which had included Google Fellow Amit Singhal. Our thanks to Annie Baxter and Nate.
Mike: One of the stimulus’s for this interview was when we were at the Google Christmas party last year. Somebody said that, in their opinion, the science of search was about three percent progressed. So the natural question is then, what does the other 97 percent look like?
Matt: Yeah, that’s a great starting point.
Mike: Maybe you can give us a little bit of background on how we got to that three percent and where are we going?
Annie: And I think the final bit of context I will add is Matt, you’re first of four today so Amit, Trystan and Ben are following you and so if in the course of this discussion you hit on stuff that you can throw to one of your esteemed colleagues to deal with later on, feel free to do that.
Matt: Absolutely, that sounds good. Its funny, Amit and Ben Gomes have been at Google even longer than I have so you’re in good hands with all those different folks and Trystan always represents the Australian point of view very well. He gives an international flavour. So he’s a very handy guy to have around. Its funny because every time I’ve heard the analogy, people either compare search to still being in its infancy or maybe a toddler, if they really think its doing well, maybe early adolescence but that’s about as far as anybody here is willing to go because there’s so much left to do in search. How did we get here? Well looking back on a decade Google has done a lot of different stuff to try to push the quality bar higher on search. So historically we talk about typically three or four different pillars of search and search quality tiers. So the one that of course matters the most to us is relevance. If you type in your query and you don’t get an answer that’s helpful to you or adds some value for your time then we’ve basically failed in terms of our abilities. So relevance is first and foremost. You would think that that would be the first in everybody’s point of view but we – I think we went to a nursing home and we asked people what do you like about Google and they said, its clean, its fast and its relevant and relevant was only mentioned after clean and fast. So in addition to having relevance you also want to have a good user experience. Google has tried very hard to make sure its webpage doesn’t get crowded with all sorts of pictures and banners and scrolly things that are going to distract you and you’ve seen that maybe recently in our most recent homepage where we even make it more minimal – just a search box and a Google logo and the whole idea now is to focus the attention on the search box and try to reduce all the little bits of clutter that have grown over the last decade. So there’s the user experience, the UI, how clean it is and how relevant it is. There’s two other really important things. There’s how comprehensive it is. Google aims to be the most comprehensive version of the web that can be searched. So back when I joined Google early on we had just topped one billion documents and we felt very happy about that and recently we did a blog post, I think in 2009 that said we now know of one trillion URLs. Now we haven’t crawled every single one of those one trillion but we’ve seen them and we’re able to process them within our system and I would absolutely stack our comprehensiveness, the size of our database up against any other search engine. But you know, even if you have a really huge database if its stale or out of date then that’s not very helpful either. So the last pillar of search quality – at least how I think about it is freshness and we really push hard on that over the years. There’s a very clear train of Google trying to have the freshest results that we can. When I started in 2000 they had not updated their index for about four or five months to the point where they had a war room to get to the point where they could create one new index every month and for years Google would work on that kind of timeframe because the web wasn’t changing that fast, people weren’t making that many domain names, this was even before people were blogging. So you know, Google had to more or less code the pages by hand. So updating the pages about once a month was okay but starting in 2003 we launched a new, what we call an incremental indexing system which means rather than update the whole thing at once, you can update just a part of it, one increment of it and so in 2003, the summer of 2003 we started to update a little bit of our index every day, sort of call it daily indexing and the largest innovation that’s most visible probably in 2009 is real time search and so we moved from once a month to once a day, over the course of the years we got to where people could do a blog post and we could find that blog post within minutes but real time search is specifically targeted at things that happened seconds ago. So Amit is the world expert on real time search but I will just give you a very simple example. Earlier this morning at 7.55 local time there was a plane crash and so three people who worked for Tesla which is an electric car company maker, their plane crashed in East Palo Alto and caused a power outage and the Stanford Hospital was without power after several hours. Several people I talked to at Google said their power went out at 7.55 and they had no idea what happened and it takes minutes for these news stories to show up. So they went to Google and they did a search for Palo Alto and they found out about this plane crash. So a lot of that information comes from Twitter, some of it comes from Friend Feed, its sort of an open system so it can also include news stories. So when someone does write a story about that plane crash it can instantly show up and people can search on it but it makes a huge difference if your power is out and you still have your cell phone, you can search on Google for Palo Alto and find out oh, it was a plane crash eight blocks away, that’s what that large noise was and its – at first when real time search was launched a lot of people were like well, you know there’s this scrolling thing, are we sure that it isn’t a little too gaudy and then California had a series of earthquakes, there was the Haiti earthquake and you noticed people don’t really question whether real time search is valuable after the Haiti earthquake. The ability to see what’s happening in real time, the ability to hear people talking about where the refugee camps are, I think most people say yes, you do want information instantly and even in real time search Google tries to bring that relevance. We don’t just search over whom most recently said plane crash or celebrity death or whose getting nominated for the Oscar Awards. Instead we try to say who are the most reputable people? Who are the people who are really helpful? Who are people who are contributing to the conversation and so forth? So that’s the circle perspective on the three percent. Now relevance, good user experience, comprehensiveness and freshness and we keep pushing on all of those to try to improve this.
Mike: So where are you on the three percent thing? Are you in adolescence yet as far as you’re concerned?
Matt: You know what, I’d be willing to say that we’re up to 10 percent, that’s my personal opinion but there remains so many hard problems. So now let’s try and look toward the future a little bit. You know, at first glance there remains so many challenges to getting good search quality and getting it to the right person and getting it to them in a really relevant way. So for example, a mobile. If your power goes out you want to be able to search with your phone, you want to get things that are relevant to you right nearby. So Google just rolled out Social Search in the last few weeks and that lets you say, oh my friend did a blog post or tweeted about something whatever it be, the Oscars or a plane crash and you can see that person, that person you know showing up at the bottom of your search results. Local, you know, just the idea that you can say okay, I’m here, I’m going to do a search for a bank in the U.S and I will get Bank of America. If you do a search for a bank in Australia you’ll get completely different search results, even down to the metro level. So individual cities can get different search results sometimes. So that’s just types of data. There’s also things like personalisation. If we know a little bit about you, if you’re willing to tell us a little bit more about your interests that if you typed in Saturn we want to know that you mean the planet rather than the car, the automobile. Boy, it just goes on and on. Universal, the idea of universal is that if somebody comes in and types Martin Luther King, do they want a picture? Do they want a video of his speech? Are they interested in news about Martin Luther King because it’s Martin Luther King Day? Do they want web results? So rather than trying to ask people oh, you have to remember to go over here to get a picture of a sunset or go over here to get a video of a music video, what we’d like is if you could just go to one place, the Google search box, type in stuff, whether it be weather, whether it be time in Australia, you can get the answer right there in the search results and it just pulls from all the different types of data that we know about. So that’s just a very simple glimpse, that’s the different types of data but there’s also knowing more about the queries themselves and the documents. You know, 10 percent of queries are misspelt. So a lot of people will type cinnamon or Israel or embarrassed and they’ll use the wrong number of ‘r’s and ‘s’s and it makes us look bad if we’re not able to return good search results. So the ability to sort of figure out what does the user really mean. Something like 25-30 percent of the queries that we see are completely unique, we’ve never seen them before in our lives. So at that point you’re like, you really don’t know what exactly someone means when they type a query, you try to figure out what they’re saying and in the same way people are ambiguous in their two or three word queries, sometimes people write about things and they use completely different lingo or vocabulary or chronology. So if a medical doctor is writing about the side affects of a drug, he might use the formal name of that drug and not the informal name, the name that everybody else uses or even explain the drug. So if you can learn that Hydrogen or Ammonia is the same as N3H2 or something like that, then someone who searches for one term doesn’t need to know that it can also be called this other thing. So there’s – the whole notion of understanding what a person really means whether they’re writing a document or a query, that’s a really hard problem, that’s not just matching keywords and we do quite a bit to try to figure out synonyms, all those sorts of things, plural versus singular, misspellings but that’s I think what makes a lot of people say its only three percent done. There’s still all this stuff yet to be done on figuring out what people really mean when they write things.
Scott: And what about the development of new technologies such as real time Twitter… They mean that Google now needs to reach in and grab the real time data that’s coming out of those feeds. And I’m guessing that when Wave gets more integrated into the internet as a whole those issues of real time are actually going to start changing the nature of the way that stuff is being searched, and they will introduce new challenges that in some respects recalibrates what the percentage depth of search completion is as you go along. How do you evaluate the new technologies as they come in to then start adding them into search results?
Matt: Absolutely, yeah. Evaluation is one of the biggest challenges any search engine can have but its one of the unsung challenges, trying to figure out given one set of results is this page better or is this other page better. So we have an entire evaluation group that’s a lot of different really neat ways to try to evaluate search quality. A good example is, pick a query like Barack Obama. What should be the top 10 results for Barack Obama? Well probably his White House page, maybe there’s a biography page. After the first one or two like that people start to disagree because they have the New York Times with a category page with all the mentions of Barack Obama. Should they have news results about Barack Obama. So I think if you – you can ask Amit this question but he can basically tell you that if you put five people in a room and say here are 100 different search results, put them in the best order and show me what the top 10 would be. You’d only get something like 80 percent agreement. So trying to figure out what the ideal set of results is, is really, really hard. So given new sets of results, they might be for real time search, they might be from Wave, how do you say well what’s the value of a block of real time results versus two or three regular search results and they do use a lot of different criteria. For example, if you’re seeing a lot of people doing queries and a lot of really new results on the web for something and it could be a hot topic or some sort of trendy topic, maybe Michael Jackson died and then if you see a lot of articles in the News sources then suddenly okay, we think this is breaking news, we think its some sort of current event. There’s a new development so lets show some real time results or lets show something that’s very, very recent. So we can literally change the ranking based on what we think the most appropriate sort of thing is. I wouldn’t claim that we’re perfect with that but we have to look…
Mike: But is there a bunch of people sitting there, watching what’s going on and then deciding to change things, or does it happen as a matter of a function?
Scott: Is there an algorithm that creates that?
Matt: That’s a fantastic question. So we don’t have any editors, we don’t have any sort of intermediary to say this is some new event that just happened with the Federal budget or something, or Senator Conroy has planned for censorship or not censorship, there’s no one person whose making these sort of decisions. What we try and do is write a computer program that would mimic what a real person would decide. So for example with Google News, how do we decide what gets escalated to the top of the front page? Well it turns out you can look at how many articles are showing up on how many different news sources and see there’s a bunch of these words in common, Conroy, Censorship, so this is a hot breaking sort of thing so let’s escalate this on Google News. So we try very hard to avoid using people because people get tired, they could have bias, they can’t work 24/7 like a computer can. So we absolutely wherever possible take people out of the loop. Now my specific field, what I work on, is called Web Scam. That’s when people try to cheat and even though the vast majority of the stuff that we catch is also by computer programs and with algorithms, we do have a small set of people who notice that, when you type in your name and you’ve got porn and you’re not a porn star, somebody is trying to spam the search results. So with very limited instances we are willing to take menial action but the vast, vast, majority of time we want everything to be completely automated.
Scott: How do you develop machines that do that though? It sounds as though you’re creating some kind of taxonomy system that’s also got some kind of clustering intelligence built into it. How do you interrelate those concepts to see that you’ve got some relevance between a couple of items that might come together. How do you actually flag on a machine level they are clustering, to therefore changes its status?
Matt: Yeah, absolutely. There’s a few different ways you could cluster them. So an easy way is if you crawl the web you might notice that two words co-occur in documents quite a bit. So car and automobile, automobile and automobiles, you might also have that when you’re looking at queries. So someone is typing in Saturn car and then they change the query to be Saturn automobile, you could imagine starting to learn well maybe car and automobile are sort of related and maybe if you review those queries the results that are returned have some amount of overlap. So there are a lot of ways you can learn over time what some of the important concepts are, even if the computer doesn’t really know what they mean.
Scott: How do you put judgement into that though, because you’re creating a kind of machine intelligence. How do you flick a switch and say you’re going to judge this as now being the case, a state in the world.
Matt: Absolutely, in my opinion there’s at least two different fundamental approaches you can take. The first is machine learning in charge of everything. All you do is you pay evaluators to judge A versus B and which search result is good, or in theory you could look at clicks on the search results pages and try to show the stuff that gets logged higher and essentially you have a neural network or some machine learning paradigm that tries to take care of everything for you, and in fact at least in the early days before Microsoft Bing was called Bing, when it was like live.com, whenever they talked about it they talked about it in terms of using a neural network. One problem with that is inputs come in, documents come in and queries come in and then search results come out but its pretty opaque. There’s no easy way to go to a neural network or to a brain and say, well why did you make this association? Why do you think this document is better than that document? So the approach that Google has taken in contrast to letting the machines run the entire thing is to have a lot of people who are information retrieval experts, who write specific algorithms or specific computer programs, that try to say ‘here’s how we can measure the reputation of a site’, ‘here’s how we can measure how well a page answers a query’, and then in some way we want to blend the trade off between how reputable it is and how topical and how well it matches the query in some reasonable way so that when you type in something like telephone system, you don’t just get a random, very reputable guy who happened to mention the word telephone. We want it to be those sort of people but also talk about telephones and so really that’s the approach that we’ve gone with. We have a lot of engineers who understand information retrieval. Amit is the world expert in information retrieval and we looked at the approach of having machines do it all and the trouble is the machine might get close but it doesn’t have the intuition that a human does and so whenever we get problems our approach lets us debug it, our approach lets it say, oh we thought that this site was very, very reputable and really relevant but it turns out its not all that relevant and so maybe its not the best match. So maybe we need to find a way to discern or ascertain that this site is really not as relevant as we thought it was. So let’s look at tweaking this algorithm or let’s look at changing these ways of doing things. So we have a meeting that happens once a week and every week people come with changes that they would propose to the ranking systems and they’re not black box changes where they randomly tweak some number or dial some knob and things just got a little bit better. Instead they’re always motivated by intuition. You want to be able to say hey, we found a better way to segment Chinese, Japanese and Korean, or we found a way to handle spelling errors and so these are not shots in the dark, they are very informed guesses and so in any given week we might have sometimes eight or ten things that we’re discussing, something the system might get a little bit better but it might make things more complex and in other cases we might be worried that there might be collateral damage, that it might not improve all languages for example, maybe only a few languages and so there’s this vetting process where everybody gets together once a week and sits around the table and looks at the side by side evaluation, the reports from the evaluation team and says yes, this appears to be a clearer way. The intuition makes sense, across millions of queries, it does seem to improve the quality overall, we don’t see any major issues so lets go ahead and push this.
Scott: Is there an example you could give because I imagine you’re talking quite top level on that. When you’re all sitting around the table discussing things, you’re going to have a shared language and a shared set of concepts that you’re talking with. What would an example of a typical item be that you’d bring to the table to discuss and how would you discuss it?
Matt: Absolutely, and you could maybe ping Trystan on this. Earlier I mentioned that if you do the query ‘bank’ on Google the number one result for me is Bank of America but if I go to Google Australia and type in ‘bank’, here we go…
Annie: Yeah, you get Westpac.
Mike: You get Westpac, our local bank. ANZ, Commonwealth.
Matt: Right, so you know, you guys would rather have St George Bank, you don’t need Bank of America. So one of the things that Trystan has worked on, and he has this great Australian international perspective, is exactly when should things be global or U.S versus specific to a country. So if you type in football you clearly want the National Football League in the United States but you clearly want soccer pretty much everywhere else and so trying to find that balance can be really tricky. If you think about somewhere like Switzerland, they might speak French and German and a bunch of different languages, or in Austria they might want some German but they might want some English. A very simple example, if you type in eBay in Japan it should be ebay.com or ebay.jp and the fact is that there is no perfect answer for some of these search results, but trying to find the right balance is where having an engineer with his intuition to help drive it can really make a big difference.
Mike: How do you visualise the algorithm or the way that the algorithm looks? Is it an animal? Is it a civilisation? Is it just an algorithm?
Matt: For me it’s like a car because people always ask is there’s one algorithm, one monolithic algorithm that everything feeds into, and that’s kind of like asking is a car a machine. Yes, a car is clearly a machine but a car is also composed of smaller parts which are also machines. You’ve got the engine, you’ve got the carburettor, even within the engine you have pistons, all that sort of thing. But, I think maybe Google for me is more like a space shuttle, but its like a car because it propels you forward, it gives you understanding but its not one single monolithic thing. It’s composed of smaller parts. So there’s one part in our engine that tries to say how reputable a page is. There’s another part that says, well how does it match this query and then there’s a part that tries to blend those to find the best balance of which things are the good search results.
Scott: So when you’re chatting about that, instead of talking about the fuel injectors and tyre pressures, you actually talk look-up-tables and that kind of stuff?
Mike: So when you’re working on the car you take out the carburettor and you make it better and then you put it back in and actually when you put it back in you find it actually screws with the exhaust and so before you put it back in you test it?
Matt: Yes that’s exactly right and not only the carburettor won’t match with the exhaust but sometimes it will and you’ll get unexpected interaction. But absolutely, you pull out individual parts like ‘proximity’. If somebody types in you know, Commonwealth Bank of Australia, well what if you’ve got Commonwealth here and Bank here on the page and then down at the very bottom, way down here that’s Australia. That’s not going to be as good a match as Commonwealth Bank of Australia. So those are ways of tuning what we call proximity for example. All of those little things are individual parts and Amit is the master mechanic who understands the whole car but there are a lot of people smarter than him for individual components, individual parts.
Nate: You know, tell me if I’m wrong here, but I think it might be useful to take a step back and talk a little bit about how just the components of the search engine, how it works. Often times people say they know how search engines work and what we find even inside the company is that people don’t. So, when you’re searching Google, you’re searching our index of the web? I mean do you guys get that? Do you understand how that works?
Mike: We were actually having this discussion in the car on the way here. Of course, Larry started by downloading the internet onto computers at Stanford, which to me is a totally foreign idea. But you can’t do that anymore because the internet changes every 20 seconds. So please go ahead.
Matt: Absolutely, I can talk search forever so I will do say five or 10 minutes. At any point just interject and say okay, enough of that. So there’s three things that a search engine needs to do well in addition to these pillars of search. We have to crawl the web really well, we have to index it quickly and we also have to serve it. We have to return the most relevant results as quickly as possible. So its funny, crawling the web turns out to be very difficult. A lot of people think that Google is the internet or Google has a copy of the entire internet and you’re exactly right, people are adding pages every 20 seconds. So no search engine can have an entire copy of the internet. A very simple home experiment to show that is, suppose on your website you had a calendar of future events that you can also march forward by clicking a link to look at 2011, 2012, 2013, 2014. Well a search engine spider, if it’s stupid, can search and keep clicking on that link until you get to the year 3510. So you know, we call that a spider trap, if you get caught in an infinite space and assuming that the world doesn’t end in 2012, you’ve wasted a lot of your indexing capacity crawling useless pages. So one thing you want to do is you want to crawl the most important pages, you want to crawl the pages that change frequently. You want to crawl the pages that are really useful to users. If you just set the bot loose and when I say bot, really a computer program that sits in a Google data centre, that goes out to web servers and asks for pages just like your web browser does. If you just set those bots loose, they’ll just crawl over the web, but in a random way that doesn’t give you the best pages first. So the primary innovation that Google brought, other than using PCs as commodity machines instead of huge clunker super computers and big iron, one of the big innovations was Page Rank and it is kind of nice to reiterate because a lot of people think that page rank is just the number of links pointing to you, and that’s not what Page Rank is. Page Rank is the number of links pointing to you but its also how important how those links are. So for example if I have 10 links to my website but they’re all my college buddies, people I grew up with in high school, nobody knows – I’m not famous, those are not very high page links. Nate might have six links pointing to his website which is fewer links but if they are CNN, the New York Times, LA Times, Readers Digest, really reputable guys, he’s going to have a lot more Page Rank, and we want to crawl his site before we crawl my site. So Google has a tonne of machines, a bank of machines always computing Page Rank continuously because people are adding new pages, people are modifying new links. So we try to keep continuously computing that and we more or less crawl in Page Rank order. We go to that page and then we take that to Google and we say well what are the outlinks that a browser can click on, on that page. And from those we say ah, here are some more pages to go and fetch and Page Rank works the same way. If you’ve got a higher Page Rank page, that Page Rank flows in the outlinks so that the people that are important, people linked to are also considered important. There’s a lot of heavy math that goes on behind that, a whole lot of machine processing, but that’s the higher order bit on how we trawl well.
Nate: And an interesting thing that’s worth noting is that Page Rank is essentially the measure of the value of links between pages, and that is one of the earliest measures of social value of a page. Its relevant to today because for every link that comes to every site there’s a human that’s made a decision about whether they wanted to do that or not. And so in some ways you could look at Page Rank as being one of the most rudimentary and fundamental measures of the social fabric of the web. Just one thing to be clear on indexing the web. Google Bot is our crawler, and we send our machine to one page on the web and we copy all the content of that page, we look at it, we send it back to Google, and then what we do is we follow all the links to that page. Does that make sense you guys?
Nate: And then we go to those pages that are linked to that page, and then we follow the links to those pages, and so on and so on until we get the entire internet or as much as the web as we can get to and then all of that information, the links, the content of the page etcetera is sent back to Google and its stored in what is called an index. And the index is similar to what you have in the back of a book which tells you what page of the book that word exists on, or what the page is. So you can think of a web index when we search on Google as similar to what you find in the back of a book.
Unfortunately, the second half of the interview somehow didn’t get recorded.