In February 2010 I had the pleasure of interviewing Google Fellow Amit Singhal, possibly the premier search engineer in the world today, with my colleague Mike Hanley, by video conference from Google Australia to the Googleplex in California. The article on The Future of Search was published in The Australian Financial Review’s BOSS magazine in April 2010. Our hour long interview was fascinating, so I’m posting the transcript here. My thanks to Annie Baxter, Public Relations Manager, Australia and New Zealand, for setting the interview up.
Amit Singhal: Why don’t I start with a brief summary of what I have done in search in my life, and I’ll keep it brief so you can use that as a springboard for questions. I started in search about 20 years back as a grad student and search was my first love as an academic, and I stuck with it for 20 years. I got a PhD in Search. After that I worked at Bell Labs as a researcher in search and in 2000 when I was contemplating moving to another job my two choices were either a Professor or actually practice what I had been preaching and I chose practice what I had been preaching and I came to Google in 2000.
After getting to Google in 2000 I was still much younger and full of myself so I came in and I said like most software engineers, I hate reading what others have written, I’d much rather just write stuff myself, okay. That’s a physical psyche of a software engineer full of himself or herself. I had been preaching search for about a decade then, so why don’t I put some of those ideas into a machine that actually moves code, and I wrote a new search algorithm for web search in December 2001 timeframe and in July 2001 that became Google search algorithm around the world. And then, because of being full of myself and saying I can build this, from that day on I became the person making sure that that stuff actually works. And since 2000 I basically have been working with Google Search with a great team looking at our search algorithms. What do you remember search being in 2000 or before that timeframe where you would type the query and you pretty much got a bunch of pages based on just keyword matches? Do a modern search which has an incredible amount of semantics built into it, and it’s magic-like because it’s a search that works on real time search and Social search, which we have been doing in the last few years, and I have been with Google search doing that here.
Scott David: There’s a question to lead straight into that which is in December 2009 you mentioned that there were 200 signals approximately that were actually incorporated within relevance for Google searches and I assume when you started way back there was page rank and boolean and a couple of other bits and pieces in terms of signals. So how has that actually evolved from earlier on and what’s the year by year or day by day kind of increase of new signals that get added and what are the criteria for adding them?
Amit Singhal: So clearly there were fewer signals when I started out in 2000 and with time we have added many more signals to our ranking system but the general rule of thumb is – we add signals to our ranking system, but we make changes to algorithms at a higher pace than we add new signals. At the end of the day what comes out is a much improved ranking system, much improved search system. Last year we made on an average one and a half changes per day to our search system. Close to 500 changes we launched last year and these changes were a mixture of adding new signals to our system. Last year we launched real time to our system and with that we added a whole bunch of new signals, but it was also many more algorithmic changes to improve search, not only for the U.S but internationally. So our changes are available to all countries out there and we will bring in changes that were much bigger than say Australia or in Albania because its our job to make sure search is amazing around the world and not only the U.S.
Mike Hanley: So what’s the process for improving the algorithm and introducing a signal?
Amit Singhal: Great question Mike, very insightful. You should come work in the Search team. So the process is we try new innovations. Google is a company that thrives on innovation and innovation comes from the great engineers here. So the process goes something like this. Great engineers work in the Search team area and they are observing work and in observing our system work they are even observing a pattern when our system is not working as well as it could be, or they just have an inkling about the problem and they have this one killer idea that they are itching to try out. This idea may be as simple as to put more to improve on great context. The idea might be as simple as an engineer who said to me one day, hey, can I look at the number of queries that mention Tiger Woods in our user stream and can I model that fluctuation to say something is happening related to this query in the real world, okay. That’s a really simple idea but it’s a beautiful idea and we will – we built that new signal into the ranking system for real time search. That’s an example I’m giving you.
Mike Hanley: So if Michael Jackson dies, suddenly Michael Jackson is all over the web and he pops up – Michael Jackson’s death pops up first as the query because that signal has been filled?
Amit Singhal: Yes. Basically we can learn from something is happening in the real world from these return signals. So we would take one of such ideas and we have a playground built inside – which is a sandbox. People can put their ideas into code and run that code on the entire web search, as parallel worlds sitting inside, and then they can compare the before and after. The world before that idea and the world after their idea and then we would do our standard evaluation technologies which is a mixture of metrics that tell us whether that idea actually improved the world as it was or it didn’t quite improve and then you learn from there. You refine the process and after two or three iterations you will have a beautifully sculptured piece sitting out there of this new idea. So they would learn. My idea works ‘but’… and then they will take that ‘but’ and fix it somewhere and that’s how you basically end up shaping an idea and build an algorithm that just really works onwards.
Mike Hanley: And that goes to a committee, to a meeting where eventually you guys say yes, let’s do it?
Amit Singhal: Yeah, every week, about 10 team leaders go to the launch committee. The launch committee has all the senior engineers who have been running the system for many years like myself and every idea is presented alongside data, and if questions arise then they’re usually to make the idea much better, and in that committee we decide what we’re going to launch.
Mike Hanley: It must be a very proud day for those engineers whose ideas get launched?
Amit Singhal: Its actually not like that to be honest with you because we run this very openly and they’re all there. It’s a conversation. There’s lots of jokes. And there’s nothing heavy handed about it, and that’s why the system works so well. Because from a management perspective you can not meet the cost of perceived failure, even though its not a failure, its an opportunity to learn. But from a management perspective you can’t let the cost of pursuit get too high. So it’s really very informal, everyone is joking around, we talk about all kinds of things. We make jokes about everything and you know, everyone feels very comfortable and most of the ideas are received with one or two feedback. Great, lets do it. Or hey, if you did that wouldn’t this be even better. And that’s basically the tone of the meeting.
Scott David: Dark matter… you’re constantly scouring more of the web, you’ve been looking at rolling out a new infrastructure called Caffeine to crawl more of the web. What are you doing to try and increase size and scale of what Google is actually able to crawl and index. And what percentage of the internet do you think is dark to Google at the moment?
Mike Hanley: I think Matt said you’ve got a trillion pages you’ve identified out there.
Amit Singhal: That is true, that is very true, but let me first give you a little bit of technical perspective on this question that I’ve been asked often enough. The number of pages out there in the deep web is infinite, okay. And when I say that, people get taken aback. How am I saying that with so much conviction? The answer is very simple. There are numerous calendar servers out there that are going to keep giving back date after date, after date and they would indeed keep issuing you a new page after a new page, okay. So what’s more important in this debate about dark matter is which pages are more important to our users that carry real information that will allow them to solve a real life problem? I think that’s the real question to ask. There have been several instances in the past of people trying to put a number of pages in the dark matter and truly I would be very worried about the science behind that because I know, I have sites which were generating infinite pages for me if I keep asking them, okay. So we know the existence of over a trillion pages and to answer your question more specifically, we have developed techniques by analysing the web and the sites out there. We can actually predict what part of probing a certain site will give useful results. I’ll give you a simple example. In the U.S there are many sites where you can change the zip code and it would give you a new meaningful page. A simple example would be health services and my zip code. And even though that information is driven behind database technology and considered part of the deep web, we at Google after years of research, hired a great researcher friend of mine who had built a scheme around specifically this problem. Its can predict that this parameter on this site can be cycled through all the zip codes and you will get delivered pages out of it. And that’s a very simple example of what we have done to actually get to the more interesting part of pages hidden from normal crawlers.
Scott David: So you’re basically indexing functionality?
Amit Singhal: We are basically learning functionality and indexing useful information that comes out of the functionality of a website.
Mike Hanley: How far down the road – how much of search science has been discovered?
Amit Singhal: See this is another question that’s philosophical when you have been working in Search for 20 years. When I started 20 years back as a grad student I thought any idea I had someone has done work on. So I thought I can’t make any progress but then something happened. I started understanding this science much better and I started seeing potential all over the place. I remember my dissertation topic came to me while driving. We used to live 20 miles from the University, from Cornell, and once, one evening while driving something clicked in my brain and said what if you looked at this certain problem that way. And from that day on I have kept looking at problems in this space and discovering infinite potential. So if I were to quantify how much of the search problems have we solved, I would say it’s somewhere between five and 10 percent. We are still in the beginning of lifecycle of the technology. So you know, it’s at an early stage.
Mike Hanley: Yeah, so what does the world look like when we’re at 50 percent or 75 percent? What can I do then that I can’t do now?
Amit Singhal: I think the question you would ask at that point is what can’t I do with search, okay. What’s happening already is you can find information that happened minutes ago, seconds ago on Google’s page or on a mobile. So just this morning – we live in a town next to a mountain called Palo Alto – and this morning an unfortunate event happened. One of the planes taking off from the local airport carrying three people from a well respected technology company, Tesla the car company, hit powerlines and these three wonderful human beings died in that accident and I didn’t know at the time. But at the time, as a human being my experience was that there was no power in my town and I was getting ready to drop my kids and since there was no power in my town I was trying to figure out what just happened. My first instinct, today, this morning at about 8 am, was to pull out my Nexus One and I typed “Palo Alto power outage” into Google and low and behold our real time search, which you heard about recently, was at the top of Google’s page mentioning we had widespread power outage in the city of Palo Alto because some powerlines went down. Five minutes later I learned, using real time search that the plane struck those powerlines and unfortunately 15 minutes later I also learned that three people died in that and I was really sad.
That is a personal story from this morning to tell you that this scenario would not have been possible last year. Last year if the town lost power you would not know what to do because the computers didn’t work and the mobiles were not powerful enough and connected enough to the internet to give you what I just narrated. This is an example of how search moves with one year. Last year my expectation would have been to find the local utility phone line number, because I don’t use paper Yellow Pages anymore, and suddenly my electronic Yellow Pages would have gone. So I would have been rather baffled last year, but today thanks to this connection between search technology and mobile phones I was not as much at a loss. And you take this to the next level, where search becomes the technology that answers all your information needs, and it gets merged with various other technologies and a future platforms emerges. Search is a critical layer of the future technology platform that emerges in this connective world, where you will be device agnostic, geography agnostic and information will come to you when you need to know it, wherever you need to know it, on whatever device you have. We’re not that yet but that’s where we’re headed.
Mike Hanley: My kids are going to live in that world.
Amit Singhal: Good quality search is a key ingredient to this emerging platform.
Scott David: And the context of events seems to come into it as well. Like you were just saying, you want to hear what the statement from the energy company was, in terms of their time to fix the problem. And you want to hear the human story in terms of what’s happening at the hospital, and whether everybody is alright. The relevance of the results changes based on the context. I imagine life and death scenarios create one kind of news-type object to be returned in search results, and the need for power requires search results based on another requirement – which is when am I going to be back up and running, and whose got that real time information. I guess those kinds of context issues start playing out right across the future of search. So, how do you analyse the context of the words that are put into search engines, to serve back really relevant answers?
Amit Singhal: So indeed, what you’re saying is very insightful here. Personalisation, which is your own context, who you are, where you are, what situation you are in, is a key ingredient to this future of search. And we have a great personalisation system but its only today’s personalisation system. It’s nowhere close to what we will be building in time to come. So your personalised context combined with search technology is powerful data. For example, as you say, Stanford Hospital this morning realised that this power outage is going to last eight hours. Stanford Hospital is a great hospital, lots of critical emergency services, and they also realised that they don’t have enough generators to actually supply the hospital. Now I don’t know fully why it happens, but apparently the hospital ordered a whole bunch of ice to show up to get their vaccines, medication, and their refrigeration good for the next eight or 10 hours until the power shows up. So what you’re saying is the truth, that you would take context – personalised technology, combined with search technology, combined with mobile devices – and a new platform would emerge on which you would build applications where you would get to know what you need to know, whenever you need to know it, wherever you are. And its about theory. Mike sitting next to you might get some different information because his context is somewhat different. He’s headed out to the cricket match later and you’re going to the rugby match or footy match. And that would be different for the two of you, which we would use in our personalisation technology to improve your experience in life. I won’t even call it search experience anymore, it’s your life experience.
Scott David: So how do you map meaning into the relationship between contexts that creates a wider understanding?
Amit Singhal: Meaning and context is the holy grail of search in some sense. We are starting to take early steps of what meaning would look like, and some of our early steps have been so successful that when I was starting out as an academic 20 years back in this field I did not think I would see this day in my life and I’m not kidding. The simplest example I give of that is our synonym technology. The simplest way you can move from keywords to meaning is to not match the exact word that the user types, the keyword, but actually find the meaning in what they wanted. If you type GM food we know you mean Genetically Modified, even though you did not type those keywords Genetically or Modified, and on the other hand if you say GM cars we know you mean General Motors. Search has already moved from keywords to words with meaning, although with the first baby step from the human perspective – but a huge step from a scientific perspective because language understanding is an unsolved problem, and we have developed these technologies, synonyms being one of them. There’s a few great blog posts on our technology out there which are building blocks of what meaning would look like tomorrow and we’re far ahead of the game in terms of these building blocks that are already improving our search way beyond most other search systems out there. We have some key technologies here which allow us to do things that most people, most search engines, find very hard to do. And an example would be a car is usually an auto unless you’re talking about trains. Okay. Human beings understand that but computers don’t.
Mike Hanley: And that’s multiplied across languages and cultures and…
Amit Singhal: We do it in every language out there, in every country out there. Okay, I can give examples in some of the other languages. Like sanga is a sandwich, like you’ve basically got another language down there.
Mike Hanley: Let’s talk about Google Books and arcane knowledge. When you’re scanning the text, and you’re searching for really very arcane academic knowledge, what’s the difference between searching that kind of material versus searching stuff that is refreshed on the web every ten minute?
Scott David: You’re not necessarily going to get a page rank relationship if there’s only one person in the world and a single website on a particular subject, or one book that Google Books has just scanned, that relates to that person.
Amit Singhal: This is where our mission and our core principles of search come into play. What you say is probably correct for some proportion of the books that we have scanned – that they might not be of interest to great masses out there. But at Google, our mission is to make sure that no query gets left behind. That one user who wants to answer that one query, that is answered by that one page published 100 years back, it’s our job to make sure that works. The beauty of Google, or the way we operate, is that to build that esoteric system based on 100 year old books, our technology comes in handy when we build the applications of tomorrow.
One of the recent applications that I was involved with building was Google Goggles where you can pull out your phone and you can take a picture of something. Based on similar vision and OCR technologies to those we used to scan 100 year old books, we will actually tell you what you’re looking at. It’s an amazing application. I was playing a game last week, at this conference, TED, with a bunch of great people, and this was the party game after a few beers… I take out my phone, I take a picture of their badge, and you do a public figure, and I would say oh yeah, you work for Wall Street Journal, your last article about blah, blah, blah was pretty good, however this other person who is standing right here found this glitch with your article. But my point is that the beauty of Google’s system is you develop generic technologies, and even though it may look when you’re building them that it will also solve an esoteric case, perfecting the technology allows you to build what tomorrow needs.
Mike Hanley: If you could do one thing to boost Google’s innovation capability, high as that capability might be, what would you do to enable that? I know you have the power to do it.
Amit Singhal: That’s a question that I will have to think on for a moment. If I had one wish, what would it be?
Mike Hanley: More time for your engineers to sit and think?
Amit Singhal: So my one wish would be that all students of technology in the world realise their dream and their potential fully. The world would be a better place if people can build their technology dreams for the rest of the world. And I’d like that to happen inside of Google as well, so that all the great people who come and work with us, can develop the future of technology. In all honesty – if we can take every child who is curious and intrigued about technology and dreamt of Star Wars and Avatar, and let them build their dream to help themselves, their family, their parents, whoever – in a limited number of years you would see a much better world with lots of great technologies, not only for important information for biomedical applications, but, you name it, to build something that will improve human being’s lives. Name a child who has said it would be really good if I could plug something to my iPhone or my Nexus One and I could tell my dad’s blood pressure as its changing while he’s sitting there. You see my point. All little kids are dreamers, I was one of them and they have technology dreams. If Google can enable them in some way, or they can come to Google and work on their dreams and make this world a better place, we’d all be happy.
Scott David: A question about context. You were talking about scanning books, and the arcane knowledge side of things. I was wondering what the future might be like, if you imagine somebody doing research with an unusual book that has been scanned and indexed by Google, but there are some other, as yet unconnected, key concepts and information in a separate location. Normally it would take a human intelligence to find and put those objects together to create a new fact – maybe a philosophical principle or the way an historical event unfolded. One book has one set of facts, another book has another set of facts, and another website has another set of facts. I’m wondering whether or not there’s some intelligence likely in search technologies that might be able to put those things together, semantically, independently of a human creating a query, or at the moment of the query going into the search engine?
Amit Singhal: Someone said to me recently, it’s a famous quote, that if you look at any technology you have today, like someone who has been sleeping for 50 years and woke up, they would say it’s a miracle. And you’re making a very deep point of whether we would approach a Singularity, with search technology being part of that. The truth is that today all the technologies, including search technologies, are amazing data processors which can do specific things very well. And we are observing that a machine Singularity is getting talked about much more – where an intelligence emerges by putting pieces together via computers.
The truth is that the world is a better place because a human being was observing those things made possible by technology, either as visualisations or tools, and was able to put things together and able to develop a drug, able to develop some connection, some hypothesis. That component of the system I don’t see going away. This is your question – your question is, is intelligence going to emerge where we can track a concept over time. Yes we can already, using search technology. However, it’s still about accepting a bunch of frequency changes of words and a bunch of mathematical models on top of that.
In the near future, or the foreseeable future, human beings will understand bits and pieces and put those together to improve the lives of other human beings. So I have observed friends who work in the area of drug discovery. They basically take patterns from patients, from existing chemicals, the molecular space, and they do a mathematical modeling of a match – to say this particular gene is slightly deviant from all these other 99 genes, and this particular molecule is slightly different from all these molecules which are known to help these 99 genes. Let me try this molecule on this new gene or new gene detect. So still the human being is needed, the computer can help them predict what molecules to search for and what diseases to look at for example. However, years of understanding from a human being is really, really useful.
Mike Hanley: So you’re an optimist?
Amit Singhal: I am very much an optimist. You know, I believe good things are going to happen. I’m a very strong optimist. I believe in the future and that’s why I get up every morning and smile because my kids are going to live in a better world than I did.
Mike Hanley: Excellent, excellent news. I will let my kids know.
Scott David: Following on from that, there’s been quite a lot of talk for many years about the semantic web. How do you see that actually emerging and what influence is it likely to have on search technologies as it spreads?
Amit Singhal: So the semantic web is really something that people have been thinking about for many years now. And indeed, with the semantic web the idea is to have human knowledge conveyed through websites and web pages and web applications. And even though the utopian world of full understanding might not be there, you’re observing baby steps of what a semantic web may look like, emerging in AJAX applications like draggable maps, which my good friend Lars in Australia did, which is a baby step towards understanding the physical world via a web app. So human knowledge of semantics will emerge and, if we wake up 50 years from now, what you’re observing today as our mapping technology, or the new semantic technologies that are emerging, will then look like we have realised the dream of the semantic web – because by then we will understand web pages, websites, web applications much better. So indeed, I’m really excited about what’s happening in that area. And we keep close tabs on that. And I think semantic intelligence will emerge in applications as time passes and I’m already seeing that happen.
Mike Hanley: One more?
Scott David: Aardvark which you’ve just purchased.
Amit Singhal: Yes.
Scott David: That seems like a really interesting kind of human intelligence component to add into search. And there are lots of other things going on at Google in terms of voice recognition technologies within voice and video over IP. It feels as though the idea of augmented reality is reaching out into the human space to join the two worlds. Where do you see Aardvark going in the future and how that might integrate into the rest of Google’s of offerings?
Amit Singhal: So Aardvark, it’s a great product and it’s an amazing set of people. You should talk to them. They have some very interesting ideas about how social networks interact with search information. There are two key components of their interesting idea, which are about degrees of separation and information needs. We were just so excited about these young guys, talking about such deep ideas about how social networks should interact with search, that we just to have them here.
So what we’re doing right now with Aardvark is recreating Aardvark.com on Google Labs. At the same time we are now taking their interesting ideas and starting to look at various short term integrations with Google products. We’ve done Social Search last year and that’s an obvious starting point. However, we can take it much further. And this is where Google’s philosophy of having great people here and letting them build great things comes in. We have got a great set of people now, with some fundamentally beautiful ideas, and we are going to start integration, and at the same time allow them to build their dreams here.
Scott David: There seems to be a lot of ongoing development of applications – you know, Mail, Buzz the rest of it. The applications seem to be the other side of the search. You’re creating developments, creating communities, buying products, adding them on, creating an ecosystem of Google products interacting with people. How do you see Google’s future in terms of creating applications that keep people within a Google application universe integrated with search, versus them leaving the Google space into the rest of the internet?
Amit Singhal: So let me start out with one fundamental belief that I hold dear, and our Google Search team holds dear. The web is created by people by linking to each other and creating great websites. Google Search’s job is to make sure that we get our users to the destination that they need to go to in the shortest possible time. We do not believe in making people spend time on our website pages. Our job is to send people out there and we do. We are, by many measures, sending almost our entire traffic to the outside web ecosystem, because an open ecosystem is healthy. A closed ecosystem may be a walled garden built by a certain say mobile company, but a closed ecosystem built by a search engine company is just unhealthy. So Google is a strong believer in sending traffic out to the world which is an open ecosystem. If someone else is providing value then I truly want our users to go there because that’s how we have built this company, and how we will survive. Now, there is the other side of Apps like Gmail or most recently Buzz. And we don’t do anything special to push traffic into our Apps. Our job is to build great products and if they succeed people do come to them eventually. So I see these two parts of the world as equally innovative – our Apps part of the world and our search part of the world – and I see us almost religiously in the search part of the world, and that open ecosystems are good, and our job is to send traffic to the rest of the web and not keep it in Google.
Scott David: One last question? You’ve introduced Google DNS and there’s Google URL shortener and things like that. It feels to me as though you’re wanting to find additional ways to track and recognise the patterns of people moving through the rest of the web without them necessarily being within a Google Application environment.
Amit Singhal: So once again, we believe fundamentally in the four pillars of search – the relevance, comprehensiveness, speed and user experience. And we have seen time and time again that when websites are spawned faster, people are happy. I am happy and I think it’s really lovely when I click on something and then I’m singing the song by the time the website has loaded, and I didn’t even realise how many times my DNS service was at fault. So what we have done with our mission at Google is we’ve found some key big points in the internet ecosystem and provided innovation in those key points, open DNS being one. I think users get their results faster that way using open DNS, and then I don’t see any harm in doing that.
Scott David: And are you using it for pattern recognition as well speed?
Amit Singhal: Not to the best of my knowledge, no. It is indeed a service that helps people get their results faster or their webpage faster.
Annie: Alright, that’s probably a great point at to which to wind it up. This has been fascinating.
Photo of Amit Singhal: Niall Kennedy, from the December 2009 Google search event.