Today the tech community (at least all of my friends) are abuzz with news of Google’s new Code Search mechanism. Now, this is just cool. From this day forward when I’m struggling with some poorly documented, hard to use, or even private API, I should be able to just type it into Google code search and see how other people have managed to use it.
But that’s only the useful angle – not enough to really create buzz on the net. The two things people are having fun with today are exploring the answers to these questions:
- What does code search know about me?
- What private information does code search know about others?
The first is the natural extension of the ego search that many of us commit on a regular basis (or have RSS subscriptions set up to to do for us). It’s fun to read about yourself, especially when somebody else is doing the writing. For instance, I learned of several new “thanks to Daniel Jalkut” type comments in source code and readme files. Neat! I like that.
The second is more problematic. Google grabbed a bunch of the world’s “source code” … basically anything it could find with a suitable file extension, and made it easily searchable. What’s wrong with this? A lot of files with source-code extensions actually contain sensitive information, but have been left mistakenly world-readable on some web server. For instance, John Gruber points out the rather stunning example of WordPress database configuration files, including the database login and password information. He directs our attention towards Jason Kottke who has assembled several other interesting phenomena. I personally am amused by the search “This file contains proprietary and confidential information.”
Now, the quite reasonable reaction we’re likely to hear from Google is, “This was already public information, we’re just indexing it.”
True! But let’s not dismiss the power of indexing. Google is too big to “just index” anything. They’re the search engine of record. Too big to blunder with technology that endangers the innocent. I imagine that with 8000 employees, at least several hundred of them are smart coders who have been beta testing this service for several weeks or months. The chances of them not noticing these funny holes seems infinitely unlikely, considering that among my friends they were the first things we observed.
So what should they do? Stand in the way of progress to protect the innocent? I’m sure dealing with problems like this will become less onerous as time goes on and people become more sophisticated about protecting their own privacy, but until that happens, Google has special responsibilities. When they substantially advance the state of information retrieval on a world-wide basis, they should think about how they can soften the negative blows of those advances.
It’s hard to say what Google should have done, but even a well-publicized warning might have helped. For those who have been compromised, I imagine their view of Google would be a lot higher if the buzz last week had been on the forthcoming advancement and what it meant for everybody’s privacy.