Monday, May 15, 2006

"Scan This Book!"

I just finished Kevin Kelly's New York Times article, "Scan This Book!" What an interesting read! I recommend it to all librarians, and to anyone interested in learning more about Google's Book Search project.

He takes Google's Book Search and looks at it on a very broad level, claiming that scanning books will one day result in "one very, very, very large single text: the world's only book". By this, he means that scanning books enables the text of one book to be linked to other books. Bibliographies in nonfiction books will contain links to all the cited books. Every word in a book could be linked to another book (although I'm curious what's supposed to happen when people want to link the same word or phrase to multiple sources), and by them being linked all together, you essentially end up with only one book.

However, his paragraph more or less praising Raj Reddy of Carnegie Mellon for shipping 30,000 books to China to scan bothered me. In China, the books are "scanned by an assembly line of workers paid by the Chinese," which means that they are making next to no money, toiling away 12+ hours a day, 7 days a week (I have been to China, and I can now say that with firsthand experience! We ate at the same restaurant many times over several days, and it was always the exact same workers. Just a coincidence? I don't think so). They are the poor who need access to all that literature, but they work too many hours to have time to read it. Someone should pay those workers more. Maybe I'm wrong, and those workers do make pretty good money, but somehow I really doubt it.

Another thing I take issue with in this article is the following: "Because tags are user-generated, when they move to the realm of books, they will be assigned faster, range wider and serve better than out-of-date schemes like the Dewey Decimal System". He's right on two counts: the tags will be assigned faster and range wider. However, I don't believe they will serve better. The subject headings assigned in libraries address the main concepts of the books. Random tags assigned by random users could generate tags that only deal with a tiny issue covered on one page of a book (I'm exaggerating slightly, but you get the idea). Plus, the subject headings assigned by librarians are set up so that there are a limited number of them - with good reason! That way, when looking for a bunch of books or articles on domestic violence, you only have to type in "domestic violence", not "domestic violence or family violence or violence against women or abusive relationships or domestic abuse" or.... the list goes on and on. In libraries, it's all under one heading. How convenient!

Of course, with the massive amount of information online now, and the even more massive amount that will be online soon if Google succeeds in its Book Search project, probably makes it unimportant as to whether or not you locate all the resources on one topic. Which I personally think is another problem with the whole idea. The author of the article feels it's a great thing to have all the world's knowledge searchable from the same little search box (Google's). But is that really best? Two words: information overload. For some searches, you'd have to get pretty specific to find what you want because there are just too many books, websites, journals, etc, that contain that word or phrase. But perhaps limiters, such as the option to search only the books, will somewhat help to solve that problem.

One statement made by the author is completely false: "Google scans all Web pages; if it's on the Web, it's scanned." Not true. Google indexes maybe 20 percent of the web on a good day. It can't scan all the pages because it can't get to all of them. Think about all the password protected sites, pages that no one links to (therefore Google's spiders/crawlers have no way of finding them because they find new pages through links from other pages), pages within databases that require a login or subscription fee, and so on. Those are all on the web, but Google can't scan them. It's called the Invisible Web or the Deep Web.

However, the author makes some excellent arguments for and against copyright as it is set up today. I agree with him that our current copyright laws (which just got even more excessive in 2003) are over-the-top. Does an author's work really need to be under copyright for 70 years after he or she is dead? I don't think so. His idea that copyright should not be given unless people agree to let their works be searched is interesting but probably won't go over too well.

Overall, it truly is an interesting article. I know I seem to disagree with far more than I agree with, but actually that's not the case. He paints a very neat picture of the future. This is one article that is definitely worth a read, so check it out. :)

No comments: