Deep Web Search

About The Deep Web

There is a ton of information out there. Almost any simple Google Search will return millions of results. Perhaps it is unfathomable to you that there is even MORE information that isn't SEARCHABLE through traditional means.

Content neglected include:

  • Disconnected page
  • Page consisting primarily of images, audio, video
  • Flash, Shockwave, compressed files
  • Content retrieved as a result of filling out forms
  • Real time information (ex: stock quotes)
  • Pages that are proprietary

Deep web is defined as the part of the Web that is not indexed by standard search engines like Google (which only searches at the surface. This content exists but cannot be found because it belongs in one of the categories above. Because of this, it is very hard to search. However, Deep Web Search Engines figure out a way to search for such content.

Good Deep Web Search Engine Traits

- Comprehensive: Reaches out to a lot of journals/websites you want it to search
- Integrated: Has different forms of search such as by title, author, etc. Integrated across and into database
- Transparent: You know which journal and/or sites it accesses

Usage in this Project

In this project, I wanted to find information such as current events, so not much of this was "invisible" to me. However, for the background portion of my project, I used Google Scholar and once in a while Bnet.

Name: Google Scholar
Access: Google Scholar
Frequency: Only when I need scholarly research
Information: Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. Google Scholar aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.
Query:

  • US Coffee industry:Yields 240,000 results. Only 1 or 2 from the first three pages seemed somewhat useful from the title. However, I could no fully read 2 of the 3 because it did not have full text.
  • Starbucks Coffee: 17,300. This yielded much more useful information than the above general search. For instance, I was able to read the brand report card for Starbucks, something that was in the Harvard Business Review, a source I would otherwise not have had access to.

Evaluation: (5/10) It was difficult for me to find results that I could access fully. Some were just abstracts, other did not give any text at all. For instance, this book would be very interesting, but I can only read an abstract of it. However, given the right search, it did yield very good scholarly/academic articles like case studies and reports.
Miscellaneous: Google Books is also a good source for information.

Some other Deep Web Site Engines

  • Google Books -Book based search site that may have ethical issues of violating copy rights
  • UM Library - University of Michigan library search tool
  • Scirus
  • Biznar - A specialized deep web search tool that focuses on business
  • BNet - A specialized deep web search tool that focuses on management articles
  • IncyWincy
  • DeepDyve
Go here for an index of all the tools I used.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License