Class 4

Why do search engines return different results? *

Different search targets, different kinds of files in database
Different search syntax , diff search terms
Processing of files on backend is different (diff doc u can index)

++Class Four: Search techniques

These are most of the search techniques that we'll cover in today's class.

Special search syntax — This is the tool that you have at your disposal that allows you to target your searches on specific parts of documents. Since different text in different parts means different things and perform different functions, you can use these operators to raise the precision of your queries.
Full text search engines
Title — intitle:
Site — site:
Top-level domain — site:
URL contents — inurl:
Links — link:
Site:www.lib.umich.edu, Flintstones

Flintstones, site:umich.edu

Flintstones, site:edu
Can shorten site from right to left only

But can do inurl:umich

Inurl takes pieces in URL wherever it may be .. looks in whole address

But site:umich.edu finds umich is specific places

Unique words and phrases — The use of multiple unique words and phrases are a key both to reducing the number of documents that are retrieved and raising the precision of your queries. Further, using multiple words and phrases increases the chances of retrieving content-filled documents (that is, increasing the number of “meaty” documents). [don’t think about how you described the doc but what author would put in it…]
They can be used to focus in on more specialized pages that would use those terms
Gather related words using summaries
Use search engines to find related words
Example at Ask.com (both “Narrow your search” and “Expand your search”)
Google
Google Suggest feature
“Related searches” at bottom of search results window
Yahoo
Yahoo Search Assist feature
“Also try” at top or bottom of search results window
Yahoo Directory (we'll cover this in a future class) can point in the right direction
Use means queries
Query specificity
Narrow to more general: this is when you have a real good idea of what you're looking for.
More general to narrow: this is when you don't know what you're looking for.
Alternative naming
People
Using different name forms can return different information
Sometimes you have to use other information to differentiate two identically named people
Also, search specifiers can help target the information (intitle, site type, include, exclude)
Places
Use addresses (streets, zips, area codes, phone numbers)
Use "official"
Sites

This is the best summaries of the major general search engines that I could come up with. I have also linked to several useful help pages for each site.

Google
The best, most reliable, fastest, most wide-ranging general purpose search engine. Nice features: Showable "Options" on the left with lots of choices (especially time-related and Related Searches switch). When you're serious about searching, you have to make at least one stop here.
Useful pages
Google Advanced Search
Advanced search tips
Search features
Yahoo
Historically, the second best search engine in terms of returning relevant results. Nice feature: the hideable "Search Assist" box at the top that also shows Related Searches.
Useful pages
Yahoo Advanced Search
Yahoo Search Help
Different parts of the results page
Ask
A great search engine for exploring a topic. Nice features: the "Related searches" on the right, the binoculars hiding the page preview and page statistics; also larger images appear on mouse-over. Notice there are sponsored results at the top and bottom of the page.
Useful pages
Advanced search tips
Site features: 1/2
Bing
A search engine that focuses on the user experience during the search. Nice features: "More on this page" and "Popular Links" in the pop-up bar on the right; "Related Searches" immediately available on left.
Useful pages
Help center
Tour of Bing's features (video)
Useful settings

Each of these search engines provides a way to set up an account and, thereby, set up preferences. I generally use the following preferences:

30-50 results per page — I like the ability to scan more information more quickly
Filtering (moderate on Google) — don't want this stuff popping up in the middle of class or a group meeting
Open search results in new browser window — this keeps the search results up and available so that they're not so easily lost or closed
Turn on search suggestions — I find these to be amazingly useful as I structure queries.
In-class examples

For most of the following I will (by default) use Google as the search engine as a demonstration of the search technique. For the most part, each of these search engines (other than Bing) could have been used.

Special search syntax example: Information about tigers

tigers (31.9mm)
tigers -"Detroit Tigers" (29.0mm)
tigers animal (4.61mm)
animal intitle:tigers (1.45mm)
Tigers (the animal but not any sports teams):
Google: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (14.4mm)
Bing: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (5.21mm)
Yahoo: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -"Louisiana State" (49.7mm)
Ask: tigers -detroit -memphis -missouri -baseball -lsu -football -athletics -sports -mlb -soccer -Louisiana -State (4.25mm)
What's wrong with this page?
Which is the best? You don’t care about all the millions of search results, only care about the first couple (of pages.. maybe)
Information from an organization
animal intitle:tigers site:org (25.6k)
Information from an organization or a government
animal intitle:tigers site:gov OR site:org (25.8k) (OR has to be in caps)
Information from a zoo
animal intitle:tigers inurl:zoo site:org (28)
animal intitle:tigers intitle:zoo (3.9k)
Unique words and phrases

Bunch of birds example
"flock of seagulls" "gaggle of geese" sparrows turkeys
Lesson: put what you know in the search
Use "means" and "definition" queries: Hydrocephalus
Ask — hydrocephalus (300k) — look at "Related searches"
Yahoo directory — hydrocephalus
Doc. That someone, somewhere filed hydrocephalus under their directory of sites (sort of like tagging feature of RSS…)
Not searching “on the doc themselves” but searching for categories … not a full text site
Search description / categorization of documents
Google — hydrocephalus — 2.0 million documents (2.34 in 2008; 2.26 in 2007); note the "Refine results" part of the page. Also note the “definition” link near the top of the page.
Google — hydrocephalus means — 1.15mm documents (385k in 2008; 789k in 2007)
Google — 'hydrocephalus means' — 3280 documents (844 in 2008; 415 in 2007)
Looks for doc with the exact words
Numbers change maybe b/c it does statistical sampling
Or.. it knows something about the user so counts differently
Google — intitle:hydrocephalus (intitle:means OR intitle:definition) — 1460 documents (470 in 2008; 200 in 2007)
Google — 'hydrocephalus means' (site:edu OR site:org OR site:gov) — 1020 documents (44 in 2008; 131 in 2007).
Google — define hydrocephalus (359k documents)
Define is specific search query
Related words: Investment guidance
investment guidance — 4.05mm (487k in 2008; 4.48mm in 2007)
'investment guidance' — 44.1k (82.8k in 2008; 71.7k in 2007)
investment guidance financial goals stocks bonds portfolio — 600k (235k in 2008; 1.62mm in 2007)
'investment guidance' financial goals stocks bonds portfolio — 872 documents (13.1k in 2008; 10.9k in 2007)
Fun with quotes
'statistical analysis' means — 10.4mm documents (26mm in 2008; 21.5mm in 2007)
'statistical analysis' mean — 6.56mm documents
'statistical analysis' 'means' — 5.55mm documents (4.73mm in 2008; 7.04mm in 2007)
'statistical analysis' 'mean' — 6.57mm documents
Google searches for means, mean, meaning etc. if you just type means or mean
define:"statistical analysis"
Lyrics
Google — 'big rock stars' nickelback lyrics 'we all just' 'drugs come cheap' — 34 lyrics (6 results in 2007, and they were all good)
Google — rockstar nickelback intitle:official video
Query specificity

Dog breed information
Google — dog breed cavalier king charles spaniel — 220k documents (355k in 2008; 888k in 2007)
Google — dog breed 'cavalier king charles spaniel' — 195k documents (890k in 2008; 535k in 2007)
Google — dog breed intitle:'cavalier king charles spaniel' — 40.1k documents (26.2k in 2008; 15.4k in 2007)
Yahoo Directory — dog breed 'cavalier king charles spaniel' — 67 documents (69 documents in 2008 and 2007)
Dog breed disease information
Google — 'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 4.54k documents (7,710 in 2008; 22,900 in 2007)
Google — intitle:'cavalier king charles spaniel' 'heart problem' OR 'heart murmur' OR 'mitral valve' — 2.9k documents (250 in 2008)
Yahoo — dog breed 'cavalier king charles spaniel' 'heart problem'= — no documents in the directory
Alternative naming

People

George Washington information
'George Washington' biography -site:com -'Carver' — 941k documents (1.22mm in 2008; 1.06mm in 2007).
intitle:'George Washington' biography -site:com -'Carver' — 293k documents (218k in 2008; 240k in 2007)
"George Washington": — one whole category on George Washington, plus 84 other related categories
Stephen Hawking (as a name example)
Stephen Hawking — 1.93mm documents (3.61mm in 2008; 2.27mm in 2007)
'Stephen Hawking' — 1.86mm documents (3.86mm in 2008; 2.12mm in 2007)
Note that the 2008 results make no sense when compared with the previous result. At least not given my understanding of how Google should operate.
intitle:'Stephen Hawking' — 73.4k documents (61.3k in 2008; 63.1k in 2007)
intitle:"Stephen * Hawking" — 3.1mm documents (9,310 in 2008; 9,190 in 2007)

  • as placeholder,

IF we OR the two, we should have no fewer than … 3.1 mm + 73.4 k – overlap

intitle:"Stephen * Hawking" OR intitle:"Stephen Hawking" — 628k documents (62,900 in 2008; 75,200 in 2007)
Note that the 2009 results really make no sense. Look at the number of results for the previous two queries.
"Hawking, Stephen" — 270k documents (535k in 2008; 241k in 2007) — library and books, mostly
Does this get everything? Including Stephen W. …
"Hawking, Stephen W." — 47.6k documents (72.1k in 2008; 53.2k in 2007) — again, library and books, mostly.
"Hawking, Stephen William" — 75.2k documents (20,400 in 2008; 13,900 in 2007) — lots of encyclopedia type entries.
Levi Strauss (since there are two/three of them)
"Levi Strauss" — 1.77mm documents (3.97mm in 2008; 2.24mm in 2007)
"Levi Strauss" -french -france -philosopher — 1.19mm documents (2.21mm in 2008; 2.06mm in 2007)
intitle:"Levi Strauss" — 56.6k documents (78,100 in 2008; 68,200 in 2007)
intitle:"Levi Strauss" -french -france -philosopher — 46.2k documents (66,300 in 2008; 53,700 in 2007)
intitle:"Levi Strauss" (french OR france OR philosopher) — 9,420 documents (9,680 in 2008; 15,900 in 2007)
intitle:"Levi Strauss" claude (french OR france OR philosopher) — 883 documents (1,160 in 2008; 556 in 2007)
intitle:"Levi Strauss" bavaria germany — 310 documents (241 in 2008; 48 in 2007)
Places

Pizza places in Ann Arbor
pizza "ann arbor" — 763k documents (1.19mm in 2008; 887k in 2007).
Look at all of the information this query has available at the top of the results page.
pizza "ann arbor" william — 102k documents (547k in 2008; 629k in 2007)
(734) 669-6973
pizza 734 "ann arbor" — 109k documents
The Sears Tower (as a landmark)
"sears tower" — 1.12mm documents (1.44mm in 2008; 1.49mm in 2007)
"sears tower" official — 1.12mm documents (1.11mm in 2008; 257k in 2007)
intitle:"sears tower" — 170k documents (28,800 in 2008; 19k in 2007)
intitle:"sears tower" official — 37.9k documents (28.7k in 2008; 1,440 in 2007)
intitle:"willis tower" official — 15.7k documents
At end of lecture

Start working on today's exercises. The exercises are on this page. You should work on them for no more (but, probably, no less) than another hour outside of class; we will have more time in the next class after the lecture to continue working on them before going on to that day's exercises.
04 Search Techniques And Strategies Exercises

Exercises

Start up

Go to the Google home page.
Sign in to iGoogle. (If you don't have a iGoogle account, then create one.)
Change the preferences (the “preferences” button next to the search box) so that it returned 50 documents instead of 10.
Special search syntax

Don't forget about the quick-and-dirty search syntax page. You will also want to refer to the more complete GoogleGuide quick reference and Yahoo Cheat Sheet.

Suppose you are interested in steel futures. Let's see how these different syntaxes affect the results that you get.

“Steel futures” example

Search for [steel futures] at Google.
What documents does this return?
How many documents does this query return? Write this number down.
Scan down the list of documents. Note the titles of the documents. Do these all seem to be about steel futures?
Search for ["steel futures"] at Google.
What documents does this return?
How many documents does this query return? Write this number down. Is this a useful change? What are the benefits and drawbacks of this change?
Again, scan down the list of documents. Note the titles of the documents. Do these all seem to be about steel futures?
Make sure that both the word steel and the word futures are in title.
How many documents does this query return? Write this number down. Is this a useful change? What are the benefits and drawbacks of this change?
Scan down the list of documents. Note the titles of the documents. Do these all seem to be about steel futures?
Make sure that the phrase steel futures is in the title.
How many documents does this query return? Write this number down. Is this a useful change? What are the benefits and drawbacks of this change?
Scan down the list of documents. Note the titles of the documents. Do these all seem to be about steel futures?
Further refine the above query by ensuring that the document comes from a US-based company (this means that the document comes from a host that ends in .com).
Now make sure that the document comes from something other than a US-based company.
Make sure that, in addition to the document having the phrase steel futures in it, the URL has the word library in it.
You are not interested in the results about the story related to the London Metal Exchange. Define a query that returns documents with the phrase steel futures but not these other stories.
Notice that the documents sometimes refer to the London Metals Exchange. We don't want these stories either. Refine the query appropriately.
Reflect back on where you started above with just a simple query and where you ended up. What does this tell you about the usefulness of thinking about the contents of your query?

Other queries

Suppose you figured out that you don't even know the meaning of futures. Enter a query that returns this definition.
Looking through the results, you are interested in The Steel Index. You wonder who else might be interested enough in this site to link to it on their site. Define a query that finds these web pages.
Suppose that you want to know about the Indian automobile industry. What would you do? Define your query and run it.
Here is what I did to return 2,520 documents (2,860 in 2008).
What is the strength of your query compared with mine? What are its weaknesses?
The home page of the Ross School of Business BBA Program is www.bus.umich.edu/bba. Suppose you want to figure out what sites other than Ross sites link to this page. What is the query you would submit to Google?
Unique words & phrases

Search for information about behavioral finance.
Start with just those two words
Join them within quotes
Now look through the summaries of the first 10-30 documents that are returned. See what concepts, terms, words you might add to the query in order to make the documents that are returned "meatier". Go ahead and add them and look again at the results.
Are there pages that are descriptions of books mixed in with your results? Get rid of them. There are probably a couple of ways that you might try.
Get just the URLs that are housed at educational institutions.
Find the lyrics to "Louie, Louie" by the Kingsmen. The only words that I am relatively certain of are "me gotta go". Other than that, good luck.
Look for lyrics for a song of your own choosing.
Query specificity

Think of some topic that you would like to know more about, and use Google, Yahoo, and Ask to find out more about it. Use Google Suggest, Yahoo Search Assist, and other related tools to explore the topic. Continually build the query's complexity until it returns the information you need with a relatively high precision.
Alternative naming

People

You want to find information on Thomas Alva Edison, but not Thomas Edison University or Thomas Edison College.

Create a query in Google that returns a reasonable number of documents, that returns high quality documents. Be complete with the name forms that you try.
Find information on Yahoo as well.
Find encyclopedic entries.
Build a comprehensive query that uses all of the name forms and returns all types of information but which maintains relatively high precision.
Places

Find information on Wrigley Field in Chicago. Define the query so that it returns a reasonable number of high quality documents.

Your project

Start looking for information on possible term project topics. Put your new-found skills to the test.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License