~ Essays ~
         to essays    essays
(Courtesy of fravia's searchlore.org)



(`.(`. Google has a wild side and you didn't know it! .).)
(and other searching tricks)

by Shally Steckerl


published @ searchlores in June 2002

Old friend, I have been busy. I've written several new items which may contribute to our community. Please feel free to post them on your site as you see fit...    Google has a wild side and you didn't know it!
Search Engines that do More, Part One: Vivisimo
Wisenut vagaries
Search Engines that Do More, Part Two: Teoma
The NEAR Command
Searching With Fuzzy Logic: The ADJ Command


Google has a wild side
Usually search engines allow you to use an asterisk for wildcard
searching. This means that looking for manage* will find instances
of manager, manages, managed, managing, and so on. Google has
taken the position that their search is so vast and accurate that
there is no need for a wildcard therefore they do not allow its
use.

Well, we discovered an undocumented way to use so-called wildcard
searching with Google. As with many other search engine concepts,
Google again has broken the rules. In Google terms the * happens
to be a wildcard that replaces an entire word, not just the last
part of it like in the above example.

If you use * connected to a keyword by any of the characters
Google ignores like = , ; \ / < and > then it acts as a place
setting or wildcard for "any word" like this:

my resume

gets over 2MM results, but...

"my resume"

gets 353k results and is the exact same as my=resume, my/resume,
myresume and so on.

The most useful aspect of this discovery is that if you use one
more connector and asterisk then it returns results with two words
between my and resume like this:

my-*-*-resume

which returns only 28k results with two words separating my from
resume vs. the previous example of 47k results with only one word
separating our two keywords.

Try this search by using different keywords used to name resumes
(vitae, CV, skills, experience), or combinations of unique skills,
or city and state duets when the city name is found in many states
like Rochester which is found in 27 states.

   
~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Search Engines that do More, Part One:
Vivisimo: Clustering Search Engine

Searching isn't always an absolute matter. Sometimes there is a
need for search engines that do more than just bring back surgical
results. There are four examples of new breed engines hat provide
additional benefits for recruiters besides long pages of links.
Today we will cover the first of four examples.

Although it may look like a meta-search, Vivisimo goes miles above
and beyond simply getting content from other search services. What
they do so differently is automatically cluster results into
topics. Similar to the former (not so former, hehe) Northern Light 
and InFind, Vivisimo is more advanced because its totally automated 
and extremely fast.
It dynamically returns search results in relevant topic clusters.
This differentiates Vivisimo from meta-searching because you can
drill down through layers of categorization organized far more
intelligently. I find a major selling point of this new service is
that Vivisimo does not report results from "pay for placement"
search engines like other meta- searches do hence you will find
fewer commercial site results.

Clustering is indispensable when you want a complete overview of a
topic or when you would like help in narrowing your search. This
is the only automated, hierarchical, conceptual, just-in-time
clustering engine available today. Because it is automated, not
manual, the categories are created on-the-fly, they are much
narrower and are particularly accurate. That's good for
Competitive Intelligence and Recruitment Research but read on to
learn about other advantages.

Vivisimo removes "most likely" duplicates. In other meta searches
which attempt to remove a duplicates they often slip into the
results because they are not exact duplicates. They could be a
newer version or for some reason have slightly different content.
Vivisimo broadens the definition of a duplicate to cleverly remove
results that would otherwise slip by meta-search scrutiny.

Another reason to take a serious look at Vivisimo is that it
offers total control. Best results are obtained when searching
with total control. Traditional meta-searches frequently fail to
meet our expectations because they don't offer the granular
control afforded by advanced field search commands like image:,
title:, url:, link:, host:, site:, domain:, related:, and text:.
In addition to every form of Boolean like "AND," "OR," "AND NOT"
and even "NEAR" Vivisimo also handles all those field search
commands.

If I haven't turned you on to Vivisiom yet then this will cinch
the deal: you can Save and Email your search results! Imagine how
useful that is. In the past saving and emailing was accomplished
only with heavy artillery. Click on the SAVE link in the yellow
frame at the bottom right corner of the Vivisimo screen and a new
page is loaded that contains all the data in one file. Save to
disk the entire page, not just the link, or email directly from
your browser. Netscape 4 does not save the page well, so use
another browser to save it, but you can use I.E. 5 or higher and
Netscape 4 or higher, to view the results.
  

~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Wisenut Vagaries

This recently launched search engine has grown quickly. It is easy
to use with a simple interface and powerful features. Wisenut uses
a similar PageRanking relevance engine, and looks at both link
structure and popularity, similarly to Google. That is not the end
of the comparison, but I will state that in my opinion Google is
not at risk of loosing their number one spot to Wisenut any time
soon. Wisenut offers some things to help you refine your search
along with some features, which make it an excellent additional
search engine. Primarily, it automatically categorizes your
results into "wiseguides" that are related to words in your query.
Each WiseGuide displays the number of results it contains. When
you run a search, open up a WiseGuide category by clicking on a
white text link in black background above your search results.
Clicking the plus icon next to the category opens the search
results for that category and reveals any additional
subcategories. Each category has a link to its right called
"Search This" allowing for an easy new search using itself as the
new query.

Like many other search engines Wisenut compresses results from
individual sites, the difference it they created a very convenient
"See X more pages from this site!" format. Wisenut's compression
is unique in that instead of the plain old "more results" from
this site link, Wisenut lists the exact number of pages on a site
that it has determined are relevant to your query. The niftiest
innovation on their results pages is the ability to Sneak-a-Peek
which opens the target page into a small window below the result
URL. These peeks may be cached pages and eliminate some mouse
movements thus saving time.

One of the main reasons this new search engine is very useful for
recruiters is its size and freshness. Although it's not as big as
Google, Wisenut index is growing quickly. Because Wisenut's robot
can allegedly read 100 million URLs a day, its likely that it will
be able to give you fresh results even from pages only recently
added to the Internet.

Thanks to www.searchenginewatch.com and www.researchbuzz.com for
inspiration in this series. Also visit www.jobmachine.net to see
the Search Engine Rankings for Recruiters and the brand new
Spyglass.
   

~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Search Engines that Do More. 
Part Two: Teoma 

Size is important when judging a search engine database, but
without good relevance ranking, a large index is practically
useless for recruiters. Teoma joins the game with a new way to
analyze and rank pages.

Searching Teoma's index brings back pages that contain the search
terms, like any other search engine, but also pages that may be
very relevant based on link context. After finding matching
documents they are organized into what are called communities,
then ranked based on link popularity within the community.

Fine, but what relevance does that have to recruiting?

Looking at the Resources section we find that search results
include a list of collections from "Experts and Enthusiasts" on
the bottom right side of the page. These "Resources" are pages
listing relevant collections of links to other sites and resources
for your search topics. Or, in other words, Weblogs with subjects
related to the search keywords.

Weblogs are created and maintained by people who consider
themselves experts, or at least connoisseurs, on a particular
subject. They spend vast amounts of time organizing what links
they consider relevant to their chosen field. These pages not only
make great resources for our recruitment efforts, but their
authors are excellent contributions to our network and their
collections are incredibly useful lists of links in their chosen
field. Effectively, what this means is we are searching "other
people's searches" or searching what others have searched before.

For example, a search for "optical network" brings up
http://www.gmpls.org/ as one of the top resources. Going to the
site we see a plethora of white papers, MPLS tutorials,
presentations, standards organizations, and many technical
documents from hundreds of optical organizations. Its like a set
of perfectly customized search results! You can also contact Mr.
Vinay Ravuri via his email address listed on the site. Its quite
likely he would welcome questions and be helpful.

Other link collections include a page linking to all SONET
vendors, a list of all telecom companies from the United States
Telecom Association, and article about Supercom 2000 describing
and linking to all the upstart and major competitors in the
Optical space.

Besides the traditional search engine results you would expect,
one more reason to check in with Teoma along with other popular
search engiens is its ability to provide highly relevant, or
authoritative, results. Teoma's results ranking is based on what
they call Subject-Specific Popularity. According to Teoma
Subject-Specific Popularity analyzes the relationship of sites
within the list of results, ranking sites based on the number of
same-subject pages that reference it. In other words, Teoma claims
they provide the best answer to search queries because by
analyzing site peers they can establish authority for the search
result.

Keep in mind that while results are highly accurate and relevant,
like with Teoma's owner Ask Jeeves, they lack in volume. Our
"optical network" search yields an estimated 300,000 results in
Teoma but over one million in Google.

Interestingly, Teoma's search results were some of the most
authoritative Optical Networking sites around like the National
Transparent Optical Network Consortium NTONC, Sycamore, Ciena and
the All-Optical Networking Consortium. In contrast, Google
displayed a great selection of unique sites but on the first page
of results it was hard to pick out the authoritative sites like
those in Teoma's first page. There was, however, a small amount of
overlap - both engines picked up NTONC and Salira in their first
page of results.

Refining a search is easy because Teoma provides a section on the
top right specifically designed to suggest additional relevant
keywords, which can be added to your search. Clicking on one of
the links under the "Refine" section automatically adds those
words to the search and returns a new set of refined results. Our
example search on "optical network" offered five refining choices.
The last one in the refine list was "Fiber, Manufacture." Clicking
on it brought back results on a new search for "Fiber,
Manufacture, optical network."

Teoma does not support special syntaxes, advanced Booleans,
wildcards, stemming, or field commands like inurl: and such. A
search for "optical network intitle:resume" for example is not
possible. On final treat, just like at Google, its clear who the
advertisers are on Teoma because their ads are placed under a
section dedicated to paid results - the "Sponsored Links" - making
them easier to avoid.

The Teoma advantage is to be able to find out which sites are the
most relevant to a search. It is by no means a conclusive search,
or a way to locate rare jewels and hard to find pages, but it is
an invaluable tool in the CyberSleuth's ToolBag.
   

~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Fuzzy Logic Searching, The NEAR Command

Fuzzy Logic as related to Booleans and Internet Searching refers
to commands that are not necessarily precise or based on definite
mathematics. The Boolean search command AND is very precise, for
example, requiring search results to include both terms. Each term
must be present, regardless of location on the page.

In comparison, the fuzzy command NEAR requests results where one
term is close to the other. It is considered a fuzzy logic term
because the definition of "close" is left to be interpreted.

Fuzzy terms can be interpreted in many different ways. For
example, how near is NEAR? Within how many words do the search
terms need to be from each other in order to be considered?
Direction can also be left open to interpretation. Does NEAR mean
close by on the right, left or both sides?

The NEAR fuzzy logic comand can be used on the AltaVista search
engine. There are other search engines besides AltaVista that are
better at handling this kind of fuzzy logic searching. Because
most readers have used AltaVista lets review the use of NEAR on
that familiar search engine for the purposes of demonstration.

NEAR searching is very useful for opening up a narrow search to
include other possible combinations of a set of words. AltaVista
interprets NEAR as within 10 words to the left or right of the
first term. Like this:

Nurse NEAR licensed

That will return pages containing the term "Nurse" where it
appears within ten words of "licensed". Results include all the
types of licensed nurses like "Licenced Vocational Nurse" and
"Licensed Practical Nurse." But you also included are the other
ones like "Registered nurse in emergency room. Provided and
supervised licensed..." where Nurse is 7 words away from Licensed.

In contrast, the use of fuzzy logic search term NEAR excludes
results like a "Licensed Driver" who was a "Sketch Nurse" in a
play in Wisconsin (read her resume at
http://suzanneadams.com/resume.htm).

To further illustrate, in AltaVista a search for "nurse NEAR
licensed AND title:resume" returned 86 documents, while "nurse AND
licensed AND title:resume" returned 141.

There are fewer results with the use of the NEAR command. Fewer
results may signify a more accurate search, especially when the
narrower search is successful in eliminating a large percentage of
the undesirable results. In the above example using the NEAR
command proved to be a more accurate search, eliminating pages
similar to the Denham Personnel Services page and the
Rehabilitation Recruitment Center page.

Other search engines define NEAR differently. On AOL Search NEAR
can be defined by the user. At Lycos NEAR is defined to be within
25 words.

Fuzzy terms like NEAR can assist in making many searches more
accurate. If you would like examples of how to apply the NEAR
command to your search drop us a line describing one of your
current searches and which search engine you favor.

Join us in two weeks when we will explore other fuzzy terms.

           

~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
Searching With Fuzzy Logic: The ADJ Command

Fuzzy search fun doesn't end with just the NEAR command we
reviewed in the last issue. To broaden the horizon a little we use
two other extremely powerful search engines, one old and one new,
like AOL and Vivisimo.

SEARCH AOL: http://search.aol.com
AOL has the little known ability to search with the Boolean NEAR,
which we have used for many years, but also the ability to use the
search commands ADJ and W/ n.

"What is that?" you ask?

ADJ means directly adjacent. With it we find documents that
contain specific keywords directly in front of or behind a primary
keyword. ADJ is different than "double quotes" for three reasons.
Fist, ADJ in AOL Search automatically allows for root word
variants or truncation as in "program," "programming,"
"programmer" and so on. Second, ADJ can connect complex
expressions. For example:

(engineer or developer or architect) ADJ software

finds items containing either software engineer, software
developer or software architect. Finally, unlike "quoted phrases"
your words can be on either side of each other not necessarily in
the exact order found within the quotations.

To illustrate, if we were to use quotations to find both versions
of database next to design we would have to use

("design database" OR "database design")

but instead by using the ADJ command all we need to do is search
for:

design ADJ database

That's not just easier, its also a bit more accurate!


WITHIN
On Search AOL within is a command expressed as W/n where n is any
number. W/n is a proximity operator that gives us the power to
manually set how close we want things to be. It will find
documents where specific keywords occur within a specified number
of words - n words - to the right of the primary keyword. Any
whole number can be used for "n". Example:

optical W/5 engineer

finds documents in which optical occurs within five words after,
to the right of, engineer - as in optical systems engineer,
optical board level design engineer, optical long-haul systems
engineer, etc. It will look only for words in order of "optical"
fist then any other words numbering up to five, and finally
"engineer" but not the inverse.

http://www.vivisimo.com
An automated, hierarchical, conceptual, just-in-time clustering
engine, Vivisimo is much more than a meta-search. There are many
reasons, but the most relevant for this article is its ability to
offer total control. Vivisimo offers the use of advanced commands
like image:, title:, url:, link:, linktext:, host:, site:,
domain:, related:, and text:, in addition to every form of Boolean
both traditional and Fuzzy like AND, +, OR, |, AND NOT, -, NEAR
and ~.

Since this is not a search engine of its own but rather gets
results from Yahoo, MSN, Fast, Netscape, Open Directory, Direct
Hit, Looksmart, AskJeeves, Lycos, AOL and HotBot, the advanced
commands are used as they would with the search engines directly.
The absence of Google and AltaVista is purposeful. Also, be aware
that Near is only used by AOL and Lycos, and that on Lycos Near
means within 25 words. Vivisimo should handle command translation
so that the use of "host:" should translate to "url.host:" for
Fast and domain: for HotBot.

Clarification on who uses what commands and how can be found on
Danny Sullivan's easy reference chart at:
http://www.searchenginewatch.com/facts/ataglance.html



ADVthanksANCE!

Shally Steckerl (JobMachine, Inc ~ jobmachine.net)  


Petit image
Bk:flange of myth 
(c) 1952-2032: [fravia+], all rights reserved