Back to where
[you came from]
[fravia's tips]
[evaluation lore]
[main portal]
[Essays]
  
red
This is part three of the 're-ranking' trilogy.
one:[yoyo1.htm]: The yo-yo technique
two:[synecdoc.htm]: The synecdochical searching method
  


The epanaleptical approach

(and other fuzzy searching tricks)

by fravia+
first published @ searchlores in December 2001
Updated in December 2003
[The epanaleptical approach]    [Search engines' tides]
[Epanawhat? Wuzz dat?]    [Other fuzzy searching tricks]

   
The epanaleptical approach

The 'geminatio' or Epanalepsis is in rhetoric nothing more than an 'echo sound' or, to be more precise, the textual repetition of the same part of phrase or word. From a rhetorical point of view we could subdivide the repetition of a single query term, palillogia from the repetition of a group of terms (or a complex query string) which is a "proper" epanalepsis. But -as a searcher- I'm going to use this term for ALL kind of approaches where you just refine your search using repetitions.
This approach is -afaik- not widely used in searching, yet when dealing with the algos of the main search engines (which are inerehently stupid) it allows you to get at different 'clusters' of your signal than you would have got WITHOUT repetition. So you better be epanaleptical and use redundancy every time you suspect phony ranking algos and, oh boy, the commercial search engines ranking algos are indeed most of the time fishy... to say the least... :-)

The 'epanaleptical' touch
Please allow me to use once more as 'Field case' the 'Haiku' targets that we have already eviscerated in the previous essay (The synecdochical searching method). There we saw how slight syntactical variations (e.g.: haïku) or delicate metonymical variations (e.g: tanka) or complex peristatical variations (e.g. "karasu-no tomari-keri") of your original search query would deliver "valid sharp arrowheads", or 'angles' for your searches, enabling you to fetch new clusters of interesting sites.
So let's imagine that we are still interested in Haiku poetry, even if this subject begins slowly to bore you (Do not despair! You'll able to apply these findings to whotever target you are after! Never forget the third optimistical law of seeking :-)
Let's try ths simplest 'Bill the zombie' search approach in Google:
haiku: 378000 sites! Lotta crap, unfortunately. We would have to use the yoyo technique in this case in order to find something worth trough our [yo-yo wand].
Let's try instead a first, simple, 'epanaleptical' touch:
haiku haiku: 195000 sites! A reduction of 48 % searching with the simplest form of epanalepsis!
Do the results look better? Yep, quite a lot! This is the UNIVERSAL law of epanalepsis when searching... as you may check by yourself: Tolkien Tolkien is BETTER than Tolkien!
Note that in this case we just repeat the term without using boolean AND operators (or plus signs) and without quotes. Thus the epanaleptical approach is efective even in its simplest forms.

I have noticed that on GOOGLE and FAST simple epanaleptical reductions of ranked results variate between 40% and 60%, in the case of Tolkien Tolkien versus Tolkien we have 498000 versus 279000 results, i.e. a 44% reduction, scusa se poco.
You'll also note how the reduction rate for Altavista is even more compelling, with a reduction rate of almost 98%!
Of course in the case of Altavista, which is commercially HEAVILY spammed, you'll have to apply the [yoyo] technique as well to fish interesting quarries from within its "down yonder" guts.

Google
Quid results reduction % epanalevel
haiku 378000 100 0
haiku haiku 195000 48% 1
tolkien 597000 100 0
tolkien tolkien 308000 48% 1
  
Altavista
Quid results reduction % epanalevel
haiku 159589 100 0
haiku haiku 3861 98% 1
tolkien 213694 100 0
tolkien tolkien 4335 98% 1

Please always keep in mind that we are trying here to apply OUR OWN parameters to the algos of the main search engines. And that these algos may vary wildly. Yet our epanaleptical results will shed some light on those very algos as well... Infact this is one of a series of approaches that can be used in order to reverse search engines algos (a compart of the web where much money is involved, and where are active many of the lurkers that roam and leech this site without ever contributing to it).
Should we for instance repeat the same epanaleptical search on FAST (Alltheweb) we would get similar results for 'haiku', but, awkward enough, OPPOSITE results for 'tolkien':

Alltheweb (Fast)
Quid results reduction % epanalevel
haiku 219768 100 0
haiku haiku 122292 44% 1
tolkien 145548 100 0
tolkien tolkien 168204 -16% (increment) 1


How is this possible? Fast has INCREASED results when using epanalepsis on 'tolkien'! Mistery of the algos. It could have to do with special postindexing filters and reranking algos these engines use when dealing with human names ("aristotle" will fetch 249840 while "aristotle aristotle" will fetch 253395 results with a smaller but still inexplicable increment of 1,4%).
This will teach us to accept all these approaches cum grano salis. They do work indeed (at times they work small searching wonders) but the web is so slippery that you can always make mistakes or blunders... That's the fun of the whole thing.
Readers that are going to be wizard searchers should never take themselves too seriously, else they will go ka-buum hitting all sort of obstacles when searching in the dark. And since we are using haiku as quarries...


With the scent of plums
on the web road - suddenly
your target comes!

Let's now increase the epanalepsis level... (keep in mind that the 'results' data may vary at EVERY REQUEST - because of search engines' tides - depending from the group of Google servers that will answer)

Google
Quid results reduction % epanalevel
haiku 377000 100 0
haiku haiku 207000 45% 1
haiku haiku haiku 195000 48% 2
haiku haiku haiku haiku 192000 50% 3


Clearly with this kind of simple "one word" repetition (palilogia, from greek logia & palin: "speaking over again ") the 'main' reduction appears at level 1.

Once more the results above must be interpreted with care. First of all the results from any given search engine do vary every time you repeat a query. This depends FIRST from the 'depth' (or width) of their specific databases.
These vary considerably, the most broad ("omnipotent") ones being - at the time of writing this snippet - those at Google, see the graphic below.

red   Comparative 'depth' of main search engines' databases compared to google=100 (note that the whole indexed web would be ~6 times bigger than Google's width).

Now the problem is that the bigger the database, the less inclined the search engine will be to search it in its whole completeness when there are server overloads (or when the search query is evidently much too vague). There are various ways to limit access: different indexes usage, timeouts, redirection towards slower servers. Once more keep ALWAYS in mind that search engines are NOT there in order to provide you some kind of 'free' service for the glory of knowledge and for the sake of the web of old... They are just trying to scrap as much money as possible out of your USE of their concoctions and biased ranking algos. As usual in our doomed society, short-term profit is the ONLY reason search engines exist and someone is paying money for their bandwith, duh.

Yet, once we understand this, we may be able to reverse part of it  :-)



Search engines' tides

As pointed out above, depending from the vastness of their indexes, the 'biggest' search engines, like Google, will tend to give you inconsistent results either every few hours or even every time you do query them, depending on server connection speed, overload, moon phases, how many Americans watch television and so on.
In other words most (big) search engines have TIDES, and accomplished searchers would be well advised to take account of this problem as well.

A 'quick and dirty' check of the database depth 'tides', for instance in Google, can be gathered through the following 'rimbaudian' query, based on the weight of the five vowels (aeiou... note also how the letter 'a' will give you an approximate idea of Google's global depth at the moment of the query)

Google's depth tides
10/12/2001: level 12/12/2001 (11.00 GMT): ebb 12/12/2001 (16.00 GMT): flood Try it yourself today
a = 1300 million a = 1170 million a = 1410 million
a
e = 325 million e = 286 million e = 357 million
e
i = 513 million i = 447 million i = 539 million
i
o = 179 million o = 159 million o = 187 million
o
u = 98 million u = 85,4 million u = 108 million
u

now you could also try determine the colors of the vowels... :-)


Back to our approach, let's see what happen with FAST's weird personal name 'increment' when we speed up the epanalepsis...

Alltheweb (Fast)
Quid results reduction % epanalevel
tolkien 1,222,522 - 0
tolkien tolkien 1,222,522 -0% (increment) 1
tolkien tolkien tolkien 1,171,064 -4,21% (increment) 2
tolkien tolkien tolkien tolkien 1,222,522 -0% (increment) 3

Well, once upon a time there was a peak at level two, while level 3 was smaller than level 1 (but still an increment). Nowadays there is a small difference at level 3, and that's all. New algos. Let's check it once again, repeating the epanaleptical query for our 'aristotle':

Alltheweb (Fast)
Quid results reduction % epanalevel
aristotle 1,389,308 - 0
aristotle aristotle 1,330,215 -4.25%% (increment) 1
aristotle aristotle aristotle 1,389,308 -0% (increment) 2
aristotle aristotle aristotle aristotle 1,330,215 -4.25%% (increment) 3

Different. Once upon a time (two years ago) there was a peak at level two, while level 3 always smaller than level 1,
Hence as (provisory) conclusions for Fast/Alltheweb (and Wisenut): with people names you could WIDEN your search using the epanalepsis approach. Nowadays there is a difference at level 2 and 4, no difference at level 3. New algos.

Other search engines particularities

Hotbot is 'invariable', no matter how much you repeat your search term, it will still give you the same amount of results.
Lykos shows the same 'epanaleptical incrementation' phenomena that we have seen above for Alltheweb/Fast when using 'Tolkien' as search term.


Epanawhat? Wuzz dat?

You may -legitimately until now- wonder why should we have to use funny terms like 'epanalepsis' instead of just saying something simpler like 'repeating the search term', or 'redundance searching'.
Well there are good reasons for that in my humble opinion. First of all the concept of epanalepsis is quite complex, encompassing simple "palilogical" one-word searches, as we have seen above, AND more complex real epanalexis searches where you'll use MORE THAN ONE TERM and/or whole (long) strings. Here the repetition can concern only one/more word(s) at the beginning of the string ('Go, go good countrymen'), the center ('daraus kann nimmer, nimmer Gutes kommen') or the end ('Come away, away!') of the string.
Moreover more subtle approaches, when searching, could also be a 'diacopical' epanalepsis, if you INSERT one or more words inbetween, or 'elliptical' epanalepsis, if you OMIT some terms from your search strings. This of course brings us back to the [synecdochical searching method] that we have examined in part two of this trilogy. Since all these techniques give DIFFERENT results on the main search engines, I do believe seekers should indeed strive to be terminologically precise.
Here some examples (Mid-december 2001):
advanced searching: 1,010,000
advanced advanced searching: 982,000
advanced advanced advanced advanced advanced advanced searching: 967,000
advanced searching searching: 1,030,000
advanced tips searching: 186,000
advanced search searching: 881,000


To be continued

Other fuzzy searching tricks

These have something to do with the famous 'art of guessing', but relate to our re-ranking efforts as well.
  1. Imagining page names:
    Note the elegance of this kind of arrows (devised by ~S~ Jeff), that will 'shoot' you to the center of the signal...
    haiku2 haiku3 basho5 tanka2
    Another possibility is to add the word 'page': haiku page

    To understand why I have chosen above, other than Haiku, examples like 'basho' and 'tanka' please refer to Part two of this trilogy of essays:[The synecdochical searching method], where you'll learn how to gather synecdochical context for your searches.


  2. The broad explanation:
    Another interesting trick is to jump directly to the ' broad explanation' of what you are searching for.
    To continue with our 'haiku' case invesigation, you could try the following searches:
    "What is haiku"
    How do you write a haiku? (note the missing quotes)
    "origins +of haiku"
    "history +of haiku"
    an introduction to haiku (note the missing quotes)
    ... you get the idea.


  3. The format hunt:
    Another interesting trick is to seek specific 'formats' of the quarry you are searching for.
    To continue with our 'haiku' case investigation, you could try the following searches:

    haiku.pl -www.haiku.pl -http://haiku.pl
    The NOT terms in the query string are necessary! Whenever you seek perl scripts you will not like to gather all possible polska sites as well :-)

    haiku.pdf
    When executing some specific searches it could be a valid idea to seek pdf files, which, their cumbersomeness notwithstanding, are for some mysterious reason the preferred media by many universities and many administrations. Since haiku are japanese poems, duh, pdf files could indeed be useful, allowing a seamless insertion of kanji characters in the documents.
    ... you get the idea.


  4. The regional hunt:
    This kind of search borders with the common use of regional search engines when combing.
    To continue with our 'haiku' case investigation, you could try (as Jeff did) following searches:

    allinurl:jp haiku.html which would give you a wealth of japanese results

    ... you get the idea.
To be continued
December 2001: in fieri, I doubt I'll ever finish this lore... in the meantime your search bows have acquired some extra strings, and you may even enjoy some Haiku...  

The short search is through
on the hairy seeker
little beads of dew

Hey, this is part three of my re-ranking trilogy. Note that there are more concoctions for your reading pleasure:
Part one: [yoyo1.htm]: The yo-yo technique by fravia+ (Tackling the 'down yonder' problem: a discussion about search engines' "depth")   Part two:[synecdoc.htm]: The synecdochical searching method by fravia+ (substituting a part for the whole when searching)


Petit image
Back to the essays
(c) 1952-2032: [fravia+], all rights reserved and reversed