~ Essays ~
         to essays    essays
(Courtesy of fravia's advanced searching lores)

(`. Feed the search engines with synonyms .)
by sonofsamiam
slightly edited
published at searchlores in July 2000

A very interesting idea to broaden your search results by using synonyms for words you are unsure about

I have made a synonyming metasearch engine which can be found at http://www.spunge.org/~uriah/cgi-bin/synmeta.cgi;. It queries Lexical FreeNet to find synonyms of words and "also-known-as"'s for people-searching (famous people searching :)

The user end of any search engine is just digging through their database via a web interface, and there are many other web-searchable databases: 411's, lexical databases (like lexfn), and etexts, for example.
A metasearch just adds another layer onto the onion, to act as an interpreter between us and the databases. (Or rather, between us and the web interfaces to the databases, an interpreter talking to an interpreter, so sometimes you don't get the joke). Perl or PHP gives more flexibility, but frontends can be javascripted, see web scripting secrets, Bombastic Search Engine Front-end, or snooz and all-in-one searches on fravia's.

It's easy to make your own two-dimensional (linear, read: slow) metasearches. Anyone with even basic programming skills can use LWP::Simple & HTML::Parser to automate web retrieval. I think REBOL is particularly well suited to these built-in html parser and document retriever. A 3d one will take some experience with multi-theading/tasking (i don't have yet, other than playing with fork()s ;)). There are several other search-engine front-ends on the web, Oingo tries to give natural language recognition to altavista and dmoz, as did electricmonk.com, which seems to be closed as of this writing.

Other possible directions to take metasearching (or front-ending?):

Hmm, let's try it real quick, using say, +%archetypal figure +Jung, 'coz I'm interested in that stuff right now :) Entering that query in the synonyms metasearcher will return the following url : http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=en&pg=q& text=yes&q=%2b%28archetypal+|+archetypical+|+prototypal+|+prototypic+|+prototypical%29+figure+%2bJung&search=Search)
And, wow, that came out much better than I thought it would, heh, good example :)

Well, here is the relevant part of the cgi (the rest is just html stuff):

use LWP::Simple; #no need for anything more in-depth

@in = split(/&/,$ENV{'QUERY_STRING'});
foreach $i(@in){
  $i =~ s/\+/ /g;
  $i =~ s/%(..)/chr(hex($1))/ge;
  @key_val = split(/=/,$i,2);
  $in{$key_val[0]} = $key_val[1];
}

$in{'q'}=~s/[^\w()|+\-~"% ]//; #strip bad chars.
open(L,"<<$logf");
print L time()."\n$ENV{'REMOTE_ADDR'}\n$ENV{'HTTP_USER_AGENT'}\n$in{'q'}\n";
close(L);

#there _must_ be a better way to do this! am I just stupid or what?!
$b=$in{'q'};
while($b=~/^.*?"(.*?)"(.*)$/s){
  $a=$1;
  $b=$2;
  $c=$a;
  $c=~tr/ /_/s;
  $in{'q'}=~s/$a/$c/s;
}
@tokens=split(/ /,$in{'q'});
foreach $token(@tokens){$token=~tr/_/ /;}

foreach $token(@tokens){
  if($token=~/^([+\-~|]*\(?)%/){
    $t="$1(";
    $token=~s/[^\w ]//g;
    @syns=($token);
    $token=~s/ /+/g;
    #grabbing and parsing out the synonyms
    $p=get("http://www.raisch.com/cgi-bin/lexfn/lexfn-cuff.cgi?sWord=$token&tWord=&query=show&maxReach=2&ASYN=on&ABAK=on")
    or last;
    $p=~/^.*<\/form<(.*)<font/is;
    $p=$1;
    while($p=~/<b<<a.*?<(.*?)<\/a<(.*)$/is){
      push @syns, $1;
      $p=$2;
    }
    #quote spaced synonyms
    foreach $s(@syns){
      if($s=~/ /){$s='"'.$s.'"';}
  }
  $token=$t.join(' | ',@syns).')';
}
$query=join(' ',@tokens);
$query=~s/\+/%2b/g;
$query=~s/ /+/g;
$url="http://www.altavista.com/cgi-bin/query?sc=on&hl=on&kl=$in{'kl'}&pg=q&text=yes&q=$query&search=Search";
print "Location: $url\n";
print "Content-type: text/html\n\n";
Is this kind of se additions useful? or maybe more can be got from Lexical FreeNet?
Please send any questions or comments or criticisms, it's most wanted :) sonofsamiam



Petit image

(c) 2000: [fravia+], all rights reserved