~ Essays ~
         to essays    essays
   
~ Bots Lab ~
         to bots    bots lab
(Courtesy of fravia's searching lores)

(`. Hitting The BullsEye .)

by CiNiX
(slightly edited by fravia+)
published at searchlores in Mai 2001

A very interesting essay by CiNiX, covering Bullseye (after the recent essays about [Copernicus] and [lexibot].
What should I say? Clearly it seems that there are SO MANY bots around that we'll always be able to 'collate' them back to an understandable script, no matter how encrypted their handshakes are... eheh: the web was built in order to SHARE, not to HOARD knowledge...

Bullseye, the target of this essay by CiNiX, is another searchbot, or searchagent, an agent being "software that knows how to do things that you could do if you had the time")
According to Sundar Kadayam (the cofounder and CTO of Intelliseek, makers of Bullseye) there are "six major task areas thats a search bot must undertake on behalf of the user... 1): Expand search coverage to cover distributed sources and the "invisible Web" 2): Guarantee freshness by eliminating dead links and removing stale hits with non-matching content 3): Improve relevance by adding quality metrics and incorporating user feedback 4): Analyze and filter out irrelevant documents and cluster/categorize to aid visualization of data 5): Report and collaborate by annotating and generating reports and aiding in collaboration with other agents 6): Track and alert through continuously monitoring sources and alerting the user when key things are found". As you can see, Bullseye raises big expectations...
This essay is short, but simple only for those that have read the recent exploits of the
[oslse project]... yet it will require some background reading for those that until now did not work along our efforts.
Alee! The moment has come, for accomplished searchers, to learn some simple 'black' magic. Enjoy!


Hitting The BullsEye

by CiNiX

Preparation

Target

BullsEye 2 - IntelliSeek

Tools

1/4 Brain  - You really don't need the rest ;)
C Compiler - For writing the decoder tool: bullsdec.exe bullsdec.zip from CiX
[that's all folks!] 

Introduction

Once upon a time... there was a little boy that wandered around in the big World Wide Web until he stumbled upon a site with huge amounts of knowledge. The little boy started reading, reversing, seeking and searching until he was showing the early sympthoms of information overload and he desided that the time had come to step forward, write his own essay and contribute...

When reading the essay's by WayOutThere and Laurent about offline search bot reversing, I remembered having a copy of an offline search tool called BullsEye. I took a little peek in the 'hidden' engine directory and found about 897 engine files, mucho interesting information for the people that work on the oslse project! Let's see how we can make it usefull.

The first steps

I started opening some of the engine files in an ordinary text editor and was happely surprised to see they had the format of an .ini file, the ones you read from with 'GetPrivateProfileStringA' and 'GetPrivateProfileIntA'.

The all looked a lot like this:

[Data]
Name=Vhi Shf Zig Zmjbnt
DisplayName=Vhi Shf Zig
Id=431
Version=22
FormPageURL=cpqn:+,yza/gstcrpirp*sfe/
SearchEnginePageURL=cpqn:+,yza/gstcrpirp*sfe/
HelpPageURL=
Cookies=2
CanTrack=2
Server=rst(zlmvkibki#lyx
GET=2
QTEngine=Vhi shf Zig
QueryString=&ydg(bjp.wjgykq?bgrfn=#k
MonitorQueryString=&ydg(bjp.wjgykq?bgrfn=#k
QTFormOptions=LP^LNNF
AD=1
Description=Awpr Sfcugm:  Jvw gvt Nwu- Wij tig Xnsl

[Registration]
more data ...
So only the value's of the keys are encoded. Humm...let's see if we can replace them with something that makes sense. The filename was AllThe00.eng so could the Name= value be something like All The Web ???
Name=Vhi Shf Zig Zmjbnt
Name=All The Web ??????

DisplayName=Vhi Shf Zig
DisplayName=All The Web
Other values that can be decoded? any URL's that start with http://www.? It's supposed to be a search tool so: Yes!
FormPageURL=cpqn:+,yza/gstcrpirp*sfe/
FormPageURL=http://www.alltheweb.com/

SearchEnginePageURL=cpqn:+,yza/gstcrpirp*sfe/
SearchEnginePageURL=http://www.alltheweb.com/
So what can we learn from this? All the uppercase characters are encoded as uppercase and all the lowercase characters are encoded as lowercase. Spaces don't seem to be encoded at all!

The encryption routine is more than some substitution else the two t's should be encoded the same. Even as the three w's, so encoding is partially based on the position of the character in the string. Look at the t's isn't the q one place further in the alphabet than the p? Look at the w's, so the encryption for lowercase characters is something like this:

encoded_char = char + encrement + position_in_string;
Some counting tells me that the encrement for lowercase is 0x15 (21), so our lowercase decoding routine should look like:
if(character >= 0x61 && character <= 0x7A){
   character = character - 0x15 - offset;
   while(character < 0x61)
      character =- 0x1A;
}
trying this routine on the engine file gives us something like this:
Name=Vll She Zeb Zearch
DisplayName=Vll She Zeb
FormPageURL=http:+,www/alltheweb*com/
SearchEnginePageURL=http:+,www/alltheweb*com/
We're on the right track and I haven't looked at the code of BullsEye yet, I really think I'm not going to look!

Further Investigation

After some experimenting I came to these conclusions:

BullsEye uses these ranges of characters to encode them:

[0x61] -> [0x7A] (a-z) encrement: 0x15
[0x41] -> [0x5A] (A-Z) encrement: 0x15
[0x21] -> [0x2F] (!-/) encrement: 0x06
[0x30] -> [0x39] (0-9) encrement: 0x01
[0x5B] -> [0x60] ([-') encrement: 0x03
[0x7B] -> [0x7E] ({-~) encrement: 0x01
After encoding these routines in our decoder the result is something like this. Easy to read;)
[Data]
Name=All The Web Search
DisplayName=All The Web
Id=318
Version=10
FormPageURL=http://www.alltheweb.com/
SearchEnginePageURL=http://www.alltheweb.com/
HelpPageURL=
Cookies=1
CanTrack=1
Server=www.alltheweb.com
GET=1
QTEngine=All the Web
QueryString=/cgi-bin/search?query=%s
MonitorQueryString=/cgi-bin/search?query=%s
QTFormOptions=QT_NONE
AD=0
Description=Fast Search:  All the Web, All the Time

[Registration]
Required=0

[NextHits]
DoNextHits=1
URLTEXTSusbtr=Next
URLHREFSubstr=/asearch?
HitsPerPage=10

[ExtractHits]
How=EXCLUDELINKSINSAMEDOMAIN
IncludeList=
ExcludeList=SA->http://ftpsearch.lycos.com/ SA->http://mp3.lycos.com/ SA->http://www.innit.com/ SA->http://www.fa-premier.com/ SA->http://web.fast.no/company/det.asp?id=8 SA->http://web.fast.no/company/contact/det.asp?id=19 SA->http://www.dell.com/ SA->http://www.dell.com SS->ftpsearch.lycos.com/cgi-bin/search?query= SA->http://www.fast.no/ SA->http://www.fast.no/company.html SA->http://www.fast.no/contact.html SA->http://richmedia.lycos.com/ SA->http://listeningroom.lycos.com/ SS->www.fast.no
DeliversSingleHitAsResultPage=0
NoMatchesId="No Hits."

[TrackExtractHits]
DynUrl=0

[Summary]
Available=1
StartFrom=DD
StopAt=BR

(c) CiNiX 2001. lord_cinix(at)hotmail(point)com

         to essays    Back to essays
   
         to bots    Back to bots lab
(c) III Millennium: [fravia+], all rights reserved