fravia_letter
HOW TO SEARCH THE WEB
by fravia+
~
Letter 008 - November 1997

ADVANCED SEARCHING TECHNIQUES
(Combing and klebing)

(Based on some original private emailings from +ORC)
~
This stuff has been gathered and written by fravia+, so if you leech, copy,
use and spread it have at least the decency to give credit

__Combing__
(Some other specific combing examples are to be found inside my antismut pages)

What is combing?
Combing is a very effective search strategy: instead of simply searching, you 'milk' (or 'comb') various other net resources:
- The continuously updated "Top 100", "Top 1000", "Top whatever" URL-locations
  ('Real' combing)
- Usenet newsgroups and their various "vigilant filters" and "short range queries"
  (Usenet combing)
- Relevant site links pages.
  (a form of 'crumbs gathering', see my anti-smut pages)

Real combing: The WebSideStory example

Best way to learn combing is to have a try by yourself: Let's take as example (yet there are THOUSAND of these 'top whatever'sites) one counter-related site that I have been using myself (it offers quick text-only stats and awful graphic stats): WEBSIDE STORY.
Here is websidestory's self-praise:
Updated every four hours. Last update Mon Nov 3 12:00:01 1997 - PST
WebSideStory, Inc. currently monitors 30,783 sites who have 15,264,666 visitors per day.
There are 22,750 sites listed in 36 categories, averaging 1,996,241 visitors per day.
The problem with all these 'top whatever' sites is that you oft have to wade trough a lot of pages to get where you are interested, because the poor sods want you to read their awful ads.
Famous Listing of the Best Sites on the Internet

In the case of Websidestory you'll for instance first land at http://www.hitbox.com/wc/world.html:
redWebSideStory's first page
Yet you'll eventually land inside this second page (divided by categories):
redWebsidestory's second page
And here you'll be eventually able to choose among the various categories that this counter-related database depot has chosen, for instance the following ones (I have of course chosen the ones I reckon could yeld some results:
Web Resources (802) Hacking / Phreaking (829) Personal Homepage (3159)
Computers (515) Internet Services (419) Software (432)

Now you have seen it...
Obviously combing is an important technique for whatever interest you may have, quite effective and pretty useful in order to spare an incredible lot of Internet searching hours.

For combing purposes you may also use:
1) ftp search, looking for "hidden" subdirectories with relevant names
As anybody that knows how to use redftp search ("This server is located in Trondheim, Norway") already experienced, the ftp search approach (that fishes hidden directories) can fish incredible (if tricky to interpret) results.
Just do a quick search for 'warez' and you'll see what I mean.
2) the "big page provider" search engines (Like the search engines that work page specific for geocities at redhttp://www.geocities.com/search/ or for mygale, or for angelfire, or for fortunecity, or for chez, or for you name one of the thousand existing free pages providers that have specific search engines)
There are THOUSAND of 'top whatever' counters and many carry some form of 'top side listing' within... you may want to examine a list with MANY counters on this good page: Web Counters and Trackers (Access Counters for Web Sites; Free Counters; Web site auditing)

Usenet combing

Usenet combing can work "on the fly" or "regularly" through the "Vigilant" filter at
filter@vigilant.bc.ca
I'll show you for instance one of my favourite simple queries:
FIND how-to-search tutorial manual
		NOT spam
		NOT top position
		NOT advertising
		MAX 8
Such a query would give you useful information about "searching techniques" on the Web, you may of course construct how many queries you like and *register* (for free) by the vigilant filter, in order to get the results of your usenet queries emailed to you every day or week or month.

The vigilant robot

Learn the secrets of usenet FILTERING! Email filter@vigilant.bc.ca with the word "help" inside BOTH subject and text and learn how to use it as soon as you get vigilant's automated answer... this robot is capable of sending you automatically ALL usenet messages that contain the wording that you have chosen... vigilant is NOT a usenet depot, like Dejavu or reference.com... vigilant will send you (obviously for free) "on-the-fly" all usenet messages that transit around dealing with matters that may interest you, at times inside newsgroups you do not even know the names of... to master well its filter capabilities is quite tricky though... study it and use it... you'll never regret it and I'm sure you'll thank me for this tip

UNFORTUNATELY DOWN SINCE THE BEGINNING OF AUGUST!
Why? Has anybody any clue? Are there other "vigilant" services? This is another of the "mysteries" of the Web: good services are retired and awful bogus and useless "push" services abound
:(

Dejanews

Remember that you can gather an INCREDIBLE amount of information through the following Usenet "depot":

red DejaNews __ONE OF THE *SCARIEST* BIG BROTHER SNOOPER ON THE WEB__

You'll use it a lot, it allows you to reconstruct a personality profile as soon as somebody uses newsgroups (like all do). As a matter of fact I tried to understand who the hell hides behind this service... have a look at my reddeja.htm page if you are interested too in this kind of things... hey, did you know that there exists also a nice redstalking page of mine where these matters are explained a little more?
And did you know that you may even red"snatch" information from people browsing your pages?

Reference.com

Finally, you can gather an INCREDIBLE amount of information through the following Usenet "depot":
red reference.com
here you'll be able to "register" your automated queryes... and THAT, believe me, is really useful to snoop what's going on and where are the sites that you are looking for...
In fact usenet combing could be translated in 'let other people do the searches for me...": you'll simply find email snippets of people that has found the solution to your query inside some usenet group you do not even know the name of!
Usenet queries that can be done through the two big Usenet "depots": Dejanews and email query, are possible ALSO through the major search engines (if you know how to use them) and using the 'klebing' techniqe explained below:
Many of the main search engines allow such querying too, and they use (of course) the services of either Dejanews or emailquery.
NOTE THAT THERE ARE MANY MORE 'usenet-depot'... I recently found an 'italian' one at redhttp://www.mailgate.org/mailgate/index.htm who knows how many more there are around!

 
__klebing__
Fishing query strings and locations

Klebing is a 'reversing search' technique that goes ways beyond "combing". And which offers incredible value. We will clear out what klebing is, below, using a ready made example on a site that you'll probably already know (it is an important hacker site and I link to it myself inside my links page): here is the 'normal' URL of that site: L0pht heavy industries.
We can use LOpht for this example because LOpht has (publicly) the 'row material' that we need for klebing: the 'remote connexions' list. It is basically a very simple CGI-script, that updates inside its own database (LOpht updates every day) all the "remote" URL locations (i.e. the sites the various visitors come from) accessing any of the pages of a given site.
You may easily write such an analogouus spider and add it to your site! In order to write quickly (and dirty) a 'crude' CGI-script like this you just need to list all the var where = document.referrer variables that any lamer's browser carries inside (well... not our reversed and 'ameliorated' browsers... in order to learn the relevant techniques you may want to have a look at Mammon_'s Reversing Netscape's buttons and menus essay... my copy of Netscape carries for instance a different random -and of course faked- document.referrer variable everytime it accesses a new site :-)

Well, have a look at the next link and you'll understand what I mean:
Here you have the real, updated LOpht's location you'll use yourself in order to perform your updated klebing endeavours: http://www.l0pht.com/ref.html

And here you have a copy of it that you should examine NOW in order to better follow what I'm telling you.
In order to discuss together with you some of the 'results' of our klebing activities I have copied a 'still image' of this continuously updating database inside my site, talen from the location above on 4 Nov 1997 (to-day), here it is: lophtrev.htm

So, now that you had a look at them, let's say a couple of things:
1) The utility of such a script from the Webmaster's point of view is obvious: he can immediately see WHO is sending hits to him and WHERE inside his site does he link to (and he can 'punish' eventual 'fastidious' linking inside his site simply modifying the name of the branched pages, like I'll do soon with the academy section of my site if you keep entering from the sides to my pages :-(
2) The utility of such a script (if publicly presented, like this by LOpht, or else if 'somehow' findbar inside a /cgi subdirectory -see my antismut pag for the relevant CGI-cracking techniques :-) is for our search purposes HUGE! If the site has some attinence with fields you are interested in (and LOphts for sure has it with sites that may interest us!) you are in for a surprise... in fact one wonders what's the point of laboriously browsing the web in search of possible new intersting sites where you could eventually learn something! Let those same sites COME TO YOU all by themselves alone... isn't it nice?

In fact, what do we have here?
Let's have a look at some intersting little fishes:
Yahoo and excite for instance, find both this site through the cdc cult
1409 | http://www.yahoo.com/Society_and_Culture/Religion/Humor/Parody_Religions/Cult_of_the_Dead_Cow/ -> /cdc.html
125 | http://www.excite.com/search.gw?trace=1&search=hackers -> /cdc.html
'our' astalavista is also present:
124 | http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez -> /lounge.html
Note thet there is already something that may be interesting for you (albeit well known by all search-experts): the FORM that an excite or astalavista query takes!
Yes, if you have read my previous letters, you'll have seen that it is possible to query search engines per email using URL addresses like:
http://lycos11.lycos.cs.cmu.edu/cgi-bin/flpursuit?first=1\\&maxhits=30\\ &minterms=1\\&minscore=0.01\\&terse=standard\\&query=linguistic+phenomena
Therefore we have here a simple 'template' that we can immediatly use for OTHER queries... c'mon: try it out: cut and paste the following line:
http://www.excite.com/search.gw?trace=1&search=hackers
that we have found through our klebing work, and paste it inside the 'location' window of your copy of navigator...
Have you done it?
Well, now backspace over hackers and digit instead crackers
. Now press enter and have a look: your own ready-made excite search string!
And youll find THOUSAND of powerful and frequent or funny and seldom used 'query string' possibilities trough this klebing method... d'you understand now how POWERFUL this can be?
New strings
Back to our klebing page... as you can see, in order to land somewhere at LOft a part of these visitors has used Yahoo and has searched for 'hackers', 'attress', 'spycamera' and more
Now, some of these are banal, like 'hackers', yet some are quite interesting, like 'email intercepting'.
This can also be quite interesting... I have quite a lot of ready-made strings that I use with the search engines, and some of them I have gathered klebing sites... else I would probably never have come to some ideas.
Watch the watchers
Some of our enemies have sites somewhere that tehy use to check us... it may be quite interesting to snoop onto those sites... through klebing you'll get them... have a look at what we have here at LOpht:
316 | http://www.microsoft.com/security/ntprod.htm -> /advisories.html
103 | http://www.microsoft.com/security/issues.htm -> /advisories.html

Unknown mysteries
This one links to a cgi-bin page... why?:
105 | http://nowhere/nothing.html -> /cgi-bin/Count.cgi
well, this tells us
FIRST
That there is indeed a cgi-bin directory here with a Count.cgi script and
SECOND
that nowhere/nothing is interested in it.

Old friends
And who the hell is this next one? Our good old friend Bokler from Deja? (See my deja.htm page)
14 | http://spider.bokler.com/bokler/crak_body.html -> /index.htm

Well... rich fishing, isn't it?
And the following ones could be interesting too, don't you believe?
106 | http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez&submit=+search+ -> /lounge.html
114 | http://netfind.aol.com/search.gw
Yes, when you start klebing, you never finish off experimenting! :-)
Go ahead, enjoy!

(c) fravia+ 1997, work in progress, all rights reserved nevertheless

how to search 5 how to search 6 how to search 7
Entrance links ~~ tools ~~ antismut anonymity
~~ ~~ ~~ search_forms mail_fravia

fravia+ 04 Nov 97