FTP is a way of moving computer files from one site to another ; anonymous FTP is our concern here since it is open to anyone. The administrators at computer sites around the world have made directories full of information available to anyone who logs is as 'anonymous' (with the convention of using your email address as a password). Since then there is no special permission required to get into these sites, they constitute a kind of "swap box," where everything is freely available. The range of programs available at these sites (now enormous) grows exponentially by the day.
Archie was developed at McGill University's School of Computer Science by Alan Emtage, Bill Heelan, and Peter Deutsch. (Apparently) this computer tool (like many) was devised out of the university's need to save money. Looking for public domain software, the McGill group began searching anonymous FTP archive sites, and eventually began to automate the process of scanning their findings. From this evolved an information tool that is among the most widely used on the Internet. Deutcsh and Emtage, went on to found Bunyip Information Systems of Montreal, which licenses the archie server system and provides upgrades and support. (Newest Version to date is 3.5).
Consider archie as a set of related functions. The software maintains a list of Internet FTP sites known as the Internet Archives database (the term archie is a play on the term "archive"). The database is searchable by a variety of servers across the internet. Note well that that archie does not make data sweeps across the entire Internet ; rather ; it targets specific sites, with the permission of their administrators, and searches them. Of course going through the directories manually would take constant vigilance as new files and directories were added to one or another site. By automating the process, archie can make its own sweep of FTP directories, compiling them and storing the results in its database. Each site's holdings therefore stay almost up-to-date at any given moment, (given a slight overlap) that is to say an archie search could point to a file that had been removed shortly after the last archie database update. (and vice versa: a new file that a given archie doesn't yet know about could be added to the directories at a specific site.) If you want to see the the listings that archie creates for its database ; they are available at the individual server sites.
The first thing of course is to use a telnet client (for the sake of simplicity let's just say we're using the plain ol' unix telnet client). So you need a server to telnet into, let's use archie.th-darmstadt.de. So you issue the command telnet archie.th-darmstadt.de, and you see the following: (this after you login with archie)
Aside from being able to identify the current version of this archie server, it is important to note the default search type --which is `exact' here ; meaning that anything you type in will be looked for as an exact match. So this is what +orc meant by nomen omen eh? So if you type in pumpkin you will not get hits such as pumpkin.txt or pumpkin.tar.Z (simple enough). To set this parameter you use the set command where search is the variable and sub tells specifically how the variable is to be applied.In other words --The 'search' variable determines the kind of search performed on the database by the 'prog' command, providing flexibilty on search times and ranges. set search exact! Now set search sub (of course all of this occurs on the archie> command line before you conduct your search, set variables first search later in other words). So set search sub which is default for some servers, will retrieve any file or directory name containing your search term within it, ignoring case. So if you search for -tin- you will get hits ranging from -bulletin to tinman-. An example of this would be to search for orc.htm, so first you write set search sub and then you write find orc.htm. Here is an example of this simple process (truncated).
Other types of searchs include set search subcase which works like a regular substring search, except it differentiates between upper- and lowercase letters. set search regex makes use of UNIX regular expressions to conduct the search. Used without furthur specification, a regex search becomes a substring search, because regular expressions assume a wild-card character at the beginning and end of the search term. Using the caret (^) and the dollar sign ($) you can specify that the search term should only appear at the beginning of the retrieved file or directory of the end of the retrieved file or directory. e.g., ^eros would return hits that contain the search term at the beginning of the file eros$ would return hits that contain the search term at the end of the file. (more on this later)
Some other search parameters explained
- set search exact_sub : here you are searching for an exact hit, but if archie finds no matches, it will then try the substring search statagem.
- set search exact_subcase : here the search begins using the exact stratagem, but if it fails, it switches to a subcase or case-sensitive substring search.
- set search exact_regex : this begins with the exact search method, but switches to regex if no matches are found.
- set match_domains string : this means that you can determine a particular domain that you want to search such as Universites. Here you would write set match_domains edu or you might want to limit your search to Switzerland, write set match_domain ch.
- set match_path Here you can restrict your search by examining the directories archie will look in for your search term. This makes most sense if you have an idea that the file you are looking for fits into a general catagory. For instance if you are looking for a way to prepare seafood you could restrict your search to directories called recipes perhaps food by writing set match_path recipes. This search ignores case. Multiple terms are separated by colons e.g., set match_path recipes:food will retrieve files with one or both of these components --recipes or food-- in the names of the directories that lead to them.
- set sortby ? [ filename | hostname | none | size | time ] These sort archie's output. (none means no sort, which is useful if you have declared a sort and wish to revoke it).
- set output_form terse The opposite of this is set output_form verbose The normal results of an archie search are in verbose mode, providing the host name and IP number, the date of the file's last updating, its directory location, and associated file information (like permissions). The terse format reduces the output to a single line an example being ftp.std.com 00:00 6 Oct 1990 512 bytes /src/pc/comm/qmodem
If you want to examine your results (since they can sometimes scroll by too rapidly to view) you might write at the prompt set mailto firstname.lastname@example.org. Now you can use tha mail command to send yourself and others the results of your query. Or else you might rather use the set pager command which displays material one page at a time (which is the program called less under UNIX), advancing through the pages by pressing your spacebar. When you are finished you enter a q followed by a RETURN. The command list can be used to indicate all the sites in the database at the server site, or in conjunction with a UNIX regular expression to limit the search to particular domains. For example, you can use list to search for all sites in Switzerland, using a regex term: list .*ch$ This returns any site with the ch domain name, and excludes others. The $ sign (again) specifies that no text should follow the search term ; the .* allows any text to exist in front of the term. A full example of this:
archie> list .*ch$
# Your queue position: 1
# Estimated time for completion: 5 seconds.
aragorn.unibe.ch 22.214.171.124 14:30 22 Jan 1999
bandon.unisg.ch 126.96.36.199 00:37 29 Oct 1997
claude.ifi.unizh.ch 188.8.131.52 04:49 27 Jan 1999
ftp.inf.ethz.ch 184.108.40.206 12:08 22 Jan 1999
domreg.nic.ch 220.127.116.11 12:08 22 Jan 1999
iacrs1.unibe.ch 18.104.22.168 14:28 22 Jan 1999
liasun3.epfl.ch 22.214.171.124 05:13 29 Jan 1997
liaftp.epfl.ch 126.96.36.199 23:42 24 Feb 1998
lucy.ifi.unibas.ch 188.8.131.52 04:59 29 Jan 1997
ftp.switch.ch 184.108.40.206 22:20 26 Feb 1997
iamftp.unibe.ch 220.127.116.11 14:29 22 Jan 1999
ftp.cscs.ch 18.104.22.168 14:30 22 Jan 1999
ftp.unizh.ch 22.214.171.124 04:46 27 Jan 1999
rd24.cern.ch 126.96.36.199 23:42 24 Feb 1998
sunsite.cnlab-switch.ch 188.8.131.52 12:09 22 Jan 1999
ftp.unibe.ch 184.108.40.206 14:30 22 Jan 1999
ftp.unige.ch 220.127.116.11 12:07 22 Jan 1999
ftp.ethz.ch 18.104.22.168 23:40 24 Feb 1998
Now the last few command I will cover are help which you can type a ? at the help prompt to get a list of available subtopics. Use quit to exit (of course! ;-) and servers to generate a list of publicly available archie servers known to the site you are currently using. You can also type in manpage to get a look at the manual page for archie!.
The whatis command: archie maintains a second set of data called the Software Description Database, in which are found short descriptions and the names of numerous files stored around the internet. As with archie's Internet Archives database, it should be understood that not all these files are executable programs (i.e., docs and other stored data). whatever it is, using the Software Description database through the whatis command can help. To search the database, use whatis followed by the term you are looking for. I might, want to know example below
regex and whatis
archie> whatis moon
astro Computes astronomical data about the sun, moon, and planets jupmoons Jupiter's major moons simple plotter [in perl]
moon A phase-of-the-moon-program
moontool The moon on a Sun
phoon Phase of the moon, date routines
rise_set Sun and Moon rise/set program
xmoon Dynamically display astronomical data concerning the moon and the sun
xphoon Draw the current phase of the moon on the root window (under X11)
next you search!
archie> find jupmoons
# Search type: exact.
Host ftp.uni-koeln.de (22.214.171.124)
Last updated 00:32 23 Jan 1999
DIRECTORY drwxrwxr-x 2048 23:00 22 Apr 1993 jupmoons
Host scitsc.wlv.ac.uk (126.96.36.199)
Last updated 09:07 13 Feb 1997
FILE -rw-r--r-- 8934 01:00 25 Aug 1991 jupmoons
regex expressions or UNIX regular expressions (basic stuff). As I said before using a search term without further regex expressions causes the search to be treated as a hunt for subsearches ; the effect is as if you entered set search sub. So if you do a find orc (you will probably get "hey fella there's no FTPs on the moon ;-)" but seriously it is the same as typing find .*orc.* which is covered in the list command above. So aside from these signs ($), (^), and (.) and (*) --let me say one thing about the asterisk and period...the asterisk stands for zero or more occurences of the preceding regular expression, in other words the example .*orc.* the period lets the search term be preceded by any one letter, while the asterisk means that any number of letters can occur before the orc string occurs. So the asterisk looks to the preceding expression, which is a period, and determines that it can occur any number of times (dig it?). The same goes for the end of the term, so that any number of letters can occur at the end of the term as well. Use [brackets] to show a set of characters you want to match. example: [smt]end you will get results ranging through [send, mend, tend], matching any of the four bracketed characters to the string that follows. It will end up finding much more coz the regular expressions have .* at the beginning and at the end unless a carat (^) or dollar sign ($) appears. There are many more expressions such as these. I end now! yet the end is good no? heh I steal Nexors form. bye friends hope this helps (I know its all out there yet I also know its good to have a quick printed reference ;)