searching ebooks

~ How to find any text, any book, any journal or any poem ~
(Yes, any, duh :-)
1st published at in December 2003
Version 0.31: March 2009

Project Gutenberg
Full-length electronic texts
Collections on specific subjects
Searching books
Et ab hic et ab hoc
History of the e-book scene
Poetry (and Lyrics)
Our own library
Universal library for personal use
Google & Amazon book searches
pdf global searches

(& a rant again and against patents and copyright-enslaved countries)

Pearls of ebook seeking, text fetching and book finding wisdoms in this very important section of [searchlore]. Requires some work and reading from your part as well, though. If your attention span is limited to the advertisement rithms you are slurping on TV you shouldn't visit this site at all anyway.
Readers that will take the time to experiment a little with the following info will be able to open many a virtual library door, far beyond the limited possibilities offered by -say- the [old google book search] or the [new google print] (more on google & amazon book searches below).
Note that you may also enjoy a more specific [unabridged discussion] about ebooks 'webbits', or you may want to delve into our [universal library for personal use]. Note also that -alternatively- this kind of targets may be fetched using some more dubious, but undoubtedly pretty effective [underground searching lore] as well.

Do not forget to check the older "classrooms essays": [Books & books & dark riders] and [Cat burglers in the museum after dark].
Furthermore, remember the oldest [webbit] trick of the trade, the so called "long reference fishing"...
["'who is that?' Frodo asked, when he got a chance to whisper to Mr. Butterbur"]

Now a possible question is: once you find a book that may be patented (they call them "copyrights", but in fact they just mean "patents") shall/can you download it?
"Latet anguis in herba", ya know, and patents have been made to last longer and longer (nobody seems exactly to know why patents should last so long, btw: already around 70 years long in most copyright-enslaved countries), besides as you will see in the following there are infinite wondrous books in the public domain, so that you hardly will need to fetch patented stuff.
Yet the question remains: can you download stuff that could be patented?
This is a very difficult question to answer: there are several philosophical schools on such matters. Some say that if a book is on the web and you can download it, then you may in bona fide assume that it is probably in the public domain (else they would not have uploaded it at all, would they?)
But be careful, and if in doubt, don't steal. Seekers do not need to hoard on their harddisks anything. Whatever you find will anyway always remain somewhere in the void-forest, ready to be plucked once more should you need it again
Everything that lands on the web is bound to be copycatted a zillion times and hence remain available forever to those that know the fine lore of searching.
Rest assured: your book, any book, will be out there, somewhere, for the eternity.
Now learn how to find it

red Universal library for personal use red            red Our own library red

Project Gutenberg

Project Gutenberg at, or Project Gutenberg at Pages: One of the first full-text Internet collections. Easiest to use by using the alphabetic or specific searches for author/title. Note that there are various "current" Project Gutenberg sites. Many o links provided on the web point, alas, to earlier addresses which are no longer being maintained.

Gutenberg's online catalog:
Gutenberg's advanced search engine:

Gutenberg's Database search
Search by Author or Title. For more guidance, see the Advanced Search page, where you can specify language, topic and more.

Note that often enough you have links to computer generated audio books in mp3 format as well...

offline catalogs
recent books

(Publish your own stuff)

The EServer is a growing online community where hundreds of writers, artists, editors and scholars gather to publish works as open archives, available free of charge to readers.
In today's world of corporate publishing, value is placed on works that sell to broad markets (read "zombies"). Quick turnover, high-visibility marketing campaigns for often awful bestsellers, and corporate "superstore" bookstores have all made it difficult for good, unique and older texts to be published. (Further, the costs this marketing adds to all books discourage people from leisure reading as a common practice, this is probably intended by the slavemasters.)
Thus publishers tend to encourage authors to write books with strong appeal to the current, undermining (not unknowingly) writings with longer-term implications.
The EServer (founded sixteen years ago, in 1990 at Carnegie Mellon as the English Server), attempts to provide an alternative niche for quality work, particularly writings in the arts and humanities. The EServer is now based at Iowa State University. Today there are various "free" hosting sites available on the Internet. Some of these, such as GeoCities or Tripod, generate profit by attaching awful advertising to the information that people post on their websites. Instead, every EServer member has an unquotaed private, personal space to store her/his work, without advertisement or popups.


Full-length electronic texts

Check also the Universal library for personal use

First of all, from the darkness of the past (I remember when the Internet was a large turtle with wings of gold), a complete GOPHER library: Wiretap.
For instance, note that the books are in TeX format, so you'll need some savoir faire.

The WWW Virtual Library : The VL is the oldest catalog of the web, started by Tim Berners-Lee, the creator of html and the web itself. Unlike commercial catalogs, it is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they are expert; even though it isn't the biggest index of the web, the VL pages are widely recognised as being amongst the highest-quality guides to particular sections of the web.   British mirror   Swiss mirror  

The University of Pensylvania Online Books Page: offers a search by author or title, as well as links to many web sites that offer collections of full-text publications: see below under Searching books 

ABU: la Bibliothèque Universelle: BIBLIOTHEQUE NATIONALE DE FRANCE (BnF) (through google)


Full text: most useful search mask.

For instance: Rhetorique: just change the following URL accordingly:
Else here is the mask, rememebr to wait some seconds after having chosen a result: all books have been scanned.
Mots du titre ex: misérables
Types de documents
Tous les documents
Ouvrages en mode texteOuvrage en mode texte
Monographies en mode imageMonographie en mode image
Périodiques en mode imagePériodique en mode image
Lots d'imagesLots d'images
Documents sonoresDocument iconographique
Documents manuscritsDocument iconographique
Auteur ex: victor hugo
(A B C D E F G H I J K L M N O P Q R S T U V Z)
ex: Médecine expérimentale
Recherche libre ex: moulin rouge

Full text (scanned images), most useful search mask.

Online Book Initiative

In parentheses: "In Parentheses is devoted to distributing texts, translations, and commentaries from a wide variety of areas and disciplines in an elegantly presented form", mainly medieval texts.

American Memory from the Library of Congress

Athena: authors and texts: thousands of full-text materials, many from other collections (such as Project Gutenberg) in several European languages. Links may point to website for collection instead of actual book.

The Bartleby project: limited collection of classic works of reference, poetry and literature. 

Dissertation Abstracts: Titles and abstracts from the most recent two years are available free of charge for most dissertations; older work requires access through a subscribing institution

Electronic Editions: Books from the University of California Press. As an experiment, the UC press has placed online the full text of selected books on its list in International Studies, Classics, Literature, History, Anthropology, Politics, and Religious Studies. The site uses frames to prevent downloading the entire book, but the full text can be read online.

Electronic Texts on the Internet: A list of lists from RefDesk

The Internet Public Library (Michigan university)
IPL Online Texts collection 18,000 titles that can be browsed by author, by title, or by Dewey Decimal Classification.  Recommended.
IPL Search engine (advanced): http://ipl.s

National Archives and Records Administration, this site is confusing to navigate, but has a rich collection of documents and images.  The National Archival Information Locator is the search page. Try the homepage for additional information.  The Archival Research Catalog is intended to replace this shortly.

Oxford Text Archive: Links to American Mirror for the OTA because the webmaster often has difficulty using the U.K connection. A large and intimidating listing of electronic texts via FTP. Not recommended for the computer challenged. "We offer searchable online literature for the student, educator, or enthusiast. To find the work you're looking for start by looking through the author index. We currently have over 300 full books and over 1000 short stories and poems by over 90 authors."


Collections on specific subjects

Librarians' Index to the Internet (California) a guide to Internet resources: a searchable, annotated subject directory of more than 12,000 Internet resources selected and evaluated by librarians for their usefulness to users of public libraries. advanced search mask, for instance:

Répertoire des bibliothèques de France (CCFR: Catalogue collectif de France)

Core Historical Literature of Agriculture: From Cornell: several hundred works covering all aspects of rural life and farming including nutrition, rural sociology, food preservation, and economic botany. Extremely well organized. Recommended.

Alex Catalogue of Electronic Texts: Several hundred works from the "western canon" with useful indexing and search tools. Recommended

Ancient Greek Sites on the WWW: Includes works by Plato, Socrates, Euripedes, etc.

The Avalon Project at the Yale Law School: Full-text digital documents relating to Law, History, Economics, Politics, Diplomacy and Government. Has lots of basic legal documents and charters with supporting documents

Bartleby Verse American & English Poetry: 1250–1920. Full-text versions of six classic poetry anthologies.

CIHM: Canadian Institute for Historical Microreproductions

Economic History Virtual Library (A-J) Economic History Virtual Library (K-Z) The Economic and Business History section of the WWW Virtual Library is maintained in Amsterdam by the Netherlands Economic History Archive.

Archive of the History of Economic Thought: maintained by Rod Hay of McMaster University: a large full-text collection of classic works in economics and political theory. Not all listed works are accessible to public users. Also provides classic reviews and bibliographies. Recommended.

EuroDocs: Western European Documents: Links to many full-text collections of documents.  Recommended

Frenc h Revolutionary pamphlets: From the Artfl project

Internet History Sourcebooks: By Paul Halsall at Fordham University. A large collection of primary texts and other materials, primarily collected for classroom use. Arranged in three groups, for Ancient, Medieval and Modern History, and many subgroups. Most links are to short selections from larger works, but there are also links to major websites such as the Galileo Project

Internet Library of Early Journals (ILEJ)  Scanned pages from selected years of 6 British 18th. and 19th. Century journals: Philosophical Transactions, the Gentleman's Magazine, Notes and Queries, Blackwood's Edinburgh Magazine, the Builder and the Annual Register. Partly searchable. 

The Literary Gothic: In addition to a wide range of research aids, this site offers an extremely extensive collection of  literary works.  Easy to search alphabetically.

Medieval Manuscripts from The Digital Scriptorium: A test site containing images from the Bancroft Library, Columbia University and other libraries. Thousands of medieval manuscripts are catalogued, but the site is difficult to use. Read the search tips carefully. Manuscripts are stored as images, so they are slow to load. This site will probably involve a fee when completed, so try it now.

The Online Medieval and Classical Library  from Berkeley.  Searchable.  Recommended.

Model Editions Partnership: experimental site offering editions of classic American papers including Documents of the First Federal Congress, the legal papers of Abraham Lincoln, and the papers of Elizabeth Cady Stanton and Susan B. Anthony, as an exercise in the preparation of web editions of texts. Four of the editions are based on full-text searchable document transcriptions; two are based on document images; and one is based on both images and text.

Ch ristian Books on the Web: Large collection of bibles and prayer books, Augustine, Loyola, Calvin, Law, Pascal. Bunyan, Foxe

Sacred and Religious Texts: From Bahai to Zoroastrian

Secular Web Historical Texts Library: electronic texts of authors such as Lucretius, Paine, Voltaire, Locke, Spinoza, Darwin, and Russell



University of Pensylvania
(a tresure chest you'll never forget)

(University of Pensylvania's Digital Library)

Au thor:

   Words in last or first name
   Exact start of name (last name first)


Words in title
Exact start of title ("The", "A", and "An" can be omitted)

  • Entering austen, jane in the Author field finds
    books by Jane Austen.
  • Entering Baum in the Author field and
    and oz
    in the Title field finds L. Frank Baum's Oz books.
  • Entering dosto in the Author field,
    choosing the Exact start of name option, and entering
    underground in the Title field finds Fyodor Dostoevsky's
    Notes from the Underground, even if you don't remember
    how to spell more than the start of the author's name! Upenn's online books.
http://onl Upenn's online books, search mask, the same reproduced above.
For instance: doyle.
Virginia edu
(another tresure chest, especially for all palm & lit formats)

Carry your complete library in your pda!
(and many interesting languages collections)
Full text search: www-ebooks?specfile=/texts/english/ebooks/ebooks.o2w
For instance: doyle
Compound search mask

Search virginia for word or phrase:


If more than 100 results view

Finally: Electronic Text Collections in Western European Literature
(Catalan | Danish | Dutch | Finnish | French | Galician | German | Greek | Irish | Italian | Latin | Norwegian | Old Norse & Icelandic | Portuguese | Romanian | Provençal | Spanish | Swedish)

Library of Congress

[gateway access to LC's catalog] (and those at many other institutions)

Deutscher Bildungsserver (finds quickly what you need)

How does it work? Explanation in german

http://www.bildung, try "doyle"
ADVAred Advanced search: http://www.bildu
TEXTred Text search: http://www.bildungsser, for instance: doyle: note the difference with the 'qsuche' search mask above.
This mask is also known as "Broker Abfragemaske": h/.
Freebooks in Oz


xrefer's contains encyclopedias, dictionaries, thesauri & books of quotations. All cross-referenced, all in one place


Another e-books search engine (search also 'in rapidshare'):

Jeff's quick tip

[from the ~S~ Seekers' msgboard] 
~S~ Jeff has given to the seekers community so much that it would be hard to find something great enough to tank him, here, if you take the time to understand the following, a 'compendium' on "how to find a book on the web".

Re: book search strategies?

"I am making this post after a search for:
Genius The Natural History of Creativity
written by
Dr. Hans Eysenck
Very poor results, but I did not try everything yet.
Suggestions for this or any book search?

sometimes "too much" is "too little"...

i begin with your words ...
Dr. Hans Eysenck 1910 returns
ok the guy is there ... what do I want? something written BY him
"by Dr. Hans Eysenck" otoh only 3 returns

regroup-rethink ... too much is too little
by Hans Eysenck 6,670 returns --- ok back on track now

filter alittle more ... Genius 512 ... and I see your full book title ... I could take a different path here now or keep on with this one ... I decide to take the Y in the road

is it online??? lets ask

"full text" "Genius The Natural History of Creativity" returns 4

i only looked at the first return ... seems to indicate a full text ... asks for your proquest login ... ah so now we know how we can get a full text Welcome

many ways to skin a cat ... problem is catchin it


A nice webbit:
-inurl:htm -inurl:html intitle:”index of” +(“/ebooks”|”/book”) +(chm|pdf|zip) +”For Dummies”

Also, of course, -inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip) +"o'reilly"

Here the complete story of this webbit:

OReally Google Webbit (17/05/05 12:01:49)


Once you learn google search, you can find anything. Want some ebooks? Oh, yeah... google does that easily. Another power searching lesson coming right up.

Google: -inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip)

What does all of this mean? The -inurl htm and -inul html is attempting to get rid of regular webpages and show just index pages. Looking for index of in the title is doing the same. Using the pipe ( | ) tells google to look for something OR something else. Here were are telling google to look for book or ebook directories... and we have listed several common ebook formats (zip, pdf, chf).

If you would like to look for a particular author or title just tack it to the end of your search.

Google: -inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip) +"o'reilly"

This uses the same idea but attempts to focus on directories that contain O'Reilly stuff. It's not perfect, but it's better than paying.


Re: OReally Google Webbit (17/05/05 12:59:00)
let's clean this query, it looks like a mess. Does obfuscating the query makes it look more leet or what ? ;)

-inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip) +"o'reilly"

could be written :

intitle:index.of ebooks|book chm|pdf|zip o.reilly -ext:htm -ext:html

you don't need to put all those parenthesis or quotes. It basically produces the same result, with less thing to type..

and then you can optimize it a bit : rar and nfo files are good signals for a good ebook release. directories can also be named 'books', and scene ebook tagging usually put 'ebook' in the filename (remember that google has still some big fuzzy results concerning stemming, so stick to piping the different way to write a word). And finally, oreilly can be written without the space between 'o' and 'reilly'.

That finally results to this query :

intitle:index.of ebooks|book|books|ebook chm|pdf|zip|nfo|rar o.reilly|oreilly -ext:htm -ext:html -ext:asp

btw, i guess this webbit was already posted two or three time in our boards in the past 5 years, and was copied over and over ;)


Re: Re: OReally Google Webbit (17/05/05 20:55:32)
> Does obfuscating the query makes it look more leet or what ? ;)

Probably... looks like the originator, AlexTheBeast, "got a gift certificate, t-shirt, and mug for submitting such a great [webbit]".

> btw, i guess this webbit was already posted two or three time in our boards in the
> past 5 years, and was copied over and over ;)

That was also what I figured - Rabbits.html says it was first posted at ~S~ in November 2002, and I recall using the technique in the 2002 summer, so.......... :)


(and Lyrics)

Poetry! Poetry! My god, poetry!
Nowadays almost forbidden, and anyway of course deprecated, since it does not bring any immediate profit, since it is hard to insert advertisements inside a poem (not that they wouldn't try, those clowns), and since, more generally, poetry does not significantly contribute to the holy zombification of humankind into brainless consumer automatons that the powers that be so fondly cherish.

One reason more to read poetry! Reading poetry you don't consume anything, you don't spend a dime (everything is on the web, duh), you don't watch a single crappish ad and at the same time you enrich yourself immensely. Like music, painting and playing, and maybe even more, poetry is worth all the hours invested in spades.
An added advantage: while reading poetry you can imagine the livid alarmed green faces of the advertisers and snake oil sellers...
"Sir! Sir! One slave has escaped"
"Yeah, I saw him reading Philip Larkin, he's already too far away, be more careful in the future"
...and enjoy your readings even more :-)

Note that this is more a proof of concept that a real list of poetry search engines. (Yet I'll add more :-)
Poetry and prose edited by members of the Department of English at the University of Toronto from 1912 to the present. Electronic Index by Ian Lancashire

Search a verse at bartleby
   Alphabetical listing of poets featured on Poetry Connection

Anna Akhmatova: "Where Stalin is, there is Freedom, Peace, and the grandeur of the earth"... sounds like a paid "Anchorman" speaking for Bush (or Saddam, or whomever pays him... intellectuals -or clowns posing as intellectuals- are eo ipso prostitutes). And yet... Yet poets are not lowlife Anchormans, despite the many similarities.
Alas.. all real great poets were rather weak characters, he, never forget it, never condone it, read more poetry and enjoy...

Lyrics are not poems, but they may be interesting as well... Search a lyric at leoslyrics

Search within song lyrics:

Search as:
Scattered Words

Google & Amazon book searches
(Commercial crappiness & some funny querystrings) Google "print", for instance "the symbolisms of heraldry"
     Advanced Book Search
  Google Book Search Help
Search: A9 search, for instance : "the symbolisms of heraldry"

You can personalize it

The Google Library Project is just a part of the broader Google Print initiative, which -according to google- intend to put books and their content "where you can find it most easily: right in your Google search results."
Google Print has another part, the Publisher Program, which presents full-text searches of *only* those books that have been authorized by publishers (confusely called "public domain" books, and yet thousands of them).
All this is due to the ususal crappy patent/copyrights paranoia of our slavemasters.
Since users can see only a few pages at a time -see below- and cannot print them unless they re-scan the whole bazaar, publishers agreed that this would spur sales with no (or little) risks and no expenses whatsoever.
It is worth underlining that Google Library is not the publisher program.
This allows users to search and read the entire text of any work in the public domain, but, for patented books, it provides only short snippets of text with their collated context.
This is bad for global knowledge spreading, of course, but still good for seekers that need to gather good query arrows :-)

A brief digression about books searching on google print (or on amazon).

Let's use as an example W. Cecil Wade's "The Symbolisms of Heraldry or A Treatise on the Meanings and Derivations of Armorial Bearings". Published in London in 1898 and thus not copyrighted anymore.
Eay to find on the web. You can e.g. search inside this book in google... a.*|e.*|i.*|o.*|u.* to get a wallop of pages and then a small script à la butler or 'customizegoogle' to retrieve many pages automatically.
Of course with a.*|e.*|i.*|o.*|u.* some pages could still escape: the old trick for a comprehensive (all pages) search is to use "la totale" "the | of | to | and | a | in | is | it | you | that | he | was | for | on | are | with | as | I | his | they | be | at | one | have | this | from | or | had | by | hot | word | but | what | some | we | can | out | other | were | all | there | when | up | use | your | how | said | an | each | she | which | do | their | time | if | will | way | about | many | then | them | write | would | like | so | these | her | long | make | thing | see | him | two | has | look | more | day | could | go | come | did | number | sound | no | most | people | my | over | know | water | than | call | first | who | may | down | side | been | now | find | any | new | work | part | take | get | place | made | live | where | after | back | little | only | round | man | year | came | show | every | good | me | give | our | under" (or some variants thereof, there's a 32 words limits nowadays).
Now the problem is that for crappy copyright enforcement reasons many pages ARE BLOCKED FOR ALL USERS (both in google and in amazon). So you cannot easily retrieve all pages. But you still can -as explained above- see the snippets for all pages of a book. Hence, having access to snippets from all pages through the above query, you'll easily gather enough arrows in order to find your targets elsewhere on the wide web :-)

pdf global searches

'Course you can use (almost) any main search engine advanced search options, and limit your search to pdf format results, but there are also some specific "global" pdf search engines out there, that will either collate the results for you, or offer results from a pdf-only database

Here a specific "pdf" search engine:
'Course you can find books and screenplays' "canovacci" as well: big sleep

"pdfgeni" is another one:

See the pdf section for an ad hoc discussion of this ubiquitous proprietary format.
Common pdf problems
  • Books which were scanned directly into PDF may only have the graphic portion: there may be no computer-readable text at all. These books are not searchable.
  • Books that were scanned and converted from graphic display to digital text using OCR (optical character recognition) may have significant numbers of errors. This is more common if the original book is old or was not perfectly aligned. In this case, many search terms will not be matched although the words were in the original printed or typed text, because they were not correctly interpreted. Some search terms may be falsely matched if the OCR software incorrectly interpreted the original text.
  • Books and documents with multiple columns which were converted to PDF by some layout programs will display correctly and contain the correct digital text, but they miss the text flow: the words don't come in the correct sequence. Therefore the search engines will fail to match phrase queries because the phrases were wrapped on the next line of the column in the original, but that relationship was not stored in the PDF. As seekers know, however, you can often use hypallage, tmesis and synchysis to your advantage when searching the web :-)
  • Documents generated by some applications will contain partial words due to hyphenation, incorrect coding of ligatures and extended characters (diacriticals and letters beyond the basic 26), and other unusual situations. These mangled words will not match queries, although the words were in the original text.

Et ab hic et ab hoc

Ebooks for gameboy: ebooks2go/ :-)

Loki's "chm" webbit:
ebook searching - chm format (13/11/03 22:00:36)
    I've seen recently, especially when dealing with tech stuff, a lot a ebooks, in the chm format. Most of them were formated by some bookwarez scene (you know, the package informal rules thing).

    CHM is the Windows' Compiled HTML Help format. Microsoft never released the format specification, but (as written on the site linked above) there are some reverse engineered descriptions.

    Anyway, what's interesting is the fact it's build on HTML, and that it can create nice ebooks with a TOC even if it wasn't build for that use.
    The fact is : THEY use it.

    that's all, just a small thing to know : chm is a keyword to know if you want to fish ebooks.

    See for yourself.
    note that google seems to index some chm files, event if it can't read its content - most of the time it should be compressed with Microsoft's LZX algorithm. So, queries like +filety pe:chm +ebook are powerful

Mordred's "lit" webbit:
~ Searching for lit ebook files. (14/11/03 12:05:53)

    Btw, you'll find this utility very useful (if you haven't already):

Loki's and oxo's "converting lit & chm" for fun & profit
Converting .LIT files for fun and profit (14/11/03 15:30:29)
    One more, also with sources :)

    the core of the lit format is the LZX compression, used also in the CAB and CHM format.

    See libmspack :

    A library for Microsoft compression formats

    The purpose of libmspack is to provide both compression and decompression of some loosely related file formats used by Microsoft. The intention is to support all of the following formats:

    File format nameFile extensionIntroducedAlgorithm(s) used
    Microsoft Help.HLP1990LZSS
    COMPRESS.EXE [KWAJ].??_1993LZSS, LZSS+Huffman, deflate
    Microsoft Cabinet.CAB1995deflate, Quantum, LZX
    HTML Help.CHM1997LZX
    Microsoft eBook.LIT2000LZX, SHA, DES


decompiling chm books (14/11/03 12:20:54)
    according to MSDN, you can use the same hh.exe to decompile such a file:

    hh.exe -decompile folder chm

    -decompile is the switch

    folder is the name of the destination folder where you want the decompiled files to be copied

    chm is the name of the compiled help file you want to decompile

Loki's "amazon" webbits
Re: Re: some comments, hey loki (25/11/03 21:36:22)
    "Hey loki, i do not quite follow your hint to 'a closer look' to amazon. You mean there is something we can access that we do not see on amazon's 'frontside' or do you mean that amazon is more useful for searchers than we believe 'as it is', I mean without hidden functions?"
    I mean their engines are becoming more and more powerful, and i think we should keep an eye on it. I have solved the little quest about the illustration of the new emperor's clothes using the ability of amazon to show some scanned pages (the frontlap carry important 'meta' information). And now amazon allow to search inside the books.

    If you have read the solution of the 909 riddle, wrote by vvf and jeff, you already know the nice amazon-audio trick : change the .com part of the URL to or .fr or .de or whatever you like and see the amazing thing: some countries let you listen to more clips than others!

    But I don't know yet if there are some hidden features that await us, but why don't we try ? There are already some good stuff dealing about that (btw, this book is already scanned and spreaded on the net ;)


A small vignette by cgull
Dear f+,
A small vignette for your growing ebook section.

Using Amazon's "Search inside this book" and some searching techniques to pull the webbit...

1. Mozilla firebird [or any browser capable of disbaling javascript 'on-the-fly']

2. account [made with false IDs of course]

3. Some grey matter to make "fishing-line carrots" ;-)

1. Login to with your acct. using Firebird, leave javascript enabled for now.

2. Identify book of interest, in this example, let us pick up "Structural Bioinformatics", by Bourne, published by Wiley-Liss, Inc. [By all means a great compendium of papers pertinent to the field,and darn expensive to boot.]

3. Click on "Search inside this book" link, under the cover pic.

4. Before doing anything else, lets look at the table of contents or TOC, to identify the pages of interest to us. I pick up Chapter 27, "ab initio Methods" on page 547.

5. Well, now if we try leafing through the pages which amazon shows us (without even loggin in), we cant get too far. These are just TOC, index and front and back jackets.

6. But, with a shiny new acct. you can log in and search inside the book. So lets do that for our chapter.

7. First try, "ab initio methods", 50 hits. OK. What now? Through the crappy page numbers shown in results, getting the page of interest is a nuisance.

But look at the page number in the TOC, and append it to the query, "ab initio methods 547", bingo the first hit is the chapter 27's first page.

8. Ho hum, big deal. Amazon lets you see only two pages before and two pages after the page you pick from the results. That doesn't help much.

9. Well, what now? Darn, we are searchers, not thieves (to paraphrase +ORC, with due respect).

10. Hmm. There is something called Figures in this book. Lets make our "fishing-line carrot" based on that.

11. "Figure 27.1" is my next query. (Chapter 27, figure 1, duh!)

12. Bingo, on page 2 of the results, I have my target page # 549. Now I can go two pages back to get page # 548 and two pages ahead as well.

13. Just below that you can see the page for Figure 27.2 and so on...

14. Great, now that we can get to the pages, what next, u dont expect me to waste a forest prinitng those darn pictures directly from the browser, right?

15. Well, try right clicking and saving the page (pic!) that amazon shows. Hmm, nada, zilch, zero. NO USE.

16. Now, Firebird comes in to the rescue. Turn off javascript and you can right click and save the images . Just take care to save them as jpeg and number them according to the page numbers so that u can read them at leisure.

17. Have fun! and learn...

Greetz to all the awesome folks at and fellow ~S~ seekers.

All rights reserved and reversed.
(c) 2004 Cgull

More Amazon hacks by Ben


Abusing Amazon "Search Inside the Book"

I just perpretrated my first abuse of the new Amazon "Search Inside the Book" service. It was interesting, educational and ultimately fruitful, though labour-intensive.

A while back, I read the first half or so of Constance Hale's "Sin and Syntax" (which I thereafter lent to Gord; have I got it back since then? I can't recall). It contained a wonderful quote by George Bernard Shaw:

If you do not immediately suppress the person who takes it upon himself to lay down the law almost every day in your columns on the subject of literary composition, I will give up the Chronicle. The man is a pedant, an ignoramus, an idiot and a self-advertising duffer... Your famous specialist ... is now beginning to rebuke "second-rate" newspapers for using such phrases as "to suddenly go" and "to boldly say." I ask you, Sir, to put this man out ... without interfering with his perfect freedom of choice between "to suddenly go," "to go suddenly" and "suddenly to go".... Set him adrift and try an intelligent Newfoundland dog in in his place.

The most distinctive word in that passage is easily Newfoundland, and dropping that into the query box at the Amazon Search Inside the Book page for Sin and Syntax did indeed return part of the passage. The result, though, is only the final sentence (and a section heading, with a shard of the first sentence thereafter):

1. on page 72:

"... of choice between "to suddenly go," "to go suddenly" and "suddenly to go...... Set him adrift and try an intelligent Newfoundland dog in his place. Carnal Pleasures Sometimes a writer does without other parts of speech altogether, letting a verb demand ..."

A little experimentation showed that it is also possible to search for exact strings of text... such as the first few words in the present sentence. Lo and behold, a further search for "of choice between" revealed:

1. on page 72:

"... "to boldly say." I ask you, Sir, to put this man out ... without interfering with his perfect freedom of choice between "to suddenly go," "to go suddenly" and "suddenly to go...... Set him adrift and try an intelligent Newfoundland dog in ..."

2. on page 151 [...and so forth...]

The rest of the quote was trivially reconstructed by applying the same process until the beginning of the sentence was found, then merging together all of the partial results. Altogether, seven queries

These seven, in fact, in order last to first:

"... 72 SIN AND SYNTAX "Ross wants you to for God's sake stop attributing human behavior to dogs." ... apoplectic. Shaw wrote this to the local paper: If you do not immediately suppress the person who takes it upon himself to lay down the law almost every day in your columns on the ..."
"... mar to our Anglo-Saxon tongue insist the split infinitive is a no-no, they're dead wrong. Copy editors need to back down, lest they earn the wrath of a latter-day George Bernard ... you do not immediately suppress the person who takes it upon himself to lay down the law almost every day in your columns on the subject of literary composition, I will give up the ..."
"... upon himself to lay down the law almost every day in your columns on the subject of literary composition, I will give up the Chronicle. The man is a pedant, an ignoramus, an idiot and a self-advertising duffer.... Your famous specialist ... is now beginning ..."
"... I will give up the Chronicle. The man is a pedant, an ignoramus, an idiot and a self-advertising duffer.... Your famous specialist ... is now beginning to rebuke "second-rate" newspapers for using such phrases as "to suddenly go" and "to boldly say." ..."
"... famous specialist ... is now beginning to rebuke "second-rate" newspapers for using such phrases as "to suddenly go" and "to boldly say." I ask you, Sir, to put this man out ... without interfering with his perfect freedom of choice between "to ..."
... "to boldly say." I ask you, Sir, to put this man out ... without interfering with his perfect freedom of choice between "to suddenly go," "to go suddenly" and "suddenly to go...... Set him adrift and try an intelligent Newfoundland dog in ..."
"... of choice between "to suddenly go," "to go suddenly" and "suddenly to go...... Set him adrift and try an intelligent Newfoundland dog in his place.

were necessary to reconstruct the entire paragraph - labour-intensive, yes, but very simple to automate.

The structure of the query URL is pleasantly simple. Here is the second example, with the important parts highlighted: 0767903099 ?v=search-inside&keywords= %22of+choice+between%22

The first part, of course, is the ISBN, and the second is the query string. It's a bog-standard escaped HTTP sequence, as simple as one could possibly desire. Screen-scraping the query result page for the text is trivial (nasty regex hint: if the query is non-empty, the desired string is in the first <td class="small">...</td> after the string "on Page". The latter is contained with an anchor tag, though this isn't strictly necessary to leverage) and turning the first three or four words of this text into another query URL is exactly as easy as you think.

Ease notwithstanding, this is not a general-purpose way to retrieve the full text of a book. Fifteen to twenty queries or more are required to extract the full text of a single page, and the server load imposed by a single IP address trying to reconstruct an entire book could (does?) easily trigger server-side countermeasures. Also, no formatting is present in the extracted text, not even section headings, so the snarfing of structured text (textbooks, reference works, even index pages) would be prohibitively difficult to automate.

Amazon has deeply impressed me with this: they've managed to create a tremendously useful resource which is minimally susceptible to abuse. I can't wait to see how the competition from Google Print shapes up. As long as we can fight off the Intellectual Property wolves, we may yet manage to create an informational golden age.

The IRC path by book

I have been searching for the book "MUD Game Programming" for quite a while, but could not find it using any of the techniques on the text Targets . I came across the string Premier.Press.MUD.GAME.PROGRAMMING.ebook-lib, though it was trying to sell me. I almost gave up, but then I remembered reading a IRC way to find music that mentioned books too. So I clicked on it, loaded irc, logged on to undernet->#bookz, then searched with @find, and the very first response was the book I wanted. Maybe the IRC page should be added to the Books target page? It is on the sound one, but not the books one. book

Final hint for all lamers and leechers that 'don't have the time' to learn:
...this is all nice and dandy, but pray, oh great seekers, just tell me where can I download those pesky doyle's books?
(or, if you are more clever than that, maybe here instead)
Petit image

© 3rd Millennium: [fravia+], all rights reserved