Page 4 of 8 FirstFirst 12345678 LastLast
Results 46 to 60 of 106

Thread: BGL (babylon glossary) to GLS (babylon glossary source).

  1. #46
    d8o8s8
    Guest

    I found something

    I found a simple small decompiler using google at
    http://tankado.com/index.php?2008/06/21/281-babylon-bgl-decompiler
    I checked and it worked the the older version of BGL file I had (didn't work for the new ones).
    I also verified with filemon it doesn't mess around (still not taking responsibility, not my file).
    Hope this helps.

    ------------------ EDIT: sorry, just realized its the same app as bglgls previously posted on this thread.
    Last edited by d8o8s8; December 2nd, 2008 at 08:25.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  2. #47
    antico2
    Guest
    Szereshki I've used nothing of special, I've simply used DevC++ 4.9.9.2 and of course the zlib 1.2.3 downloaded as package from DevC++. The only think to remind is the casting problem ( not implicit in c++ see my post #35 and #36 ), the inclusion of ctype.h in the main and finally the linker problem solved by addressing DevC++ linker to the location ( folder ) of libz.a.
    If can be useful I can post my entire DevC++ project.

    regards
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  3. #48
    Thx antico2.
    Anybody tried to simply increase the buffer from 1024 up?

  4. #49
    antico2
    Guest
    That's OK.

    I've taken the code of the post #11 and modified the buffer size to 2048 in:

    //write gls

    .
    .
    char tmpbuff[2048];
    .
    .

    Test it and let me know...

    regards
    Attached Files Attached Files
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  5. #50
    Bigal
    Guest
    Quote Originally Posted by antico2 View Post
    That's OK.

    I've taken the code of the post #11 and modified the buffer size to 2048 in:

    //write gls

    .
    .
    char tmpbuff[2048];
    .
    .

    Test it and let me know...

    regards
    Thanks. However I am sure that won't be enough for many dictionaries. I remember I had to multiply the buffer size by 6 or 7 or even more for some dictionaries.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  6. #51
    Quote Originally Posted by antico2 View Post
    That's OK.

    I've taken the code of the post #11 and modified the buffer size to 2048 in:

    //write gls

    .
    .
    char tmpbuff[2048];
    .
    .

    Test it and let me know...

    regards
    It doesn't change anything. How about increasing 1019 in the 'if' statement to more too?

  7. #52
    Quote Originally Posted by antico2 View Post
    Szereshki I've used nothing of special, I've simply used DevC++ 4.9.9.2 and of course the zlib 1.2.3 downloaded as package from DevC++. The only think to remind is the casting problem ( not implicit in c++ see my post #35 and #36 ), the inclusion of ctype.h in the main and finally the linker problem solved by addressing DevC++ linker to the location ( folder ) of libz.a.
    If can be useful I can post my entire DevC++ project.

    regards
    May you plz post theDevC++ project? I have problems with linker? would you please tell us the details? Sorry.

  8. #53
    Ulrezaj
    Guest
    Since I got most of my help from here, I figured I'd register and post what I've discovered for the benefit of all, to give something back

    I'm working specifically on the japanese->english dictionary http://info.babylon.com/glossaries/4E9/Babylon_Japanese_English_dicti.BGL. My goal was to decompile it, extract all the data, then add additional entries from a different dictionary.

    I don't know why (not a huge C person) but the code provided thus far misses a lot of entries, and, more importantly in my case, doesn't extract the alternate spellings from each record, which is critical for word recognition in Japanese. So I decided to write my own extractor (in Python, because it's awesome).

    I don't know about other dictionaries, but in this one the record structure is:

    header byte - record type/length byte as described earlier by Bilbo
    length bytes - 1-2 bytes holding length of record
    term length byte - byte holding length of term
    term - the dictionary entry for this record
    0x00 byte
    unknown byte - never figured out what this does. I suspect it specifies the record contents, eg: has a definition, has alternate spelling, has a classification, etc
    definition - term's definition, including html code and such
    0x14 - separator byte (or end byte if definition was the last part of record)
    0x02 - classification specifier - means a word type (noun, verb, etc) will follow
    classification - in this case, was between 0x30 and 0x3b and was mapped in one of the 'id' records earlier in the dictionary
    alternate spellings - separated by 0x## between 0x00 and 0x30 (seems arbitrary what the separator character is)

    Note that the record length does not include the record header byte or the length bytes themselves.

    Anyway, armed with this I created a quick and dirty program to parse it, and lo and behold, it works. The resulting file can be run through the Glossary builder and, at least as far as I've tested, appears to be identical to the original.

    Code:
    import traceback
    styles = {48: "n", 49: "adj", 50: "v", 51: "adv", 52: "interj", 53: "pron",
              54: "prep", 55: "conj", 56: "suff", 57: "pref", 58: "art", 59: "aux"}
    
    ps = open("u.gls","w")
    
    # Header
    ps.write("### Glossary title:Uru\n")
    ps.write("### Author:Urudict\n")
    ps.write("### Description:Urudict\n")
    ps.write("### Source language:Japanese\n")
    ps.write("### Source alphabet:Default\n")
    ps.write("### Target language:English\n")
    ps.write("### Target alphabet:Default\n")
    ps.write("### Browsing enabled?No\n")
    ps.write("### Type of glossary:00000000\n")
    ps.write("### Case sensitive words?0\n\n")
    # Glossary section
    ps.write("### Glossary section:\n\n")
    
    r = open("Babylon_Japanese_English_dicti.txt") # un-gz'd dictionary (see earlier posts)
    e = open("entries.txt") # output of Bilbo's original code
    for line in e.readlines():
        line = line.decode('utf-8')
        if '<entry>' not in line: continue
        line = line.split()
        offset = int(line[0][1:-1],16)
        entry = " ".join(line[2:-2])[1:-2]
        record = int(line[-2],16)
        spelling, type, defn, alts = "", "", "", ""
        
        try:
            r.seek(offset)
            bin = r.read(record)
            nib = int(hex(ord(bin[0]))[2])+1 # length of the 'length' header
            if len(hex(ord(bin[0]))) == 3: nib = 1 # 0x1 case
            bin += r.read(nib+1) # first byte and length headers aren't part of record length
            term = bin[nib+2:nib+2+ord(bin[nib+1])] # Get term from 2 bytes after nib + 'length'
            bin = bin[nib+4+ord(bin[nib+1]):] # discard up to and including term
            if bin.find("<I>") != -1: # check for spelling
                spelling = bin[bin.find("("):bin.find(")")+1] # extract spelling
                bin = bin[bin.find(")")+2:] # discard spelling if exists
            # at this point, bin should start with definition
            defn = bin[:bin.find('\x14')] # extract defintion
            bin = bin[bin.find('\x14')+1:] # discard definition
            if bin and bin[0] == '\x02': # check for type
                type = styles[ord(bin[1])] # extract type
                bin = bin[2:] # discard type
            if bin: # if bin isn't empty, rest of record is alts
                alts = bin
                for c in [chr(k) for k in range(1,30)]:
                    alts = alts.replace(c, '\x00')
    
            ps.write(term.decode("shift-jis").encode('utf-8')) # write term
            for k in alts.split('\x00'):
                if k: ps.write("|"+k.decode("shift-jis").encode('utf-8')) # write alts
            #ps.write("\n<font color='blue'>"+type+"</font> ") #type
            ps.write("\n"+type.encode('utf-8')) # write type
            if type: ps.write(". ")
            ps.write(spelling.decode("shift-jis").encode('utf-8'))
            if spelling: ps.write(" ")
            ps.write (defn.decode("shift-jis").encode('utf-8')+"\n\n") # definition
        except Exception:
            print hex(offset), hex(record)
            dmp = open("dmp.txt","w")
            r.seek(offset)
            dmp.write(r.read(record+nib+1))
            dmp.close()
            print traceback.format_exc()
            break
    
    e.close()
    r.close()
    ps.close()
    The code clearly isn't designed to be flexible or anything - I seriously just threw it together in an hour - but hopefully it might provide some insight as to how to go about making the perfect decompiler :P
    Last edited by Ulrezaj; December 7th, 2008 at 16:35. Reason: spelling
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  9. #54
    antico2
    Guest

    Thumbs up

    Quote Originally Posted by szereshki View Post
    May you plz post theDevC++ project? I have problems with linker? would you please tell us the details? Sorry.
    Ok, here attached you can find the devc++ project I use.

    I've also atthached a compiled bglgls exe with the buffer more capable.

    good luck
    p.s.
    If you have problems with devc++ let me help you.

    regards
    Attached Files Attached Files
    Last edited by antico2; December 7th, 2008 at 19:37.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  10. #55
    Quote Originally Posted by antico2 View Post
    Ok, here attached you can find the devc++ project I use.
    I have building problems:
    Dev C++: [Build Error] [Progetto1.exe] Error 1
    Borland C++: Error: 'C:\BC5\LIB\ZLIB.LIB' contains invalid OMF record, type 0x21

    Quote Originally Posted by antico2 View Post
    I've also atthached a compiled bglgls exe with the buffer more capable.
    Doesn't work. I tried a big bgl (>9mb). The previous bgl2gls works great but has the problem of cutting long definitions. But this one exports a 600kb gls (incomplete) and also dont delete the 50mb dat temporary file.

    thx antico
    Last edited by szereshki; December 14th, 2008 at 10:15.

  11. #56
    My compiler errors have been solved. No its very simpler to change some part of the code.
    antico: you should change some other 1024s to 2048 or more.

    Any body know the attributes of a bitmap which could be use in a gls? e.g. 24bit or 16 bit? 72 or 96? ...

  12. #57
    I increased all 1024s to more and it worked. I also changed the character validation function to always return 1 (for characters other than English) except for 1E and 1F characters (which are placed before and after a bitmap file name). I changed the Target language and alphabet from English and Default to Arabic (in my case).
    Now it can generate a gls from my big bgl. To problems still exist:
    1- I tried hFrasi advanced version (a Persian dic, 9.38mb) and the reproduced bgl is 4.78mb. Some part of definitions is still cut. This problem is not related to the buffer size. (try “forces” or “cut” to see).
    2- Now it realizes the bitmap file, but doesn’t correctly include it in the bgl.

    These represent a basic defect in the code. Compare two same words (author name) from the original and reproduced bgls:
    Attached Images Attached Images  

  13. #58
    antico2
    Guest
    Quote Originally Posted by szereshki View Post
    I increased all 1024s to more and it worked. I also changed the character validation function to always return 1 (for characters other than English) except for 1E and 1F characters (which are placed before and after a bitmap file name). I changed the Target language and alphabet from English and Default to Arabic (in my case).
    Now it can generate a gls from my big bgl. To problems still exist:
    1- I tried hFrasi advanced version (a Persian dic, 9.38mb) and the reproduced bgl is 4.78mb. Some part of definitions is still cut. This problem is not related to the buffer size. (try “forces” or “cut” to see).
    2- Now it realizes the bitmap file, but doesn’t correctly include it in the bgl.

    These represent a basic defect in the code. Compare two same words (author name) from the original and reproduced bgls:
    Ok szereshki, I'm happy for your progress to solve compilation errors.
    I think that we need the help of the original author of the code..

    please post here the code you had modified so we can start to see it and think what's to do.

    regards
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  14. #59
    here is the code:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <windows.h>
    #include <ctype.h>
    #include <zlib.h>
    #pragma comment(lib,"zlib.lib")
    #include "zlib.h"
    #include "zconf.h"
    
    //include namespace std
    int isvalidchar(char ch);
    void stripjunk(char *buffer,char type);
    int focc(char *cstr,char ch);
    int uncomp_bgl(char *bglname,char *datname);
    int writegls(char *datname);
    
    char glsheader[32768];
    char glsheadertemplate[]=
    "### Glossary title:%s\r\n"
    "### Author:%s\r\n"
    "### Description:%s\r\n"
    "### Source language:English\r\n"
    "### Source alphabet:Default\r\n"
    "### Target language:Arabic\r\n"
    "### Target alphabet:Arabic\r\n"
    "### Browsing enabled?No\r\n"
    "### Type of glossary:00000000\r\n"
    "### Case sensitive words?0\r\n"
    ";gls generated by bglgls\r\n\r\n"
    "### Glossary section:\r\n\r\n";
    
    int main(int argc,char **argv) {
    int ix;
    char szAuth[32];
    char szTitle[32];
    char szDescription[128];
    char datfname[128];
    
    if(argc!=2) { 
    	printf("usage: bglgls.exe filename.bgl\n"); 
    	return 0; 
    }
    //>get input
    printf("gls Author:");
    fgets(szAuth,32,stdin);
    printf("gls Title:");
    fgets(szTitle,32,stdin);
    printf("gls Description:");
    fgets(szDescription,128,stdin);
    
    szAuth[strlen(szAuth)-1]=0;
    szTitle[strlen(szTitle)-1]=0;
    szDescription[strlen(szDescription)-1]=0;
    sprintf(glsheader,glsheadertemplate,szAuth,szTitle,szDescription);
    //>set output filename
    strncpy(datfname,argv[1],128);
    ix=focc(datfname,'.');
    if(ix<0) { printf("invalid filename\n"); return 0; }
    datfname[ix]=0;
    strcat(datfname,".dat");
    //>>
    if(!uncomp_bgl(argv[1],datfname)) { printf("error uncompressing BGL.\n"); return 0; }
    if(!writegls(datfname)) { printf("error writing GLS.\n"); return 0; }
    return 0;
    }
    //>>uncompression routine
    int uncomp_bgl(char *bglname,char *datname) {
    FILE *ztmp;
    FILE *zfile;
    char iobuff[128];
    char tmppath[256];
    char tmpfname[256];
    unsigned char zptrbyte;
    int tread;
    
    //get temp filename
    GetTempPath(256,tmppath);
    GetTempFileName(tmppath,"bgl",0,tmpfname);
    ztmp=fopen(tmpfname,"wb");
    if(!ztmp) return 0;
    //>
    zfile=fopen(bglname,"rb");
    if(!zfile) return 0;
    fseek(zfile,0x5,SEEK_SET);
    fread(&zptrbyte,sizeof(char),1,zfile);
    printf("zlib header@0x%X\n",zptrbyte);
    fseek(zfile,zptrbyte,SEEK_SET);
    while(!feof(zfile)) {
    	tread=fread(iobuff,sizeof(char),128,zfile);
    	fwrite(iobuff,sizeof(char),tread,ztmp);
    }
    fclose(zfile);
    fclose(ztmp);
    //>>uncompressing >
    zfile=fopen(datname,"wb");
    ztmp=(FILE*)gzopen(tmpfname,"rb");
    if(!zfile||!ztmp) return 0;
    while(!gzeof(ztmp)) {
    	tread=gzread(ztmp,iobuff,128);
    	fwrite(iobuff,sizeof(char),tread,zfile);
    }
    gzclose(ztmp);
    fclose(zfile);
    DeleteFile(tmpfname); //get rid of temporary file
    return 1;
    }
    //write gls
    int writegls(char *datname) {
    FILE *fdic,*fgls;
    int ix,rec_length;
    short int lenword;
    unsigned char hdr,high_nibble,lenbyte;
    unsigned char lenmul,lenadd;
    unsigned long datapos;
    char tmpbuff[32768];
    char glsf[256];
    int tt=0,lt=0;
    
    //gls filename
    strcpy(glsf,datname);
    ix=focc(glsf,'.');
    glsf[ix]=0;
    strcat(glsf,".gls");
    printf("gls filename:%s\n",glsf);
    fgls=fopen(glsf,"wb");
    if(!fgls) return 0;
    //>write header
    printf("writing GLS");
    fwrite(glsheader,sizeof(char),strlen(glsheader),fgls);
    //>>parsing
    fdic=fopen(datname,"rb");
    if(!fdic) return 0;
    while(1) {
    	fread(&hdr,sizeof(char),1,fdic);
    	if(feof(fdic)) break;
    
    	//get record size
    	high_nibble=hdr >> 4;
    	if(high_nibble>=4) rec_length=high_nibble-4;
    	else {
    		for(ix=rec_length=0;ix<high_nibble+1;ix++) {
    			rec_length*=256;
    			fread(&lenbyte,sizeof(char),1,fdic);
    			rec_length+=lenbyte;
    		}
    	}
    	datapos=ftell(fdic);
    
    	switch(hdr & 0xF) {
    			case 1: {
    			fread(&lenbyte,sizeof(char),1,fdic);
    			memset(tmpbuff,0,32768);
    			fread(tmpbuff,sizeof(char),lenbyte,fdic);
    			if(!isalpha(tmpbuff[0])) break;
    			stripjunk(tmpbuff,0);
    			strcat(tmpbuff,"\r\n");
    			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);
    			fread(&lenmul,sizeof(char),1,fdic);
    				fread(&lenadd,sizeof(char),1,fdic);
    			memset(tmpbuff,0,32768);
    			lenword=lenmul*256+lenadd;
    			if(lenword>32608) lenword=32608;
    			fread(tmpbuff,sizeof(char),lenword,fdic);
    			stripjunk(tmpbuff,1);
    			strcat(tmpbuff,"\r\n\r\n");
    			fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);
    			if(tt-100==lt) { lt=tt; printf("."); }
    			tt++;
    			} break;
    			default: break;
    	}
    	fseek(fdic,datapos+rec_length,SEEK_SET);
    }
    fclose(fdic);
    fclose(fgls);	
    DeleteFile(datname); //we dont need the *.dat anymore..
    printf("%d terms written to file!\n",tt);
    return 1;
    }
    //find occurrence
    int focc(char *cstr,char ch) { 
    int ix;
    for(ix=0;(unsigned)ix<strlen(cstr);ix++)
    	if(cstr[ix]==ch) return ix;
    return -1;
    }
    //>
    void stripjunk(char *buffer,char type) {
    int ix,slen;
    slen=strlen(buffer);
    
    if(!type) {
    	for(ix=1;ix<slen;ix++)
    		if(buffer[ix]=='$') { buffer[ix]=0; break; }
    	slen=ix;
    }	
    for(ix=0;ix<slen;ix++) 
    	if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }
    }
    //valid term/definition char
    int isvalidchar(char ch) {
        if (ch==30 ||ch==31) return 0;
        return 1;//I didn't delete the old code here, but u can.
    	int ix;
    	char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";
    	ch=tolower(ch);
    	for(ix=0;(unsigned)ix<strlen(valtab);ix++)
    		if(ch==valtab[ix]) return 1;
    return 0;
    }

  15. #60
    peterparler
    Guest
    hi, I found this site that links to a project, StarDict. sort of open source dictionary. in the source code, there is something that may interesting you
    I promise that I have read the FAQ and tried to use the Search to answer my question.

Similar Threads

  1. Dll source code
    By w_a_r_1 in forum The Newbie Forum
    Replies: 6
    Last Post: July 1st, 2009, 15:07
  2. I want to look at source code
    By mdhakk in forum The Newbie Forum
    Replies: 7
    Last Post: March 19th, 2005, 22:52
  3. help with asm source
    By LowF in forum The Newbie Forum
    Replies: 4
    Last Post: March 17th, 2003, 17:10
  4. VB source patch
    By current in forum Malware Analysis and Unpacking Forum
    Replies: 5
    Last Post: December 10th, 2000, 12:34

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •