PDA

View Full Version : BGL (babylon glossary) to GLS (babylon glossary source).


acidmelt
04-19-2005, 05:15 AM
hello reversers, a while back i tried to reverse the babylon GLS format so i would be able to read data out of it and use it in my own personal project, this task however is beyond my very noobish debugging skills and obviously i failed.

i wasnt sure about where this post best fits in, it was either advanced reversing (since this requires pretty advanced practises) or the mini project area, feel free to move it.

anyways this is the data i have gathered so far:
the decryption algo can be found at the "babylon" program itself (http://www.babylon.com)
the encryption algo can be found at the "babylon builder" program which is used to write dictionaries and is publicly available (http://www.babylon.com/builder)
-----------
there is zero documentation about this format available on the net.
ive found this page (http://fjolliton.free.fr/babytrans/) which asserts that the new babylon bgl format is encrypted using the "Cipher Square" algorithm (http://www.esat.kuleuven.ac.be/~rijmen/square/).
-----------
after examining a few *.blg's it is visible that the first 8 bytes of the file are the signature.
ive checked wotsit.org for documentation and found nothing.

in a recent thread (http://www.woodmann.com/forum/showthread.php?t=6934) bilbo have suggested this as a project.. so i thought that id start this thread and see what happens.

what do you say?

bilbo
04-19-2005, 12:05 PM
In my opinion, that would be a nice true RE activity, and not related to software stealing...
You have my support, as long as I have time...!

For the moment, I tell you how I would start...

(1) Install Babylon - I have 5.0.1 r7 - dunno if last - and focalize on one BGL you have installed...
The program is not compressed/protected in any way from debuggers.
(2) Menu->Glossaries->Glossary Options, and remove your BGL
(3) Attach to Babylon.exe with your preferred debugger and set a breakpoint on CreateFileA / ReadFile
(4) Menu->Glossaries->Install glossary from disk, and reinstall your target BGL
(5) Debugger will break at start of API: on stack you will find the return address and the BGL file name.
(6) ... no time for now to go on...

Best regards, bilbo

acidmelt
04-20-2005, 08:49 AM
hey bilbo, i have tried your suggestion for debugging the app (using olly) and i have encountered a rather strange behaviour.. it seems that at startup babylon is iterating thru all the files inside %windir%\fonts and opening each one of them... i dont see any reason for that.
anyways, i have stepped-over the code searching for the right CreateFile() and i wasnt able to find any reference that is opening a *.bgl.

another problem was that as soon as i go into glossaries->add glossaries olly reports a memory access violation.. id guess that babylon does holds some sort of anti-reversal protection

as i said my debugging skills are very limited and i would be glad if you (bilbo) or any other experienced reversers would take a look at that

oh, and one last thing.. judging by the ram usage and the speed of seeking i assume that the glossaries are being loaded into memory at startup (duh.) so taking a memory dump should provide us with a valid copy of the decrypted gloss right?

bilbo
04-20-2005, 12:07 PM
Hello, acidmelt!

I had some time to make other nice steps in "our" project. Let's see...
Quote:
[Originally Posted by acidmelt]it seems that at startup babylon is iterating thru all the files inside %windir%\fonts and opening each one of them... i dont see any reason for that.
No, that's not my case (I've checked with FILEMON). It could be that you have a yet fresh installation of Babylon, and it is yet auto-learning the fonts installed on your system for OCR. If that is the case you should also see an high CPU load for the following hours on your system.
Quote:
[Originally Posted by acidmelt]anyways, i have stepped-over the code searching for the right CreateFile() and i wasnt able to find any reference that is opening a *.bgl.
That was the reason I suggested you to put a breakpoint only after the initial phase and load a new BGL when the program is already started.
Quote:
[Originally Posted by acidmelt]another problem was that as soon as i go into glossaries->add glossaries olly reports a memory access violation.. id guess that babylon does holds some sort of anti-reversal protection
You're right, I don't use Olly and I did not noticed it. It is not a Memory Access Violation neither an anti-debugging trick. It is a lot of Exceptions C++ E06D7363. I dunno the exact reason. Anyway: Options->Debugging Options-> Exceptions->select Ignore Custom Exceptions and press button "Add last exception". This solves Olly problem!

Quote:
[Originally Posted by acidmelt]oh, and one last thing.. judging by the ram usage and the speed of seeking i assume that the glossaries are being loaded into memory at startup (duh.)
Correct!
Quote:
[Originally Posted by acidmelt]so taking a memory dump should provide us with a valid copy of the decrypted gloss right?
You have yet to localize the data and to interpret them, though!

Quote:
[Originally Posted by acidmelt]Ive found this page (http://fjolliton.free.fr/babytrans/) which asserts that the new babylon bgl format is encrypted using the "Cipher Square" algorithm (http://www.esat.kuleuven.ac.be/~rijmen/square/).
That's a wrong info, as far as I've seen!

And now the good news.
What you already found, the 4(8?)-bytes signature, can be of three types:
12340003 .BDC extension - to be studied
12340002 .BGL generated by the builder in some cases - to be studied
12340001 .BGL distributed on Babylon site - I've started from these...

I've managed to identify their decompression (not decryption) algorithm, using the 5 steps I suggested you. It is simply ZLIB, release 1.1.3 (rather old...). The routines are inside BabyServices.DLL, but they are called from BContentServer.DLL. I will tell you more details in the following messages if you are interested.

Since the Library is completely free, and not GPL-ed, they cannot be blamed for performing a GPL violation, I suppose.

Now, take one BGL of the last type, remove the first 0x47 bytes, and save it with a .GZ extension. The new file must start with 0x1F. Then you can extract it with WinZip, and you can browse its uncompressed contents.
Not so bad, isn't it? There are many initial field we must discover yet, tough.

If you want to play reversing some more, put a breakpoint at 0x9B29AF, run Baby and "Install glossary from disk" as I told you at step (4).
You must land at this code
Code:

009B29AF lea ecx,[ebp-1030h] ; uncompressed buffer to be filled
009B29B5 push 1 ; number of bytes to uncompress
009B29B7 push ecx
009B29B8 mov ecx,dword ptr [ebp-1Ch]
009B29BB push ecx ; compression structure 64h bytes
009B29BC mov ecx,eax ; ZLIB object (Baby source is in C++)
009B29BE call dword ptr [edx+18h] ; inflate

Execute the whole subroutine and you will find in the buffer the first uncompressed byte, 60 in my case. Try to discover the meaning of that value...
I stop here at the moment... no more time.

Best regards, bilbo

P.S. JMI, I don't know if I can go on. Maybe the subforum is not correct, the matter is against rules, nobody else is interested, etc. etc.
Please let me know...

JMI
04-20-2005, 12:20 PM
Seems OK so far. Go for it.

Regards,

acidmelt
04-20-2005, 01:48 PM
bilbo that is some awesome information!

here are my finding:
to my surprise, after decompression the resulting files dont require any further decryption.. after scrolling a bit (offset 0xC47 at the eng_eng dictionary) you can see simple html tags and inbetween them are the definitions
try changing the extension of the uncompressed file to html

i have created a simple glossary with only 3 words to figure out the way that the definitions are aligned:
TERM 0x000C DEFINITION 0x101809 TERM 0x000C and so on..
however this is different in 12340001 bgls.. ill further analyse them.

the byte at offset 0x5 points to the begining of the gzip header, convenient
the gzip header of 12340001 files starts at 0x47 (as you said).
the gzip header of 12340002 files starts at 0x69.

on 12340003 (*.bdc) files however this is not the case.. this files seem to be uncompressed and it seems that their format is similer to the old *.dic.

p.s im stupid, i totaly forgot about babylons ocr capabilities.

thank you bilbo

bilbo
04-21-2005, 12:28 PM
Hello, acidmelt, and everyone interested (nobody seems to be...),

Quote:

try changing the extension of the uncompressed file to html

ok, but that is just a resource... the whole dictionary is not HTML format
Quote:

the byte at offset 0x5 points to the begining of the gzip header, convenient

great... I would say at offset 0x4, though, because all the Baby entities are in big-endian form (the high byte first, read on)
Quote:

the gzip header of 12340002 files starts at 0x69

great! one point to you!

And now the step for today...

I started from the address I told you yesterday and I have reversed some stuff here and there (sub_9B1DCO and related ones). These are my findings.

The uncompressed file is a collection of records.
Every record has a one-byte header.
The low nibble is the record type.
The high nibble holds indication of the record length, with the following rule:

high nibble>=4: subtract 4; that is the length
high nibble <4: add 1: that is the number of bytes for the following length (in big-endian format)

As for the record types:
0 - one-byte specifier will follow, and the data next
1 - this is an entry: the entry name will follow as a string preceded by one byte for length, and the definition next
2 - this is a named resource: the resource name will follow as above (e.g xxx.bmp, xxx.html) (and the data next)
3 - two byte specifier will follow, and the data next
4/6 - no specifier, 0 bytes of data - type 6 is at end

But I hate the theory, so here is a little program which will scan the whole uncompressed file.
I have tried it successully on a little BGL: Code Analysis, at http://info.babylon.com/gl_index/gl_template.php?id=46760

Code:

#include
#include
#include

void
main(int argc, char **argv)
{
char resname[256];
unsigned char hdr, high_nibble, lenbyte;
unsigned char specifier[2];
int i, record_length;
FILE *fpin;
long curpos, datapos;

if (argc != 2) {
printf("usage: %s uncompressed_filename\n", argv[0]);
return;
}

fpin = fopen(argv[1], "rb");
if (!fpin) goto ko;

// a record per loop
while (1) {
curpos = ftell(fpin);
fread(&hdr, 1, sizeof(hdr), fpin);
if (feof(fpin)) return;

// get the record size
high_nibble = hdr >> 4;
if (high_nibble >= 4) record_length = high_nibble - 4;
else for (i=record_length=0; i record_length *= 256;
fread(&lenbyte, 1, sizeof(lenbyte), fpin);
record_length += lenbyte;
}
datapos = ftell(fpin);

switch (hdr & 0xF) { // low nibble

case 0: // one-byte specifier follows
fread(specifier, 1, 1, fpin);
printf("@%x: %x bytes\n",
curpos, specifier[0], record_length);
break;
case 3: // two-bytes specifier follows
fread(specifier, 1, 2, fpin);
printf("@%x: %x bytes\n",
curpos, specifier[0]*256+specifier[1], record_length);
break;
case 4: // no specifier
case 6: // no specifier
printf("@%x: %x bytes\n",
curpos, hdr&0xF, record_length);
break;

case 2: // named resource
fread(&lenbyte, 1, sizeof(lenbyte), fpin);
fread(resname, 1, lenbyte, fpin);
printf("@%x: %x bytes\n",
curpos, lenbyte, resname, record_length);
break;

case 1: // entry
fread(&lenbyte, 1, sizeof(lenbyte), fpin);
fread(resname, 1, lenbyte, fpin);
printf("@%x: \"%.*s\"> %x bytes\n",
curpos, lenbyte, resname, record_length);
break;

default:
printf("unexpected low_nibble %x\n", hdr & 0xF);
return;
}
fseek(fpin, datapos+record_length, SEEK_SET);
}

return;
ko:
printf("exit due to error %d: %s\n", errno, strerror(errno));
}


We need only to understand the meaning of the specifiers...
Best regards, bilbo

dELTA
04-22-2005, 04:20 AM
Nice work as always bilbo.

Quote:
and everyone interested (nobody seems to be...)
Sure we are, just lurking. Keep up the good work.

acidmelt
04-22-2005, 04:56 AM
hey bilbo!

again thats plenty of great information.. thank you

i wrote a little program to explore uncompressed bgls based on your code

Code:

#include
#include
#include

int isvalidchar(char ch);
void stripjunk(char *buffer);

struct bdc {
char szTerm[256];
char szDefinition[256];
} **babyterm[27];
int ptrcnt[27]; //sorted

void main(int argc,char **argv) {
FILE *fdic;
int ix,iy,rec_length;
unsigned char hdr,high_nibble,lenbyte,tmpch;
unsigned long datapos;
char tmpbuff[256];
char uterm[256];
int bg,eg;

if(argc!=2) { printf("usage: %s uncompressed_filename\n", argv[0]); return; }
//initial allocation of pointers
for(ix=0;ix<27;ix++) {
babyterm[ix]=(struct bdc**)malloc(sizeof(struct bdc*));
ptrcnt[ix]=0;
}
//>>parsing
fdic=fopen(argv[1],"rb");
if(!fdic) { printf("error opening file [%s].\n",argv[1]); return; }
bg=GetTickCount();
while(1) {
fread(&hdr,sizeof(char),1,fdic);
if(feof(fdic)) break;

//get record size
high_nibble=hdr >> 4;
if(high_nibble>=4) rec_length=high_nibble-4;
else {
for(ix=rec_length=0;ix rec_length*=256;
fread(&lenbyte,sizeof(char),1,fdic);
rec_length+=lenbyte;
}
}
datapos=ftell(fdic);

switch(hdr & 0xF) {
case 1: {
fread(&lenbyte,sizeof(char),1,fdic);
memset(tmpbuff,0,lenbyte+1);
fread(tmpbuff,sizeof(char),lenbyte,fdic);
if(!isalpha(tmpbuff[0])) break;
stripjunk(tmpbuff);
//printf("TERM [%s] -> \n",tmpbuff);
//>>allocating space for term struct
tmpch=tolower(tmpbuff[0])-'a';
babyterm[tmpch][ptrcnt[tmpch]]=(struct bdc*)malloc(sizeof(struct bdc));
if(babyterm[tmpch][ptrcnt[tmpch]]==NULL) {
printf(":O ran out of space.\n");
return;
}
strcpy(babyterm[tmpch][ptrcnt[tmpch]]->szTerm,tmpbuff);
//>>
fseek(fdic,1,SEEK_CUR); //definiton lenbyte is next
fread(&lenbyte,sizeof(char),1,fdic);
memset(tmpbuff,0,lenbyte+1);
fread(tmpbuff,sizeof(char),lenbyte,fdic);
stripjunk(tmpbuff);
strcpy(babyterm[tmpch][ptrcnt[tmpch]]->szDefinition,tmpbuff);
//printf("DEF [%s]\n",tmpbuff);
ptrcnt[tmpch]++;
} break;
default: break;
}
fseek(fdic,datapos+rec_length,SEEK_SET);
}
eg=GetTickCount();
fclose(fdic);

printf("total parsing time: %dms\n",eg-bg);

printf("--------------------------\n");
for(ix=0;ix<27;ix++) {
if(ptrcnt[ix]>0) {
for(iy=0;iy printf("--\n[%s][%s]\n",babyterm[ix][iy]->szTerm,babyterm[ix][iy]->szDefinition);
if(getch()==27) goto takeinp;
}
}
}
printf("--------------------------\n\n");
takeinp:
for(; { //input loop
memset(uterm,0,256);
printf("Term:");
scanf("%256s",uterm);
if(uterm[0]) {
tmpch=tolower(uterm[0])-'a';
for(ix=0;ix if(!strcmpi(babyterm[tmpch][ix]->szTerm,uterm))
printf("%s = \n%s\n",uterm,babyterm[tmpch][ix]->szDefinition);
}
}
}

void stripjunk(char *buffer) {
int ix,slen;
slen=strlen(buffer);

for(ix=1;ix if(buffer[ix]=='$') { buffer[ix]=0; break; }
slen=ix;
for(ix=0;ix if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }

}

int isvalidchar(char ch) {
int ix;
char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";
ch=tolower(ch);

for(ix=0;(unsigned)ix if(ch==valtab[ix]) return 1;
return 0;
}


however whats about the rest of the data?
it seems as if the uncompressed files have some sort of header?


i just took a look at some *.gls and i belive our goal is completed (well you did most of the work so kudos to you)

the format is really simple:
### Glossary title:testTitle
### Author:testAuthor
### Description:testGlossDescription
### Source language:English
### Source alphabetefault
### Browsing enabled?No
### Type of glossary:00000000
### Case sensitive words?0
### Glossary section:

test1
meaning1

test2
meaning2

test3
meaning3
--------
using the code above it is really easy to produce gls's..

bilbo
04-22-2005, 10:52 AM
Good, acidmelt, you added some indexing feature (dynamic array 'babyterm'), but... there is a bug...

You initialized babyterm[ix] just at init time with only one pointer in it! In this way the entries are overwritten as they grow.
You can remove the whole initialization loop, but you must add, before every new entry allocation, a resizing of the **babyterm array:
Code:

babyterm[tmpch] = (struct bdc**)realloc(babyterm[tmpch],
(ptrcnt[tmpch]+1)*sizeof(struct bdc*));
babyterm[tmpch][ptrcnt[tmpch]] = (struct bdc*)malloc(sizeof(struct bdc));

instead of the simple
Code:

babyterm[tmpch][ptrcnt[tmpch]] = (struct bdc*)malloc(sizeof(struct bdc));

By the way, realloc will work also the first time, when the area to reallocate has address 0.

Ok.
And you removed a lot of things: not just spaces in the source, I see, you don't like spaces ); but also non ASCII characters which are used as quotes or underscores, etc. If you try your program on the BGL I suggested, many definitions are cut.

That's all for this weekend, I have other things to do...
A simple addition would be to integrate ZLIB in the program in order to uncompress the file automatically...

Best regards, bilbo

P.S. thx dELTA (and acidmelt) for appreciation...
P.P.S. I suggest to have a look at the dictionary I linked in my previous message, there is also something for Fravia
Quote:

+Fravia: One of the best reverser in the world. Founder of +Fravia's Pages of Reverse Engineering

and for our friend Zero
Quote:

Universitas Virtualis: Free knowledge project which provides a professional place for Algorithms, Software-Engineering, Software-Protection and Reverse Code Engineering, Cryptography and Cryptanalysis.

acidmelt
04-24-2005, 04:45 AM
hey bilbo!

thanks for the corrections
in my previous code i have ignored some important details which made the parsing crippled.. anyways here is a fixed code incorporating zlib, so there is no need to manually unpack bgls
Code:

#include
#include
#include "zlib.h"

#pragma comment(lib,"zlib.lib")

int isvalidchar(char ch);
void stripjunk(char *buffer,char type);
int focc(char *cstr,char ch);
int uncomp_bgl(char *bglname,char *datname);
int writegls(char *datname);

char glsheader[1024];
char glsheadertemplate[]=
"### Glossary title:%s\r\n"
"### Author:%s\r\n"
"### Description:%s\r\n"
"### Source language:English\r\n"
"### Source alphabetefault\r\n"
"### Target language:English\r\n"
"### Target alphabetefault\r\n"
"### Browsing enabled?No\r\n"
"### Type of glossary:00000000\r\n"
"### Case sensitive words?0\r\n"
";gls generated by bglgls\r\n\r\n"
"### Glossary section:\r\n\r\n";

int main(int argc,char **argv) {
int ix;
char szAuth[32];
char szTitle[32];
char szDescription[128];
char datfname[128];

if(argc!=2) {
printf("usage: bglgls.exe filename.bgl\n");
return 0;
}
//>get input
printf("gls Author:");
fgets(szAuth,32,stdin);
printf("gls Title:");
fgets(szTitle,32,stdin);
printf("gls Description:");
fgets(szDescription,128,stdin);

szAuth[strlen(szAuth)-1]=0;
szTitle[strlen(szTitle)-1]=0;
szDescription[strlen(szDescription)-1]=0;
sprintf(glsheader,glsheadertemplate,szAuth,szTitle,szDescription);
//>set output filename
strncpy(datfname,argv[1],128);
ix=focc(datfname,'.');
if(ix<0) { printf("invalid filename\n"); return 0; }
datfname[ix]=0;
strcat(datfname,".dat");
//>>
if(!uncomp_bgl(argv[1],datfname)) { printf("error uncompressing BGL.\n"); return 0; }
if(!writegls(datfname)) { printf("error writing GLS.\n"); return 0; }
return 0;
}
//>>uncompression routine
int uncomp_bgl(char *bglname,char *datname) {
FILE *ztmp;
FILE *zfile;
char iobuff[128];
char tmppath[256];
char tmpfname[256];
unsigned char zptrbyte;
int tread;

//get temp filename
GetTempPath(256,tmppath);
GetTempFileName(tmppath,"bgl",0,tmpfname);
ztmp=fopen(tmpfname,"wb");
if(!ztmp) return 0;
//>
zfile=fopen(bglname,"rb");
if(!zfile) return 0;
fseek(zfile,0x5,SEEK_SET);
fread(&zptrbyte,sizeof(char),1,zfile);
printf("zlib header@0x%X\n",zptrbyte);
fseek(zfile,zptrbyte,SEEK_SET);
while(!feof(zfile)) {
tread=fread(iobuff,sizeof(char),128,zfile);
fwrite(iobuff,sizeof(char),tread,ztmp);
}
fclose(zfile);
fclose(ztmp);
//>>uncompressing >
zfile=fopen(datname,"wb");
ztmp=gzopen(tmpfname,"rb");
if(!zfile||!ztmp) return 0;
while(!gzeof(ztmp)) {
tread=gzread(ztmp,iobuff,128);
fwrite(iobuff,sizeof(char),tread,zfile);
}
gzclose(ztmp);
fclose(zfile);
DeleteFile(tmpfname); //get rid of temporary file
return 1;
}
//write gls
int writegls(char *datname) {
FILE *fdic,*fgls;
int ix,rec_length;
short int lenword;
unsigned char hdr,high_nibble,lenbyte;
unsigned char lenmul,lenadd;
unsigned long datapos;
char tmpbuff[1024];
char glsf[256];
int tt=0,lt=0;

//gls filename
strcpy(glsf,datname);
ix=focc(glsf,'.');
glsf[ix]=0;
strcat(glsf,".gls");
printf("gls filename:%s\n",glsf);
fgls=fopen(glsf,"wb");
if(!fgls) return 0;
//>write header
printf("writing GLS");
fwrite(glsheader,sizeof(char),strlen(glsheader),fgls);
//>>parsing
fdic=fopen(datname,"rb");
if(!fdic) return 0;
while(1) {
fread(&hdr,sizeof(char),1,fdic);
if(feof(fdic)) break;

//get record size
high_nibble=hdr >> 4;
if(high_nibble>=4) rec_length=high_nibble-4;
else {
for(ix=rec_length=0;ix rec_length*=256;
fread(&lenbyte,sizeof(char),1,fdic);
rec_length+=lenbyte;
}
}
datapos=ftell(fdic);

switch(hdr & 0xF) {
case 1: {
fread(&lenbyte,sizeof(char),1,fdic);
memset(tmpbuff,0,1024);
fread(tmpbuff,sizeof(char),lenbyte,fdic);
if(!isalpha(tmpbuff[0])) break;
stripjunk(tmpbuff,0);
strcat(tmpbuff,"\r\n");
fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);
fread(&lenmul,sizeof(char),1,fdic);
fread(&lenadd,sizeof(char),1,fdic);
memset(tmpbuff,0,1024);
lenword=lenmul*256+lenadd;
if(lenword>1019) lenword=1019;
fread(tmpbuff,sizeof(char),lenword,fdic);
stripjunk(tmpbuff,1);
strcat(tmpbuff,"\r\n\r\n");
fwrite(tmpbuff,sizeof(char),strlen(tmpbuff),fgls);
if(tt-100==lt) { lt=tt; printf("."); }
tt++;
} break;
default: break;
}
fseek(fdic,datapos+rec_length,SEEK_SET);
}
fclose(fdic);
fclose(fgls);
DeleteFile(datname); //we dont need the *.dat anymore..
printf("%d terms written to file!\n",tt);
return 1;
}
//find occurrence
int focc(char *cstr,char ch) {
int ix;
for(ix=0;(unsigned)ix if(cstr[ix]==ch) return ix;
return -1;
}
//>
void stripjunk(char *buffer,char type) {
int ix,slen;
slen=strlen(buffer);

if(!type) {
for(ix=1;ix if(buffer[ix]=='$') { buffer[ix]=0; break; }
slen=ix;
}
for(ix=0;ix if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }
}
//valid term/definition char
int isvalidchar(char ch) {
int ix;
char valtab[]="abcdefghijklmnopqrstuvwxyz 0123456789!@#$%&8()_-+=|{}[]<>\"',.%%/\\:;!?";
ch=tolower(ch);
for(ix=0;(unsigned)ix if(ch==valtab[ix]) return 1;
return 0;
}


i have tested it with the code_analysis bgl that you suggested and it now works perfectly. i have also tested it with bablyons english_english dictionary (since it is the largest (18mb unpacked)) and it works really well.

though something is missing and i couldnt figure it out.. babylon shows part-of-speech for each word and id guess that this information is stored in a table somewhere inside the bgl.. thats the last piece missing i believe.

heres a binary (http://www.woodmann.com/forum/attachment.php?attachmentid=1230&stc=1) compiled and linked with zlib.

hrmprog
05-20-2008, 12:08 AM
hi
i tried to use above code with unicode BGL but output file didn't complete
which part of this code should be corrected?

dELTA
05-20-2008, 04:44 AM
The one that fails. And why don't you debug it and tell us which one that is?

hrmprog
05-20-2008, 07:53 AM
by use of above code with unicode BGL, in output file, there isn't any unicode letter and only english letter will appear. i try to conver english to farsi BGL, but in output file only english word appear.

dELTA
05-20-2008, 04:11 PM
Windows uses special APIs to handle unicode strings, you must integrate these into the existing source code.

szereshki
10-07-2008, 03:45 AM
hi. I have the same problem as hrmprog and waiting a long time to an answer in this post. But this seems not to be continued. in fact the main guys didn't go here since 2005!
Many of the babylon BGLs are in unicode and so its very important to be able to handle unicode BGLs as well. I have little information in C coding and no success in manupulating acidmelts code for unicode. would someone please help me how to modify his code for unicode BGLs?
dELTA should be right. But it's in theory. Thanks to acidmelt, the code is presented above. it will be appreciated if someone put the unicode corrected code here. thx

bilbo
10-07-2008, 11:42 PM
What is the release of Babylon you are referring to (7.0 is out) and what is an example of unicode BGL? It is a long time I'm not using Babylon and it is become ever and ever more commercial...
Anyway, some new activity on the target could be interesting... But be prepared to give your contribute: if you do not know C, you can use ASM as well!
Best regards, bilbo

szereshki
10-08-2008, 03:09 AM
i'm using v5. but it doesn't matter. cuz the BGLs should work on the new versions as the old ones. Better working on the new version ofcorse. I tried a little farsi BGL file which is attached.
the unicode (farsi) words dont appear in the output file of acidmelt code.
thx for help

bilbo
10-12-2008, 12:05 AM
szereshki, I looked at the file you posted; it has exactly the same format as the files we were talking about three years ago... The problem is that the data are discarded by the conversion program because they are not valid ASCII characters.
Let's see for example the first definition, taken from the uncompressed dictionary:
Code:

00000ED5 1241 6273 6F72 7074 696F 6E20 636F 7374 .Absorption cost
00000EE5 696E 6700 0FE5 D2ED E4E5 20ED C7C8 ED20 ing....... ....
00000EF5 CCD0 C8ED ....

First byte (12) is the length of first part of the definition; after 12h bytes ("Absorption costing") you will find the length of the second part, on two bytes in big-endian asset (00 0F). And finally the Unicode stuff follows, 15 bytes. The strangeness is that they are not even (2 bytes per character). Can you interpret this stuff ("E5 D2 ED E4 E5 20 ED C7 C8 ED 20 CC D0 C8 ED"), or can you provide a BGS source with the corresponding compiled BGL file?

Best regards, bilbo

szereshki
10-13-2008, 08:22 AM
Dear Bilbo, you are right.
Code:
The strangeness is that they are not even (2 bytes per character)

Because its not a unicode stuff. Im sorry. I tried removing the first 0x47 bytes and extracting gz file to a html again. I opened it with ie and found the encoding should be on arabic not unicode. indeed the problem is why your program discard this codes and how should not?
many thanks

bilbo
10-15-2008, 06:49 AM
Quote:
[Originally Posted by szereshki]Because its not a unicode stuff

Yeah! Simple indeed!
If we launch "charmap" selecting Arial and we select "Windows: Arabic" we will see exactly the codes E5 D2 ED... in arabic chars!

Quote:
[Originally Posted by szereshki]Indeed the problem is why your program discard this codes and how should not?

Simply remove the following check:
Code:
for(ix=0;ix if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }


Best regards. bilbo

szereshki
10-17-2008, 11:41 AM
Many thanks Bilbo. I'll go through checking it.
You are great as you know so much in Reversing, and greatest as an F1 for me and the other newbies.

Bigal
10-24-2008, 06:27 AM
Quote:
[Originally Posted by bilbo;77380]Yeah! Simple indeed!
If we launch "charmap" selecting Arial and we select "Windows: Arabic" we will see exactly the codes E5 D2 ED... in arabic chars!


Simply remove the following check:
Code:
for(ix=0;ix if(!isvalidchar(buffer[ix])) { buffer[ix]=0; break; }


Best regards. bilbo


Hi, congrats for the excellent work. I am not a C programmer (just only some Perl hacking). Nevertheless I have tried to compile your code with different compilers but I am always getting compile errors apparently related with zlib. Anyway, it would be great if you could post the binary file without the above piece of code which apparently gives problemes whenever there are Unicode chars (also accents, umlauts, etc).

Maybe you could also add some input parameters too. That would surely be great.

In any case, thanks again for your great work.

szereshki
10-25-2008, 03:53 AM
You are right Bigal. I also had some problems and finally unpacked the previously posted binary and changed the asm code in olly. the new problem...
(I'll post some images later) there is also some encoding problems. the reconverted BGL have all the chars, but it didn't follow the encoding. I meen it doesnt show the true charecters, arabic win in my case.
maybe I should explain more. I'll post two images of the original BGL and the reconverted one and also the new binary one week later.

szereshki
10-25-2008, 05:29 AM
The original bgl result is attached (Farsi with windows Arabic encoding). The converted one has all the letters, but with wrong encoding. The source and target languages in the glossary properties tab is selected as English although. Changing from English will result in a more defective output.
I also attached the changed binary file. It seems there is just one step more. Bilbo! Its your turn again!
(p.s. I also tried these on Babylon v5)

bilbo
10-28-2008, 01:39 AM
Quote:
[Originally Posted by Bigal;77467]I have tried to compile your code with different compilers but I am always getting compile errors apparently related with zlib.

ZLIB must be downloaded apart
Quote:
[Originally Posted by szereshki]I also had some problems and finally unpacked the previously posted binary and changed the asm code in olly

great approach: if you can find the spot to patch, you already know 'C' well! but why don't you try some free C compiler? see for example http://www.thefreecountry.com/compilers/cpp.shtml

Anyway, since I do not know arabic (I tried to learn it when I was young but I have forgot anything) and I don't know arabic/farsi Windows, could you please post both BGL and GLS files, not just a BGL like ahsan.zip?

Best regards, bilbo

szereshki
10-29-2008, 06:28 AM
The original sample BGL and its converted GLS, and also some recreated BGLs with different settings is attached:

bilbo
11-01-2008, 01:23 AM
szereshki,

what are interesting are not the broken BGL (you posted 1.BGL, 2.BGL, 3.BGL) neither the reconstructed GLS which, as you said, does not work, but the original GLS used to generate the working BGL.
Only in this way we - me, or you - cam compare the original GLS with the reconstructed GLS and see what is different!

Anyway, here is another homework for you... I suspect that the problem is no more in the data contents, but in the initial lines of the reconstructed GLS.
Code:
### Source language:English
### Source alphabetefault
### Target language:English
### Target alphabetefault


Try editing them with an ascii editor (e.g. replacing English or Default with Arabic), and see what happens...
Best regards... bilbo

szereshki
11-01-2008, 12:11 PM
Bilbo,

I have posted the original GLS and BGL as well (Read redme>ahsan.gls). although I know you are busy.
You are right as always. Changing the target alphabet from default to Arabic (or Farsi) solved the problem. Your work seems to be completed now. Or someone could add some functionality to the C source to consider this.
Best wishes for you

afree
11-03-2008, 04:09 AM
Hi,
Has anyone compiled it with these new changes (to work with arabic), and if Yes can he post it. I just can't Compile it

bilbo
11-03-2008, 11:54 PM
Quote:
[Originally Posted by afree]I just can't Compile it

Please don't be so categoric! Get a free compiler, get ZLIB.LIB (first hit in Google) and you too will be able to compile it.

Best regards, bilbo

afree
11-04-2008, 02:58 AM
I worked a little bit in C, but I almost forgot it all.
Any way, I did manage to compile it, but something doesn't work
Program starts, reads data(I think) but it doesnt write anything except for the header to the file. I will take a look at it later

szereshki
11-04-2008, 06:52 AM
Hi, please read the previous post carefully before sending such questions. I also wants you to compile and work on it yourself. but I have sent the compiled file before (#25). Use it and after you got the GLS edit it and change the default lang to arabic.

dreamer155
11-20-2008, 03:29 PM
hi
first of all, i wanna thank for this superb code and app. it's been very useful for me. i got plenty of babylon dictionaries, some of them really big. Yesterday, i noticed something regarding the size of characters deflated. for some words, definitions sometimes exceed 1000 chars with all formatting tags eg. "

...." and these are cut off after 1000 chars, i think. is it possible to increase the buffer size or something to overcome this?

antico2
11-25-2008, 06:35 PM
Hi everyone,

I'm triyng to compile acidmelt code but something goes wrong.
I've discovered that is missing #include ( I'm using DEVC++ ).
The other error that I cannot solve is:

line1> //>>uncompressing >
line2> zfile=fopen(datname,"wb");
line3> ztmp=gzopen(tmpfname,"rb");
line4> if(!zfile||!ztmp) return 0;

line2> error is: In function `int uncomp_bgl(char*, char*)': invalid conversion from `void*' to `FILE*'

I think is a cast type error, but I'm not able to solve it..

Can anyone help me?

regards

antico2
12-01-2008, 11:04 AM
Quote:
[Originally Posted by antico2;77902]Hi everyone,

I'm triyng to compile acidmelt code but something goes wrong.
I've discovered that is missing #include ( I'm using DEVC++ ).
The other error that I cannot solve is:

line1> //>>uncompressing >
line2> zfile=fopen(datname,"wb");
line3> ztmp=gzopen(tmpfname,"rb");
line4> if(!zfile||!ztmp) return 0;

line2> error is: In function `int uncomp_bgl(char*, char*)': invalid conversion from `void*' to `FILE*'

I think is a cast type error, but I'm not able to solve it..

Can anyone help me?

regards


Thats's all ok...

I've solved the problem by putting: ztmp=(FILE*)gzopen(tmpfname,"rb") ie:

first: ztmp= gzopen(tmpfname,"rb"); > error
after: ztmp=(FILE*)gzopen(tmpfname,"rb"); > ok ( casting non implicit in c++ )

The other problem was during the linking process with dev-c++:

I had 4 linking error:

[Linker error] undefined reference to `gzopen'
[Linker error] undefined reference to `gzeof'
[Linker error] undefined reference to `gzread'
[Linker error] undefined reference to `gzclose'
ld returned 1 exit status

to solve this problem, is necessary to say the linker where is located the libz.a otherwise the linker does not recognize function methods.

( in devcpp go in project>option>linker and add the file required ).

bye

d8o8s8
12-01-2008, 03:31 PM
You go guys, great effort and good cause. (:
I'm gonna join you soon trying to kick the .bdc files.

szereshki
12-02-2008, 02:13 AM
You are absolutely right dreamer. The buffer size is 1024 and the characters after 1019 chars will be cut. Maybe not an important problem in the C code. But since me as well as some other guys couldn't compile the code and went to the ASM code, I don't know how to workaround this. What if our melted guy (acidmelt) was here! And probably the busy guy -Bilbo- may have a suggestion.

antico2
12-02-2008, 03:52 AM
Hi,

if you want I can help in the development of the full version of program. Say me what to modify I can do it ( I've all installed on my pc..).

Bigal
12-02-2008, 04:29 AM
Quote:
[Originally Posted by antico2;78023]Hi,

if you want I can help in the development of the full version of program. Say me what to modify I can do it ( I've all installed on my pc..).


It woud be great if you could help with that buffer which is not big enough for all the characters. With some dictionaries an entry can be really big. It would also be great if you could produce a compiled version so that we could all test it. I had a lot of trouble trying to compile the sources, specially with the ZLIB. At the end, after a lot of wasted time I finally had to give up.

One more thing it would be great to have is the decompliling of the bdc format.

Good luck and thanks a million.

antico2
12-02-2008, 05:13 AM
Quote:
[Originally Posted by szereshki;77490]The original bgl result is attached (Farsi with windows Arabic encoding). The converted one has all the letters, but with wrong encoding. The source and target languages in the glossary properties tab is selected as English although. Changing from English will result in a more defective output.
I also attached the changed binary file. It seems there is just one step more. Bilbo! Its your turn again!
(p.s. I also tried these on Babylon v5)


Ok, I can make modification to the file ( but let me understand what to modify!! )
First of all you can find a compiled version of it in the atthached file I've quoted. Try it and after we speak about modification to do.

regards

Bigal
12-02-2008, 05:29 AM
Quote:
[Originally Posted by antico2;78025]Ok, I can make modification to the file ( but let me understand what to modify!! )
First of all you can find a compiled version of it in the atthached file I've quoted. Try it and after we speak about modification to do.

regards



Don't see any attachments. Am I missing something?

antico2
12-02-2008, 05:32 AM
go to the main address of forum: http://www.woodmann.com/forum/ and make the login from there

Bigal
12-02-2008, 07:09 AM
Quote:
[Originally Posted by antico2;78027]go to the main address of forum: http://www.woodmann.com/forum/ and make the login from there


I've done that but i still don't see your attachment

szereshki
12-02-2008, 07:22 AM
Bigal: Go directly to my post#25. I attached it there.
Antico2: How you compiled the c code? Any special compiler? let us now about it. thx

d8o8s8
12-02-2008, 08:23 AM
I found a simple small decompiler using google at
http://tankado.com/index.php?2008/06/21/281-babylon-bgl-decompiler
I checked and it worked the the older version of BGL file I had (didn't work for the new ones).
I also verified with filemon it doesn't mess around (still not taking responsibility, not my file).
Hope this helps.

------------------ EDIT: sorry, just realized its the same app as bglgls previously posted on this thread.

antico2
12-02-2008, 08:28 AM
Szereshki I've used nothing of special, I've simply used DevC++ 4.9.9.2 and of course the zlib 1.2.3 downloaded as package from DevC++. The only think to remind is the casting problem ( not implicit in c++ see my post #35 and #36 ), the inclusion of ctype.h in the main and finally the linker problem solved by addressing DevC++ linker to the location ( folder ) of libz.a.
If can be useful I can post my entire DevC++ project.

regards