I'm trying to follow the decoding in a BIOS decompression utility and have confirmed the encoding is LHA (lh5). I could not follow the logic of the decoding so I decided to encode the same ROM module that I decoded independently. I'm using LHA under softice in DOS mode (in a VM) and I have confirmed that LHA does in fact produce exactly the same compressed module (from the uncompressed module) that is found in the ROM file. The only difference is that LHA adds a short header, which is missing in the ROM file.

I'm scratching my head over the different compression algorithms used in LHA, which apparently uses Huffman, Ziv-Lempel and LZSS, by a Japanese author. Can someone shed some light on this to get me started? First of all, with Huffman, apparently a tree is constructed that repesents the probabilities of characters/words appearing in a file. Are the values in the tree independent of the data it is compressing? That is, does the Huffman-type decoding create a tree and match data bytes to the probabilities in the tree?

In LHA, they seem to construct a tree out of nothing. They initialize a 64k section of memory and add a 1 in position 0x0. Then they operate on the table, deriving values somehow. I have seen that done with an SHA1 algorithm as well. What is it exactly they are doing and is the binary tree created independently of the data?

When the file to be decompressed is loaded by LHA, it cuts out a chunk that is 0x2000 in size, It takes the first byte and processes it against a value in a table. The second and ensuing bytes are processed before the compressed byte is formed. I'm wondering if the app needs to read all the data first before it does the compression, and if the compression is done in chunks of 0x2000?

I have not confirmed yet that the table is the one created from an empty 64k chunk of memory. The 2000 is significant because it is referenced independently by the ROM utility during decompression. The decompression is done in units of 0x2000 bytes. Also, the decompressor seeds the initialization with the values 0x10 and 0x8.

There are a lot of references in the LHA decompression to values like 0x61, also 0x72 and 0x77. At first, I thought that 0x61 was related to the ASCII letter 'a'. I'm curious about that because my ROM file cannot be processed by the ROM utility because of data it contains that is out of bounds. The same data in a ROM file that can be deciphered begins with a subtraction of 0x41 from the first data byte and that is followed by DEC 1's and JMPs. My initial data byte is way too high and the subtraction of 0x41 runs out of DEC 1's, leaving the app in an endless loop.

I mention that only because 0x41 is also the ASCII character 'A' and I have seen that method used in apps to determine if code is ASCII or not. That's why I wondered about the 0X61 that LHA uses.