File Type Reference


by Jeff Connelly

The main goal of bag archives is to provide a simple and efficient way to
combine many files into one. For that reason, it is extremely simple.

The header is at the beginning of the file:
OFFSET Count TYPE Description
0000h 3 char ID='BAG'
0003h 2 char Version, ID='11'

There is one file block for each file:
OFFSET Count TYPE Description
0000h 1 dword File length in bytes
0004h 1 byte Filename length
????h ? char Filename
????h ? char File contents

At the very end of the file is:
OFFSET Count TYPE Description
0000h 1 char EOF marker, ID=1A

A file length of zero in a file block indicates a special file that will be
used as a description. Directories are also stored in the filename field:
Filename='> dirname' means change current directory to 'dirname'.
For example,
Filename='> JEFF'
Filename='> COMPUTER'
Filename='> BAG'
Would make the directory JEFF\COMPUTER\BAG.
Files are put in the current directory.

The files contents is not always raw data, it can be compressed.
An optional four-byte signature specifies the compression scheme:
---- Compression ----
LZW LZW encoded (note extra space at end)
RLEn RLE method N (1 to 4) encoded
HUFF Huffman encoded
LZHF LZHUF encoded (LZSS + Arithmetic)
  *NOTE* The authors of LZHUF do not allow using it for any commercial purpose.
LZAR LZARI encoded (LZSS + Huffman)
WCOD Word coded -- text only
If no signature is found it is assumed to be raw data.

The word coding compression is as follows:
* Initalize dictionary (max. size FFFF)
* Loop until end-of-file
* Read a space-delimited word from the input stream
* Search for the word in the dictionary
* If it is not there, add it to the first free space and output location
  of where in the dictionary the word was added as a 16-bit integer.
* If it is there, output the location of where it is as a 16-bit integer.
* Write dictionary at end of file. Each word is null-terminated, whole
  dictionary is double-null terminated.
This means that each word will be encoded as 2 bytes.

The last character is an end-of-file character to insure the whole file was
received when transferring this archive.