Author Topic: LZSS decompression... am I doing it right?  (Read 485 times)

Schala Zeal

  • Radical Dreamer (+2000)
  • *
  • Posts: 2050
  • 7th Elemental Innate
    • View Profile
LZSS decompression... am I doing it right?
« on: September 21, 2017, 07:18:56 pm »
So I recently got into a programming language called Rust which is a systems language that's supposed to strongly prevent any possibility of memory leaks, data races, etc. Out of just randomness, I followed the wiki explanation of Chrono Cross's LZSS algorithm, and used the '2540.out' file from the CD to test this on.

I compared it to decompression via Purple Cat Tools and the decompressed files don't match. The decompressed file mine outputs is 4765 bytes  as opposed to PCT's 2160 bytes, where the decompressed size info in the file header says 2048. The contents of my file are mostly null (0 value) bytes with some data here and there while PCT outputs a file filled with data.

I noticed PCT also bitswaps in 512-byte chunks (512 * 8 = 4096, so I'm guessing the data following header is the initial buffer content?), which wasn't mentioned on the wiki, so I did the same, with... not much luck. Below is the entire code for the decode function + the test. It should be similar to C/C++.

Code: [Select]
extern crate bitstream_io;
extern crate bit_reverse;

use bitstream_io::{BitReader,LE};
use bit_reverse::ParallelReverse;
use std::io::Cursor;

pub const BUFFER_SIZE: usize = 4096;
pub const LIT_SIZE: u32 = 8;
pub const PTR_SIZE: u32 = 12;
pub const VAL_SIZE: usize = 4;
pub const VAL_ADD: isize = 2;

pub fn decode(data: &[u8]) -> Vec<u8> {
let mut ic = Cursor::new(&data);
let mut br = BitReader::<LE>::new(&mut ic);

let mut header: [u8; 4] = [0; 4];
let _ = br.read_bytes(&mut header);
assert_eq!(&header, b"sszl");

let dsize = br.read::<u32>(32).unwrap_or(0) as usize;
assert!(dsize != 0);

let unknown = br.read::<u32>(32).unwrap_or(0);

        // bitswap every byte past the header
let mut rdata = vec![0u8; data.len()-12];
rdata.copy_from_slice(&data[12..]);
for i in rdata.iter_mut() {
*i = i.swap_bits();
}
let mut ic = Cursor::new(&rdata);
let mut br = BitReader::<LE>::new(&mut ic);

let mut outbuf = vec![0u8; dsize];
let mut buf: [u8; BUFFER_SIZE] = [0; BUFFER_SIZE];
let mut idx: usize = 0;
while let Ok(is_val) = br.read_bit() { // TL;DR: while not end of the data
if is_val == true {
buf[idx % BUFFER_SIZE] = br.read::<u8>(LIT_SIZE).unwrap();
idx += 1;
} else {
if let Ok(offs) = br.read::<u16>(PTR_SIZE) { // compiler tends to choke without the if-let guard on this part
if let Ok(size) = br.read::<u8>(VAL_SIZE as u32) {
let sz = (size as isize) + VAL_ADD;
for i in 0..sz {
outbuf.push(buf[(offs as usize)+(i as usize)]);
}
}
}
}
}

outbuf
}

#[test]
fn dec_test() {
use std::fs::File;
use std::io::prelude::*;

let test_data = include_bytes!("2540.out");
let mut f = File::create("test.dat").unwrap();
let ucmp = decode(&test_data[..]);
f.write_all(&ucmp).unwrap();
}

OneWingedAngel

  • Iokan (+1)
  • *
  • Posts: 10
    • View Profile
Re: LZSS decompression... am I doing it right?
« Reply #1 on: August 27, 2018, 08:01:29 pm »
Using the TT source (Lzss.cpp) as a reference, I can see three possible errors in the code you posted:

1) In the codeblock for when the first bit indicates a byte literal (if is_val == true {), you write that byte to your temporary buffer buf, but never put that byte in outbuf.  In contrast, the corresponding block of Lzss_Decompression in the TT code (if (type==OCTET)) stores the byte/character in the temporary buffer and writes it to the outfile.

2) In the line that writes a byte/character from buffer to outbuf (outbuf.push(buf[(offs as usize)+(i as usize)])), I think the index of buf needs % BUFFER_SIZE at the end.  As written, it looks like it could ask for an illegal element of buf when offs+i > BUFFER_SIZE.

3) The line in the TT code that reads the buffer offset (offs in your code, position in theirs) strangely *decreases* it by 1 from what the bitreader returned (then modulos by 4096 to fix any -1's that generated).  That seems really weird, but it's there.  Maybe there's something weird about TT's bitreader (Bread_M in Binare.cpp).

But more broadly, something more than these errors must be wrong for your code to generate a 4765 byte file when the header says 2048.  Googling a bit, my guess is that you're writing to outbuf incorrectly.  You initialized it as a vector of zeros of type u8 with dsize elements, the size of the file you want to write (based on the header).  So when you want to write data to it, you should write to an existing element of outbuf; start from element zero, and increment a position counter every time you write to outbuf (whether from a byte literal in the compressed file or from the temporary buffer).  In your code, however, you use push, which *adds* an element on the end of the list / vector / serializer / whatever these things are called in Rust.  So I would expect that your test has all zeros for the first 2048 bytes, then whatever output your code produces.

Caveats: I don't know C++ or Rust, I can barely read French, and I learned about LZSS compression just now.