Lena for bam features :
100% lossless compression-MD5 preserved
Compression with Lena for bam is completely lossless: it preserves all the information, byte per byte. The format embeds two checksums, one to validate that the decompressed file is indeed the same as the original data, and one to check if some data corruption occurred during transmission or storage. In such an event, the file format will pinpoint the location in the file where the error occurred.
High compression ratio while simple to use
Lena for bam can achieve a high compression ratio while keeping you away from the hassle to always link the decompression to the reference genome that has been used during the alignment step. You can therefore store .bam.lena files for many years without keeping old reference genome versions for decompression.
The code has been thoroughly optimized to provide very good compression and decompression speed, without hampering the compression ratio. Speed is crucial to allow for an easy integration into an existing workflow, and to speed up file transfer.
When the compressed bam.lena file is needed for some computation, e.g. for variant calling with GATK, it is possible to avoid decompression of the lena file on the disk. Instead, the file may be decompressed on the fly and fed directly to the analysis software that requires a bam file input. This greatly reduces read / write to the disk and achieves much better performance.