Software

We developed many tools for compression, processing, and analysis of bioinformatics data. Some of them were designed by our group in close collaboration with external researchers. Below you can find a list of the tools with links to GitHub repositories. You can also visit our organization Web page at GitHub.

Compression of sequencing data

  • CoLoRd — compressor of 3rd gen (ONT, PacBio) sequencing data.
  • DSRC — very fast compressor of 2nd gen (Illumina) sequencing data.
  • FaStore — compressor (compromise between ratio and speed) of 2nd gen (Illumina) sequencing data.
  • FQSqueezer — best-ratio-focussed compressor of 2nd gen (Illumina) sequencing data.
  • ORCOM — experimental compressor of bases in 2nd gen (Illumina) sequencing data.

Compression of genome collections

  • AGC — compressor of collections of complete genomes (sets of contigs) of the same species.
  • GDC — compressor of collections of complete genomes (sets of chromosomes) of the same species.

Compression of genotype collections

  • GTC — compressed data structure of collections of genotypes; supports various types of queries.
  • GTShark — compressor of collections of genotypes.
  • MuGI — index to a collection of genomes of the same species.
  • TGC — experimental compressor of collections of genomes (sets of chromosomes) of the same species.
  • VCFShark — compressor of VCF files.

Multiple sequence alignment of proteins

  • CoMSA — compressor of collections of multiple sequence alignments (MSA) of proteins.
  • FAMSA — very fast aligner of huge (1M+) protein families.
  • QuickProbs — high-quality aligner of moderate-size (approx. 1k) protein families.
  • CoMeta — classification of metagenomes in sequencing data.
  • KMC — very fast k-mer counter.
  • KMC tools — tools to operate on sets of k-mers.
  • Kmer-db — compact data structure representing collection of k-mers in genomes.
  • PHIST — tool to predict prokaryotic hosts for phage (meta)genomic sequences.
  • RECKONER — corrector of errors in 2nd gen (Illumina) sequencing data.

Read mapping

  • Whisper — robust mapper of 2nd gen (Illumina) sequencing data.

External projects to which we contributed