Skip to content

SAMtools Support

At present, GCGC does not support working with BAM/SAM files directly. Instead one can pre-process the files by using samtools fasta or samtools fastq, for example. In particular, see the parsing options to ensure data from the alignments transfers to the file, then use one of the built-in GCGC Field parsers.

For example, by running samtools fasta -T RG ./alignments.bam, the RG tag from each alignment is copied into the description of the fasta sequence. Then, by using the DescriptionField in GCGC, the RG tag can converted into a label. The preprocessing function might look like:

def parse_description(d: str): str:
    for di in d.split("\t"):
        if di.startswith("RG"):
            return di
    else:
        raise RuntimeError("No RG tag found.")