SAMtools Support
At present, GCGC does not support working with BAM/SAM files directly. Instead one can pre-process
the files by using samtools fasta or samtools fastq, for example. In particular, see the parsing
options to ensure data from the alignments transfers to the file, then use one of the built-in
GCGC Field parsers.
For example, by running samtools fasta -T RG ./alignments.bam, the RG tag from each alignment is
copied into the description of the fasta sequence. Then, by using the DescriptionField in GCGC, the
RG tag can converted into a label. The preprocessing function might look like:
def parse_description(d: str): str: for di in d.split("\t"): if di.startswith("RG"): return di else: raise RuntimeError("No RG tag found.")