Skip to content
Broken test due to platform differences in
User can specify to use start or end tokens optionally.
Removed one_hot_encoding. The user can do that pretty easily if needed. E.g.
scatter in PyTorch.
Properties to access the integer encodings of special tokens. (35cae2a)
Remove uniprot dataset creation. (e233162)
Simplify index handling for GenomicDataset. (3213a9e)
Updated package management so gcgc is easier to use with other version of
Ability for kmer size to be passed to an alphabet.
Add Dockerfile and docker-compose.yml for development.
EncodedSeq.shift, which will shift sequence by an offset integer.
EncodedSeq.from_integer_encoded_seq will take a list of integers and an
alphabet and return an EncodedSeq object.
Add the ability to apply a function to the rollout_kmers yielded values.
Alphabet special characters are now located at the start, rather than the end,
of the letters and token sequence.
Add extra css to make underline links in articles.
Exit if the download directory doesn't exist in the call to download organism.
Wording improvements in docs.
seq_tensor_one_hot in the PyTorch Parser.
gcgc.random module to start holding sequence data.
gcgc.rollout module to handle working through chunks of sequences.
rollout_kmers will roll out
rollout_seq_features will roll out the
SeqFeatures from a
EncodingAlphabet now can optionally take a
gap_characters set of characters to add to the
alphabet letters. It also takes
add_lower_case_for_inserts which will duplicate the alphabet,
but convert the letters to lowercase.
Fixed bug in
GenomicDataset.from_path where it still referred to
EncodedSeq now supports iterating through kmers, see
EncodedSeq.rollout_kmers for options.
GCGC is citable.
GCGC now has a CHANGELOG.md.