Basic tools
Fundamental operations for genome file reading and basic k-mer frequency analysis. This module provides lightweight utilities commonly used as preprocessing steps in DNA sequence analysis workflows.
Loading genome from file
- GenomeVisualizer.basic.load_genome_from_txt(filepath: str) str[source]
Loads a genome sequence from a plain text (.txt) file.
This function reads the entire content of a text file and removes any whitespace or newline characters, returning a continuous DNA string.
- Parameters:
filepath (str) – Path to the genome file (must be a .txt file containing ACGT characters).
- Returns:
A cleaned DNA sequence as a single string.
- Return type:
str
- Raises:
FileNotFoundError – If the specified file does not exist.
ValueError – If the file is empty or contains invalid characters.
Example
>>> genome = load_genome_from_txt("data/ecoli.txt") >>> genome[:10] 'AGCTTTTCAT'
k-mer frequency analysis
- GenomeVisualizer.basic.FrequencyMap(Text: str, k: int) dict[str, int][source]
Computes the frequency of all k-length substrings (k-mers) in a DNA sequence.
This function counts how many times each k-mer appears in the input DNA sequence using a sliding window. The output is a dictionary where each key is a unique k-mer, and the value is the number of occurrences.
- Parameters:
Text (str) – The DNA sequence to scan.
k (int) – Length of the k-mers to count.
- Returns:
A dictionary mapping each k-mer to its count in the sequence.
- Return type:
dict[str, int]
Example
>>> FrequencyMap("ATATA", 3) {'ATA': 2, 'TAT': 1}
Most frequent k-mers
- GenomeVisualizer.basic.FrequentWords(Text: str, k: int) list[str][source]
Identifies the most frequent k-length substrings (k-mers) in a DNA sequence.
This function uses FrequencyMap to find all k-mers in the sequence and then returns those that occur with the highest frequency.
- Parameters:
Text (str) – The DNA sequence to search.
k (int) – Length of the k-mers.
- Returns:
A list of k-mers with the highest frequency in the sequence.
- Return type:
list[str]
Example
>>> FrequentWords("ACGTTGCATGTCGCATGATGCATGAGAGCT", 4) ['CATG', 'GCAT']