Basic tools

Fundamental operations for genome file reading and basic k-mer frequency analysis. This module provides lightweight utilities commonly used as preprocessing steps in DNA sequence analysis workflows.

Loading genome from file

GenomeVisualizer.basic.load_genome_from_txt(filepath: str) str[source]

Loads a genome sequence from a plain text (.txt) file.

This function reads the entire content of a text file and removes any whitespace or newline characters, returning a continuous DNA string.

Parameters:

filepath (str) – Path to the genome file (must be a .txt file containing ACGT characters).

Returns:

A cleaned DNA sequence as a single string.

Return type:

str

Raises:
  • FileNotFoundError – If the specified file does not exist.

  • ValueError – If the file is empty or contains invalid characters.

Example

>>> genome = load_genome_from_txt("data/ecoli.txt")
>>> genome[:10]
'AGCTTTTCAT'

k-mer frequency analysis

GenomeVisualizer.basic.FrequencyMap(Text: str, k: int) dict[str, int][source]

Computes the frequency of all k-length substrings (k-mers) in a DNA sequence.

This function counts how many times each k-mer appears in the input DNA sequence using a sliding window. The output is a dictionary where each key is a unique k-mer, and the value is the number of occurrences.

Parameters:
  • Text (str) – The DNA sequence to scan.

  • k (int) – Length of the k-mers to count.

Returns:

A dictionary mapping each k-mer to its count in the sequence.

Return type:

dict[str, int]

Example

>>> FrequencyMap("ATATA", 3)
{'ATA': 2, 'TAT': 1}

Most frequent k-mers

GenomeVisualizer.basic.FrequentWords(Text: str, k: int) list[str][source]

Identifies the most frequent k-length substrings (k-mers) in a DNA sequence.

This function uses FrequencyMap to find all k-mers in the sequence and then returns those that occur with the highest frequency.

Parameters:
  • Text (str) – The DNA sequence to search.

  • k (int) – Length of the k-mers.

Returns:

A list of k-mers with the highest frequency in the sequence.

Return type:

list[str]

Example

>>> FrequentWords("ACGTTGCATGTCGCATGATGCATGAGAGCT", 4)
['CATG', 'GCAT']