SARS-CoV-2 nsp12 Motif Mutation Analysis Toolkit โ Q1 2025
SARS-CoV-2 nsp12 Motif Mutation Analysis Scripts โ Q1 2025 Edition
Overview
This package contains a complete set of Python scripts developed to analyze mutation patterns in four conserved functional motifs (A, C, F, G) of the SARS-CoV-2 RNA-dependent RNA polymerase (nsp12) gene.
All genomic sequence data used in this work is freely available from public repositories such as:
๐ [NCBI Virus](https://www.ncbi.nlm.nih.gov/labs/virus/ )
๐ [GISAID EpiCoV Database](https://www.gisaid.org/ )
These scripts were developed independently by me to process publicly available FASTA sequences and identify mutations relevant to antiviral drug resistance , particularly at known resistance sites such as:
- Motif A : D614X, P621X, N623X
- Motif C : L655X, V657X (linked to Remdesivir resistance)
These scripts are reusable and extendable for analyzing new datasets or expanding the scope of mutation tracking.
๐งช Included Python Scripts
| Script | Purpose |
| extract_motifs.py | Extracts motif regions A, C, F, G from aligned FASTA files |
| extract_mutations_motif_a.py , extract_mutations_motif_c.py , etc. | Calls nucleotide mutations vs Wuhan-Hu-1 reference per motif |
| generate_mutations_all_motifs.py | Combines all motif mutation results into one CSV |
| extract_accessions_per_motif.py | Lists sample accessions used per motif |
| extract_unique_mutations.py | Identifies unique or potentially undescribed mutations across motifs |
| count_mutation_frequency.py | Counts how often each mutation appears |
| count_motif_accessions.py | Reports number of samples used per motif |
| print_motif_mutation_tables.py | Prints formatted mutation tables using tabulate |
All scripts are written in Python 3.x and require Biopython and Pandas.
๐ Licensing & Usage Note
All scripts are provided under the Creative Commons Attribution 4.0 International License
๐ https://creativecommons.org/licenses/by/4.0/
You are free to:
- Adapt โ remix, transform, and build upon the scripts
Under the following conditions:
- Appropriate Credit must be given
This license does not apply to FASTA files, metadata, or sequence data, which are sourced from open databases and governed by their own terms.
๐ก Who This Is For
Researchers, bioinformaticians, virologists, and public health professionals interested in:
- Reproducible bioinformatics workflows
- Mutation tracking in SARS-CoV-2 nsp12
- Drug resistance surveillance
- Customizable pipelines for global genomic data
๐ ๏ธ How to Use
1. Install dependencies:
pip install biopython pandas tabulate
Place your FASTA files in the working directory
Run the scripts in order to extract motifs, call mutations, and generate reports
๐งฉ Why Buy This?
While the underlying sequence data is freely accessible, these scripts provide:
A reproducible pipeline for mutation detection
Tools to identify drug resistance mutations
Ready-to-use logic for future datasets
Clean code that helps researchers skip the development phase and jump straight into analysis
โ ๏ธ Disclaimer
This research is based on publicly available sequence data and does not involve clinical trials or direct patient impact.
Results should not be used for diagnosis, treatment, or policy without further validation.
๐งพ Author
Tahir Bhatti
๐ง tahirhb@tahirhb.com
GitHub: @tahir-hb
Twitter: @tahir_hb
Published via Zenodo:
๐ https://zenodo.org/record/15450402
- Reproducible bioinformatics workflows - Mutation tracking in SARS-CoV-2 nsp12 - Drug resistance surveillance - Customizable pipelines for global genomic data