Differential Gene Expression Pipeline
Reproducible RNA-seq DEG pipeline comparing healthy vs. wild-type fungal samples — covering BAM processing, coverage-based ROI discovery, featureCounts quantification, and DESeq2 statistical analysis with full visualization outputs.
RPythonBashRNA-seqDESeq2Bioinformatics
Overview
- Reproducible RNA-seq pipeline for identifying differentially expressed genomic regions between healthy control (HC) and wild-type (WT) fungal samples
- Covers the full workflow: raw BAM processing, coverage-based region discovery, read quantification, and DESeq2 statistical analysis
- Automated scripts at each stage with full visualization outputs
Pipeline Stages
- BAM processing — Automated shell scripts for position-sorting, indexing, and name-sorting of 9 sample BAM files (4 HC, 5 WT) using samtools
- Coverage analysis — Per-sample coverage maps generated with bedtools; merged across all samples to identify high-expression regions of interest (ROIs, ≥5x depth, ≥10 bp)
- GTF annotation — Python scripts to clean and standardize GFF/GTF annotations from Geneious for compatibility with featureCounts; handles both automated and manually curated ROI definitions
- Read counting — featureCounts (Rsubread) in paired-end mode to generate count matrices across ~120 ROIs per sample
- Differential expression — DESeq2 negative binomial GLM comparing HC vs. WT; output includes adjusted p-values, log2 fold changes, and full visualization suite
Key Results
- 123 ROIs analyzed; multiple statistically significant DEGs identified (padj < 0.05)
- Majority of significant ROIs were downregulated in WT samples
- Heatmaps showed clear hierarchical clustering separation between HC and WT groups
| ROI | log2 Fold Change | padj | |---|---|---| | ROI_1239 | -8.89 | 9.6e-20 | | ROI_21503 | -7.06 | 1.1e-20 | | ROI_32505 | -8.24 | 2.6e-05 | | ROI_63603 | -2.59 | 4.1e-07 |
Visualizations
- Volcano plots
- MA plots
- Z-score normalized heatmaps of top DEGs
- Sample distance heatmaps
Tools & Technologies
- R, Python, Bash
- DESeq2, featureCounts, samtools, bedtools, ggplot2, pheatmap