Agent skill

bio-rnaseq-qc

RNA-seq specific quality control including rRNA contamination detection, strandedness verification, gene body coverage, and transcript integrity metrics. Use when validating RNA-seq libraries before differential expression analysis.

View SKILL.md on GitHub Repository

Stars 2,009

Forks 275

Install this agent skill to your Project

npx add-skill https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-rnaseq-qc

SKILL.md

Version Compatibility

Reference examples tested with: NCBI BLAST+ 2.15+, numpy 1.26+, picard 3.1+, pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

RNA-seq Quality Control

RNA-seq specific QC metrics beyond general read quality.

"Check RNA-seq alignment quality" → Assess gene body coverage, read distribution (exonic/intronic/intergenic), strand specificity, and rRNA contamination rate.

CLI: infer_experiment.py, read_distribution.py (RSeQC)
CLI: picard CollectRnaSeqMetrics

rRNA Contamination Detection

High rRNA content indicates failed rRNA depletion or polyA selection.

SortMeRNA (NCBI BLAST+)

bash

sortmerna \
    --ref rRNA_databases/smr_v4.3_default_db.fasta \
    --reads sample.fastq.gz \
    --aligned rRNA_reads \
    --other non_rRNA_reads \
    --fastx \
    --threads 8

rrna_count=$(grep -c "^@" rRNA_reads.fastq 2>/dev/null || echo 0)
total_count=$(zcat sample.fastq.gz | grep -c "^@")
rrna_pct=$(echo "scale=2; $rrna_count / $total_count * 100" | bc)
echo "rRNA: ${rrna_pct}%"

BLAST Against rRNA (NCBI BLAST+)

bash

seqkit sample -n 10000 sample.fastq.gz | seqkit fq2fa > sample_10k.fasta
blastn -query sample_10k.fasta -db rrna_db -outfmt 6 -evalue 1e-10 -max_target_seqs 1 | wc -l

Expected rRNA Levels

Library Type	Expected rRNA
PolyA selected	< 5%
rRNA depleted	< 10%
Total RNA	50-80%

Strandedness Verification

RSeQC infer_experiment (NCBI BLAST+)

bash

infer_experiment.py -i aligned.bam -r genes.bed

Output Interpretation

Fraction of reads explained by "1++,1--,2+-,2-+": 0.9856  # Forward stranded
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0144  # Reverse (should be low)

Strand Inference

Tool Setting	1++,1--,2+-,2-+	1+-,1-+,2++,2--
Forward (dUTP)	~0	~1
Reverse (Illumina)	~1	~0
Unstranded	~0.5	~0.5

Salmon Strandedness (NCBI BLAST+)

bash

salmon quant -i index -l A -r sample.fastq.gz -o quant/
grep "library_types" quant/lib_format_counts.json

Gene Body Coverage

Check for 3' or 5' bias indicating RNA degradation.

RSeQC geneBody_coverage (NCBI BLAST+)

bash

geneBody_coverage.py \
    -i aligned.bam \
    -r housekeeping_genes.bed \
    -o coverage

Interpretation

Pattern	Indicates
Even coverage	Good quality
3' bias	Degradation or polyA artifacts
5' bias	Incomplete reverse transcription
Steep drop	Severe degradation

Read Distribution

RSeQC read_distribution (NCBI BLAST+)

bash

read_distribution.py -i aligned.bam -r genes.bed > distribution.txt

Expected Distribution

Region	Good Library
CDS_Exons	60-80%
UTRs	10-20%
Introns	5-20%
Intergenic	< 10%

Transcript Integrity Number (TIN)

Measure of RNA degradation per transcript.

RSeQC tin (NCBI BLAST+)

bash

tin.py -i aligned.bam -r genes.bed > tin_scores.txt

TIN Interpretation

TIN Score	Quality
> 70	Good
50-70	Moderate
< 50	Poor

Duplication Rate

Picard MarkDuplicates (NCBI BLAST+)

bash

java -jar picard.jar MarkDuplicates \
    I=aligned.bam \
    O=marked.bam \
    M=dup_metrics.txt \
    REMOVE_DUPLICATES=false

grep -A 1 "LIBRARY" dup_metrics.txt | tail -1 | cut -f9

RNA-seq Expected Duplication

Library	Expected
High complexity	< 20%
Low input	20-50%
Concerning	> 50%

Insert Size (Paired-End)

Picard CollectInsertSizeMetrics (NCBI BLAST+)

bash

java -jar picard.jar CollectInsertSizeMetrics \
    I=aligned.bam \
    O=insert_metrics.txt \
    H=insert_histogram.pdf

Saturation Analysis

Subsampling Analysis

bash

for frac in 0.1 0.25 0.5 0.75 1.0; do
    samtools view -bs $frac aligned.bam > sub_${frac}.bam
    featureCounts -a genes.gtf -o counts_${frac}.txt sub_${frac}.bam
    detected=$(awk '$7 > 0' counts_${frac}.txt | wc -l)
    echo "$frac: $detected genes"
done

Picard CollectRnaSeqMetrics

Comprehensive RNA-seq metrics from Picard.

bash

java -jar picard.jar CollectRnaSeqMetrics \
    I=aligned.bam \
    O=rnaseq_metrics.txt \
    REF_FLAT=refFlat.txt \
    STRAND=SECOND_READ_TRANSCRIPTION_STRAND \
    RIBOSOMAL_INTERVALS=rRNA.interval_list

Key Metrics

Metric	Description
PCT_CODING_BASES	% in coding regions
PCT_UTR_BASES	% in UTRs
PCT_INTRONIC_BASES	% in introns
PCT_INTERGENIC_BASES	% intergenic
PCT_RIBOSOMAL_BASES	% rRNA
MEDIAN_5PRIME_TO_3PRIME_BIAS	3' bias

MultiQC Report

Aggregate all QC metrics.

bash

multiqc fastqc/ star_output/ featurecounts/ -o multiqc_report/

Complete RNA-seq QC Pipeline (NCBI BLAST+)

Goal: Generate a comprehensive RNA-seq QC report covering strandedness, read distribution, gene body coverage, transcript integrity, duplication, and RNA-seq metrics.

Approach: Run RSeQC tools (infer_experiment, read_distribution, geneBody_coverage, TIN) and Picard (MarkDuplicates, CollectRnaSeqMetrics) sequentially, appending all results to a single summary report file.

bash

#!/bin/bash
SAMPLE=$1
BAM=$2
GENES_BED=$3
REF_FLAT=$4

echo "=== RNA-seq QC: $SAMPLE ===" > qc_report.txt

echo -e "\n--- Strandedness ---" >> qc_report.txt
infer_experiment.py -i $BAM -r $GENES_BED >> qc_report.txt

echo -e "\n--- Read Distribution ---" >> qc_report.txt
read_distribution.py -i $BAM -r $GENES_BED >> qc_report.txt

echo -e "\n--- Gene Body Coverage ---" >> qc_report.txt
geneBody_coverage.py -i $BAM -r $GENES_BED -o coverage

echo -e "\n--- TIN Scores ---" >> qc_report.txt
tin.py -i $BAM -r $GENES_BED > tin.txt
awk '{sum+=$3; count++} END {print "Mean TIN:", sum/count}' tin.txt >> qc_report.txt

echo -e "\n--- Duplication ---" >> qc_report.txt
java -jar picard.jar MarkDuplicates I=$BAM O=/dev/null M=dup.txt 2>/dev/null
grep -A 1 "LIBRARY" dup.txt | tail -1 | awk '{print "Duplication rate:", $9}' >> qc_report.txt

echo -e "\n--- RNA-seq Metrics ---" >> qc_report.txt
java -jar picard.jar CollectRnaSeqMetrics I=$BAM O=rnaseq.txt REF_FLAT=$REF_FLAT STRAND=SECOND_READ_TRANSCRIPTION_STRAND 2>/dev/null
grep -A 2 "## METRICS CLASS" rnaseq.txt >> qc_report.txt

cat qc_report.txt

Python QC Summary

python

import pysam
import numpy as np
from collections import Counter

def rnaseq_qc(bam_file, sample_size=100000):
    bam = pysam.AlignmentFile(bam_file, 'rb')

    strand_counts = Counter()
    insert_sizes = []

    for i, read in enumerate(bam.fetch()):
        if i >= sample_size:
            break
        if not read.is_unmapped:
            if read.is_read1:
                if read.is_reverse:
                    strand_counts['1-'] += 1
                else:
                    strand_counts['1+'] += 1
            if read.is_proper_pair and read.template_length > 0:
                insert_sizes.append(read.template_length)

    bam.close()

    total = sum(strand_counts.values())
    print(f'Read 1 forward: {strand_counts["1+"]/total:.2%}')
    print(f'Read 1 reverse: {strand_counts["1-"]/total:.2%}')

    if insert_sizes:
        print(f'Median insert: {np.median(insert_sizes):.0f}')

rnaseq_qc('aligned.bam')

QC Thresholds Summary

Metric	Good	Warning	Fail
Mapping rate	> 85%	70-85%	< 70%
rRNA %	< 10%	10-20%	> 20%
Exonic %	> 60%	40-60%	< 40%
Duplication	< 20%	20-40%	> 40%
Mean TIN	> 70	50-70	< 50
3' bias	< 1.5	1.5-2	> 2

Related Skills

quality-reports - General FastQC
fastp-workflow - Read trimming
alignment-files/alignment-validation - General BAM QC
rna-quantification/featurecounts-counting - Quantification after QC

Maintainer

FreedomIntelligence Core maintainer

Source details

Full Name: FreedomIntelligence/OpenClaw-Medical-Skills
Branch: main
Path in repo: skills/bio-rnaseq-qc
Topics: claude-code skills openclaw awesome clawhub openclaw-skills medical nanoclaw

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

FreedomIntelligence/OpenClaw-Medical-Skills

vcf-annotator

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

chemist-analyst

Analyzes events through chemistry lens using molecular structure, reaction mechanisms, thermodynamics, kinetics, and analytical techniques (spectroscopy, chromatography, mass spectrometry). Provides insights on chemical processes, material properties, reaction pathways, synthesis, and analytical methods. Use when: Chemical reactions, material analysis, synthesis planning, process optimization, environmental chemistry. Evaluates: Molecular structure, reaction mechanisms, yield, selectivity, safety, environmental impact.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-alignment-io

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

sleep-analyzer

分析睡眠数据、识别睡眠模式、评估睡眠质量，并提供个性化睡眠改善建议。支持与其他健康数据的关联分析。

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2,009 275

Explore

FreedomIntelligence/OpenClaw-Medical-Skills

bio-hi-c-analysis-matrix-operations

Balance, normalize, and transform Hi-C contact matrices using cooler and cooltools. Apply iterative correction (ICE), compute expected values, and generate observed/expected matrices. Use when normalizing or transforming Hi-C matrices.

2,009 275

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Version Compatibility

RNA-seq Quality Control

rRNA Contamination Detection

SortMeRNA (NCBI BLAST+)

BLAST Against rRNA (NCBI BLAST+)

Expected rRNA Levels

Strandedness Verification

RSeQC infer_experiment (NCBI BLAST+)

Output Interpretation

Strand Inference

Salmon Strandedness (NCBI BLAST+)

Gene Body Coverage

RSeQC geneBody_coverage (NCBI BLAST+)

Interpretation

Read Distribution

RSeQC read_distribution (NCBI BLAST+)

Expected Distribution

Transcript Integrity Number (TIN)

RSeQC tin (NCBI BLAST+)

TIN Interpretation

Duplication Rate

Picard MarkDuplicates (NCBI BLAST+)

RNA-seq Expected Duplication

Insert Size (Paired-End)

Picard CollectInsertSizeMetrics (NCBI BLAST+)

Saturation Analysis

Subsampling Analysis

Picard CollectRnaSeqMetrics

Key Metrics

MultiQC Report

Complete RNA-seq QC Pipeline (NCBI BLAST+)

Python QC Summary

QC Thresholds Summary

Related Skills

Recommended Agent Skills

vcf-annotator

chemist-analyst

bio-alignment-io

sleep-analyzer

metabolomics-workbench-database

bio-hi-c-analysis-matrix-operations