Agent skill
knowledge-pipeline
Install this agent skill to your Project
npx add-skill https://github.com/drshailesh88/integrated_content_OS/tree/main/skills/cardiology/knowledge-pipeline
SKILL.md
Knowledge Pipeline Skill
Metadata
- Name: knowledge-pipeline
- Version: 1.1
- Purpose: Build rich knowledge context using RAG + PubMed before writing
- Trigger: Any content creation task requiring evidence-based writing
Overview
This skill implements a parallel knowledge building pipeline that queries BOTH:
- RAG Pipeline - Your AstraDB vector store containing cardiology textbooks and guidelines
- PubMed Pipeline - Latest research via NCBI E-utilities API
NOTE: Perplexity is used SEPARATELY for social listening and demand assessment (YouTube workflow), NOT for research/evidence gathering.
When to Use
ALWAYS use this skill BEFORE writing content that requires:
- Evidence-based claims
- Statistics or study citations
- Guideline references
- Current best practices
- Recent trial results
Architecture
┌─────────────────────────────────────────────────────────────┐
│ KNOWLEDGE PIPELINE (Research) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Question ──┬──► RAG Pipeline ──────────────────────────────│
│ │ (AstraDB Vector Store) │
│ │ - YOUR textbooks (Braunwald, etc.) │
│ │ - Guidelines (ESC, ACC, AHA) │
│ │ - Reference materials │
│ │ Tech: Vector + BM25 + RRF + Cohere rerank │
│ │ │
│ └──► PubMed Pipeline ───────────────────────────│
│ (NCBI E-utilities API) │
│ - Latest research articles │
│ - Systematic reviews │
│ - Meta-analyses │
│ - Clinical trials │
│ │
├─────────────────────────────────────────────────────────────┤
│ SYNTHESIS │
│ Combined context ──► GPT-4o-mini ──► Knowledge Brief │
│ │
│ Output: │
│ 1. Established Knowledge (guidelines) │
│ 2. Latest Research (PubMed) │
│ 3. Key Data Points │
│ 4. Areas of Consensus │
│ 5. Areas of Uncertainty │
│ 6. Citation Summary │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SEPARATE: DEMAND ASSESSMENT (YouTube) │
├─────────────────────────────────────────────────────────────┤
│ Perplexity ──► Social listening, trends, what people ask │
│ Free LLMs ──► Demand analysis from YouTube comments │
│ │
│ This is NOT research. This is audience intelligence. │
└─────────────────────────────────────────────────────────────┘
Usage
Python Integration
from rag_pipeline.src.knowledge_pipeline import KnowledgePipeline
pipeline = KnowledgePipeline(verbose=True)
# Option 1: Get raw combined context
context = pipeline.build_knowledge_context(
"What are optimal LDL targets for high-risk patients?"
)
# Option 2: Get synthesized knowledge brief
brief = pipeline.synthesize_knowledge(
"What are optimal LDL targets for high-risk patients?"
)
CLI Usage
cd "/Users/shaileshsingh/cowriting system/rag-pipeline"
python src/knowledge_pipeline.py
Configuration
Environment Variables (in .env)
# RAG Pipeline (AstraDB)
ASTRA_DB_APPLICATION_TOKEN=your_token
ASTRA_DB_API_ENDPOINT=your_endpoint
ASTRA_DB_COLLECTION=documents
OPENAI_API_KEY=your_key
COHERE_API_KEY=your_key
# PubMed Pipeline
NCBI_API_key=your_key
Output Format
Raw Context (build_knowledge_context)
## FROM TEXTBOOKS & GUIDELINES (RAG)
--------------------------------------------------
[Source 1: ESC Guidelines 2021.pdf, Page 45] (Score: 0.892)
LDL-C targets for patients with established CVD...
[Source 2: Braunwald Cardiology.pdf, Page 1203] (Score: 0.856)
The evidence for aggressive LDL lowering...
## FROM PUBMED (Latest Research)
--------------------------------------------------
[1] PMID: 38123456
Smith, Jones, Brown et al. (2024)
Novel LDL-C targets in high-risk populations
Journal of the American College of Cardiology
Abstract: Recent meta-analysis of 15 trials...
==================================================
KNOWLEDGE SUMMARY
- RAG chunks (textbooks/guidelines): 8
- PubMed articles (latest research): 5
- Total sources: 13
==================================================
Priority When Sources Conflict
- Highest: Established guidelines from RAG (ESC, ACC, AHA)
- High: Major trials and meta-analyses (RAG + PubMed)
- Medium: Recent updates not yet in guidelines (PubMed)
- Lower: Single studies, expert opinion
Integration with Writing Skills
This skill feeds into ALL writing skills:
- cardiology-writer - Uses knowledge brief for factual grounding
- cardiology-newsletter-writer - Research phase uses this pipeline
- cardiology-editorial - Evidence synthesis from both sources
- youtube-script-master - Educational sections use RAG + PubMed context
Cost Estimates
| Component | Model/Service | Cost per Query |
|---|---|---|
| RAG Embeddings | text-embedding-3-small | ~$0.001 |
| RAG Reranking | Cohere rerank-english-v3.0 | ~$0.01 |
| PubMed API | NCBI E-utilities | Free |
| Synthesis | GPT-4o-mini | ~$0.002 |
| Total | ~$0.013/query |
Maintenance
Adding New Documents to RAG
cd "/Users/shaileshsingh/cowriting system/rag-pipeline"
python src/ingest_documents.py --folder /path/to/new/pdfs
Rebuilding BM25 Index
Delete cache files to force rebuild:
rm .bm25_cache.pkl .doc_cache.json
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
pufferlib
This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.
fluidsim
Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.
metabolomics-workbench-database
Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.
geniml
This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
astropy
Comprehensive Python library for astronomy and astrophysics. This skill should be used when working with astronomical data including celestial coordinates, physical units, FITS files, cosmological calculations, time systems, tables, world coordinate systems (WCS), and astronomical data analysis. Use when tasks involve coordinate transformations, unit conversions, FITS file manipulation, cosmological distance calculations, time scale conversions, or astronomical data processing.
Didn't find tool you were looking for?