Agent skills
knowledge-pipeline

Agent skill

knowledge-pipeline

View SKILL.md on GitHub Repository

Stars 2

Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/drshailesh88/integrated_content_OS/tree/main/skills/cardiology/knowledge-pipeline

SKILL.md

Knowledge Pipeline Skill

Metadata

Name: knowledge-pipeline
Version: 1.1
Purpose: Build rich knowledge context using RAG + PubMed before writing
Trigger: Any content creation task requiring evidence-based writing

Overview

This skill implements a parallel knowledge building pipeline that queries BOTH:

RAG Pipeline - Your AstraDB vector store containing cardiology textbooks and guidelines
PubMed Pipeline - Latest research via NCBI E-utilities API

NOTE: Perplexity is used SEPARATELY for social listening and demand assessment (YouTube workflow), NOT for research/evidence gathering.

When to Use

ALWAYS use this skill BEFORE writing content that requires:

Evidence-based claims
Statistics or study citations
Guideline references
Current best practices
Recent trial results

Architecture

┌─────────────────────────────────────────────────────────────┐
│                 KNOWLEDGE PIPELINE (Research)                │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Question ──┬──► RAG Pipeline ──────────────────────────────│
│             │    (AstraDB Vector Store)                      │
│             │    - YOUR textbooks (Braunwald, etc.)          │
│             │    - Guidelines (ESC, ACC, AHA)                │
│             │    - Reference materials                       │
│             │    Tech: Vector + BM25 + RRF + Cohere rerank   │
│             │                                                │
│             └──► PubMed Pipeline ───────────────────────────│
│                  (NCBI E-utilities API)                      │
│                  - Latest research articles                  │
│                  - Systematic reviews                        │
│                  - Meta-analyses                             │
│                  - Clinical trials                           │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│                       SYNTHESIS                              │
│  Combined context ──► GPT-4o-mini ──► Knowledge Brief       │
│                                                              │
│  Output:                                                     │
│  1. Established Knowledge (guidelines)                       │
│  2. Latest Research (PubMed)                                 │
│  3. Key Data Points                                          │
│  4. Areas of Consensus                                       │
│  5. Areas of Uncertainty                                     │
│  6. Citation Summary                                         │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              SEPARATE: DEMAND ASSESSMENT (YouTube)           │
├─────────────────────────────────────────────────────────────┤
│  Perplexity ──► Social listening, trends, what people ask   │
│  Free LLMs  ──► Demand analysis from YouTube comments       │
│                                                              │
│  This is NOT research. This is audience intelligence.        │
└─────────────────────────────────────────────────────────────┘

Usage

Python Integration

python

from rag_pipeline.src.knowledge_pipeline import KnowledgePipeline

pipeline = KnowledgePipeline(verbose=True)

# Option 1: Get raw combined context
context = pipeline.build_knowledge_context(
    "What are optimal LDL targets for high-risk patients?"
)

# Option 2: Get synthesized knowledge brief
brief = pipeline.synthesize_knowledge(
    "What are optimal LDL targets for high-risk patients?"
)

CLI Usage

bash

cd "/Users/shaileshsingh/cowriting system/rag-pipeline"
python src/knowledge_pipeline.py

Configuration

Environment Variables (in .env)

bash

# RAG Pipeline (AstraDB)
ASTRA_DB_APPLICATION_TOKEN=your_token
ASTRA_DB_API_ENDPOINT=your_endpoint
ASTRA_DB_COLLECTION=documents
OPENAI_API_KEY=your_key
COHERE_API_KEY=your_key

# PubMed Pipeline
NCBI_API_key=your_key

Output Format

Raw Context (build_knowledge_context)

## FROM TEXTBOOKS & GUIDELINES (RAG)
--------------------------------------------------
[Source 1: ESC Guidelines 2021.pdf, Page 45] (Score: 0.892)
LDL-C targets for patients with established CVD...

[Source 2: Braunwald Cardiology.pdf, Page 1203] (Score: 0.856)
The evidence for aggressive LDL lowering...

## FROM PUBMED (Latest Research)
--------------------------------------------------
[1] PMID: 38123456
    Smith, Jones, Brown et al. (2024)
    Novel LDL-C targets in high-risk populations
    Journal of the American College of Cardiology
    Abstract: Recent meta-analysis of 15 trials...

==================================================
KNOWLEDGE SUMMARY
- RAG chunks (textbooks/guidelines): 8
- PubMed articles (latest research): 5
- Total sources: 13
==================================================

Priority When Sources Conflict

Highest: Established guidelines from RAG (ESC, ACC, AHA)
High: Major trials and meta-analyses (RAG + PubMed)
Medium: Recent updates not yet in guidelines (PubMed)
Lower: Single studies, expert opinion

Integration with Writing Skills

This skill feeds into ALL writing skills:

cardiology-writer - Uses knowledge brief for factual grounding
cardiology-newsletter-writer - Research phase uses this pipeline
cardiology-editorial - Evidence synthesis from both sources
youtube-script-master - Educational sections use RAG + PubMed context

Cost Estimates

Component	Model/Service	Cost per Query
RAG Embeddings	text-embedding-3-small	~$0.001
RAG Reranking	Cohere rerank-english-v3.0	~$0.01
PubMed API	NCBI E-utilities	Free
Synthesis	GPT-4o-mini	~$0.002
Total		~$0.013/query

Maintenance

Adding New Documents to RAG

bash

cd "/Users/shaileshsingh/cowriting system/rag-pipeline"
python src/ingest_documents.py --folder /path/to/new/pdfs

Rebuilding BM25 Index

Delete cache files to force rebuild:

bash

rm .bm25_cache.pkl .doc_cache.json

Maintainer

drshailesh88 Core maintainer

Source details

Full Name: drshailesh88/integrated_content_OS
Branch: main
Path in repo: skills/cardiology/knowledge-pipeline

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

drshailesh88/integrated_content_OS

pufferlib

This skill should be used when working with reinforcement learning tasks including high-performance RL training, custom environment development, vectorized parallel simulation, multi-agent systems, or integration with existing RL environments (Gymnasium, PettingZoo, Atari, Procgen, etc.). Use this skill for implementing PPO training, creating PufferEnv environments, optimizing RL performance, or developing policies with CNNs/LSTMs.

2 0

Explore

drshailesh88/integrated_content_OS

fluidsim

Framework for computational fluid dynamics simulations using Python. Use when running fluid dynamics simulations including Navier-Stokes equations (2D/3D), shallow water equations, stratified flows, or when analyzing turbulence, vortex dynamics, or geophysical flows. Provides pseudospectral methods with FFT, HPC support, and comprehensive output analysis.

2 0

Explore

drshailesh88/integrated_content_OS

metabolomics-workbench-database

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

2 0

Explore

drshailesh88/integrated_content_OS

geniml

This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.

2 0

Explore

drshailesh88/integrated_content_OS

zinc-database

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

2 0

Explore

drshailesh88/integrated_content_OS

astropy

Comprehensive Python library for astronomy and astrophysics. This skill should be used when working with astronomical data including celestial coordinates, physical units, FITS files, cosmological calculations, time systems, tables, world coordinate systems (WCS), and astronomical data analysis. Use when tasks involve coordinate transformations, unit conversions, FITS file manipulation, cosmological distance calculations, time scale conversions, or astronomical data processing.

2 0

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Knowledge Pipeline Skill

Metadata

Overview

When to Use

Architecture

Usage

Python Integration

CLI Usage

Configuration

Environment Variables (in .env)

Output Format

Raw Context (build_knowledge_context)

Priority When Sources Conflict

Integration with Writing Skills

Cost Estimates

Maintenance

Adding New Documents to RAG

Rebuilding BM25 Index

Recommended Agent Skills

pufferlib

fluidsim

metabolomics-workbench-database

geniml

zinc-database

astropy