Agent skill
semantik-plugin-development
Create semantik plugins (connectors, embeddings, chunkers, rerankers, extractors, agents). Use when developing plugins, creating new integrations, or asking about plugin patterns, protocols, or testing.
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/semantik-plugin-development
SKILL.md
Semantik Plugin Development
This skill helps you create plugins for Semantik, a self-hosted semantic search engine. Plugins extend Semantik's capabilities for document ingestion, embedding, chunking, reranking, extraction, and AI agents.
Protocol Version
Current Version: 1.0.0
Breaking changes to protocols increment the major version. Your plugins continue to work as long as they satisfy the protocol interface.
Security Note
Plugins run in-process with the main Semantik application (no sandboxing). Only install plugins you trust. See Security Guide for details.
Quick Start
Create a minimal connector plugin in 5 minutes:
# my_connector.py
from typing import ClassVar, Any, AsyncIterator
import hashlib
class MyConnector:
PLUGIN_ID: ClassVar[str] = "my-connector"
PLUGIN_TYPE: ClassVar[str] = "connector"
PLUGIN_VERSION: ClassVar[str] = "1.0.0"
def __init__(self, config: dict[str, Any]) -> None:
self._config = config
async def authenticate(self) -> bool:
return True
async def load_documents(self, source_id: int | None = None) -> AsyncIterator[dict[str, Any]]:
content = "Document content..."
yield {
"content": content,
"unique_id": "doc-1",
"source_type": self.PLUGIN_ID,
"metadata": {},
"content_hash": hashlib.sha256(content.encode()).hexdigest(),
}
@classmethod
def get_config_fields(cls) -> list[dict[str, Any]]:
return []
@classmethod
def get_secret_fields(cls) -> list[dict[str, Any]]:
return []
@classmethod
def get_manifest(cls) -> dict[str, Any]:
return {"id": cls.PLUGIN_ID, "type": cls.PLUGIN_TYPE, "version": cls.PLUGIN_VERSION,
"display_name": "My Connector", "description": "Custom connector"}
Plugin Types
| Type | Purpose | Key Method | Template |
|---|---|---|---|
connector |
Ingest documents from sources | load_documents() |
connector.py |
embedding |
Convert text to vectors | embed_texts() |
embedding.py |
chunking |
Split documents into chunks | chunk() |
chunking.py |
reranker |
Reorder search results | rerank() |
reranker.py |
extractor |
Extract entities/metadata | extract() |
extractor.py |
agent |
LLM-powered capabilities | execute() |
agent.py |
Type-specific guides:
- Connector Guide - Document sources, async iterators
- Embedding Guide - Query/document modes, dimensions
- Chunking Guide - Text segmentation strategies
- Reranker Guide - Cross-encoder reranking
- Extractor Guide - Entity and metadata extraction
- Agent Guide - LLM agents, streaming, context
Cross-cutting guides:
- Testing Guide - Contract tests, mocks, fixtures
- Security Guide - Trust model, best practices
- Advanced Guide - Health checks, dependencies, migration
Development Approach
Protocol-Based (Recommended)
Use plain Python classes with no semantik imports. Plugins are validated by structural typing (duck typing):
class MyPlugin:
PLUGIN_ID: ClassVar[str] = "my-plugin"
PLUGIN_TYPE: ClassVar[str] = "connector" # or embedding, chunking, etc.
PLUGIN_VERSION: ClassVar[str] = "1.0.0"
# ... implement required methods
Benefits:
- Zero dependencies on semantik
- Develop in separate repository
- Distribute via PyPI or git
- No version conflicts
ABC-Based (Advanced)
Inherit from semantik base classes when you need access to internal utilities:
from shared.connectors.base import BaseConnector
class MyConnector(BaseConnector):
# ... inherit helper methods
Use when:
- Building embedding plugins with GPU management
- Need access to shared utilities
- Developing internal/builtin plugins
Required Class Variables
Every plugin must define:
from typing import ClassVar, Any
class MyPlugin:
PLUGIN_ID: ClassVar[str] = "my-plugin" # Unique ID (lowercase, hyphens)
PLUGIN_TYPE: ClassVar[str] = "connector" # One of 6 types
PLUGIN_VERSION: ClassVar[str] = "1.0.0" # Semantic version
Some plugin types require additional class variables:
| Type | Additional Variables |
|---|---|
connector |
METADATA (dict with name, description, icon) |
embedding |
INTERNAL_NAME, API_ID, PROVIDER_TYPE, METADATA |
chunking |
(none) |
reranker |
(none) |
extractor |
(none) |
agent |
(none) |
Manifest Method
All plugins must implement get_manifest():
@classmethod
def get_manifest(cls) -> dict[str, Any]:
return {
"id": cls.PLUGIN_ID,
"type": cls.PLUGIN_TYPE,
"version": cls.PLUGIN_VERSION,
"display_name": "My Plugin",
"description": "What the plugin does",
# Optional fields:
"author": "Your Name",
"license": "MIT",
"homepage": "https://github.com/...",
"requires": ["other-plugin"], # Dependencies
"capabilities": {}, # Plugin-specific capabilities
}
Configuration
Config Fields (UI)
Define configuration fields for the Semantik UI:
@classmethod
def get_config_fields(cls) -> list[dict[str, Any]]:
return [
{
"name": "base_url",
"type": "text", # text, password, number, boolean, select
"label": "Base URL",
"description": "API endpoint",
"required": True,
"placeholder": "https://api.example.com",
},
{
"name": "model",
"type": "select",
"label": "Model",
"options": ["model-a", "model-b"],
"default": "model-a",
},
]
Secret Fields
Mark fields that contain secrets (encrypted at rest):
@classmethod
def get_secret_fields(cls) -> list[dict[str, Any]]:
return [
{"name": "api_key", "label": "API Key", "required": True},
]
Environment Variables
Use the _env suffix pattern for secrets:
# In config schema - user enters env var name
"api_key_env": "OPENAI_API_KEY"
# At runtime, semantik resolves it
config = {"api_key": "sk-actual-key-value"} # Resolved
Testing
Manual Verification
pip install -e .
python -c "
from my_plugin import MyConnector
print(f'ID: {MyConnector.PLUGIN_ID}')
print(f'Type: {MyConnector.PLUGIN_TYPE}')
print(f'Manifest: {MyConnector.get_manifest()}')
"
Protocol Validation
import pytest
class TestMyPlugin:
def test_has_required_attributes(self):
assert hasattr(MyPlugin, "PLUGIN_ID")
assert hasattr(MyPlugin, "PLUGIN_TYPE")
assert hasattr(MyPlugin, "PLUGIN_VERSION")
assert MyPlugin.PLUGIN_TYPE == "connector"
def test_manifest_format(self):
manifest = MyPlugin.get_manifest()
assert "id" in manifest
assert "type" in manifest
assert "display_name" in manifest
@pytest.mark.asyncio
async def test_core_functionality(self):
plugin = MyPlugin(config={})
# Test plugin-specific methods
With Semantik Test Mixins
If semantik is installed:
from shared.plugins.testing.contracts import ConnectorProtocolTestMixin
class TestMyConnector(ConnectorProtocolTestMixin):
plugin_class = MyConnector
Packaging
pyproject.toml
[project]
name = "semantik-plugin-myconnector"
version = "1.0.0"
requires-python = ">=3.10"
dependencies = [] # Your dependencies only
[project.entry-points."semantik.plugins"]
my-connector = "my_plugin.connector:MyConnector"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
See templates/pyproject.toml for a complete template.
Entry Point Format
plugin-id = "module.path:ClassName"
plugin-id: Should matchPLUGIN_IDmodule.path: Python import pathClassName: Your plugin class
Installation
# Development
pip install -e .
# From git
pip install git+https://github.com/you/semantik-plugin-myconnector.git
# Via Semantik API
POST /api/v2/plugins/install
{"install_command": "git+https://github.com/..."}
Common Issues
Plugin Not Loading
-
Check entry point is registered:
bashpip show semantik-plugin-myconnector -
Verify PLUGIN_TYPE is valid:
pythonassert PLUGIN_TYPE in ["connector", "embedding", "chunking", "reranker", "extractor", "agent"] -
Check for import errors:
pythontry: from my_plugin import MyConnector except ImportError as e: print(f"Error: {e}")
Validation Errors
| Error | Fix |
|---|---|
missing required keys: {'content'} |
Add all required fields to returned dict |
Invalid role: 'xyz' |
Use valid string from MESSAGE_ROLES |
content_hash must be 64 characters |
Use hashlib.sha256(text.encode()).hexdigest() |
Async Issues
All I/O methods must be async:
# Wrong
def load_documents(self):
yield {"content": "..."}
# Right
async def load_documents(self) -> AsyncIterator[dict]:
yield {"content": "..."}
Templates
Ready-to-use templates in templates/:
| File | Description |
|---|---|
connector.py |
Document source connector |
embedding.py |
Embedding model provider |
chunking.py |
Text chunking strategy |
reranker.py |
Search result reranker |
extractor.py |
Entity/metadata extractor |
agent.py |
LLM-powered agent |
pyproject.toml |
Package configuration |
Copy a template and modify:
cp templates/connector.py my_connector.py
# Edit PLUGIN_ID, PLUGIN_VERSION, and implement methods
Data Format Reference
Connector Documents (IngestedDocumentDict)
{
"content": str, # Full text (required)
"unique_id": str, # Unique identifier (required)
"source_type": str, # Your PLUGIN_ID (required)
"metadata": dict, # Source metadata (required)
"content_hash": str, # SHA-256, 64 hex chars (required)
"file_path": str | None, # Local path (optional)
}
Chunk Format (ChunkDict)
{
"content": str, # Chunk text (required)
"metadata": { # Chunk metadata (required)
"chunk_index": int,
"start_offset": int,
"end_offset": int,
},
"chunk_id": str | None, # Unique ID (optional)
"embedding": list[float] | None, # Pre-computed (optional)
}
Rerank Result (RerankResultDict)
{
"index": int, # Original document index (required)
"score": float, # Relevance score (required)
"text": str | None, # Document text (optional)
"metadata": dict | None, # Metadata (optional)
}
Agent Message (AgentMessageDict)
{
"id": str, # Unique ID (required)
"role": str, # user, assistant, system, tool_call, tool_result, error
"type": str, # text, thinking, tool_use, tool_output, partial, final, error
"content": str, # Message content (required)
"timestamp": str, # ISO 8601 (required)
"is_partial": bool, # Streaming partial (optional)
"sequence_number": int, # Message order (optional)
}
Getting Help
- Semantik docs: See
semantik/docs/external-plugins.mdfor protocol details - Protocol reference: See
semantik/docs/plugin-protocols.mdfor full specifications - Examples: Check
semantik/packages/shared/plugins/builtins/for built-in plugins
Didn't find tool you were looking for?