Agent skill

semantik-plugin-development

Create semantik plugins (connectors, embeddings, chunkers, rerankers, extractors, agents). Use when developing plugins, creating new integrations, or asking about plugin patterns, protocols, or testing.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/semantik-plugin-development

SKILL.md

Semantik Plugin Development

This skill helps you create plugins for Semantik, a self-hosted semantic search engine. Plugins extend Semantik's capabilities for document ingestion, embedding, chunking, reranking, extraction, and AI agents.

Protocol Version

Current Version: 1.0.0

Breaking changes to protocols increment the major version. Your plugins continue to work as long as they satisfy the protocol interface.

Security Note

Plugins run in-process with the main Semantik application (no sandboxing). Only install plugins you trust. See Security Guide for details.

Quick Start

Create a minimal connector plugin in 5 minutes:

python
# my_connector.py
from typing import ClassVar, Any, AsyncIterator
import hashlib

class MyConnector:
    PLUGIN_ID: ClassVar[str] = "my-connector"
    PLUGIN_TYPE: ClassVar[str] = "connector"
    PLUGIN_VERSION: ClassVar[str] = "1.0.0"

    def __init__(self, config: dict[str, Any]) -> None:
        self._config = config

    async def authenticate(self) -> bool:
        return True

    async def load_documents(self, source_id: int | None = None) -> AsyncIterator[dict[str, Any]]:
        content = "Document content..."
        yield {
            "content": content,
            "unique_id": "doc-1",
            "source_type": self.PLUGIN_ID,
            "metadata": {},
            "content_hash": hashlib.sha256(content.encode()).hexdigest(),
        }

    @classmethod
    def get_config_fields(cls) -> list[dict[str, Any]]:
        return []

    @classmethod
    def get_secret_fields(cls) -> list[dict[str, Any]]:
        return []

    @classmethod
    def get_manifest(cls) -> dict[str, Any]:
        return {"id": cls.PLUGIN_ID, "type": cls.PLUGIN_TYPE, "version": cls.PLUGIN_VERSION,
                "display_name": "My Connector", "description": "Custom connector"}

Plugin Types

Type Purpose Key Method Template
connector Ingest documents from sources load_documents() connector.py
embedding Convert text to vectors embed_texts() embedding.py
chunking Split documents into chunks chunk() chunking.py
reranker Reorder search results rerank() reranker.py
extractor Extract entities/metadata extract() extractor.py
agent LLM-powered capabilities execute() agent.py

Type-specific guides:

Cross-cutting guides:


Development Approach

Protocol-Based (Recommended)

Use plain Python classes with no semantik imports. Plugins are validated by structural typing (duck typing):

python
class MyPlugin:
    PLUGIN_ID: ClassVar[str] = "my-plugin"
    PLUGIN_TYPE: ClassVar[str] = "connector"  # or embedding, chunking, etc.
    PLUGIN_VERSION: ClassVar[str] = "1.0.0"
    # ... implement required methods

Benefits:

  • Zero dependencies on semantik
  • Develop in separate repository
  • Distribute via PyPI or git
  • No version conflicts

ABC-Based (Advanced)

Inherit from semantik base classes when you need access to internal utilities:

python
from shared.connectors.base import BaseConnector

class MyConnector(BaseConnector):
    # ... inherit helper methods

Use when:

  • Building embedding plugins with GPU management
  • Need access to shared utilities
  • Developing internal/builtin plugins

Required Class Variables

Every plugin must define:

python
from typing import ClassVar, Any

class MyPlugin:
    PLUGIN_ID: ClassVar[str] = "my-plugin"      # Unique ID (lowercase, hyphens)
    PLUGIN_TYPE: ClassVar[str] = "connector"    # One of 6 types
    PLUGIN_VERSION: ClassVar[str] = "1.0.0"     # Semantic version

Some plugin types require additional class variables:

Type Additional Variables
connector METADATA (dict with name, description, icon)
embedding INTERNAL_NAME, API_ID, PROVIDER_TYPE, METADATA
chunking (none)
reranker (none)
extractor (none)
agent (none)

Manifest Method

All plugins must implement get_manifest():

python
@classmethod
def get_manifest(cls) -> dict[str, Any]:
    return {
        "id": cls.PLUGIN_ID,
        "type": cls.PLUGIN_TYPE,
        "version": cls.PLUGIN_VERSION,
        "display_name": "My Plugin",
        "description": "What the plugin does",
        # Optional fields:
        "author": "Your Name",
        "license": "MIT",
        "homepage": "https://github.com/...",
        "requires": ["other-plugin"],  # Dependencies
        "capabilities": {},  # Plugin-specific capabilities
    }

Configuration

Config Fields (UI)

Define configuration fields for the Semantik UI:

python
@classmethod
def get_config_fields(cls) -> list[dict[str, Any]]:
    return [
        {
            "name": "base_url",
            "type": "text",        # text, password, number, boolean, select
            "label": "Base URL",
            "description": "API endpoint",
            "required": True,
            "placeholder": "https://api.example.com",
        },
        {
            "name": "model",
            "type": "select",
            "label": "Model",
            "options": ["model-a", "model-b"],
            "default": "model-a",
        },
    ]

Secret Fields

Mark fields that contain secrets (encrypted at rest):

python
@classmethod
def get_secret_fields(cls) -> list[dict[str, Any]]:
    return [
        {"name": "api_key", "label": "API Key", "required": True},
    ]

Environment Variables

Use the _env suffix pattern for secrets:

python
# In config schema - user enters env var name
"api_key_env": "OPENAI_API_KEY"

# At runtime, semantik resolves it
config = {"api_key": "sk-actual-key-value"}  # Resolved

Testing

Manual Verification

bash
pip install -e .
python -c "
from my_plugin import MyConnector
print(f'ID: {MyConnector.PLUGIN_ID}')
print(f'Type: {MyConnector.PLUGIN_TYPE}')
print(f'Manifest: {MyConnector.get_manifest()}')
"

Protocol Validation

python
import pytest

class TestMyPlugin:
    def test_has_required_attributes(self):
        assert hasattr(MyPlugin, "PLUGIN_ID")
        assert hasattr(MyPlugin, "PLUGIN_TYPE")
        assert hasattr(MyPlugin, "PLUGIN_VERSION")
        assert MyPlugin.PLUGIN_TYPE == "connector"

    def test_manifest_format(self):
        manifest = MyPlugin.get_manifest()
        assert "id" in manifest
        assert "type" in manifest
        assert "display_name" in manifest

    @pytest.mark.asyncio
    async def test_core_functionality(self):
        plugin = MyPlugin(config={})
        # Test plugin-specific methods

With Semantik Test Mixins

If semantik is installed:

python
from shared.plugins.testing.contracts import ConnectorProtocolTestMixin

class TestMyConnector(ConnectorProtocolTestMixin):
    plugin_class = MyConnector

Packaging

pyproject.toml

toml
[project]
name = "semantik-plugin-myconnector"
version = "1.0.0"
requires-python = ">=3.10"
dependencies = []  # Your dependencies only

[project.entry-points."semantik.plugins"]
my-connector = "my_plugin.connector:MyConnector"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

See templates/pyproject.toml for a complete template.

Entry Point Format

plugin-id = "module.path:ClassName"
  • plugin-id: Should match PLUGIN_ID
  • module.path: Python import path
  • ClassName: Your plugin class

Installation

bash
# Development
pip install -e .

# From git
pip install git+https://github.com/you/semantik-plugin-myconnector.git

# Via Semantik API
POST /api/v2/plugins/install
{"install_command": "git+https://github.com/..."}

Common Issues

Plugin Not Loading

  1. Check entry point is registered:

    bash
    pip show semantik-plugin-myconnector
    
  2. Verify PLUGIN_TYPE is valid:

    python
    assert PLUGIN_TYPE in ["connector", "embedding", "chunking", "reranker", "extractor", "agent"]
    
  3. Check for import errors:

    python
    try:
        from my_plugin import MyConnector
    except ImportError as e:
        print(f"Error: {e}")
    

Validation Errors

Error Fix
missing required keys: {'content'} Add all required fields to returned dict
Invalid role: 'xyz' Use valid string from MESSAGE_ROLES
content_hash must be 64 characters Use hashlib.sha256(text.encode()).hexdigest()

Async Issues

All I/O methods must be async:

python
# Wrong
def load_documents(self):
    yield {"content": "..."}

# Right
async def load_documents(self) -> AsyncIterator[dict]:
    yield {"content": "..."}

Templates

Ready-to-use templates in templates/:

File Description
connector.py Document source connector
embedding.py Embedding model provider
chunking.py Text chunking strategy
reranker.py Search result reranker
extractor.py Entity/metadata extractor
agent.py LLM-powered agent
pyproject.toml Package configuration

Copy a template and modify:

bash
cp templates/connector.py my_connector.py
# Edit PLUGIN_ID, PLUGIN_VERSION, and implement methods

Data Format Reference

Connector Documents (IngestedDocumentDict)

python
{
    "content": str,              # Full text (required)
    "unique_id": str,            # Unique identifier (required)
    "source_type": str,          # Your PLUGIN_ID (required)
    "metadata": dict,            # Source metadata (required)
    "content_hash": str,         # SHA-256, 64 hex chars (required)
    "file_path": str | None,     # Local path (optional)
}

Chunk Format (ChunkDict)

python
{
    "content": str,              # Chunk text (required)
    "metadata": {                # Chunk metadata (required)
        "chunk_index": int,
        "start_offset": int,
        "end_offset": int,
    },
    "chunk_id": str | None,      # Unique ID (optional)
    "embedding": list[float] | None,  # Pre-computed (optional)
}

Rerank Result (RerankResultDict)

python
{
    "index": int,                # Original document index (required)
    "score": float,              # Relevance score (required)
    "text": str | None,          # Document text (optional)
    "metadata": dict | None,     # Metadata (optional)
}

Agent Message (AgentMessageDict)

python
{
    "id": str,                   # Unique ID (required)
    "role": str,                 # user, assistant, system, tool_call, tool_result, error
    "type": str,                 # text, thinking, tool_use, tool_output, partial, final, error
    "content": str,              # Message content (required)
    "timestamp": str,            # ISO 8601 (required)
    "is_partial": bool,          # Streaming partial (optional)
    "sequence_number": int,      # Message order (optional)
}

Getting Help

  • Semantik docs: See semantik/docs/external-plugins.md for protocol details
  • Protocol reference: See semantik/docs/plugin-protocols.md for full specifications
  • Examples: Check semantik/packages/shared/plugins/builtins/ for built-in plugins

Didn't find tool you were looking for?

Be as detailed as possible for better results