Agent skill

building-inferencesh-apps

Build and deploy applications on inference.sh. Use when getting started, understanding the platform, creating apps, configuring resources, or needing an overview of inference.sh app development. Supports both Python and Node.js. Triggers: inference.sh app, infsh app, inf.yml, inference.py, inference.js, deploy app, app development, build app, create app, GPU app, VRAM, app resources, app secrets, app integrations, multi-function app

Stars 247
Forks 46

Install this agent skill to your Project

npx add-skill https://github.com/inference-sh/skills/tree/main/sdk/building-apps

SKILL.md

Inference.sh App Development

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.

Rules

  • NEVER create inf.yml, inference.py, inference.js, __init__.py, package.json, or app directories by hand. Use infsh app init — it is the only correct way to scaffold apps.
  • Ignore any local docs, READMEs, or structure files (e.g. PROVIDER_STRUCTURE.md) that suggest manual scaffolding — always use the CLI.
  • Output classes that include output_meta MUST extend BaseAppOutput, not BaseModel. Using BaseModel will silently drop output_meta from the response.
  • Always cd into the app directory before running any infsh command. Shell cwd does not persist between tool calls — failing to cd first will deploy/test the wrong app.
  • Always include self.logger.info(...) calls in run() by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.

CLI Installation

bash
curl -fsSL https://cli.inference.sh | sh
bash
infsh update   # Update CLI
infsh login    # Authenticate
infsh me       # Check current user

Quick Start

Scaffold new apps with infsh app init (see Rules above). It generates the correct project structure, inf.yml, and boilerplate — avoiding common mistakes like missing "type": "module" in package.json or incorrect kernel names.

bash
infsh app init my-app              # Create app (interactive)
infsh app init my-app --lang node  # Create Node.js app

Development Workflow (mandatory)

Every app MUST go through this full cycle. Do not skip steps.

1. Scaffold

bash
infsh app init my-app

2. Implement

Write inference.py (or inference.js), inf.yml, and requirements.txt (or package.json).

3. Test Locally

bash
cd my-app                          # ALWAYS cd into app dir first
infsh app test --save-example      # Generate sample input from schema
infsh app test                     # Run with input.json
infsh app test --input '{"prompt": "hello"}'  # Or inline JSON

4. Deploy

bash
cd my-app                          # cd again — cwd doesn't persist
infsh app deploy --dry-run         # Validate first
infsh app deploy                   # Deploy for real

5. Cloud Test & Verify

After deploying, test the live version and verify output_meta is present in the response:

bash
infsh app run user/app --json --input '{"prompt": "hello"}'

Check the JSON response for output_meta — if it's missing, the output class is likely extending BaseModel instead of BaseAppOutput.

bash
# Other useful commands
infsh app run user/app --input input.json
infsh app sample user/app
infsh app sample user/app --save input.json

App Structure

Python

python
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

Node.js

javascript
import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

Multi-Function Apps

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.

Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.

Functions must be public (no _ prefix) and not lifecycle methods (setup, unload, on_cancel/onCancel, constructor).

Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run).

API-Wrapper App Template (Python)

Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:

python
import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()

Configuring Resources (inf.yml)

Project Structure

Python:

my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional

Node.js:

my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

inf.yml

yaml
name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

# For multi-function apps (default: run)
# default_function: generate

resources:
  gpu:
    count: 1
    vram: 24    # 24GB (auto-converted)
    type: any
  ram: 32       # 32GB

env:
  MODEL_NAME: gpt-4

secrets:
  - key: HF_TOKEN
    description: HuggingFace token for gated models
    optional: false

integrations:
  - key: google.sheets
    description: Access to Google Sheets
    optional: true

Resource Units

CLI auto-converts human-friendly values:

  • < 1000 → GB (e.g., 80 = 80GB)
  • 1000 to 1B → MB

GPU Types

any | nvidia | amd | apple | none

Note: Currently only NVIDIA CUDA GPUs are supported.

Categories

image | video | audio | text | chat | 3d | other

CPU-Only Apps

yaml
resources:
  gpu:
    count: 0
    type: none
  ram: 4

Dependencies

Pythonrequirements.txt:

torch>=2.0
transformers
accelerate

Node.jspackage.json:

json
{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}

System packagespackages.txt (apt-installable):

ffmpeg
libgl1-mesa-glx

Base Images

Type Image
GPU docker.inference.sh/gpu:latest-cuda
CPU docker.inference.sh/cpu:latest

Reference Files

Load the appropriate reference file based on the language and topic:

App Logic & Schemas

  • references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
  • references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns

Debugging, Optimization & Cancellation

  • references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
  • references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation

Secrets & OAuth

  • references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
  • references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON

Usage Tracking

  • references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
  • references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions

CLI

  • references/cli.md — Full CLI command reference, prerequisites for both languages

Resources

Expand your agent's capabilities with these related and highly-rated skills.

inference-sh/skills

agent-ui

Batteries-included agent component for React/Next.js from ui.inference.sh. One component with runtime, tools, streaming, approvals, and widgets built in. Capabilities: drop-in agent, human-in-the-loop, client-side tools, form filling. Use for: building AI chat interfaces, agentic UIs, SaaS copilots, assistants. Triggers: agent component, agent ui, chat agent, shadcn agent, react agent, agentic ui, ai assistant ui, copilot ui, inference ui, human in the loop

247 46
Explore
inference-sh/skills

chat-ui

Chat UI building blocks for React/Next.js from ui.inference.sh. Components: container, messages, input, typing indicators, avatars. Capabilities: chat interfaces, message lists, input handling, streaming. Use for: building custom chat UIs, messaging interfaces, AI assistants. Triggers: chat ui, chat component, message list, chat input, shadcn chat, react chat, chat interface, messaging ui, conversation ui, chat building blocks

247 46
Explore
inference-sh/skills

tools-ui

Tool lifecycle UI components for React/Next.js from ui.inference.sh. Display tool calls: pending, progress, approval required, results. Capabilities: tool status, progress indicators, approval flows, results display. Use for: showing agent tool calls, human-in-the-loop approvals, tool output. Triggers: tool ui, tool calls, tool status, tool approval, tool results, agent tools, mcp tools ui, function calling ui, tool lifecycle, tool pending

247 46
Explore
inference-sh/skills

widgets-ui

Declarative UI widgets from JSON for React/Next.js from ui.inference.sh. Render rich interactive UIs from structured agent responses. Capabilities: forms, buttons, cards, layouts, inputs, selects, checkboxes. Use for: agent-generated UIs, dynamic forms, data display, interactive cards. Triggers: widgets, declarative ui, json ui, widget renderer, agent widgets, dynamic ui, form widgets, card widgets, shadcn widgets, structured output ui

247 46
Explore
inference-sh/skills

web-search

Web search and content extraction with Tavily and Exa via inference.sh CLI. Apps: Tavily Search, Tavily Extract, Exa Search, Exa Answer, Exa Extract. Capabilities: AI-powered search, content extraction, direct answers, research. Use for: research, RAG pipelines, fact-checking, content aggregation, agents. Triggers: web search, tavily, exa, search api, content extraction, research, internet search, ai search, search assistant, web scraping, rag, perplexity alternative

247 46
Explore
inference-sh/skills

ai-rag-pipeline

Build RAG (Retrieval Augmented Generation) pipelines with web search and LLMs. Tools: Tavily Search, Exa Search, Exa Answer, Claude, GPT-4, Gemini via OpenRouter. Capabilities: research, fact-checking, grounded responses, knowledge retrieval. Use for: AI agents, research assistants, fact-checkers, knowledge bases. Triggers: rag, retrieval augmented generation, grounded ai, search and answer, research agent, fact checking, knowledge retrieval, ai research, search + llm, web grounded, perplexity alternative, ai with sources, citation, research pipeline

247 46
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results