Agent skill

tokf-filter

This skill should be used when the user asks to "create a filter", "write a tokf filter", "add a filter for <tool>", "how do I filter output", or needs guidance on tokf filter step types, templates, pipes, or placement conventions.

View SKILL.md on GitHub Repository

Stars 147

Forks 14

Install this agent skill to your Project

npx add-skill https://github.com/mpecan/tokf/tree/main/crates/tokf-cli/skills/tokf-filter

SKILL.md

tokf Filter Authoring

You are an expert at writing tokf filter files. tokf is a config-driven CLI that compresses command output before it reaches an LLM context. Filters are TOML files that define how to process a command's output.

When the user asks you to create or modify a filter, follow this guide exactly. Produce valid, idiomatic TOML that matches the schema described below.

Section 1 — What a Filter File Is

A filter file is a TOML file that describes:

Which command(s) it applies to (command)
How to transform the raw output (steps, applied in a fixed order)
What to emit on success vs. failure

Filters live in three places, searched in priority order:

.tokf/filters/ — project-local (repo-level overrides)
~/.config/tokf/filters/ — user-level overrides
Built-in library (embedded in the tokf binary)

First match wins. Use tokf which "cargo test" to see which filter would activate for a given command.

Section 2 — Processing Order

Steps execute in this fixed order — do not rearrange them:

match_output — whole-output substring checks; if matched, short-circuits the entire pipeline and emits immediately
[[replace]] — per-line regex transforms applied to every line, in array order
strip_ansi / trim_lines — per-line cleanup (ANSI stripping, whitespace trimming)
skip / keep — line-level filtering (drop or retain lines by regex)
dedup / dedup_window — collapse duplicate consecutive lines
lua_script — Luau escape hatch; runs after dedup, before JSON/section/parse
[json] — JSON extraction via JSONPath; when configured, replaces section/parse/chunk
[[section]] OR [parse] — structured extraction (these are mutually exclusive; section is a state machine, parse is a declarative grouper). Skipped when [json] is configured.
[[chunk]] — block-based structured extraction with per-block aggregation, grouping, and tree output (runs on raw output, alongside sections). Skipped when [json] is configured.
Exit-code branch — [on_success] or [on_failure] depending on exit code
[fallback] — if neither on_success nor on_failure produced output
strip_empty_lines / collapse_empty_lines — post-processing cleanup on the final output

Within [on_success] and [on_failure], fields are processed as:

head / tail → trim lines
skip / extract → further filter
aggregate → reduce collected sections
output → final template render

Section 3 — Top-Level Fields Reference

Field	Type	Default	Description
`command`	string or array of strings	required	Command pattern(s) to match. Supports `*` wildcard.
`run`	string	(same as command)	Override the actual command executed. Use `{args}` to forward arguments.
`match_output`	array of tables	`[]`	Whole-output checks. Short-circuit on first match.
`[[replace]]`	array of tables	`[]`	Per-line regex replacements, in order.
`skip`	array of strings (regex)	`[]`	Drop lines matching any regex.
`keep`	array of strings (regex)	`[]`	Retain only lines matching any regex. (Inverse of skip.)
`dedup`	bool	`false`	Collapse consecutive identical lines.
`dedup_window`	integer	`0` (off)	Dedup within a sliding window of N lines.
`strip_ansi`	bool	`false`	Strip ANSI escape sequences before skip/keep.
`trim_lines`	bool	`false`	Trim leading/trailing whitespace from each line.
`lua_script`	table	(absent)	Luau escape hatch.
`[json]`	table	(absent)	JSON extraction via `JSONPath`. When configured, replaces `[[section]]`/`[parse]`/`[[chunk]]`.
`[[section]]`	array of tables	`[]`	State-machine section collectors.
`[[chunk]]`	array of tables	`[]`	Block-based structured extraction with per-block aggregation and grouping.
`[parse]`	table	(absent)	Declarative structured parser (branch + group).
`[on_success]`	table	(absent)	Output branch for exit code 0.
`[on_failure]`	table	(absent)	Output branch for non-zero exit.
`[output]`	table	(absent)	Top-level output template (used by `[parse]`).
`[fallback]`	table	(absent)	Fallback when no branch matched.
`strip_empty_lines`	bool	`false`	Remove all blank lines from the final output.
`collapse_empty_lines`	bool	`false`	Collapse consecutive blank lines into one.
`show_history_hint`	bool	`false`	Append a hint line after filtered output pointing to the full output in history.
`[[variant]]`	array of tables	`[]`	Context-aware delegation to specialized child filters.

Section 4 — Step Types

4.1 `match_output` — Whole-Output Short-Circuit

Check the entire raw output for a substring. If matched, emit a fixed string and stop — no further processing.

toml

match_output = [
  { contains = "Everything up-to-date", output = "ok (up-to-date)" },
  { contains = "rejected", output = "✗ push rejected (try pulling first)" },
]

contains: literal substring to search for (case-sensitive)
output: string to emit if matched
{line_containing} template variable: the first line that contains the substring

toml

match_output = [
  { contains = "error", output = "Error on: {line_containing}" },
]

When to use: for well-known one-liner outcomes that make the rest of filtering irrelevant (e.g., "already up to date", "nothing to push", "authentication failed").

4.2 `[[replace]]` — Per-Line Regex Transforms

Applied to every line, in array order, before skip/keep. Use to reformat noisy lines.

toml

[[replace]]
pattern = '^(\S+)\s+\S+\s+(\S+)\s+(\S+)'
output = "{1}: {2} → {3}"

[[replace]]
pattern = '^\s+Compiling (\S+) v(\S+)'
output = "compiling {1}@{2}"

pattern: Rust regex (RE2 syntax, no lookaheads)
output: template with {1}, {2}, … for capture groups; {0} is the full match
If the pattern doesn't match a line, that line passes through unchanged
Invalid patterns are silently skipped at runtime

When to use: when a line contains useful information but in a verbose format — reformat it rather than dropping it.

4.3 `skip` / `keep` — Line Filtering

skip drops lines matching any regex. keep retains only lines matching any regex. They compose:

toml

skip = [
  "^\\s*Compiling ",
  "^\\s*Downloading ",
  "^\\s*$",
]

keep = ["^error", "^warning"]

Both are arrays of regex strings
Applied after [[replace]]
skip is checked first, then keep
A line must pass both: not skipped, and (if keep is non-empty) matching keep

When to use: skip for removing known noise patterns; keep for allow-listing (e.g., keep only lines that start with error or warning).

Also available inside [on_success] and [on_failure] for branch-level filtering.

4.4 `dedup` / `dedup_window` — Deduplication

toml

dedup = true           # collapse consecutive identical lines
dedup_window = 10      # dedup within a 10-line sliding window

dedup = true: removes consecutive duplicate lines (like uniq)
dedup_window = N: deduplicates within a sliding window of N lines (catches near-consecutive repeats)
They are independent; you can use both

When to use: for commands that emit repetitive progress lines (e.g., npm install printing the same package multiple times, spinner frames, repeated warnings).

4.5 `lua_script` — Luau Escape Hatch

For logic that pure TOML cannot express: numeric math, multi-line lookahead, conditional branching.

toml

[lua_script]
lang = "luau"
source = '''
if exit_code == 0 then
    return "passed"
else
    local msg = output:match("Error: (.+)") or "unknown error"
    return "FAILED: " .. msg
end
'''

Or load the script from an external file:

toml

[lua_script]
lang = "luau"
file = "scripts/my-filter.luau"

The file path resolves relative to the current working directory. Exactly one of source or file must be set.

Globals available:

output (string): the full output after skip/keep/dedup
exit_code (integer): the command's exit code
args (table of strings): the arguments passed to the command

Return semantics:

Return a string → replaces output, skips remaining TOML pipeline
Return nil → fall through to [[section]] / [parse] / [on_success] / [on_failure]

Sandbox: io, os, and package are blocked. No filesystem or network access. Standard math/string/table libraries are available.

When to use: only when no TOML step can express the logic. Most filters do not need this. Consider it after exhausting match_output, skip/keep, [[replace]], [[section]], and [parse].

4.6 `[json]` — JSON Extraction via `JSONPath`

For commands that produce JSON output (e.g. kubectl get pods -o json, gh api, docker inspect). Extracts values using JSONPath (RFC 9535) queries and produces template variables and structured collections.

toml

[json]

[[json.extract]]
path = "$.items[*]"
as = "pods"

[[json.extract.fields]]
field = "metadata.name"
as = "name"

[[json.extract.fields]]
field = "status.phase"
as = "phase"

[on_success]
output = "Pods ({pods_count}):\n{pods | each: \"  {name}: {phase}\" | join: \"\\n\"}"

[[json.extract]] fields:

Field	Type	Required	Description
`path`	string	yes	`JSONPath` expression (RFC 9535), e.g. `"$.items[*]"`, `"$.version"`
`as`	string	yes	Variable name to bind the result to
`fields`	array of tables	no	Sub-field extraction for each matched object

[[json.extract.fields]] fields:

Field	Type	Required	Description
`field`	string	yes	Dot-separated path within each object (e.g. `"metadata.name"`, `"containers.0.name"`). Not JSONPath — uses simple dot-notation. Supports numeric array indices.
`as`	string	yes	Variable name for the extracted value

Result mapping:

Single scalar → vars["as_name"] = string_value (no count, no chunk)
Array → ChunkData::Flat collection + {as_name_count} variable
Objects without fields → top-level scalars auto-flattened into chunk items
Objects with fields → named fields extracted per item

Pipeline behavior: when [json] is configured, [[section]], [parse], and [[chunk]] are skipped. JSON replaces line-based structural processing. Extracted vars and chunks flow into [on_success]/[on_failure] template rendering.

Error handling: invalid JSON input → extraction skipped, pipeline falls back to raw output (templates are not rendered). Invalid JSONPath → rule silently skipped, other rules still run. Empty array with fields → emits {as_name_count} = "0".

When to use: when the command produces structured JSON output and you need to extract specific fields. Prefer this over [parse] + skip/keep for JSON-native commands.

4.7 `[[section]]` — State-Machine Section Collector

The most powerful step. Defines a state machine that collects lines into named variables as it scans top-to-bottom.

toml

[[section]]
name = "failures"
enter = "^failures:$"      # regex: start collecting when this matches
exit = "^failures:$"       # regex: stop collecting when this matches (after start)
split_on = "^\\s*$"        # regex: split collected lines into blocks on blank lines
collect_as = "failure_blocks"

[[section]]
name = "summary"
match = "^test result:"    # regex: collect only lines matching this (no enter/exit)
collect_as = "summary_lines"

Fields:

Field	Required	Description
`name`	yes	Identifier for this section (used in error messages)
`enter`	no	Regex to start collecting (state transitions to "inside")
`exit`	no	Regex to stop collecting (state transitions to "outside")
`match`	no	Collect any line matching this regex, without enter/exit state
`split_on`	no	Split collected lines into blocks when this regex matches
`collect_as`	yes	Variable name to bind the result to

Accessing collected variables in templates:

Expression	Type	Description
`{name}`	string	Full collected text joined with newlines
`{name.lines}`	collection	Individual lines as a list
`{name.blocks}`	collection	Blocks split by `split_on`
`{name.count}`	integer	Number of blocks (or lines if no split_on)

When to use: when the output has distinct sections with clear start/end markers — test failure blocks, error sections, file change groups.

4.8 `[[chunk]]` — Block-Based Structured Extraction

Chunks split raw output into repeating structural blocks (e.g., per-crate test suites in a Cargo workspace), extract structured data per-block, and produce named collections for template rendering. Like sections, chunks operate on the raw (unfiltered) command output — skip/keep patterns do not affect chunk processing.

toml

[[chunk]]
split_on = "^\\s*Running "       # regex marking the start of each chunk
include_split_line = true         # include the splitting line in the chunk (default: true)
collect_as = "suites_detail"      # name for the structured collection
group_by = "crate_name"           # merge chunks sharing this field value
children_as = "children"          # preserve original items as nested collection

[chunk.extract]
pattern = 'unittests.+deps/([\w_-]+)-'  # extract a field from the header line
as = "crate_name"
carry_forward = true              # inherit value from previous chunk when pattern doesn't match

[[chunk.body_extract]]
pattern = 'Running\s+(.+?)\s+\('
as = "suite_name"

[[chunk.aggregate]]
pattern = '(\d+) passed'          # aggregates run within each chunk's lines
sum = "passed"

[[chunk.aggregate]]
pattern = '^test result:'
count_as = "suite_count"

Fields:

Field	Type	Required	Description
`split_on`	string (regex)	yes	Regex marking the start of each chunk
`include_split_line`	bool	no	Whether the splitting line is part of the chunk (default: `true`)
`collect_as`	string	yes	Name for the resulting structured collection
`extract`	table	no	Extract a named field from the header line (`pattern` + `as`)
`body_extract`	array of tables	no	Extract fields from body lines (`pattern` + `as`, first match wins)
`aggregate`	array of tables	no	Per-chunk aggregation rules (`pattern` + `sum`/`count_as`)
`group_by`	string	no	Merge chunks sharing the same field value, summing numeric fields
`children_as`	string	no	When set with `group_by`, preserve original items as a nested collection

carry_forward (on extract or body_extract): when a chunk's pattern doesn't match, inherit the value from the most recent chunk that did. Useful when boundary markers (like Running unittests) identify a group, and subsequent chunks should inherit that identity.

Structured collections in templates: each item has named fields accessible in each pipes:

toml

[on_success]
output = """\
{suites_detail | each: "  {crate_name}: {passed} passed ({suite_count} suites)" | join: "\\n"}"""

Tree output with children_as: groups preserve their child items for nested template rendering:

toml

[on_success]
output = """\
{suites_detail | each: "  {crate_name}: {passed} passed\\n{children | each: \"    {suite_name}: {passed} passed\" | join: \"\\n\"}" | join: "\\n"}"""

When to use: when output contains repeating structural blocks with per-block data you want to aggregate and display. Common for workspace build tools (Cargo, Gradle, Nx) where output is organized by sub-project.

4.9 `[parse]` — Declarative Structured Parser

Alternative to [[section]] for commands with table-like output. Declaratively extracts a header field and groups remaining lines.

toml

[parse]
branch = { line = 1, pattern = '## (\S+?)(?:\.\.\.(\S+))?(?:\s+\[(.+)\])?$', output = "{1}" }

[parse.group]
key = { pattern = '^(.{2}) ', output = "{1}" }
labels = { "M " = "modified", "??" = "untracked", "D " = "deleted" }

[output]
format = """
{branch}{tracking_info}
{group_counts}"""
group_counts_format = "  {label}: {count}"
empty = "clean — nothing to commit"

[parse] fields:

Field	Description
`branch`	Extract a single value from a specific line (`line`, `pattern`, `output`)
`[parse.group]`	Group remaining lines by a key pattern

[parse.group] fields:

Field	Description
`key`	`{ pattern, output }` — extract the grouping key from each line
`labels`	Map from raw key string to human-readable label

[output] fields (used with [parse]):

Field	Description
`format`	Template string for the overall output
`group_counts_format`	Template for each group entry: `{label}`, `{count}`
`empty`	String to emit when no lines were grouped

When to use: for commands like git status, docker ps, kubectl get — table-formatted output where you want to extract a header and count/group rows.

4.10 `[on_success]` / `[on_failure]` — Exit Code Branches

These branches run after all top-level steps. They have their own sub-fields:

toml

[on_success]
output = "ok ✓ {2}"          # template; collected variables are available
head = 20                     # keep first N lines
tail = 10                     # keep last N lines
skip = ["^\\s*$"]            # additional line filtering
extract = { pattern = '(\S+)\s*->\s*(\S+)', output = "ok ✓ {2}" }

# Singular form (one rule):
aggregate = { from = "summary_lines", pattern = 'ok\. (\d+) passed', sum = "passed", count_as = "suites" }

# Plural form (multiple rules):
# [[on_success.aggregates]]
# from = "summary_lines"
# pattern = 'ok\. (\d+) passed'
# sum = "passed"
# count_as = "suites"
#
# [[on_success.aggregates]]
# from = "summary_lines"
# pattern = '(\d+) failed'
# sum = "failed"

[on_failure]
tail = 10
output = "FAILED: {summary_lines | join: \"\\n\"}"

Branch sub-fields:

Field	Description
`output`	Template string for the output. Has access to all `[[section]]` and `[[chunk]]` variables. `{output}` = the filtered output text.
`head`	Keep first N lines of filtered output
`tail`	Keep last N lines of filtered output
`skip`	Array of regexes to filter output lines within this branch
`extract`	`{ pattern, output }` — find first match, render template with capture groups
`aggregate`	Reduce collected section lines into numeric summaries (singular form)
`aggregates`	Array of aggregate rules (plural form — use `[[on_success.aggregates]]`)

aggregate / aggregates fields:

Field	Description
`from`	Variable name (a `collect_as` result from `[[section]]`)
`pattern`	Regex with one capture group to extract a number
`sum`	Variable name to bind the sum to
`count_as`	Variable name to bind the count (number of lines matched) to

Both singular aggregate and plural aggregates can be used together — they are merged at runtime.

When to use: Always. Every filter should have at least one of [on_success] or [on_failure]. Use [on_success] to produce a clean summary. Use [on_failure] to show enough context to diagnose the issue.

4.11 `[fallback]` — Last Resort

Emits output when neither [on_success] nor [on_failure] produced anything.

toml

[fallback]
tail = 5

When to use: as a safety net when you have complex branching logic. Ensures tokf never silently swallows output.

4.12 `[[variant]]` — Context-Aware Filter Delegation

Some commands are wrappers around different underlying tools (e.g. npm test may run Jest, Vitest, or Mocha). A parent filter can declare [[variant]] entries that delegate to specialized child filters based on project context.

toml

command = ["npm test", "pnpm test", "yarn test"]

strip_ansi = true
skip = ["^> ", "^\\s*npm (warn|notice|WARN|verbose|info|timing|error|ERR)"]

[on_success]
output = "{output}"

[on_failure]
tail = 20

[[variant]]
name = "vitest"
detect.files = ["vitest.config.ts", "vitest.config.js", "vitest.config.mts"]
filter = "npm/test-vitest"

[[variant]]
name = "jest"
detect.files = ["jest.config.js", "jest.config.ts", "jest.config.json"]
filter = "npm/test-jest"

Fields:

Field	Type	Required	Description
`name`	string	yes	Human-readable identifier for this variant
`detect.files`	array of strings	no	File paths to check in CWD (pre-execution detection)
`detect.output_pattern`	string (regex)	no	Regex to match against command output (post-execution fallback)
`filter`	string	yes	Filter to delegate to (relative path without `.toml`, e.g. `"npm/test-vitest"`)

Two-phase detection:

File detection (before execution) — checks if any listed config files exist in the current directory. First match wins.
Output pattern (after execution) — regex-matches the command output. Used as a fallback when no file was detected.

At least one of detect.files or detect.output_pattern must be set.

Behavior:

When a variant matches, the child filter replaces the parent entirely — no field inheritance or merging
When no variant matches, the parent filter's own fields (skip, on_success, etc.) apply as the fallback
The filter field references another filter by its discovery name (e.g. "npm/test-vitest" maps to filters/npm/test-vitest.toml)

TOML ordering: [[variant]] entries must appear after all top-level fields (skip, [on_success], etc.) because TOML array-of-tables sections capture subsequent keys.

When to use: when a single command pattern maps to different underlying tools that produce fundamentally different output formats. Create a parent filter with a generic fallback, then create specialized child filters for each tool.

Section 5 — Template Pipes

Output templates support pipe chains: {var | pipe | pipe: "arg"}.

Pipe	Input → Output	Description
`lines`	Str → Collection	Split string on newlines into a list
`join: "sep"`	Collection → Str	Join list items with separator string
`each: "tmpl"`	Collection → Collection	Map each item through a sub-template; `{value}` = item, `{index}` = 1-based index. For structured collections (from chunks), all named fields are also available (e.g. `{crate_name}`, `{passed}`).
`keep: "re"`	Collection → Collection	Retain items matching the regex
`where: "re"`	Collection → Collection	Alias for `keep:`
`truncate: N`	Str → Str	Truncate to N characters, appending `…`

Examples:

Filter a multi-line output variable to only error lines:

toml

[on_failure]
output = "{output | lines | keep: \"^error\" | join: \"\\n\"}"

For each collected block, show only > (pointer) and E (assertion) lines:

toml

[on_failure]
output = "{failure_blocks | each: \"{value | lines | keep: \\\"^[>E] \\\"}\" | join: \"\\n\"}"

Truncate long lines and number them:

toml

[on_failure]
output = "{summary_lines | each: \"{index}. {value | truncate: 120}\" | join: \"\\n\"}"

Section 6 — Naming & Placement Conventions

File naming:

filters/<tool>/<subcommand>.toml for two-word commands: filters/git/push.toml for git push
filters/<tool>.toml for single-word commands: filters/pytest.toml for pytest
For wildcards: filters/npm/run.toml with command = "npm run *" in the TOML
Lowercase filenames only, no spaces

Placement:

Location	Purpose
`.tokf/filters/`	Project-local override (committed to the repo)
`~/.config/tokf/filters/`	User-level override (your personal filters)
`filters/` in the tokf source repo	Built-in library (requires a tokf release)

When creating a filter for a user's project, default to .tokf/filters/ unless they specify otherwise.

Command field:

Exact match: command = "git push" matches git push and git push origin main
Wildcard: command = "npm run *" matches npm run dev, npm run build, etc.
Array: command = ["cargo test", "cargo t"] matches either form

Section 7 — Workflow for Creating a New Filter

Follow these steps when asked to create a filter:

Step 1: Understand the command's output

Ask the user to provide (or capture) example output from the command. If they don't have it, generate a plausible example based on the tool's known output format. Look for:

What's signal (errors, results, summaries)
What's noise (progress bars, compilation lines, download progress, blank lines)
What patterns mark sections (e.g., "failures:", "test result:")

Step 2: Choose the right complexity level

Level	When to use	Steps to use
Level 1 (simple)	Command produces one-liner outcomes	`match_output`, `skip`, `extract`
Level 2 (structured)	Table-like output needing grouping	`[parse]` + `[output]`
Level 2J (JSON)	Command produces JSON output	`[json]` + `on_success`/`on_failure` templates
Level 3 (stateful)	Multi-section output with nested structure	`[[section]]` + `aggregate` + pipes
Level 4 (chunked)	Repeating blocks with per-block aggregation (workspaces)	`[[chunk]]` + `[[section]]` + `aggregates` + tree templates

Start at the lowest level that handles the use case. Don't reach for [[section]] when skip + extract suffices.

Step 3: Draft the filter

Set command to match the command pattern
Add match_output for well-known short-circuit cases (empty output, auth failure, "already done")
Add skip to drop noise lines (progress, compile output, blank lines)
Add [[replace]] to reformat noisy-but-useful lines
Add [[section]] or [parse] if you need structured extraction
Write [on_success] with the desired output format
Write [on_failure] with enough context to diagnose (tail = 20 is a safe default)
Add [fallback] with tail = 5 as a safety net for complex filters

Step 4: Write test cases and verify

Create a <stem>_test/ directory adjacent to the filter TOML and add at least one test case per meaningful outcome (success, failure, edge cases):

filters/mytool/
  mysubcmd.toml       ← filter config
  mysubcmd_test/      ← test suite
    success.toml
    failure.toml

Each test case is a TOML file:

toml

name = "success shows one clean line"
fixture = "tests/fixtures/mytool_success.txt"  # path relative to this file, then CWD
exit_code = 0

[[expect]]
equals = "ok ✓"

[[expect]]
not_contains = "noise"

Or with an inline fixture (no file needed):

toml

name = "known error message"
inline = "Error: connection refused\n"
exit_code = 1

[[expect]]
contains = "connection refused"

Run the suite:

tokf verify mytool/mysubcmd   # run one suite
tokf verify                   # run all suites

For quick one-off testing without creating test files:

tokf apply filters/mytool/mysubcmd.toml tests/fixtures/mytool_output.txt --exit-code 0

Step 5: Place and name the file correctly

Two-word command: .tokf/filters/mytool/mysubcmd.toml
Single-word command: .tokf/filters/mytool.toml
Wildcard command: .tokf/filters/mytool/run.toml with command = "mytool run *"

Section 8 — Three Annotated Examples

Example 1: `git push` (Level 1 — match_output + extract)

Goal: 15 lines of push noise → "ok ✓ main" (or failure message).

toml

# filters/git/push.toml — Level 1
# Raw output: 15 lines of object counting, compression, "remote:" lines
# Filtered (success): "ok ✓ main"
# Filtered (up-to-date): "ok (up-to-date)"
# Filtered (rejected): "✗ push rejected (try pulling first)"

command = "git push"

# Check full output for well-known outcomes before any processing
match_output = [
  { contains = "Everything up-to-date", output = "ok (up-to-date)" },
  { contains = "rejected", output = "✗ push rejected (try pulling first)" },
]

[on_success]
# Drop all the noise lines
skip = [
  "^Enumerating objects:",
  "^Counting objects:",
  "^Delta compression",
  "^Compressing objects:",
  "^Writing objects:",
  "^Total \\d+",
  "^remote:",
  "^To ",
]
# Extract the branch name from the ref update line: "abc1234..def5678  main -> main"
extract = { pattern = '(\S+)\s*->\s*(\S+)', output = "ok ✓ {2}" }

[on_failure]
tail = 10

Key decisions:

match_output handles the two most common "instant" outcomes
extract captures the branch name from the ref update line
tail = 10 on failure gives enough context without overwhelming

Example 2: `git status` (Level 2 — parse + group)

Goal: 30+ lines of verbose status → branch name + grouped file counts.

toml

# filters/git/status.toml — Level 2
# Raw output: 30+ lines with hints, file paths, status codes
# Filtered: "main [ahead 2]\n  modified: 3\n  untracked: 2"

command = "git status"

# Override: use porcelain format for reliable machine parsing
run = "git status --porcelain -b"

match_output = [
  { contains = "not a git repository", output = "Not a git repository" },
]

[parse]
# First line: "## main...origin/main [ahead 2]"
# Extract: branch name, upstream, ahead/behind info
branch = { line = 1, pattern = '## (\S+?)(?:\.\.\.(\S+))?(?:\s+\[(.+)\])?$', output = "{1}" }

[parse.group]
# Group remaining lines by their two-character status code
key = { pattern = '^(.{2}) ', output = "{1}" }
labels = {
  "M " = "modified",
  " M" = "modified (unstaged)",
  "MM" = "modified (staged+unstaged)",
  "A " = "added",
  "??" = "untracked",
  "D " = "deleted",
  " D" = "deleted (unstaged)",
  "R " = "renamed",
  "UU" = "conflict",
  "AM" = "added+modified"
}

[output]
format = """
{branch}{tracking_info}
{group_counts}"""
group_counts_format = "  {label}: {count}"
empty = "clean — nothing to commit"

Key decisions:

run overrides to porcelain format — machine-readable is easier to parse
[parse] extracts the branch header line declaratively
[parse.group] groups by status code without needing [[section]]
[output] uses built-in {group_counts} variable populated by the parser

Example 3: `cargo test` (Level 4 — section + chunk + aggregates + tree)

Goal: 200+ lines with compile noise, per-test "ok" lines, failure blocks → per-crate tree summary on pass, structured failure report on fail.

toml

# filters/cargo/test.toml — Level 4
# Raw output: 200+ lines
# Filtered (pass): "✓ cargo test: 1279 passed, 0 failed, 119 ignored (42 suites)"
#                   with per-crate tree breakdown showing individual test suites
# Filtered (fail): failure details + summary

command = "cargo test"
strip_ansi = true

# Drop all the noise
skip = [
  "^\\s*Compiling ",
  "^\\s*Downloading ",
  "^\\s*Downloaded ",
  "^\\s*Finished ",
  "^\\s*Locking ",
  "^running \\d+ tests?$",
  "^test .+ \\.\\.\\. ok$",   # individual passing tests
  "^\\s*$",
  "^\\s*Doc-tests ",
]

# State machine: collect the "failures:" section into blocks split by blank lines
[[section]]
name = "failures"
enter = "^failures:$"
exit = "^failures:$"
split_on = "^\\s*$"
collect_as = "failure_blocks"

# Collect "test result: ok/FAILED" summary lines (one per test suite)
[[section]]
name = "summary"
match = "^test result:"
collect_as = "summary_lines"

# Chunk processing: per-crate breakdown from "Running" headers.
# "unittests" lines define crate boundaries; integration test suites
# inherit the crate name via carry_forward.
[[chunk]]
split_on = "^\\s*Running "
include_split_line = true
collect_as = "suites_detail"
group_by = "crate_name"
children_as = "children"

[chunk.extract]
pattern = 'unittests.+deps/([\w_-]+)-'
as = "crate_name"
carry_forward = true

[[chunk.body_extract]]
pattern = 'Running\s+(.+?)\s+\('
as = "suite_name"

[[chunk.aggregate]]
pattern = '(\d+) passed'
sum = "passed"

[[chunk.aggregate]]
pattern = '(\d+) failed'
sum = "failed"

[[chunk.aggregate]]
pattern = '(\d+) ignored'
sum = "ignored"

[[chunk.aggregate]]
pattern = '^test result:'
count_as = "suite_count"

# Success: aggregate summaries + per-crate tree breakdown
[on_success]
output = "✓ cargo test: {passed} passed, {failed} failed, {ignored} ignored ({suites} suites)\n{suites_detail | each: \"  {crate_name}: {passed} passed ({suite_count} suites)\\n{children | each: \\\"    {suite_name}: {passed} passed\\\" | join: \\\"\\\\n\\\"}\" | join: \"\\n\"}"

[[on_success.aggregates]]
from = "summary_lines"
pattern = 'ok\. (\d+) passed'
sum = "passed"
count_as = "suites"

[[on_success.aggregates]]
from = "summary_lines"
pattern = '(\d+) failed'
sum = "failed"

[[on_success.aggregates]]
from = "summary_lines"
pattern = '(\d+) ignored'
sum = "ignored"

# Failure: show failure details + summary
[on_failure]
output = "✗ cargo test: {passed} passed, {failed} failed ({suites} suites)\n\nFAILURES ({failure_blocks.count}):\n{failure_blocks | each: \"\\n── {index}. ──\\n{value}\" | join: \"\\n\"}\n\n{summary_lines | join: \"\\n\"}"

[[on_failure.aggregates]]
from = "summary_lines"
pattern = '(\d+) passed'
sum = "passed"
count_as = "suites"

[[on_failure.aggregates]]
from = "summary_lines"
pattern = '(\d+) failed'
sum = "failed"

[fallback]
tail = 5

Key decisions:

skip removes all per-test "ok" lines — only failures and summaries remain
[[section]] collectors handle failure blocks and summary lines
[[chunk]] splits on Running headers, extracts crate names from unittests lines
carry_forward = true makes integration test suites inherit the crate name from the preceding unit test suite
children_as = "children" preserves per-suite detail within each crate group
[[on_success.aggregates]] (plural) sums passed/failed/ignored across all suite summary lines
Nested each pipes produce tree output: crate → suites
[fallback] catches edge cases (compile errors with no test output)

Section 9 — Writing Test Cases

Every filter in the standard library has a <stem>_test/ directory with declarative test cases. When writing or modifying a filter, write test cases alongside it.

Test case format

toml

name = "success output is a single clean line"   # required, human-readable
fixture = "tests/fixtures/cargo_build_success.txt"  # path to raw output file
# inline = "some raw output\nline two"            # alternative: inline fixture
exit_code = 0                                    # optional, default 0
args = []                                        # optional, forwarded to filter

[[expect]]
equals = "ok ✓"          # exact match

[[expect]]
not_contains = "Compiling"  # noise must be gone

Assertion types

Field	Description
`equals`	Output exactly equals this string
`contains`	Output contains this substring
`not_contains`	Output does not contain this substring
`starts_with`	Output starts with this string
`ends_with`	Output ends with this string
`line_count`	Output has exactly N non-empty lines
`matches`	Output matches this regex
`not_matches`	Output does not match this regex

Every [[expect]] entry checks one assertion. A test case with multiple [[expect]] entries must pass all of them. A test case with no [[expect]] entries is an error.

What to test

For every filter, write at least:

Success case: the happy path produces the expected one-liner or summary
Failure case: a failing exit code produces enough context to diagnose
Edge cases: cover each match_output branch (e.g., "up-to-date", "rejected")

Directory convention

filters/
  git/
    push.toml          ← filter config
    push_test/         ← test suite (identified by _test suffix)
      success.toml
      up_to_date.toml
      rejected.toml
      failure.toml

The _test suffix makes suite directories immediately identifiable in file listings and distinguishes them from filter category directories.

Section 10 — Common Mistakes to Avoid

Don't use keep when skip is enough. keep is an allow-list — it drops everything that doesn't match. Use it only when you want to radically filter to a specific type of line.
Escape backslashes in TOML strings. In regular strings, \\d means literal \d in the regex. In TOML raw strings ('...'), backslashes are literal. Use raw strings for complex patterns.
match_output is a short-circuit. If it matches, nothing else runs. Don't put it at the end expecting it to be a fallback — it runs first.
[[section]], [parse], and [json] are mutually exclusive in practice. [json] replaces [[section]]/[parse]/[[chunk]] — when [json] is configured, those line-based steps are skipped. Use [json] for JSON output, [[section]]/[parse]/[[chunk]] for line-based output.
{output} in branch templates is the filtered output text (after skip/keep/replace/dedup), not the raw command output.
Pipe chains need careful quoting. When nesting templates inside each:, escape inner quotes: {each: "{value | lines | keep: \\\"^error\\\"}"}.
Don't skip the [fallback]. Complex filters with [[section]] can produce empty output if sections don't match. Always add [fallback] tail = 5 as a safety net.
Test with realistic fixture data. A filter that works on a trimmed example may miss edge cases. Use real command output saved to a .txt fixture file.

Section 11 — Generic Commands (When a Filter Isn't Needed)

Before writing a full filter, consider whether a generic command would be sufficient. tokf provides three built-in subcommands that work on any command without a TOML filter:

Command	Use case	Example
`tokf err <cmd>`	Extract errors/warnings	`tokf err mix compile`
`tokf test <cmd>`	Extract test failures	`tokf test ctest --output-on-failure`
`tokf summary <cmd>`	Heuristic summary	`tokf summary terraform plan`

When to use generic commands vs. writing a filter

Use generic commands when the tool's output follows common patterns (error lines, test results, repetitive logs) and you don't need precise control over the output format.
Write a filter when you need structured extraction, specific sections, custom templates, or the generic output isn't useful enough.

Routing generic commands through rewrites

To make generic commands trigger automatically via the hook, add rewrite rules to .tokf/rewrites.toml:

toml

# Build tools without dedicated filters
[[rewrite]]
match = "^mix compile"
replace = "tokf err {0}"

# Test runners without dedicated filters
[[rewrite]]
match = "^mix test"
replace = "tokf test {0}"

# Long output without dedicated filters
[[rewrite]]
match = "^terraform plan"
replace = "tokf summary {0}"

Important: Only add rewrite rules for commands that don't already have a filter. Check with tokf which "<command>" first. Commands with dedicated filters produce better output through tokf run.

Maintainer

mpecan Core maintainer

Source details

Full Name: mpecan/tokf
Branch: main
Path in repo: crates/tokf-cli/skills/tokf-filter
License: MIT License
Topics: claude-code cli developer-tools llm token-optimization ai-tools rust command-line homebrew context-window toml output-filter

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

mpecan/tokf

tokf-discover

Find missed token savings in Claude Code sessions and create filters for unfiltered commands

147 14

Explore

mpecan/tokf

tokf-discover

Find missed token savings by scanning AI coding session files for commands that ran without tokf filtering.

147 14

Explore

mpecan/tokf

tokf-run

Compress verbose CLI output with tokf before returning results. Activates for git, cargo, npm, docker, go, gradle, kubectl, and other supported commands.

147 14

Explore

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298

Explore

davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298

Explore

davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

23,776 2,298

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

tokf Filter Authoring

Section 1 — What a Filter File Is

Section 2 — Processing Order

Section 3 — Top-Level Fields Reference

Section 4 — Step Types

4.1 match_output — Whole-Output Short-Circuit

4.2 [[replace]] — Per-Line Regex Transforms

4.3 skip / keep — Line Filtering

4.4 dedup / dedup_window — Deduplication

4.5 lua_script — Luau Escape Hatch

4.6 [json] — JSON Extraction via JSONPath

4.7 [[section]] — State-Machine Section Collector

4.8 [[chunk]] — Block-Based Structured Extraction

4.9 [parse] — Declarative Structured Parser

4.10 [on_success] / [on_failure] — Exit Code Branches

4.11 [fallback] — Last Resort

4.12 [[variant]] — Context-Aware Filter Delegation

Section 5 — Template Pipes

Section 6 — Naming & Placement Conventions

Section 7 — Workflow for Creating a New Filter

Step 1: Understand the command's output

Step 2: Choose the right complexity level

Step 3: Draft the filter

Step 4: Write test cases and verify

Step 5: Place and name the file correctly

Section 8 — Three Annotated Examples

Example 1: git push (Level 1 — match_output + extract)

Example 2: git status (Level 2 — parse + group)

Example 3: cargo test (Level 4 — section + chunk + aggregates + tree)

Section 9 — Writing Test Cases

Test case format

Assertion types

What to test

Directory convention

Section 10 — Common Mistakes to Avoid

Section 11 — Generic Commands (When a Filter Isn't Needed)

When to use generic commands vs. writing a filter

Routing generic commands through rewrites

Recommended Agent Skills

tokf-discover

tokf-discover

tokf-run

verl-rl-training

openrlhf-training

gguf-quantization

4.1 `match_output` — Whole-Output Short-Circuit

4.2 `[[replace]]` — Per-Line Regex Transforms

4.3 `skip` / `keep` — Line Filtering

4.4 `dedup` / `dedup_window` — Deduplication

4.5 `lua_script` — Luau Escape Hatch

4.6 `[json]` — JSON Extraction via `JSONPath`

4.7 `[[section]]` — State-Machine Section Collector

4.8 `[[chunk]]` — Block-Based Structured Extraction

4.9 `[parse]` — Declarative Structured Parser

4.10 `[on_success]` / `[on_failure]` — Exit Code Branches

4.11 `[fallback]` — Last Resort

4.12 `[[variant]]` — Context-Aware Filter Delegation

Example 1: `git push` (Level 1 — match_output + extract)

Example 2: `git status` (Level 2 — parse + group)

Example 3: `cargo test` (Level 4 — section + chunk + aggregates + tree)