Agent skill

channel-name-parsing

Multi-format channel name parsing for KINTSUGI CHANNELNAMES.txt files

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/development/channel-name-parsing-smith6jt-cop-skills-registry

SKILL.md

Channel Name Parsing - Research Notes

Experiment Overview

Item Details
Date 2024-12-15
Goal Parse channel names from various CHANNELNAMES.txt formats
Environment KINTSUGI pipeline, Python 3.10+
Status Success

Context

Different microscopy systems and users produce CHANNELNAMES.txt files in various formats. KINTSUGI needs to parse channel/marker names to label output files correctly. The parsing must auto-detect the format and handle multiple conventions.

Supported Formats

Format 1: Simple List (One Channel Per Line)

Most common format from CODEX systems. Each line is a channel name, 4 channels per cycle. Cycle number extracted from DAPI marker name (DAPI-01, DAPI-02, etc.).

DAPI-01
Blank
Blank
Blank
DAPI-02
CD31
CD8
CD45
DAPI-03
CD20
Ki67
CD3e

Format 2: Cycle-Prefixed with Colon

1: DAPI, Blank, Blank, Blank
2: DAPI, CD31, CD8, CD45
3: DAPI, CD20, Ki67, CD3e

Format 3: Tab-Separated

1	DAPI	Blank	Blank	Blank
2	DAPI	CD31	CD8	CD45
3	DAPI	CD20	Ki67	CD3e

Format 4: CSV (Comma-Separated)

1,DAPI,Blank,Blank,Blank
2,DAPI,CD31,CD8,CD45
3,DAPI,CD20,Ki67,CD3e

Verified Workflow

Complete Parsing Function

python
import re
from pathlib import Path

def load_channel_names(meta_dir, filename="CHANNELNAMES.txt", channels_per_cycle=4):
    """
    Load channel names from various formats.

    Returns: dict {cycle_number: [channel_names]} or None
    """
    channel_file = Path(meta_dir) / filename

    # Try alternative filenames
    if not channel_file.exists():
        alt_names = ["CHANNELNAMES.txt", "channelnames.txt", "channel_names.txt",
                     "channel_names.csv", "channels.txt", "markers.txt"]
        for alt_name in alt_names:
            alt_file = Path(meta_dir) / alt_name
            if alt_file.exists():
                channel_file = alt_file
                break
        else:
            return None

    # Read non-empty, non-comment lines
    lines = []
    with open(channel_file, 'r') as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith('#'):
                lines.append(line)

    if not lines:
        return None

    channel_dict = {}
    first_line = lines[0]

    # Detect format from first line
    if ':' in first_line or '\t' in first_line or \
       (first_line.split(',')[0].strip().isdigit() and len(first_line.split(',')) > 2):
        # Cycle-prefixed format
        for line in lines:
            try:
                if ':' in line:
                    cycle_str, names_str = line.split(':', 1)
                    cycle = int(cycle_str.strip())
                    names = [n.strip() for n in names_str.split(',')]
                elif '\t' in line:
                    parts = line.split('\t')
                    cycle = int(parts[0].strip())
                    names = [n.strip() for n in parts[1:]]
                else:
                    parts = line.split(',')
                    cycle = int(parts[0].strip())
                    names = [n.strip() for n in parts[1:]]
                channel_dict[cycle] = names
            except (ValueError, IndexError):
                continue
    else:
        # Simple list format - detect cycles from DAPI-XX pattern
        current_cycle = 0
        cycle_channels = []

        for line in lines:
            dapi_match = re.match(r'DAPI[-_]?(\d+)', line, re.IGNORECASE)

            if dapi_match:
                # Save previous cycle
                if cycle_channels and current_cycle > 0:
                    channel_dict[current_cycle] = cycle_channels
                # Start new cycle
                current_cycle = int(dapi_match.group(1))
                cycle_channels = [line]
            elif current_cycle > 0:
                cycle_channels.append(line)
                if len(cycle_channels) == channels_per_cycle:
                    channel_dict[current_cycle] = cycle_channels
                    cycle_channels = []

        # Save final cycle
        if cycle_channels and current_cycle > 0:
            channel_dict[current_cycle] = cycle_channels

    return channel_dict

Usage

python
meta_dir = project.paths.meta  # or Path("/path/to/meta")
channel_name_dict = load_channel_names(meta_dir)

if channel_name_dict is None:
    # Fallback to manual definition
    channel_name_dict = {
        1: ["DAPI", "Blank1a", "Blank1b", "Blank1c"],
        2: ["DAPI", "CD31", "CD8", "CD45"],
        3: ["DAPI", "CD20", "Ki67", "CD3e"],
    }

# Access channel name for cycle 2, channel 3
marker = channel_name_dict.get(2, [''] * 4)[2]  # "CD8"

Failed Attempts (Critical)

Attempt Why it Failed Lesson Learned
Only supporting cycle-prefixed format Simple list format common in CODEX systems Must auto-detect format from first line
Hardcoding 4 channels per cycle Some systems have different channel counts Make channels_per_cycle a parameter
Requiring exact "DAPI" match Some files use "DAPI-01", "DAPI-02" with cycle number Use regex to extract cycle from DAPI marker
Case-sensitive matching "dapi-01" and "DAPI-01" both valid Use re.IGNORECASE flag

Final Parameters

Format Detection Heuristic

python
# Check first line for format indicators
first_line = lines[0]

is_cycle_prefixed = (
    ':' in first_line or           # "1: DAPI, Blank..."
    '\t' in first_line or          # "1\tDAPI\tBlank..."
    (first_line.split(',')[0].strip().isdigit() and
     len(first_line.split(',')) > 2)  # "1,DAPI,Blank..."
)

DAPI Cycle Extraction Regex

python
dapi_match = re.match(r'DAPI[-_]?(\d+)', line, re.IGNORECASE)
# Matches: DAPI-01, DAPI_01, DAPI01, dapi-1, etc.

Key Insights

  • Auto-detect format rather than requiring user specification
  • Simple list format uses DAPI marker to determine cycle boundaries
  • Always provide fallback when file not found or parsing fails
  • Support multiple filename conventions (CHANNELNAMES.txt, channelnames.txt, etc.)
  • Comments (lines starting with #) should be ignored
  • Empty lines should be skipped

References

  • CODEX channel naming conventions
  • KINTSUGI Notebook 2 cell-7 (Processing Parameters)

Didn't find tool you were looking for?

Be as detailed as possible for better results