Agent skill

glmocr-table

Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells, and multi-page documents. Use this skill when the user wants to extract tables, recognize spreadsheets, or convert table images to editable format.

View SKILL.md on GitHub Repository

Stars 304

Forks 22

Install this agent skill to your Project

npx add-skill https://github.com/zai-org/GLM-skills/tree/main/skills/glmocr-table

Metadata

Additional technical details for this skill

openclaw: { "emoji": "\ud83d\udcca", "homepage": "https://github.com/zai-org/GLM-OCR/tree/main/skills/glmocr-table", "requires": { "env": [ "ZHIPU_API_KEY", "GLM_OCR_TIMEOUT" ], "bins": [ "python" ] }, "primaryEnv": "ZHIPU_API_KEY" }

SKILL.md

GLM-OCR Table Recognition Skill / GLM-OCR 表格识别技能

Extract tables from images and PDFs and convert them to Markdown format using the ZhiPu GLM-OCR layout parsing API.

When to Use / 使用场景

Extract tables from images or scanned documents / 从图片或扫描件中提取表格
Convert table images to Markdown or Excel format / 将表格图片转为 Markdown 或可编辑格式
Recognize complex tables with merged cells / 识别含合并单元格的复杂表格
Parse financial statements, invoices, reports with tables / 解析财务报表、发票、带表格的报告
User mentions "extract table", "recognize table", "表格识别", "提取表格", "表格OCR", "表格转文字"

Key Features / 核心特性

Complex table support: Handles merged cells, nested tables, multi-row headers
Markdown output: Tables are output in clean Markdown format, easy to edit and convert
Multi-page PDF: Supports batch extraction from multi-page PDF documents
Local file & URL: Supports both local files and remote URLs

Resource Links / 资源链接

Resource	Link
Get API Key	智谱开放平台 API Keys
API Docs	Layout Parsing / 版面解析

Prerequisites / 前置条件

API Key Setup / API Key 配置（Required / 必需）

脚本通过 ZHIPU_API_KEY 环境变量获取密钥，可与其他智谱技能复用同一个 key。 This script reads the key from the ZHIPU_API_KEY environment variable. Reusing the same key across Zhipu skills is optional.

Get Key / 获取 Key： Visit 智谱开放平台 API Keys to create or copy your key.

Setup options / 配置方式（任选一种）：

Global config (recommended) / 全局配置（推荐）： Set once in openclaw.json under env.vars, all Zhipu skills will share it:
json
```
{
  "env": {
    "vars": {
      "ZHIPU_API_KEY": "你的密钥"
    }
  }
}
```

Skill-level config / Skill 级别配置： Set for this skill only in openclaw.json:

json

{
  "skills": {
    "entries": {
      "glmocr-table": {
        "env": {
          "ZHIPU_API_KEY": "你的密钥"
        }
      }
    }
  }
}

Shell environment variable / Shell 环境变量： Add to ~/.zshrc:
bash
```
export ZHIPU_API_KEY="你的密钥"
```

💡 如果你已为其他智谱 skill（如 glmocr、glmv-caption、glm-image-generation）配置过 key，它们共享同一个 ZHIPU_API_KEY，无需重复配置。

Security & Transparency / 安全与透明度

Environment variables used / 使用的环境变量：
- ZHIPU_API_KEY (required / 必需)
- GLM_OCR_TIMEOUT (optional timeout seconds / 可选超时秒数)
Fixed endpoint / 固定官方端点： https://open.bigmodel.cn/api/paas/v4/layout_parsing
No custom API URL override / 不支持自定义 API URL 覆盖： this avoids accidental key exfiltration via redirected endpoints.
Raw upstream response is optional / 原始响应默认不返回： use --include-raw only when needed for debugging.

⛔ MANDATORY RESTRICTIONS / 强制限制 ⛔

ONLY use GLM-OCR API — Execute the script python scripts/glm_ocr_cli.py
NEVER parse tables yourself — Do NOT try to extract tables using built-in vision or any other method
NEVER offer alternatives — Do NOT suggest "I can try to recognize it" or similar
IF API fails — Display the error message and STOP immediately
NO fallback methods — Do NOT attempt table extraction any other way

📋 Output Display Rules / 输出展示规则

After running the script, present the OCR result clearly and safely.

Show extracted table Markdown (text) in full
Summarization is allowed, but do not hide important extraction failures
If layout_details contains table-related entries, you may highlight them
If the result file is saved, tell the user the file path
Show raw upstream response only when explicitly requested or debugging (--include-raw)

How to Use / 使用方法

Extract from URL / 从 URL 提取

bash

python scripts/glm_ocr_cli.py --file-url "https://example.com/table.png"

Extract from Local File / 从本地文件提取

bash

python scripts/glm_ocr_cli.py --file /path/to/table.png

Save Result to File / 保存结果到文件

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --pretty

Include Raw Upstream Response (Debug Only) / 包含原始上游响应（仅调试）

bash

python scripts/glm_ocr_cli.py --file table.png --output result.json --include-raw

CLI Reference / CLI 参数

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty] [--include-raw]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output
`--include-raw`	No	Include raw upstream API response in `result` field (debug only)

Response Format / 响应格式

json

{
  "ok": true,
  "text": "| Column 1 | Column 2 |\n|----------|----------|\n| Data     | Data     |",
  "layout_details": [...],
  "result": null,
  "error": null,
  "source": "/path/to/file",
  "source_type": "file",
  "raw_result_included": false
}

Key fields:

ok — whether extraction succeeded
text — extracted text in Markdown (use this for display)
layout_details — layout analysis details
error — error details on failure

Error Handling / 错误处理

API key not configured:

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Maintainer

zai-org Core maintainer

Source details

Full Name: zai-org/GLM-skills
Branch: main
Path in repo: skills/glmocr-table
License: Apache License 2.0
Topics: skills glm ocr multimodal vision

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

zai-org/GLM-skills

glmocr-handwriting

Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various handwriting styles, languages, and mixed handwritten/printed content. Use this skill when the user wants to read handwritten notes, convert handwriting to text, or OCR handwritten documents.

304 22

Explore

zai-org/GLM-skills

glmv-prd-to-app

Build a complete, production-ready full-stack web application from PRD documents, prototype images, and resource files. Handles the entire pipeline: system design, database schema, seed data, backend API, frontend UI, visual verification against prototypes, and deployment script generation. Use this skill whenever the user: - Provides a PRD (product requirement document) and wants a working app built - Says things like "根据PRD开发", "build from PRD", "implement this product", "把需求文档做成应用", "develop this app from requirements" - Has prototype images + requirements and wants full-stack implementation - Wants to turn product specifications into a running web application - Mentions building an app from wireframes/mockups combined with a requirements doc Trigger this skill even if the user just says "帮我开发" or "build this" with PRD materials present in the working directory.

304 22

Explore

zai-org/GLM-skills

glmv-doc-based-writing

Write a textual content based on given document(s) and requirements, using ZhiPu GLM-V multimodal model. Read and comprehend one or multiple documents (PDF/DOCX), write a content in Markdown format according to the specified requirements. Use when the user wants to draft a paper/article/essay/report/review/post/brief/proposal/plan, etc.

304 22

Explore

zai-org/GLM-skills

glmocr-formula

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equations, inline formulas, and formula blocks. Use this skill when the user wants to extract formulas, convert formula images to LaTeX, or OCR mathematical expressions.

304 22

Explore

zai-org/GLM-skills

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

304 22

Explore

zai-org/GLM-skills

glmocr

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recognition. Use this skill whenever the user wants to extract text from images, perform OCR on pictures, scan documents, convert images to text, or process any image files to get their textual content.

304 22

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

GLM-OCR Table Recognition Skill / GLM-OCR 表格识别技能

When to Use / 使用场景

Key Features / 核心特性

Resource Links / 资源链接

Prerequisites / 前置条件

API Key Setup / API Key 配置（Required / 必需）

Security & Transparency / 安全与透明度

📋 Output Display Rules / 输出展示规则

How to Use / 使用方法

Extract from URL / 从 URL 提取

Extract from Local File / 从本地文件提取

Save Result to File / 保存结果到文件

Include Raw Upstream Response (Debug Only) / 包含原始上游响应（仅调试）

CLI Reference / CLI 参数

Response Format / 响应格式

Error Handling / 错误处理

Recommended Agent Skills

glmocr-handwriting

glmv-prd-to-app

glmv-doc-based-writing

glmocr-formula

glmv-caption

glmocr