Topic: vision
17 skills in this topic.
-
glmv-doc-based-writing
Write a textual content based on given document(s) and requirements, using ZhiPu GLM-V multimodal model. Read and comprehend one or multiple documents (PDF/DOCX), write a content in Markdown format according to the specified requirements. Use when the user wants to draft a paper/article/essay/report/review/post/brief/proposal/plan, etc.
zai-org/GLM-skills 304
-
glmv-web-replication
Frontend visual replication skill. Explores a target website’s publicly visible pages via Playwright MCP or agent-browser, captures screenshots and layout information, then generates a static or client-side frontend replica that approximates the original’s visual appearance and page structure. This skill replicates FRONTEND PRESENTATION ONLY — it does not reproduce backend logic, server-side behavior, databases, or any non-public content. The user is responsible for ensuring they have proper authorization (ownership, license, or explicit permission) before replicating any website.
⚠️ Authorization gate: Before starting, the agent MUST confirm with the user that they have the legal right to replicate the target site. If the user cannot confirm, the skill MUST refuse to proceed.
zai-org/GLM-skills 304
-
glmv-stock-analyst
股票分析与涨跌预测分析。 在用户表达分析、判断或预测意图时触发,如“分析一下腾讯”、“0700最近走势如何”、“XX能不能买”、“预测一下后续走势”、“生成一份分析报告”等; 对于简单查询类需求(如“腾讯当前价格是多少”、“茅台代码是什么”)不触发本 Skill。 支持港股、A股、美股,整合多源数据(包括新闻、基本面、技术面、资金流及宏观信息)进行多维综合分析,输出图文结合、包含可视化图表的结构化分析报告。 ⚠️ 需要多模态主模型支持(如 glm-5v-turbo),主模型需能读取图片。
zai-org/GLM-skills 304
-
glmv-resume-screen
Screen and evaluate resumes against criteria using ZhiPu GLM-V multimodal model. Reads multiple resume files (PDF/DOCX/TXT), compares against user-defined screening criteria, and outputs a Markdown table with pass/fail analysis. Use when the user wants to filter resumes, compare candidates, or batch-evaluate job applications.
zai-org/GLM-skills 304
-
glmv-prompt-gen
Analyze images/videos and generate professional prompts for text-to-image and text-to-video AI tools (Midjourney, Stable Diffusion, DALL-E, Sora, Runway, Kling, Pika). Use when the user wants to generate prompts from reference images/videos, create AI art prompts, or get prompt engineering suggestions from visual content.
zai-org/GLM-skills 304
-
glmv-prd-to-app
Build a complete, production-ready full-stack web application from PRD documents, prototype images, and resource files. Handles the entire pipeline: system design, database schema, seed data, backend API, frontend UI, visual verification against prototypes, and deployment script generation.
Use this skill whenever the user: - Provides a PRD (product requirement document) and wants a working app built - Says things like "根据PRD开发", "build from PRD", "implement this product",
"把需求文档做成应用", "develop this app from requirements"
- Has prototype images + requirements and wants full-stack implementation - Wants to turn product specifications into a running web application - Mentions building an app from wireframes/mockups combined with a requirements doc
Trigger this skill even if the user just says "帮我开发" or "build this" with PRD materials present in the working directory.
zai-org/GLM-skills 304
-
glmv-pdf-to-web
Convert a PDF (research paper, technical report, or project document) into a beautiful single-page academic/project website with a structured outline JSON. Trigger this skill when the user wants to make a paper page, project homepage, or academic website from a PDF — in Chinese or English.
zai-org/GLM-skills 304
-
glmv-pdf-to-ppt
Convert a PDF (research paper, report, or any document) into a polished multi-slide HTML presentation with a structured outline JSON and summary markdown. Trigger this skill when the user mentions making slides or a PPT from a PDF — in Chinese or English.
zai-org/GLM-skills 304
-
glmv-grounding
A skill that uses GLM-V native grounding capabilities for coordinate conversion, bounding-box visualization, and more. GLM-V native grounding can locate any target specified by the prompt in an image and output relative coordinates normalized to 0-1000 based on image size. Coordinate formats include 2D bounding box (default), 2D points, and 3D bounding box. GLM-V also supports spatiotemporal localization and tracking of multiple prompt-specified targets in videos, outputting 2D bounding boxes per second.
zai-org/GLM-skills 304
-
glm-image-gen
Official skill for generating high-quality images from text prompts using ZhiPu GLM-Image API. Excellent at scientific illustrations, high-quality portraits, social media graphics, and commercial posters. Supports multiple aspect ratios, HD quality, and watermark control. Use this skill when the user wants to generate images, create AI art, text-to-image, or convert text descriptions into visual content.
zai-org/GLM-skills 304
-
glmv-caption
Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).
zai-org/GLM-skills 304
-
glmocr-table
Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells, and multi-page documents. Use this skill when the user wants to extract tables, recognize spreadsheets, or convert table images to editable format.
zai-org/GLM-skills 304
-
glmocr
Trigger when: (1) User wants to extract text, tables, formulas, or structured data from images/PDFs/scanned documents, (2) User mentions "OCR", "文字识别", "文档解析", (3) User has a document (screenshot, scanned page, invoice, paper, whiteboard photo) and needs its content in structured form, (4) User asks to parse, digitize, or extract content from a visual document.
Invokes the GLM-OCR SDK (pip install glmocr) to parse documents via Zhipu's cloud API. No GPU required. Returns structured JSON (regions with labels + bounding boxes) and Markdown. Agent can operate entirely via CLI — no YAML files needed.
NOT for: real-time camera feeds, audio transcription, or non-document images (photos, illustrations).
zai-org/GLM-skills 304
-
glmocr-handwriting
Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various handwriting styles, languages, and mixed handwritten/printed content. Use this skill when the user wants to read handwritten notes, convert handwriting to text, or OCR handwritten documents.
zai-org/GLM-skills 304
-
glmocr-formula
Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equations, inline formulas, and formula blocks. Use this skill when the user wants to extract formulas, convert formula images to LaTeX, or OCR mathematical expressions.
zai-org/GLM-skills 304
-
glmocr
Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recognition. Use this skill whenever the user wants to extract text from images, perform OCR on pictures, scan documents, convert images to text, or process any image files to get their textual content.
zai-org/GLM-skills 304
-
glm-master-skill
Documentation-only master skill for GLM ecosystem discovery and installation.
This skill does not execute scripts or subprocess commands.
It provides a curated list of official GLM skills, install methods, and source links.
zai-org/GLM-skills 304