Agent skill

coreweave-rate-limits

Handle CoreWeave API and GPU quota limits. Use when hitting quota limits, managing GPU resource allocation, or implementing request queuing for inference endpoints. Trigger with phrases like "coreweave quota", "coreweave limits", "coreweave gpu allocation", "coreweave throttle".

Stars 1,803
Forks 241

Install this agent skill to your Project

npx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/main/plugins/saas-packs/coreweave-pack/skills/coreweave-rate-limits

SKILL.md

CoreWeave Rate Limits

Overview

CoreWeave limits are primarily GPU quota-based rather than API rate limits. Each namespace has allocated GPU quotas per type.

Check GPU Quota

bash
kubectl describe resourcequota -n my-namespace
kubectl get resourcequota -o json | jq '.items[].status'

Inference Request Queuing

python
import asyncio
from collections import deque

class InferenceQueue:
    def __init__(self, max_concurrent: int = 10):
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.queue_depth = 0

    async def inference(self, client, prompt: str) -> str:
        self.queue_depth += 1
        async with self.semaphore:
            try:
                return await asyncio.to_thread(client.generate, prompt)
            finally:
                self.queue_depth -= 1

Resources

Next Steps

For security, see coreweave-security-basics.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results