Build AI Agent Workflows with Video Intelligence

Veedcrawl is designed for AI pipelines from the ground up. Every endpoint returns structured JSON, async jobs follow a consistent enqueue-and-poll pattern, and the same API surface works for metadata lookups, transcript generation, and prompt-driven extraction. Whether you wire it up through the MCP server or call the REST API directly, the architecture stays the same: fetch only what you need, at the depth the task actually requires.

The three-layer approach

Structure your workflows around three layers of increasing cost and depth. Start cheap — pull metadata first, then commit credits only when a video proves worth it.

Layer 1: Metadata

Free · instantPull title, author, view count, like count, duration, and thumbnail from any video URL. Use this as a cheap first pass to decide whether a video is worth deeper processing. No credits consumed.

Layer 2: Transcript

1–5 credits · asyncGet the full spoken content with timestamps. Use native captions (1 credit) when available or Whisper AI generation (5 credits) as a fallback. Feed the result into RAG pipelines, search indexes, or summarization steps.

Layer 3: Extract

10 credits · asyncAsk any question about a video and get a structured answer — hooks, claims, product mentions, sentiment, summaries. Optionally enforce output shape with a JSON Schema for machine-readable results.

Start every workflow at Layer 1. Promote to Layer 2 or Layer 3 only when the metadata signals the video is relevant.

Using the MCP server for agents

If you’re building with Claude, Cursor, or Windsurf, the Veedcrawl MCP server exposes all five tools directly to your agent — no REST calls, no polling logic to write. The server handles job queuing and retries internally and returns finished results to the agent. Install and connect the server, then your agent can call get_video_metadata, get_video_transcript, extract_from_video, get_tiktok_profile, and get_instagram_profile natively within its reasoning loop. For full setup instructions, see the MCP overview.

Using the REST API directly

For pipelines outside of MCP-compatible hosts, call the REST API directly. The following example runs all three layers against a single video in sequence.

Check the video with metadata

Start with a free metadata call. If the view count or creator doesn’t match your criteria, stop here — no credits spent.

curl "https://api.veedcrawl.com/v1/metadata?url=https://www.tiktok.com/@creator/video/123" \
  -H "x-api-key: YOUR_KEY"

{
  "title": "Building AI agents in 2025",
  "author": "@theprimagen",
  "platform": "tiktok",
  "duration": 182,
  "viewCount": 482910,
  "likeCount": 21400,
  "thumbnail": "https://i.ytimg.com/vi/abc123/hqdefault.jpg"
}

If the video passes your filter, proceed to transcription.

Enqueue a transcript job and poll for completion

POST to /v1/transcript to start a transcription job. Use mode: "auto" to try native captions first and fall back to AI generation.

# Enqueue the job
curl -X POST "https://api.veedcrawl.com/v1/transcript" \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.tiktok.com/@creator/video/123", "mode": "auto"}'

{ "jobId": "job_abc123", "status": "queued" }

Poll the job endpoint until status is "completed" or "failed":

# Poll for result
curl "https://api.veedcrawl.com/v1/transcript/job_abc123" \
  -H "x-api-key: YOUR_KEY"

{
  "jobId": "job_abc123",
  "status": "completed",
  "resultJson": {
    "text": "Most people building AI agents in 2025 are making the same three mistakes..."
  }
}

Extract structured information with a prompt

Once you have the transcript and have confirmed the video is worth deeper analysis, send the URL to /v1/extract with a specific prompt. This watches the full video — visual content included — not just the transcript text.

# Enqueue the extract job
curl -X POST "https://api.veedcrawl.com/v1/extract" \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.tiktok.com/@creator/video/123",
    "prompt": "Extract all product names shown"
  }'

Poll until complete:

curl "https://api.veedcrawl.com/v1/extract/job_xyz456" \
  -H "x-api-key: YOUR_KEY"

{
  "jobId": "job_xyz456",
  "status": "completed",
  "resultJson": {
    "products": ["iPhone 16 Pro", "AirPods Pro 2"]
  }
}

Handling async jobs

Both /v1/transcript and /v1/extract are asynchronous. Every POST immediately returns a jobId — the actual processing happens in the background.

Enqueue and poll pattern

Every async job follows the same two-step pattern:

POST to the endpoint → receive { "jobId": "job_abc123", "status": "queued" }
GET /{endpoint}/{jobId} repeatedly until status is "completed" or "failed"

The poll endpoint is always the same base path with the jobId appended:

Transcript: GET /v1/transcript/{jobId}
Extract: GET /v1/extract/{jobId}

Poll interval and timeout

Poll every 1–2 seconds. Most jobs complete within 30 seconds for native transcripts and under 2 minutes for AI generation or visual extraction.If you’re implementing polling yourself, cap at 120 attempts before treating the job as timed out — that’s approximately 3 minutes at a 1.5-second interval. The MCP server handles polling and retry logic automatically, so this only applies when calling the REST API directly.

Error shape

When a job fails or a request is invalid, the API returns a consistent error object:

{
  "error": "error-code",
  "message": "Human-readable description",
  "details": null
}

Check status === "failed" on a polled job response the same way you check for "completed". The resultJson field will be null on failure; the error details appear at the top level.

Enforcing output shape with JSON Schema

When your extract results need to feed into a downstream system — another API call, a database write, a structured agent response — pass a schema field with your POST request to constrain the output to a machine-readable format.

Use JSON Schema on extract calls whenever the result needs to be parsed programmatically. Without a schema, extract returns free-form text that requires additional parsing. With a schema, the output matches the shape you define.

curl -X POST "https://api.veedcrawl.com/v1/extract" \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.youtube.com/watch?v=...",
    "prompt": "Extract all product names shown",
    "schema": {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    }
  }'

The response resultJson will conform to your schema:

{
  "products": ["iPhone 16 Pro", "AirPods Pro 2"]
}

Pass that array directly to a downstream tool, write it to a database, or feed it into the next step in your agent’s reasoning chain — no parsing or reshaping required.

Get Started

MCP Server

Guides

Reference

Build AI Agent Workflows with Video Intelligence

The three-layer approach

Layer 1: Metadata

Layer 2: Transcript

Layer 3: Extract

Using the MCP server for agents

Using the REST API directly

Handling async jobs

Enforcing output shape with JSON Schema

Get Started

MCP Server

Guides

Reference

Documentation Index

​The three-layer approach

Layer 1: Metadata

Layer 2: Transcript

Layer 3: Extract

​Using the MCP server for agents

​Using the REST API directly

​Handling async jobs

​Enforcing output shape with JSON Schema

The three-layer approach

Using the MCP server for agents

Using the REST API directly

Handling async jobs

Enforcing output shape with JSON Schema