Skip to main content

WorldFlow AI Developer Documentation

WorldFlow AI is an enterprise memory layer for LLM applications. It sits between your agents and LLM providers, delivering four capabilities: semantic caching that reduces costs and latency for repeated queries, a contextual memory system that gives agents persistent knowledge across sessions, an intelligence layer that turns accumulated project knowledge into queryable team context, and KV-cache inference acceleration that reuses GPU key-value caches across semantically similar prompts for dramatic prefill speedups.

Why WorldFlow AI?

  • Cost reduction --- Cache semantically similar queries and serve responses in under 50ms instead of calling the LLM provider. 40-70% reduction in inference costs through semantic caching.
  • Inference acceleration --- KV-cache reuse (SemBlend) eliminates redundant GPU prefill computation for long-context prompts, delivering 2-12x TTFT speedup with near-lossless quality.
  • Context continuity --- Agents lose context between sessions. WorldFlow AI's memory layer persists milestones, branches, and reasoning traces so every new session starts with full project awareness.
  • Team intelligence --- Search across all projects, contributors, and external sources (Slack, JIRA, Confluence) with natural language queries.

Get Started

Quickstart

Make your first cached query and store your first memory milestone in under 5 minutes.

API Reference

Complete reference for all endpoints: authentication, memory, and OpenAI/Anthropic-compatible proxy.

Core Concepts

Understand the semantic cache, the three-tier cache architecture, the GCC memory model, branches, milestones, and the intelligence layer.

API Surface

AreaEndpointsDescription
Authentication1API key exchange for JWT
Memory32Projects, milestones, branches, search, intelligence
Proxy (OpenAI)2Drop-in /v1/chat/completions replacement
Proxy (Anthropic)1Drop-in /v1/messages replacement
Proxy (Gemini)6Drop-in Gemini API replacement
Proxy (Cohere)3Drop-in Cohere API replacement
MCP / Agentic20+Model Context Protocol server management

SDK Examples

  • Python --- OpenAI SDK and requests examples
  • TypeScript --- Node.js with openai and fetch
  • cURL --- Copy-paste examples for every endpoint