What is Mezmo?
Mezmo is an AI-driven telemetry data platform built for agentic operations—helping teams detect, diagnose, and resolve incidents faster using intelligent observability. Instead of drowning in millions of raw logs, metrics, and traces, Mezmo’s Active Telemetry pipeline filters out noise, clusters related signals, and delivers only the most relevant, context-rich data to your AI agents. This means faster root cause analysis, lower costs, and smarter automation.
At the heart of Mezmo’s approach is AURA, an open-source agentic control plane that orchestrates AI workflows across your infrastructure. AURA works with any LLM and connects to tools like PagerDuty, Datadog, or internal APIs via the Model Context Protocol (MCP). Together, Mezmo and AURA ensure your AI agents act on high-quality data—not data chaos—so they can investigate incidents in seconds, not hours.
What are the features of Mezmo?
- Active Telemetry Pipeline: Reduces up to 99.98% of raw telemetry data by deduplicating, clustering, and enriching signals before agents see them—saving tokens, cost, and time.
- AURA Open-Source Agent Framework: An Apache 2.0–licensed, Rust-based control plane that runs on your infrastructure and supports multi-agent orchestration with human-in-the-loop safety.
- AI-Ready Context Engineering: Curates just-in-time, task-specific context for agents using MCP, ensuring they reason over <1K signals instead of millions of raw events.
- LLM-Agnostic & Tool-Native: Works with OpenAI, Anthropic, Ollama, Bedrock, Gemini, and more—plus dynamic discovery of MCP tools like Slack, ClickHouse, or internal APIs.
- Pre-Built Agentic SRE Workflows: Includes ready-to-use agents for incident triage, root cause analysis (RCA), and remediation, grounded in real runbooks.
- Seamless Integration with Popular Frameworks: Plug Mezmo into LangChain, CrewAI, Temporal, or Deep Agents via its remote MCP server—no local tool server needed.
- Cost & Vendor Control: Route OpenTelemetry data flexibly across destinations (Datadog, Grafana, S3, etc.), cut observability spend by up to 70%, and avoid lock-in.
What are the use cases of Mezmo?
- Incident Triage Automation: Automatically classify PagerDuty alerts, prioritize by business impact, and launch investigation workflows within seconds.
- Root Cause Analysis (RCA): Correlate logs, metrics, and traces to narrow hypotheses from dozens to just a few—then identify the true root cause in under 20 seconds.
- On-Call AI Assistant: Deploy a conversational agent that answers SRE questions using real-time, curated telemetry—no hallucinations, just facts from your stack.
- Kubernetes Troubleshooting: Inspect cluster health, query pod metrics, and analyze error logs across namespaces with a specialized K8s SRE agent.
- Post-Incident Automation: Generate auto post-mortems and timeline summaries after resolution—cutting manual reporting from 4 hours to minutes.
- Observability Cost Optimization: Profile high-volume, low-value telemetry streams and reroute or drop them to reduce vendor bills by up to 70%.
How to use Mezmo?
- Start with AURA Single Agent: Pick a use case (e.g., incident triage), configure it in a TOML file, and connect your LLM and MCP tools—go live in under an hour.
- Connect to Mezmo’s MCP Server: Point your agent to
https://mcp.mezmo.com/mcpwith your API key to get curated, pipeline-processed signals—no raw firehose. - Use pre-built agent templates for SRE tasks like log analysis or metric validation, or build custom agents using LangChain, CrewAI, or Temporal.
- Enable multi-agent orchestration in AURA to coordinate specialists (e.g., log analyst + metrics analyst) with automatic handoffs and safety gates.
- Migrate incrementally with OpenTelemetry: Route data to Mezmo alongside existing tools during OTel migration—zero downtime required.
- Monitor agent performance with OpenInference tracing: Export full audit trails (prompts, tool calls, decisions) to Jaeger, Arize Phoenix, or Datadog.









