Question 1

Which is better: Azure Foundry Hosted Agents or MDArena?

Accepted Answer

Based on our expert panel, Azure Foundry Hosted Agents has a stronger verdict with a 50% Ship rate. Azure Foundry Hosted Agents received a panel verdict of Mixed and MDArena received Mixed.

Question 2

Is Azure Foundry Hosted Agents free?

Accepted Answer

Azure Foundry Hosted Agents pricing: $0.0994/vCPU-hour, $0.0118/GiB-hour (public preview)

Question 3

Is MDArena free?

Accepted Answer

MDArena pricing: Free / Open Source

Question 4

What do experts say about Azure Foundry Hosted Agents vs MDArena?

Accepted Answer

Azure Foundry Hosted Agents: Microsoft Azure's Foundry Agent Service now offers Hosted Agents in public preview — per-session isolated compute sandboxes purpose-built for running AI agents at scale. Each session gets its own container with a persistent filesystem, internet access (optional), and a Python environment pre-loaded with common agent dependencies. Sessions spin up in seconds and terminate — and stop billing — the moment the agent task completes.

The design is framework-agnostic: it officially supports LangGraph, OpenAI Agents SDK, Claude Agent SDK, and Microsoft's own Agent Framework, with others planned. This removes one of the most awkward parts of deploying agents in production: figuring out where they actually run. The persistent filesystem per session means agents can read and write files across their task without external storage configuration.

Pricing is $0.0994/vCPU-hour and $0.0118/GiB-hour — competitive with Lambda/Cloud Run for bursty workloads. The service is available in six Azure regions at launch. For enterprises already invested in Azure, this is a compelling "we just figured out the infra" moment. Independent developers can also use it without an enterprise agreement. MDArena: MDArena is an open-source benchmarking tool that answers a question every Claude Code user eventually asks: do my CLAUDE.md context files actually improve agent performance, or am I just adding tokens? It mines merged PRs from your repository, strips or injects context files, runs your actual test suite, and measures success rates with statistical significance tests.

The methodology mirrors SWE-bench: use `git archive` to create history-free checkpoints so agents can't peek at future commits, detect test commands from CI/CD configs automatically, and run paired t-tests to determine whether differences are real or noise. The project was motivated by academic research showing many CLAUDE.md files reduce agent success rates by 20% while consuming more tokens.

For any team investing heavily in Claude Code infrastructure, MDArena provides empirical feedback that most developers currently lack. It's a small, focused tool that solves an annoying but real problem in the emerging AI coding workflow.

Azure Foundry Hosted Agents vs MDArena

Azure Foundry Hosted Agents

MDArena

Bookmarks