Why We Chose a Game Engine (And Why Determinism Matters)

When we started building Agent Arena, we had a choice: build a simple Python environment for agents to run in, or use a real game engine.

We chose Godot — a full-featured, open-source game engine. This wasn't the easy path. It meant building a C++ extension module, managing cross-language communication, and dealing with real physics and rendering.

But it was the right choice. Here's why.

Agents Need Real Environments

Most agent tutorials run in toy environments. The "world" is a Python dictionary. Actions are function calls that update the dictionary. There's no physics, no time, no consequences.

These environments are easy to set up, but they teach the wrong lessons. Agents that succeed in toy environments often fail in reality because reality has properties that dictionaries don't:

Physics: Movement takes time. Objects collide. Paths can be blocked.
Partial observability: You can't always see everything. Information is local.
Continuous time: The world doesn't wait for your agent to decide. Things happen while you're thinking.
Other actors: The environment includes other agents, each with their own goals.

You can't learn to build agents for the real world by training in environments that lack these properties.

What Godot Provides

Godot is a mature game engine with features we'd never want to build ourselves:

Physics Simulation

Real collision detection, pathfinding, and movement. When an agent moves, it actually moves through space. Obstacles block paths. Resources have positions.

Navigation

Built-in navigation meshes and pathfinding. Agents can navigate complex environments without us reimplementing A* for the hundredth time.

Sensors and Perception

Area detection, raycasting, visibility checks. Agents can "see" their environment through proper sensing, not just dictionary lookups.

Visualization

Real-time rendering of the simulation. You can watch your agents behave, not just read logs. Seeing is understanding.

Time Management

Proper time steps, delta time handling, and frame-rate independence. The simulation runs at a consistent pace regardless of system performance.

Entity Management

Scene trees, node hierarchies, and lifecycle management. Adding agents, resources, and hazards to scenarios is straightforward.

We didn't choose Godot because we wanted to build a game. We chose it because game engines solve hard problems that are essential for meaningful agent environments.

The Determinism Requirement

Here's where it gets interesting. Game engines are typically designed for entertainment, not reproducibility. They optimize for visual smoothness, not exact repeatability.

We needed the opposite. For Agent Arena to be a learning environment, simulations must be deterministic — the same inputs must always produce the same outputs.

Why Determinism Matters

Debugging: When an agent fails, you need to reproduce the failure exactly. If every run is different, you can't isolate what went wrong. You're guessing.

Comparison: When you change your agent's behavior, you need to know if the change helped. If randomness varies between runs, you can't tell if the improvement is real or luck.

Benchmarking: For scenarios to be meaningful benchmarks, agents must face identical challenges. Otherwise, scores are incomparable.

Learning: Understanding cause and effect requires controlled variables. Determinism gives you that control.

Making Godot Deterministic

This required careful engineering:

Fixed Tick Loop: The simulation runs on a fixed tick rate, independent of rendering frame rate. Every run processes the same number of ticks with the same duration.

Seedable Randomness: Any randomness in the simulation (resource spawning, agent initial positions, etc.) uses seeded random number generators. Same seed, same results.

Ordered Event Processing: Events are processed in a defined order. No race conditions, no timing-dependent behavior.

Isolated State: Simulation state is isolated from rendering and input processing. The simulation doesn't depend on how fast your computer renders frames.

Deterministic Physics: We configure Godot's physics for reproducibility over smoothness. Same forces, same collisions, same outcomes.

The result: you can run a simulation, save the seed and initial conditions, and replay it identically on any machine. This is non-negotiable for a learning environment.

The C++ Extension

Agent Arena's core simulation logic lives in a C++ GDExtension module. This gives us:

Performance

C++ is fast. Simulation logic runs at native speed, which matters when you're running thousands of ticks for evaluation.

Control

We control memory layout, execution order, and state management. No surprises from garbage collection or interpreted overhead.

Determinism Guarantees

C++ gives us precise control over floating-point operations, memory access patterns, and execution flow. This helps ensure determinism across platforms.

Clean Separation

The simulation logic is separate from Godot's rendering and input systems. The simulation can run headless for automated evaluation.

The extension exposes key classes to both GDScript (for scenario design) and Python (for agent implementation):

SimulationManager: Controls the tick loop, start/stop/step
Agent: Manages perception, decision, action execution
EventBus: Records and replays events
ToolRegistry: Manages available agent actions

Python Integration

Agents are implemented in Python — it's the language ML developers know, with access to the best AI/ML libraries. The architecture bridges Godot and Python through a clean IPC layer:

[Python Agent] ←HTTP→ [IPC Service] ←GDExtension→ [Godot Simulation]

Each simulation tick:

Godot collects observations for each agent
Observations are sent to Python via HTTP
Python agents process observations and return decisions
Decisions are sent back to Godot
Godot executes actions in the simulation
State updates, next tick begins

The HTTP boundary keeps Python and Godot cleanly separated. Agents can be swapped without changing the simulation. The simulation can run without agents for testing.

Why Not Just Python?

You might wonder: couldn't we build a simpler environment in pure Python?

We could. And for some use cases, that's fine. But we wanted Agent Arena to teach skills that transfer to real agent deployments, and real deployments have properties that pure Python environments lack:

Concurrency: Real systems involve multiple processes, timing, and communication. The Godot/Python split mirrors production agent architectures.

Latency: There's actual network latency between agent decisions and execution. Agents must handle the gap between deciding and acting.

Environmental Complexity: A game engine provides rich environments that would take years to build from scratch. We wanted complex scenarios without building a physics engine.

Visualization: Watching agents behave is invaluable for learning. Godot provides real-time visualization for free.

The added complexity of a game engine pays off in the quality of learning it enables.

The Tradeoffs

We're not pretending this architecture is simple. It has costs:

Setup Complexity: Getting Godot, the C++ extension, and Python all working together requires more setup than a pure Python package.

Development Overhead: Changes to simulation logic require C++ recompilation. This slows iteration on core systems.

Resource Requirements: A game engine needs more resources than a minimal Python environment. Not a concern for modern machines, but not zero.

We accepted these tradeoffs because the benefits — real physics, visualization, determinism, and transferable learning — outweigh the costs.

The Result

The combination of Godot's capabilities with deterministic execution and Python agent development creates a learning environment that:

Feels like a real world, not a toy
Enables exact reproduction of any behavior
Supports systematic debugging and comparison
Teaches skills that transfer to production systems

The game engine isn't a gimmick. It's the foundation that makes meaningful agent learning possible.

This is Part 4 of our series on Agent Arena. Next up: Tools, Memory, and Debugging — how Agent Arena makes agent internals visible and debuggable.

Previous posts: Why We Built Agent Arena | What Agentic AI Skills Actually Mean | The Core Learning Loop