A New Path for Artificial Intelligence

Dive into AXIOM, a revolutionary AI that learns like a human—efficiently, intuitively, and transparently. It challenges the foundations of modern deep learning by mastering complex games in minutes, not millennia, without relying on neural networks, backpropagation, or gradient-based optimization.

The AI Efficiency Crisis

Modern AI, particularly Deep Reinforcement Learning (DRL), is powerful but incredibly data-hungry. While a human can learn a new game in minutes, a DRL agent often needs billions of interaction steps—"tens of thousands of years of game play"[2]. This section frames the problem AXIOM was built to solve, contrasting the two dominant philosophies in the quest for more efficient AI.

The "Scale" Philosophy

This approach tackles inefficiency by building bigger, faster neural networks and training them with immense computational power and data. It pushes the current paradigm to its absolute limits, seeking intelligence through sheer capacity.

Method: Brute-force computation, massive function approximators.

Example: BBF (Bigger, Better, Faster)[11]

The "Structure" Philosophy

This approach argues that the key to intelligence lies in building AI with more explicit, human-like cognitive structures. Models learn an internal "world model," enabling them to "imagine," reason, and plan with far greater data efficiency.

Method: Built-in world knowledge, causal reasoning, strong inductive biases.

Example: AXIOM, DreamerV3[6]

The Anatomy of AXIOM

AXIOM is not a monolithic black box. It's a "modular 'digital brain'"[13] composed of four distinct yet interconnected mixture models. They work in concert to transform raw pixels into structured knowledge and goal-directed action, mirroring a logical decomposition of cognitive functions: perception, recognition, prediction, and reasoning. Click any module to learn more.

1. Slot Mixture Model (SMM)

Perception: Deconstructs a raw image into a set of distinct "objects" or slots, separating entities from the background.

2. Identity Mixture Model (iMM)

Recognition: Takes the discovered objects and classifies them into types (e.g., "ball", "paddle") based on their features like color and shape.

3. Transition Mixture Model (tMM)

Prediction: Acts as a "physics engine," modeling how objects can move with a library of simple motion primitives (e.g., "fall", "bounce").

4. Recurrent Mixture Model (rMM)

Reasoning: The cognitive core. It infers *why* an object moves in a certain way by modeling the causal rules of interaction and context.

AXIOM Brain Functions

Figure 1: Inference and prediction flow using AXIOM: The sMM extracts object-centric representations from pixel inputs. For each object latent and its closest interacting counterpart, a discrete identity token is inferred using the iMM and passed to the rMM, along with the distance and the action, to predict the next reward and the tMM switch. The object latents are then updated using the tMM and the predicted switch to generate the next state for all objects. (a) Projection of the object latents into image space. (b) Projection of the kth latent whose dynamics are being predicted and (c) of its interaction partner. (d) Projection of the rMM in image space; each of the visualized clusters corresponds to a particular linear dynamical system from the tMM. (e) Projection of the predicted latents. The past latents at time t are shown in gray.

Source: AXIOM Digital Brain Whitepaper

SMM: From Pixels to Objects

The SMM is AXIOM's perceptual front-end. It takes a raw image, treats it as a collection of pixel tokens (each with RGB color and X,Y coordinates), and uses a Gaussian Mixture Model (GMM) to probabilistically group these pixels into a set of 'slots'. Each slot is an object-centric representation. Crucially, unlike methods like Slot Attention[14] that use iterative attention over CNN feature maps, the SMM is a more direct, probabilistic model where a slot's latent state (position, color, shape) *directly* defines the parameters of its corresponding Gaussian component. This creates a tight, interpretable link between the abstract representation and its visual manifestation.[1]

iMM: What Am I Seeing?

Once objects are segmented by the SMM, the iMM answers, "What kind of objects are these?" It takes the continuous features of the slots (specifically, their 5-dimensional color and shape vectors) and assigns a discrete identity code to each one using another GMM. For example, it might learn that all small, red, circular objects belong to "type 1" (cherries) and all long, yellow, rectangular objects are "type 2" (paddles). This is vital for generalization, as it allows AXIOM to learn physical laws for an object *type*, not just for one specific instance, so it doesn't have to re-learn that all cherries fall down every time it sees a new one.[1]

tMM: The "Physics Engine"

The tMM provides AXIOM with a predictive "physics engine." It is formulated as a switching linear dynamical system (SLDS). Instead of learning one highly complex function for all motion, the tMM maintains a shared library of up to L simple, linear motion primitives (e.g., "falling under gravity," "moving left," "bouncing"). For each object, a discrete switch variable selects a primitive from this library to predict its state at the next timestep. By switching between these simple modes, the tMM can approximate highly complex, non-linear trajectories. This shared library creates a compact, general-purpose physics engine applicable to any object based on context.[1]

rMM: The Causal Reasoner

The rMM is the cognitive core where high-level reasoning occurs. Its main job is to infer the correct switch state for the tMM—that is, to answer: "Given the situation, which motion primitive should apply to this object now?" To do this, it acts as a sophisticated causal reasoning engine, implemented as a generative mixture model. It considers a rich context for a "focal" object: its own state (position, velocity), interaction features (distance to nearest neighbor and that neighbor's identity), the action taken by the agent, and the reward received. By learning a joint probability distribution over these variables, it discovers rules like, "If the paddle (type 2) is very close to the ball (type 1), and the agent moves up, then the tMM switch for the ball should be 'bouncing up'."[1]

Learning and Deciding

AXIOM's learning process is as unique as its architecture. It learns online, one frame at a time, in a dynamic dance of growth and simplification. Its decisions are guided by a single, powerful principle: Active Inference.

Adaptive Growth & Pruning

AXIOM starts with a minimal model. When it encounters something novel or surprising (a new object, a new type of movement), it grows its internal models by adding a new component—a new "hypothesis" to explain the phenomenon. This is Adaptive Growth. To prevent just memorizing everything, it periodically engages in Bayesian Model Reduction (BMR), a pruning process that merges redundant or overly specific hypotheses into a single, more general rule. This explicit cycle of hypothesis generation and theory simplification is how AXIOM generalizes from limited data.[1]

Planning with Active Inference

AXIOM plans by selecting policies (sequences of actions) that are expected to minimize Expected Free Energy (G). This elegantly unifies two drives: seeking rewards and seeking information.[1]

Pragmatic Value (Exploitation): The first term drives the agent to seek states it expects to be rewarding (maximizing utility).

Epistemic Value (Exploration): The second term (a KL Divergence) is a formalization of curiosity. It drives the agent to take actions that are expected to resolve its uncertainty about how the world works (maximizing information gain).

Putting AXIOM to the Test

The Gameworld 10k benchmark was designed to test the limits of fast learning. With a strict budget of just 10,000 interactions (a few minutes of gameplay), agents must learn complex object-based dynamics from pixels. The charts below compare AXIOM to two state-of-the-art deep learning models: the structured, model-based DreamerV3 and the scaled-up, model-free BBF.

Gameworld 10k: Cumulative Reward

This chart shows the final score achieved by each model across 10 games after 10,000 steps. Higher scores are better. AXIOM consistently attains higher, or on-par, average cumulative reward in every environment.[1, Table 1]

Model Size: Parameters

This chart compares the number of learnable parameters for each model (on a logarithmic scale). AXIOM's model is orders of magnitude smaller—up to 440 times smaller than DreamerV3 and 7 times smaller than BBF—making it lighter and more efficient.[1, Table 2]

Computational Speed

This chart shows how long it takes each model to perform its core update step. AXIOM's gradient-free variational Bayesian update is substantially faster than the backpropagation used by deep learning models.[1, Table 2]

The "Glass Box" Advantage

Beyond performance, AXIOM's greatest strength is its transparency. Unlike opaque "black box" neural networks, AXIOM's internal states are interpretable, allowing for unprecedented analysis and even direct, surgical intervention.

Case Study: Cognitive Surgery

In a perturbation experiment, researchers suddenly changed the colors of game objects mid-training. A traditional AI would suffer catastrophic failure, its entangled representation of appearance and dynamics rendered useless. With AXIOM, the developers could pinpoint the problem and perform a "cognitive intervention."[1, Appx. E.3]

1

Diagnosis: The developers could see the failure point was precisely localized to the Identity Model (iMM), which uses color to infer object identity. The new, unseen colors were causing it to create new object types with no associated physics.

2

Intervention: Instead of costly retraining, they performed a "cognitive surgery" by surgically modifying the iMM's inference process, instructing it to temporarily ignore color information and rely solely on shape to determine object identity.

3

Result: The model instantly re-associated the color-changed objects with their original identities. The entire library of previously learned dynamics and causal rules in the tMM and rMM became immediately applicable again. Performance was rescued with no retraining required.

This functional interpretability, which allows a human to act as a "cognitive debugger," is simply not possible with today's black-box systems and highlights a path toward building safer, more reliable AI.

A Glimpse of the Future

AXIOM is more than a new model; it's a compelling argument that the path to more general and efficient AI may lie in structure, not just scale. By combining principled Bayesian inference with strong, object-centric priors, it achieves remarkable performance while remaining lightweight, transparent, and adaptable. While it currently relies on engineered priors, the quest to have AI learn these structures autonomously marks the next exciting frontier. AXIOM offers a glimpse of what that future might look like: structured, curious, efficient, and, above all, understandable.

References

A comprehensive list of papers and resources referenced in this report.

Test Your Understanding

Challenge yourself with this interactive quiz about AXIOM's architecture, principles, and capabilities. Each question is designed to deepen your understanding of the key concepts.

Progress Question 1 of 10

Question 1

0 / 0 correct

Critical Thinking: Deeper Questions

Beyond the impressive results, AXIOM's approach raises profound questions about the nature of intelligence, scalability, and the path toward artificial general intelligence. Explore these thought-provoking questions that challenge the boundaries of what we've learned.

The Path Forward

These questions don't diminish AXIOM's remarkable achievements, but rather illuminate the profound challenges that remain in our quest for artificial general intelligence. They remind us that each breakthrough in AI opens new questions as fascinating as the answers it provides.