Artificial intelligence is rapidly evolving, transforming the ways developers create. The flow of code is accelerating into repositories like GitHub, where machine intelligence now collaborates with human effort.
According to the Allen Institute for Artificial Intelligence (AI2), coding agents face a fundamental issue: most are closed-source, expensive to train, and difficult to study or adapt for private codebases. To address this, the institute has released the AI2 Open Coding Agent, a suite of tools designed to simplify the building and training of custom coding agents.
The first release in this series is named SERA, the Soft-Validated Efficient Repository Agent. It can resolve over 55% of the problems in the SWE-Bench validation benchmark, outperforming previous open-source models of comparable scale.
Every component of SERA is open, including the model, code, and its integration with Anthropic PBC's Claude Code. It can also be launched with just a single line of code, requiring no prior experience in training large language models.
Internally, SERA comes in two versions: SERA-32B and SERA-8B. The first is a 32-billion-parameter model offering robust performance on the SWE-bench validation, solving approximately 55% of problems in standard settings. This surpasses most open-source models like Qwen3-Coder and even some closed models like Mistral3's Devstral Small 2. The second version is an 8-billion-parameter model that solves 29.4% of SWE-Bench validation problems in a matched inference setting, compared to a 9.4% baseline from reinforcement learning agents. For instance, models like SkyRL-Agent-8B-v0 using the Qwen 3 8B model achieve 9.4%, while SERA-8B reaches a higher score.
AI2 employs specialized models, training on 8,000 synthetic trajectories per codebase. This approach consistently matches and often exceeds the performance of GLM-4.5-Air, a teacher model with over 100 billion parameters.
AI2 highlights a particularly promising result: smaller, fully open models can replicate or even surpass the performance of more powerful "teacher" coding agents. Through favorable specialization on specific codebases and fine-tuning at the 32-billion-parameter level, SERA can outperform some 100-billion-parameter generalist models while being only one-third the size. For deployment, this translates to a smaller memory footprint, lower computational costs, and significantly reduced expenses without compromising quality.
The total cost for AI2 to reproduce the main results on standard cloud hardware was approximately $400, which is about 100 times cheaper than many existing methods on the market.
The release includes everything developers and researchers need to quickly get started with reproducing, testing, and building upon SERA: lightweight deployment based on two lines of code for launching, deploying, and inference. There is also a setup script and inference optimizations enabling SERA to work with Claude Code.
AI2 states plans to continue improving and scaling to larger backbone networks using the same methodology. However, it emphasizes that the current process is already affordable and feasible, allowing anyone to run, customize, and iterate.