Cognitive Agents for the Next Era of Silicon

Marrying Reinforcement Learning with Agentic Reasoning to accelerate Physical Design (PD)

Traditional chip development is sharply divided into two domains: the frontend (system specification and RTL/VHDL coding) and the backend (translating that programmatic description into transistor-level GDSII data via Electronic Design Automation (EDA) tools).

Currently, only 40% of a chip's development lifecycle is spent on frontend code. With the rise of agentic coding tools like Cursor and GitHub Copilot, frontend generation is accelerating massively. However, the backend physical design (PD) remains a severely manual bottleneck, consuming nearly 60% of design time with minimal AI support. Traditional PD takes 3 to 5 months to complete.

This creates a massive organizational throughput blockage. As frontend engineers generate designs faster, the backend cannot keep up, resulting in quantifiable costs in engineering payroll, reliance on outsourced design services, and lost market opportunities.

The Evolution of EDA Convergence

Magma built the engine that fixed the physics of timing closure; ArchGen AI is building the driver that fixes the economics of it. The EDA industry is progressing through three distinct phases of convergence:

EDA Convergence Timeline - Three phases from Data to Optimization to Reasoning

PD Code Architecture: RL + Agentic LLM

ArchGen achieves Reasoning Convergence by marrying a brute-force RL optimization engine with a high-level cognitive LLM brain.

The Reinforcement Learning (RL) Engine

We look at macro placement as a sequential game which is modelled in the following manner:

The Long-Running Agent (The LLM Brain)

While RL is a brilliant optimization engine, it is fundamentally a black box. Left to its own devices, an RL agent will often pack macros entirely too tightly to artificially minimize HPWL, creating massive routing congestion that makes timing closure impossible in a real backend flow.

The long-running LLM agent acts as the orchestrator that governs the RL tool, adding context, memory, and physical constraints to the process.

By using the LLM to dynamically tune the RL reward weights, evaluate the physical outputs, and enforce deterministic legalizations, you eliminate the human iteration loop and achieve true autonomous workflow.

The PD Code Workflow: Step-by-Step

The system acts as a "Digital Senior Engineer" by breaking the massive physical design problem into distinct, manageable cognitive steps handled by specialized sub-agents.

Step 0: Initialization (Root Agent)
The human engineer provides a high-level natural language prompt (e.g., "Close timing on gcd at 2 GHz on nangate45"). The pd_orchestrator acts as the lead manager. It immediately uses Orchestrator Tools (initialize_run) to create a new session in the Supabase DB.

Step 1: The Planning Phase (planning_agent)
Before any code is executed, the orchestrator delegates to the Planning Agent. It reads the current target specifications from the Design Files (config.mk, constraint.sdc). It queries the Supabase DB (db_get_run_history) to look at past iterations. If it sees that a previous run failed due to routing congestion, it factors that into its new strategy. Output: a concrete parameter strategy to hit the 2 GHz target.

Step 2: The Execution Phase (runtime_agent)
Once a plan is formulated, the orchestrator hands it off to the Runtime Agent. It physically modifies the parameters in the Design Files (runtime_apply_config). It triggers the actual EDA compiler via the OpenROAD Flow (runtime_execute_flow) and monitors its progress (runtime_check_status).

Step 3: The Analysis Phase (analyzer_agent)
After the OpenROAD flow finishes (or fails), the orchestrator brings in the Analyzer Agent to interpret the raw results. It extracts WNS, TNS, power, and DRC metrics directly from the OpenROAD timing reports (analyzer_get_timing_metrics). It figures out why the design failed or succeeded (analyzer_get_violations) and writes its reasoning logs back to the Supabase DB.

Step 4: Results & Iteration
The pd_orchestrator takes the Analyzer's findings, uses Orchestrator Tools to log the completed iteration (record_iteration), and returns the final results and recommendations to the user. If the goal wasn't met, this loop repeats, utilizing the newly logged data to make a smarter plan for the next iteration.

PD Orchestrator Architecture - Step-by-step workflow from initialization to analysis

Benchmark Case Studies

Ariane RISC-V CPU (133 Macros)

The Ariane RISC-V CPU benchmark exposes the fundamental flaw in pure reinforcement learning (RL) and analytical placers: they over-optimize for proxy metrics at the expense of routability and timing.

The Result: PD agent virtually closed timing with a WNS of 0.05ns and a negligible TNS of -0.08ns. Meanwhile, both circuit_training and autodmp failed catastrophically, hemorrhaging over -5500ns of Total Negative Slack.

The Narrative: AlphaChip (circuit_training) achieved a visibly lower Half-Perimeter Wirelength (HPWL of ~5.78M vs PD Agent's ~6.87M). However, because RL agents blindly pack macros together to minimize wirelength, they create massive routing congestion that makes timing closure impossible in the actual backend flow. PD Code's reasoning loop understands that relaxing HPWL slightly gives the router breathing room, leading to a manufacturable design.

Benchmark results for Ariane RISC-V CPU - PD Agent vs circuit_training vs autodmp

BlackParrot Backend (bp_be_top)

This mid-sized design (10 macros) reinforces the system's ability to navigate tradeoffs on highly constrained dies. Closing a massive Worst Negative Slack (WNS) violation typically requires a severe penalty in power or area.

The Result: PD agent is the only method that achieved positive setup slack (0.02ns). The Google and AutoDMP baselines failed timing significantly (WNS ~-0.8ns).

The Narrative: Even on smaller designs where the die isn't heavily constrained, black-box optimization struggles to cross the finish line. PD Code successfully navigated the tradeoffs to produce a tapeout-ready macro placement while the established baselines required further human intervention.

Benchmark results for BlackParrot Backend - PD Agent vs baselines

BlackParrot Frontend (bp_fe_top)

This benchmark shows what happens when the design is relaxed enough that all algorithms successfully close timing.

The Result: All three methods met timing constraints (WNS > 0). However, PD agent delivered the lowest total power (26.00 mW) compared to the 27.2 mW generated by the baselines, while tying autodmp for the best HPWL.

The Narrative: When timing isn't the primary bottleneck, PD Code dynamically shifts its optimization strategy to squeeze out significant power savings (a ~4.4% reduction) without sacrificing area or wirelength.

Benchmark results for BlackParrot Frontend - PD Agent vs baselines

Conclusion

PD Code was deployed to optimize the backend of the Black Parrot RISC-V Processor:

ArchGen is building the Claude Code for physical design engineering. Through a simple terminal interface, a PD engineer provides a high-level description of their target, and our agentic system takes over—executing synthesis, floorplanning, placement, and routing in weeks rather than months.

If you love tinkering with chips, AI agents, and OpenClaw, or if you're like us and leave multiple Claude Code terminals running build workflows late into the night, we'd love to chat. Reach out to hari@archgen.tech or naveen@archgen.tech.

https://archgen.tech/


Disclaimer: This article was written with assistance from Gemini 3.1 pro and Claude Sonnet 4.5


Originally published on Substack.

Back to Blog