Compliance by Design II: LOAC in U.S. Software Provided for Autonomous Combat Systems
Editors’ note: This is the seventh post in a series dedicated to Lethal Autonomous Weapons Systems (LAWS) and the questions of human oversight and legal accountability under international humanitarian law. Previous posts have focused on LAWS, China, Russia, the United States, and Europe. The second part of the series examines defense industry practices. It analyzes how hardware developers, software providers, and manufacturers approach the integration of law of armed conflict requirements into autonomous combat systems in the United States, China, Europe, and Israel.
In February 2026, the U.S. Air Force (USAF) publicly validated the Autonomy Government Reference Architecture (A-GRA) across multiple Collaborative Combat Aircraft (CCA) platforms, marking progress in the service’s drive to field semi-autonomous systems at scale. Two months later, in April 2026, the USAF requested nearly $1 billion in procurement funding for the first tranche of CCA, meaning that software compliance choices will soon be tested in actual production systems.
Within the USAF’s emerging concepts of operation, CCA are envisioned as semi-autonomous platforms capable of acting as wingmen that fly alongside crewed platforms to conduct sensing, jamming, and strike missions. Framing autonomy and law of armed conflict (LOAC) reviewability at this operational level underscores that software design decisions will directly affect how human air crews exercise judgment in contested airspace.
Given the United States’ leading role in military aviation and autonomous systems development, these choices may influence the broader evolution of international norms and practices regarding the responsible use of such systems under LOAC.
The USAF has now moved beyond individual vendor stacks to institutionalize a government-owned modular framework that decouples mission autonomy software from specific airframes. This shift highlights software’s growing role as an enabling layer for autonomous combat capability, consistent with the planned establishment of a sub-unified command dedicated to autonomous warfare.
Consequently, the software layer is uniquely important for LOAC compliance. While hardware provides the physical platform, the autonomy stack ultimately determines whether a system can be shown to behave predictably “in some or all circumstances,” the standard required by U.S. weapons-review policy. The autonomy stack is made up of its algorithms, interfaces, runtime safeguards, and verification artifacts.
Private sector software providers are therefore no longer peripheral suppliers but active participants in “compliance by design.” These architectural choices are taking shape within the early stages of a competitive software ecosystem including human-on-the-loop supervisory controls; explainable decision outputs; runtime assurance monitors; and auditable simulation-to-flight data pipelines. Such choices directly shape the feasibility of Article 36-equivalent reviews. The directions taken now will significantly influence the extent to which autonomous platforms can be made legally reviewable at production scale, the practical ability of commanders to retain meaningful human judgment amid high-tempo operations, and ultimately the credibility with which the United States can demonstrate compliance with distinction, proportionality, and precautions in attack on tomorrow’s battlefields.
LOAC and the Software Layer
The core principles of LOAC govern software driven autonomy exactly as they govern any other means or method of warfare. The United States’ Article 36-equivalent weapons-review process, conducted under the U.S. Law of War Program, assesses whether a system’s design and operational profile enable compliance with LOAC rules even when tasks are delegated to software-based autonomy. U.S. policy documents frame the legal-review task as determining whether autonomous functions can operate within LOAC-compliant parameters under realistic operational conditions.
Effective Article 36-equivalent reviews turn critically on the software layer’s predictability, explainability, and traceability. By comparison, traditional weapons reviews have generally focused on hardware performance envelopes and relatively static software, whose behavior remained fixed once the weapon was fielded. Learning enabled and highly networked autonomy therefore demands far more extensive test, evaluation, and data-generation regimes if reviewers are to understand how a system will behave across the range of conditions in which commanders are likely to employ it.
Algorithmic opacity complicates the legal reviewer’s ability to determine whether a system will behave lawfully “in some or all circumstances.” Algorithmic opacity is often described as the black box problem. Also introducing complication are the emergent behaviors under electronic attack, the validation of over-the-air updates, and the difficulty of generating auditable decision logs. Runtime assurance frameworks are designed to monitor an autonomy stack during execution, constrain it within validated safety envelopes, and, when necessary, trigger reversionary modes or hand control back to a human operator. By bounding the system’s behavior in this way, they provide a concrete technical mechanism for meeting legal expectations of predictability and controllability.
As the previous post explained, private-sector developers have responded by embedding human-on-the-loop supervisory controls runtime assurance frameworks aligned with ASTM F3269-21, and operator-facing explainable interfaces directly into mission autonomy stacks. These software-centric design choices can reduce legal and operational friction that autonomy otherwise creates and thereby strengthen the feasibility of sustained commander accountability.
Such features align with DoD Directive 3000.09, which emphasizes human judgment, predictability, and appropriate levels of human supervision in autonomous capabilities. Yet human-on-the-loop configurations do not guarantee sustained meaningful human judgment, particularly in high-tempo and/or sustained operations where operators may supervise multiple autonomous assets simultaneously. Ultimately, effective oversight depends not only on interface design but also on training, task allocation, and operational concepts that preserve the commander’s ability to exercise distinction and proportionality consciously and deliberately.
Collins Aerospace Sidekick
One example is Collins Aerospace’s integration of its Sidekick Collaborative Mission Autonomy software with the CCA program. In February 2026, the software completed a four-hour semi-autonomous flight test on General Atomics Aeronautical Systems’ YFQ-42A platform. During the test, a human operator on the ground issued high level commands that Sidekick interpreted and executed while maintaining full supervisory control. The integration used the A-GRA to enable data exchange between the autonomy stack and the aircraft’s mission systems.
Sidekick operates as an open-systems solution that supports collaboration between human teams and autonomous platforms in combat air operations, and functions in a human-on-the-loop configuration in which the operator retains real time intervention authority and the software executes discrete tasks only within the bounds of that intent. This design enables the commander to apply distinction and proportionality judgments. The software presents fused sensor data and decision recommendations through an operator interface that may address explainability requirements identified in Article 36-equivalent assessments.
The test showed Sidekick’s compatibility with the USAF’s modular open-systems approach, allowing integration across platforms without vendor lock-in. This modularity produces traceable performance data, including simulation logs, hardware-in-the-loop results, and live-flight artifacts, that can be used to demonstrate system behavior under U.S. weapons-review policy. Collins’ focus on open architectures and operator-centric design thus provides one example of how software features can support LOAC compliance at the code and interface level.
Modular Open Architectures and the Broader Software Ecosystem
The A-GRA functions as the standardized framework for mission autonomy within the CCA program. It establishes common interfaces that allow autonomy software developed by different providers to operate across multiple platforms. The February 2026 validation confirmed that the architecture supports data exchange between autonomy stacks and aircraft mission systems without requiring vendor-specific modifications. Parallel experiments in other services with AI-enabled autonomy for future collaborative aircraft indicate that questions of software architecture, traceability, and legal reviewability will arise across the joint force rather than remaining confined to a single USAF program.
For example, Northrop Grumman’s Talon IQ testbed has been used to evaluate additional autonomy solutions. In April 2026, the Talon IQ performed a mid-flight switch between Applied Intuition’s Acuity Air Combat Autonomy software and Accelint’s mission autonomy software before returning to Northrop Grumman’s Prism software and the demonstration included a three-hour flight in which control transitioned dynamically among the three autonomy stacks while the aircraft continued mission-relevant maneuvers. These handoffs generated data on software performance, interface compatibility, and system behavior under realistic conditions.
The modular approach produces simulation logs, hardware-in-the-loop records, and live-flight artifacts that can be examined for traceability. Such records help determine whether autonomy software will behave consistently with U.S. weapons-review policy. The A-GRA is structured as a modular open systems approach that establishes a universal standard for mission autonomy and helps prevent vendor lock-in. This enables the USAF to integrate algorithms from both traditional and non-traditional providers without being tied to any single solution or platform. The architecture has been described as a means of fostering a competitive software ecosystem in which the most capable solutions can be deployed rapidly across compliant aircraft.
By decoupling mission autonomy software from specific airframes, the framework produces standardized interfaces and traceable performance data across multiple vendors, thereby reducing some of the longstanding barriers that proprietary systems have posed for Article 36-equivalent reviews.
These design choices illustrate a broader point under the LOAC. The law does not prohibit autonomy as such, but instead it requires that autonomous functions operate within clear legal and operational constraints. While States continue to differ on the precise level of human control required, there is broad recognition that rigorous weapons-review processes and careful system design are essential if autonomous systems are to satisfy LOAC obligations regarding distinction, proportionality, and precautions in attack.
Conclusion
These design efforts do not eliminate every difficulty. Operators can still fall victim to automation bias, over-relying on AI recommendations in ways that erode their independent exercise of distinction and proportionality. Many commercial software stacks depend on proprietary algorithms, which restrict the thoroughness of government scrutiny during Article 36-equivalent reviews. Validation grows particularly difficult in contested electromagnetic environments, where jamming and electronic warfare can disrupt sensor fusion and decision outputs. The rapid pace of over-the-air updates adds another layer of complexity to keeping auditable records current, while sustained operator training is essential if commanders are to retain meaningful human judgment amid the speed and scale of collaborative operations.
Nevertheless, the incorporation of human-on-the-loop controls, explainable interfaces, runtime assurance mechanisms, and modular architectures into mission autonomy stacks is generating traceable data that can support Article 36-equivalent weapons reviews by demonstrating system predictability “in some or all circumstances.” While these software design choices make LOAC compliance more feasible at production scale, they do not by themselves resolve the underlying doctrinal and operational challenges associated with autonomy and human control in armed conflict. Even so, continued progress in this area may contribute to the United States’ ability to field autonomous platforms that remain consistent with distinction, proportionality, and precautions under the LOAC.
***
Dr Gerald Mako is a Research Affiliate at the Cambridge Central Asia Forum at Cambridge University.
The views expressed are those of the author, and do not necessarily reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense.
Articles of War is a forum for professionals to share opinions and cultivate ideas. Articles of War does not screen articles to fit a particular editorial agenda, nor endorse or advocate material that is published. Authorship does not indicate affiliation with Articles of War, the Lieber Institute, or the United States Military Academy West Point.
Photo credit: Secretary of the Air Force Public Affairs
