Brian Christian’s The Alignment Problem: A Cautionary Tale to Proponents of LAWS

by | May 5, 2025

Alignment

Brian Christian’s The Alignment Problem: Machine Learning and Human Values is an accessible, comprehensive look under the hood of contemporary machine learning. The title suggests, and Christian states, “This book is about machine learning and human values: about systems that learn from data without being explicitly programmed, and about how exactly—and what exactly—we are trying to teach them.” Yet, the book is also about the biases that manifest as we seek to be more efficient, objective, and predictive with machine-aided decision-making. These biases present a formidable challenge for developers to engineer systems that coherently and consistently reflect the normative values that we, ourselves, struggle to articulate.

After reading The Alignment Problem, one might be convinced the profoundly complex mechanics and limitations inherent to algorithmic design, dataset selection, and deep reinforcement learning (RL) render such systems unfit to perform our most sensitive tasks autonomously. As human developers nudge, shape, and reward systems to do their bidding, the sprawling web of inputs, nodes, and layers within neural networks simultaneously obfuscates the path to the output. Should the output produce undesirable effects, the process of identifying the proximate cause of such effects, and by extension, determining who, if anyone, should be accountable for autonomous “decisions,” would too be a formidable challenge from a moral, legal, and technical perspective.

This post offers The Alignment Problem as a cautionary tale to those who serve in disciplines where the life, limb, or liberty of any person may be affected by this emerging technology, to include judge advocates across the military services. As the race to operationalize machine learning in a military context is already underway, and despite Christian steering a relatively wide berth from the subject (with the exception of a brief mention of “killer robots” in chapter nine), this review examines relevant portions of the book’s three sections as they apply to lethal autonomous weapon systems (LAWS), and cautions that a measured approach is necessary in this rapidly evolving field.

Style and Organization

Conversational and extremely well-researched, Christian’s prose finds coherence in machine learning’s non-linear history, the fundamentals common to systems in wide-ranging disciplines, and the behavioral conditions (both animal and human) that serve as analogs for machine learning principles. His tone is simultaneously that of an expert and neutral bystander, reflecting an intimate, practitioner-level grasp of machine learning while maintaining a healthy skepticism about its proper place in the world.

Christian’s cadence is flowing, his tone is welcomely varied, and his main points are tidy and succinct. He navigates technical detail with the clinical eye of a computer scientist, explains abstract concepts with the humble sophistication of a philosopher, and remains economical with his sentence structure and word selection, much like that of a poet—all fields in which Christian is classically trained. As the book progresses, Christian iteratively exposes the reader to a roster of italicized technical terms, allowing the reader to build confidence—and their machine learning-specific vocabulary—as they digest what may have seemed indigestible at the book’s outset.

Christian is measured and deliberate in where he delves into technical nuance, typically in instances of elegance or crudeness in system design or output. This approach allows the text to retain a technical flavor without being overly academic and invites the reader to engage in cross-functional application. Christian organizes The Alignment Problem into thoughtful categories that permit sorting critiques unique to one’s own discipline.

As discussed below, the book’s three sections, while explanatory at their core, offer different reasons why aligning model outputs with desirable human values—both of which are necessarily variable—is a challenging task to perform fairly, transparently, and consistently. In the context of LAWS development, this is especially true.

The Problem with “Prophecy”

Christian, through various anecdotes, offers that a foundational challenge for applied machine learning is “to get increasingly general purpose [artificial intelligence (AI)] systems to do what we want, particularly when what we want—and what we don’t want—is difficult to state directly or completely.” Because human values are, themselves, normative constructs that may reasonably mean different things to different people in different contexts, one can see the difficulty in “defin[ing]—in statistical and computational terms—the principles, rights, and ideals articulated by the law[]” that presumably undergird such values. As a workaround of sorts—and in the context of large language models (LLMs)—Christian explains that developers use “biologically inspired ‘neural network[s]’” to comb the dataset(s) used to train such systems “for correlations and connections between terms,” identifying patterns to make predictions, and using predictions to meaningfully guide model decisions.

Christian illustrates the fundamentals of neural network design through a description of the perceptron, a system that “contains a model architecture … a single artificial ‘neuron’ with four hundred inputs, each with its own ‘weight’ multiplier, which are then summed together and turned into an all-or-nothing-output.” Put another way, while the input data is presumably unmanipulated at capture (even if it reflects a human bias that exists in the sample for which the data is derived), the “weight multipliers” or “parameters,” are applied after-the-fact and do produce often heavily manipulated outputs via a human-designed training algorithm.

This process, Christian explains, is called “stochastic gradient descent,” where system developers provide feedback “by literal turning of physical knobs or simply the changing of numbers in software—to lower the error [rate].” In the context of LAWS development, the ability of the developer to spot a material error (especially if the execution, as opposed to the result, is undesirable), the extent to which a knob (real or proverbial) is turned to correct such an error, or even the ability to identify the variables for which a knob should be created at all, are themselves normative assessments, inviting variability where some users would likely prefer predictability.

Stochastic gradient descent also lends itself to the concept of “meaningful human control” during the employment of LAWS, though much of the academic discussion related to this topic area concerns where, when, and to what extent in the automated targeting process a human gatekeeper should be positioned and have control, not who the gatekeeper should be. This means the experience and aptitude of the human that renders the assessment, and turns the knob according to that assessment, is, too, variable, and perhaps dispositive, in LAWS’ final output. For this reason, a process that focuses solely on the size and quality of datasets, algorithmic design, and where in the process a human may intervene—as opposed to the criteria used to select the human(s) who must supervise and reinforce the machine learning—does not fully address all the critical aspects of the system.

Perhaps the greatest problem with prophecy, however, is whether machine learning is ever the appropriate tool for the next war. Christian, citing the University of California Berkeley’s Moritz Hardt, is directly on point: “A machine-learning model, trained by data, ‘is by definition a tool to predict the future, given that [the future] looks like the past … . That’s why it’s fundamentally the wrong tool for a lot of domains, where you’re trying to design interventions and mechanisms to change the world.’”

Accordingly, a war in the South China Sea would look different from the war in Ukraine, which looks different from the Global War on Terror, which looked different from the Vietnam War. The war that could hypothetically be fought in the South China Sea three years from now would presumably look different from the same war fifteen years from now, with both looking different from a war against the Axis of Resistance, which, too, could occur on varying timelines, if at all. The process, then, of constantly identifying, updating, and testing datasets, all while incorporating the latest emerging technologies, is not only impractical, but will almost always produce a dataset that is stale or obsolete, creating a next-war/machine-learning paradox.

“Agency” – Reinforcement Learning Models Struggle with What They Ought to Do

Assuming, however, that a dataset is temporally, factually, and contextually appropriate for the model it will eventually train, Christian explains that the goals and purpose of deep RL models are largely driven by “the maximization of the cumulative sum of a received scalar reward.” Put another way, the algorithms that steer deep RL decision-making promote reward-seeking “behavior” and are designed to indirectly shape outcomes via reward-inspired decisions. Christian, citing computer scientist John McCarthy, offers that deep RL models, in a way, “offer[] us a powerful, and perhaps even universal, definition of what intelligence is … ‘the computational part of the ability to achieve goals in the world.’”

That said, Christian notes that goals and model “decisions,” regardless of whether they produce high levels of “intelligence,” do not yield static end-states that exist “in a vacuum … Every decision we make sets the context in which our next decision will be made—and, in fact, it may change that context permanently.” Perhaps more importantly, even if human-made adjustments assist models in changing course or making the best of a newly charted course, deep RL models still cannot “tell us what we value, or what we ought to value.” Alas, the alignment problem.

Aside from the inability to incorporate desired human values regardless of context, Christian explains that each deep RL model decision is also not exclusively determined by the reward(s) the model is programmed to seek. Such decisions are also materially affected by the density or sparseness of rewards existing in the environment, with sparseness being more problematic. Christian demonstrates this point through a deep RL model programmed to outperform its human counterparts in various Atari video games. While the model outperformed the human testers in reward-dense games in which the algorithm encountered numerous reward-producing guideposts with which to chart its course, the model struggled with reward-sparse games—namely, Montezuma’s Revenge—failing to score a single point.

Explaining the results, Christian offers, “In an environment with so few rewards, a randomly exploring algorithm can’t get a toehold; it’s incredibly unlikely that by essentially wiggling the joystick and mashing buttons, it will manage to do all of the necessary steps to get that first reward. Until it does, it has no idea that it’s even on the right track.”

The lesson Montezuma’s Revenge teaches us, Christian contends, is that failure is an integral component of deep RL model-“learning,” in environments where: (1) failure is more likely, earlier, and more often in the model’s exposure to the environment; and (2) success requires “a huge number of things to go exactly right before the player gets any points at all.” But these points are perhaps not the appropriate reward to drive desirable results, even if maximizing points is the ultimate goal. Christian offers that shaping, or “additional incentive rewards to nudge the algorithm in the right direction[]”—like rewarding staying alive in Montezuma’s Revenge instead of rewarding points, for example—could be a solution.

However, Christian admits that structuring rewards in this way risks creating unintended “loophole[s] that the algorithm can exploit,” like the decision to stand idly in a safe location while reaping the benefits of the new reward structure but failing to accomplish anything, let alone the underlying goal. While idly standing may have been what the revised algorithm implicitly nudged the model to do, it is not what the model ought to do. This not only cuts against John McCarthy’s definition of intelligence, but also suggests the state of this technology as of 2020 (when The Alignment Problem was published) was not ripe for higher-stakes use autonomously. A human must still be in or on the loop to reinforce desirable decisions like not standing idly in perpetuity.

What a system ought to do, then, is largely driven by context. Often indiscernible from the operational environment alone, subtle context remains as important to system “behavior” as overt reward-producing guideposts. Efforts to reduce context to a quantitative value, to include making sense of its intersection with proportionality determinations and “confidence levels surrounding law of armed conflict (LOAC)] decisions” may seem necessary conceptually. However, doing so would force a military user to commit to specific evaluative criteria, the weight assigned to each distinct criterion, and the unknown downstream effects of the combination of the two, a practice the United States has avoided in the past to preserve legal and operational “maneuver space.”

In armed conflict, context will always include express or implied commander intent. This guidance serves as the lens through which military operators—and presumably LAWS—must view each tactical-level decision. Commander intent may be communicated through various mediums, at different classification levels, and from different geographical locations. It may be disseminated through phone calls, emails, in-person meetings, or more formal written documents, and may change every day, even during low-intensity conflict. The content of a single statement in a single phone call or email may amount to a tactical directive, context LAWS would obviously need to do a commander’s bidding.

This raises a rather large but practical question: how do programmers timely capture, interpret, quantify, and code the nuance and subjectivity of another person’s intent, test the resulting algorithm before operationalizing it, solicit and incorporate human feedback from various subject matter experts, and retest the system for viability? If the timing of this process is not operationally feasible, what corners would a military user be willing to cut—and what risk would the user be willing to assume—for LAWS to keep pace with the relevant combat intensity-level?

Assuming this process could be streamlined, the remaining risks may not be mitigatable through programming alone. What physical or geographical limitations must commanders emplace—like the use of operational boxes (“opboxes”) for example—to limit the number of legal regimes at play, reduce the risks inherent to near-border operations, or simply to yield more predictable results? At what point would such restrictions negate the underlying reason(s) for developing and using LAWS in the first place? If context is not gleanable from the operational environment alone, what good are models trained to make predictions based on observable context or predict context based on the information provided?

These questions are not rhetorical; they demand meaningful answers. Yet, other complicated questions remain, like how to quantify LOAC decisions which expressly contemplate the balancing of normative, non-numeric factors. The LOAC obligations relating to distinction and proportionality are particularly illustrative of this challenge. Distinction “obliges parties to a conflict to distinguish principally between the armed forces and the civilian population, and between unprotected and protected objects” (§ 2.5). Prior to using force, “[c]ommanders and other decision-makers must determine whether a potential target is a military objective based on the available information that is relevant to whether the potential target meets the applicable legal standard for a military objective” (§ 5.4.3.2). This standard is typically communicated via combatant command-level positive identification (PID) policies, laying the groundwork for what amounts to a quasi-judicial process.

Much like the rules of evidence, judge advocates assist commanders in assessing the weight one should afford intelligence during the targeting cycle, not merely whether the intelligence should be considered at all. Assuming LAWS could consistently ferret-out circular human reporting, assess and balance source reliability, and accurately translate the content and context of intercepted enemy communications, each piece of intelligence would need to be balanced against a standard of certainty informed by human experience and judgment—not a precise probability.

For the same reason lawyers balk at assigning numeric value to probable cause determinations in criminal cases, delegating such a task to an autonomous system designed to lawfully kill a person should give lawyers and commanders pause. When combined with additional context like commander intent, the sensitivities of peace negotiations, or a recent, highly publicized civilian casualty incident, assigning weighable values to “squishy” legal terms like “reasonableness” or “certainty” is high-stakes guesswork.

Proportionality determinations are similarly challenging. Under customary international law, “this principle creates obligations to refrain from attacks in which the expected harm incidental to such attacks would be excessive in relation to the concrete and direct military advantage anticipated to be gained … .” This balancing is often so “difficult and subjective” that “States have declined to use the term ‘proportionality’ in law of war treaties because it could incorrectly imply an equilibrium between considerations or suggest that a precise comparison between them is possible” (§ 2.4.1.2).

Two distinct actions that are otherwise factually identical could be deemed proportionate in one context and excessive in another, with the distinction resting solely on information not readily observable on the battlefield. For this reason, coding LOAC, and by extension, proportionality, is only half the problem, with coding context—to include fluid tactical, operational, and strategic-level considerations—the critically important other half. However, even if proportionality and context could be expressed numerically, whose values, judgment, and legal interpretation would be imbued into the algorithm that marries the two?

Should the military user eventually design systems with low enough failure rates to be operationally feasible, the path to such a point, in light of the mechanics of stochastic gradient descent and deep RL, would require failure, and a lot of it. For air-to-surface LAWS, failure means undesirable, real-world kinetic strike execution or outcomes—like the death or serious bodily harm of non-combatants at an unreasonably large scale—raising significant, and perhaps insurmountable, legal and ethical concerns. Christian finds scenarios such as these ripe for the following “precautionary principle: for systems to be designed to err against taking ‘irreversible’ or ‘high-impact’ actions in the face of uncertainty.” For the United States to credibly maintain its moral footing and responsibility on the world stage, embracing such a principle in the context of air-to-surface LAWS would seem wise.

The Ubiquity of “Normativity”

In The Alignment Problem’s third and final section, the normative concepts of imitation, inference, and uncertainty emerge as both pervasive complicators of model design and the domains most likely to produce the alignment problem’s long-term solution(s). Christian’s description of imitation suggests machine learning and imitation are effectively two sides of the same coin, with “overimitation” (or mimicking actions not relevant to the underlying purpose of the learning) being of particular concern. This is the case not because the practice results from the negligence of a model developer, but because the practice results from entirely “reasonable, sophisticated insight based on imagining the demonstrator as making rational choices and performing the action as easily and efficiently as possible.”

Christian contends that while humans, and even toddlers, are “acutely sensitive to whether the [teacher] demonstrating something is deliberately teaching them, or just experimenting,” the teachers who represent themselves as an expert but who perform unnecessary actions in the course of doing the teaching, can cause the child or student to mimic such unnecessary actions.

For purposes of LAWS development and use, the lack of training standardization and operational execution among different combat functions, in different theaters, with different missions, and varying levels of commander and judge advocate experience in combat, can lead to varying results on the battlefield. This may prompt the imitation of actions in one theater that have been rooted out as ineffective in another, yielding what may be an “overimitation” in one context, and perhaps deemed reasonable “best practice” in another.

Conclusion

The Alignment Problem is an instructive, appealing read for judge advocates regardless of service or portfolio. In addition, it is a cautionary note for those racing to unleash this technology into the world’s most highly contested skies and seas, and a call to focus more intently on how the world around us, and what we are seeking to achieve, are represented and translated into these models.

The Alignment Problem does what current law and doctrine are unable to do. It dives deeply into machine learning’s technical limitations and speaks frankly about what we do not know or understand. In today’s world, where a single “bad strike” can yield a negative strategic impact, more so than a “good strike” can yield a positive one, the large failure rates one can expect to accompany LAWS during RL may come at too high a cost, and for a gain that has yet to be persuasively articulated.

***

MAJ Benjamin Zeskind is a national security law attorney and former Deputy Regimental Judge Advocate for the 75th Ranger Regiment.

The views expressed are those of the author, and do not necessarily reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense.

Articles of War is a forum for professionals to share opinions and cultivate ideas. Articles of War does not screen articles to fit a particular editorial agenda, nor endorse or advocate material that is published. Authorship does not indicate affiliation with Articles of War, the Lieber Institute, or the United States Military Academy West Point.

 

 

 

 

Photo credit: Unsplash, W. W. Norton & Company