Israel-Hamas 2024 Symposium – AI-Based Targeting in Gaza: Surveying Expert Responses and Refining the Debate

by Klaudia Klonowska | Jun 7, 2024

The reported use of artificial intelligence-enabled decision-support systems (AI-DSS), in particular the Gospel and Lavender, by the Israel Defense Forces (IDF) in their military operations in Gaza has been controversial. Allegedly, these two systems are used to identify objects and human targets respectively, and as Israeli experts asserted, they are applied at the “very preliminary stages of the chain of creating, and authorizing, a military target.”

The use of AI-DSS in hostilities has been on the rise with new applications emerging also in the context of the war in Ukraine, for example in drone targeting operations and intelligence gathering and analysis. However, none of these applications has sparked as big an outcry as the recent use of AI-DSS by the IDF. Concern for the massive civilian harm inflicted on the population in Gaza and disagreements as to Israel’s fulfillment of its duties to take precautions in attack and mitigate civilian harm are chiefly responsible for this outcry. Experts are also concerned whether and how the use of AI systems has contributed to the effects of military operations on the civilian populations and what the obligation to take feasible precautions in attack means when leveraging these AI tools in targeting procedures.

In this post, I scope and review expert interventions that emerged in the wake of the IDF’s reported AI-DSS use with the aim to illuminate points of contention and, most importantly, highlight not what we say but how we frame the legal issues. While military professionals and technologies are under scrutiny, I observe that the nature of the AI systems renders invisible the crucial role of AI engineers. In conclusion, I suggest that it is important to avoid exalting either humans or AI when formulating legal arguments on the topic of military AI.

Two Types of Interventions

The scoping and reviewing of expert commentaries on the IDF’s reported uses of AI-DSS exposes two broad lines of interrogation: one focusing on the conduct of States and humans; and the second examining the types of weapons and tools that are considered permissible. The first group frames the inquiry around the human (e.g., agency, judgment, discretion), the benefits and limits of human interactions with technologies, and hence also the impacts of these interactions on the fulfillment of responsibilities and duties under international humanitarian law (IHL) and international criminal law (ICL).

The second group of interventions frames the inquiry around the technology. How does the design of the systems condition their possible uses and the conduct of hostilities? These latter questions are more aligned with the traditional interrogations of weapons law, such as those under the Convention on Certain Conventional Weapons (CCW), where the focus is on whether the design of a system per se may have indiscriminate effects or cause unnecessary suffering or superfluous injury. But this second form of interrogation also takes a notably broader look beyond instances of human-machine interactions and considers impacts that AI-DSS have on security, society, politics, and law.

Albeit related, these two types of approaches have different starting and end points that are worth considering. The first approach asks whether humans (whether or not they are using AI-DSS) fulfilled their responsibilities, among others to take precautions in attack and minimize civilian harm, whereas the second approach asks whether AI-DSS specifically have interfered with the duties to take precautions in attack and contributed to/exacerbated civilian harm. To be sure, both groups of questions are of crucial and unique importance. But I highlight below why we need more of the latter interrogation that looks at specific impacts of the AI-DSS in order to better understand and regulate the impacts of AI technologies on warfare.

The Human in Military AI Debates

As mentioned, the Gospel and Lavender qualify as AI-DSS. As opposed to autonomous systems, where the AI software moderates the behavior of its robotic parts, decision-support systems are designed to moderate how information is processed, filtered, and presented to their end users. Despite this conceptual distinction, it is important to remember that in practice software applications can be easily networked and repurposed in ways that blur the separation between AI autonomous systems and decision-support systems. For this reason, among others, the reports of the IDF’s use of the AI-DSS have echoed the discussions regarding the role of humans in targeting cycles.

Experts that have commented on the alleged use of these systems by the IDF in the ongoing military operations in Gaza have all taken different positions and interests in the topic. But a common theme has been the analysis of whether and how the timeframe of deliberation in targeting procedures impacts compliance with the existing legal norms. The +972 Magazine report quotes one of the interviewees who observed that “during the early stages of the war” there was as little as “20 seconds” devoted “to each target before authorizing a bomb.”

It is worth noting that at this time, the IDF has not disclosed or specifically responded to how much time it has taken to conduct a (legal) review of targets. Hence, there is a lack of clarity as to the time this procedure takes that has led experts to different interpretations. For example, Mimran and Dahan argue that the AI-DSS is only the first step of intelligence gathering and that precautionary measures are performed when the selection of a target “undergoes an additional separate examination and approval by several other authorities (operational, legal, and intelligence).” Moyes argues that a shortened timeframe of deliberation may mean that humans become “cogs in a mechanized process,” while Agenjo echoes these concerns by highlighting that it undermines the possibility of exercising a meaningful human control – thereby also linking to a broader debate in AI governance about the requirements for human control to be meaningful.

On the relationship between the timeframe of deliberation and law, Elliott comments that it is implausible that “well-informed, dispassionate legal determinations at that pace in a social milieu of nation-wide trauma and fury” could be made. Bo and Dorsey suggest that it is impossible to perform the duties required by the precautionary principle when there is no time to “delve deep into the target.” Notably for Bo and Dorsey’s argument, the outpacing of the human ability to take precautions is of concern “whether or not an AI-DSS is employed.” Relatedly, Hinds and Stewart add that “the increased tempo of decision-making” creates “additional risks to civilians,” and advise to exercise tactical patience.

Another common point of analysis in response to the IDF’s use of the AI technologies concerns the cognitive limits of human operators. Bo warns that humans develop an automation bias that leads them to “over-rely on systems which come to have too much influence over complex human decisions.” In contrast, Mimran et al. argue that in existing procedures within the IDF there is always a possibility that a commander will “choose to disregard the recommendations” of an AI-DSS, although Mimran also adds elsewhere that “under pressure” it is more likely that such AI recommendation would be accepted. This mirrors a discussion in the field that voices concerns about the extent to which AI-DSS contribute to undermining the human capacity to contest or refute AI-generated suggestions, with many noting that even training, education, and verification procedures may be insufficient to uphold independent human judgment in light of the cognitively impactful role of the AI-DSS (see here and here).

In addition to these challenges, there is also a related concern for the technological impacts of AI-DSS on human judgment and responsibilities, specifically in relation to system explainability. Although it is often discussed as a technical issue, the notion of explainability stems from concerns over the human capacity to understand the reasoning of a statistical model in a software application (though legally it is not an explicit requirement). In this regard, it has been argued that the likely lack of explainability of the Gospel and Lavender systems could “inhibit the [human] ability to minimize the risk of recurrent mistakes” and “is likely to impact the duty to conduct investigations into alleged breaches of IHL.” Relatedly, Shehabi and Lubin argue that a limited understanding of the machine-learning algorithm’s processes may exacerbate the inadequate supervision of intelligence processes and further erode the moral agency of a commander.

The interest in humans in the military AI debates is long-lived, with voices on both sides that are either frightened by or hopeful about the prospects of such technologies in mediating warfare experiences. Attitudes towards concerns over the displacement of human responsibilities often have to do with attitudes towards the human: whether one is optimistic about the human ability to exercise judgments properly; or, on the contrary, considers humans to have too many flaws to be able to properly act under the pressure of warfare. Optimists often forget about the human cognitive biases, while pessimists neglect that technologies are not error-free and that indeed they are designed and made by the same flawed and imperfect humans.

For lawyers, however, the concern over human judgment, agency, and expertise is also linked to a central legal preoccupation to ensure that, despite the use of force, individuals within these institutions are aware of their responsibilities in enforcing lawful limits to the conduct of hostilities and are accountable when those limits are not respected. Hence, questions such as: how do humans perform their duties in/on/out of the loop; what is the human’s unique role in targeting; and what kind of judgments should not be quantified or mechanized have all been of crucial importance, causing much ink to be spilled. Lastly, I grouped these interventions together as their starting point is the concern with human obligations under international law, while the next group of interventions start with the concern over technology and its unique impacts on warfare.

AI Characteristics and Military AI Debates

A growing body of literature is challenging assumptions that AI-DSS are a neutral or objective tool in targeting processes. These scholars are asking whether there is anything about AI-DSS specifically that makes their use more likely to lead to un/lawful decisions. Yet again, the answer to this question is neither straightforward nor devoid of controversies.

One of the major concerns over the specificity of AI technology is the speed of processing. Discussions on this topic mirror to a certain extent the above-mentioned concerns for the shortening timeframe of deliberation. However, those interventions whose starting point is the technology argue that the shorter timeframe is an outcome that follows specifically from the use of the AI-driven technologies. Most eloquently put by Schwarz and Renic, “AI-enabled targeting systems, fixed as they are to the twin goals of speed and scale, will forever make difficult the exercise of morally and legally restrained violence.” I also argue elsewhere that “the speed and volume of target recommendations introduce a climate of risk where recommendations are not to be ignored.” On the implications of these technologies for deterrence, Baggiarini suggests that the speed of AI systems can “make it easier to enter into a conflict.”

Another concern specific to AI technologies broadly but especially in high-risk contexts such as warfare is machine bias and false accuracy. As is common to all AI technologies, AI in the military domain produces biased outcomes too. Many scholars have intervened in the wake of the IDF’s AI-DSS controversy to demonstrate this characteristic. Bo and Dorsey posit that AI’s inherent programming errors mean not only that targets may be mistakenly missed (false negative IDs) but also that such systems can incorrectly identify targets (false positive IDs). To this end, Baggiarini argues that it is important to understand the limitations of technologies in a given context. For example, it must be appreciated that computer vision technologies perform worse in contested environments where concepts are not “objective, reasonably stable, and internally consistent” as is the case in the context of Gaza and warfare more broadly, given the dynamic nature of conflict.

Stewart and Hinds further highlight that the reality of networking AI software applications also means that machine biases and errors become compounded, noting, “a small inadequacy in the first algorithmic recommendation is fed into and skews a second algorithmic process, which feeds into a third, and so on.” Although here I would add, by echoing Bode, that “bias is inherent in society and thus it is inherent in AI as well”, which means that human errors and biases also contribute to the compounding effect, both at the stages of system development as well as through their uses.

The contribution of this type of analysis is going beyond the focus on a single human or even a single human-machine interaction and, instead, lies with acknowledging the broader trends of the multifaceted impacts that AI technologies have on military practices of intelligence gathering and analysis, situational awareness, target nomination, and course of action development. These accounts challenge the perspective that AI systems are “mere replacements” of humans that are there to automate one of the existing tasks. The perspective that AI are “mere replacements” may sometimes lead to the assumption that the tool used to analyze information is irrelevant, seeing as the outcomes of its use would arguably be the same whether or not the system was used to make assessments in targeting decisions.

However, this second group of expert interventions argues that AI are not mere tools but that they reshape how warfare is thought of, understood, and acted upon. And it is precisely this reshaping of warfare that so far has been analyzed in the abstract. But with the reported real-life uses of AI applications, it is time to formulate better questions and expand research that attends to the “realities of algorithmic warfare” (for inspiration see here).

Invisible Engineers

Lastly, I want to highlight that the reviewed interventions following the IDF’s use of the AI-DSS have all largely neglected the role of humans involved in the engineering and production of the AI-DSS. A simple (and somewhat legalistic) explanation of the focus on people in command centers and on the edges of the military operations could be that it is most important to address the role of those individuals in military positions. After all, they are the ones that the law makes responsible for the implementation of the legal principles in the conduct of hostilities. By no means do I mean to undermine their important duties under international law. Nonetheless, I would also argue that it is yet another unique characteristic of the AI systems that leads many to not necessarily forget, but to be unable to properly address the role of the engineers and developers. The characterization of AI systems as “black boxes” makes their “inner workings” invisible, and by extension invisibilizes the role of engineers in making those systems work. Meanwhile, system engineers make essential decisions about the appropriateness of the AI model, suitability of training data and its labels, and acceptability of system performance results.

For example, the choice of whose data are labeled as “Hamas operative” for the purposes of training the Lavender system to standardize what a Hamas operative’s pattern-of-life “looks like” is a crucial determination that will have an impact not only on the system performance but, when implemented as an AI-DSS, on the targeting decisions. Similarly, the decision whether a system’s performance of 90 percent accuracy in a testing environment is considered suitable enough to be relied upon in targeting decisions by military professionals is yet another “technical” but legally speaking crucial decision. Furthermore, if a system is being updated during use, system engineers likely continue to be involved throughout deployment in cleaning, organizing, and labeling data in order to update the machine-learning algorithms’ performance. Arguably, it is precisely this process of neglecting the role of engineers that leads to the “black box” problem, as some have argued it is often not expected from the engineers to carefully trace their choices and communicate their subjective judgements which exacerbates opacity and inscrutability of systems (see here and here).

I recognize that the role of engineers in military institutions (and with military private contractors) has always been veiled with a level of confidentiality, and as Dickinson observes, specifically the roles of AI engineers “are less visible and harder to define, their roles often intersect with intelligence which requires secrecy.” However, I also concur with her that their roles cannot be obfuscated, as their design choices can undermine international legal norms and how they are operationalized. We therefore need new ways of thinking of and addressing their role within the military targeting cycles, and that should be informed by the initial examination of whether, and if so why, their role is being invisibilized in the current uses of the AI-DSS systems in warfare.

Conclusion

This post’s review of interventions in the wave of the IDF’s use of AI-DSS has revealed common themes. The main concerns of informed experts center on the shortened timeframe of deliberation, the speed and scale of processing, and the compounded human-machine errors and biases. The principle of feasible precautions in attack emerges from these interventions as the main source of preoccupation amongst legal experts.

These interventions conclude that the fast-paced decision-making conditioned partly by the AI-DSS technologies coupled with the AI characteristics of a lack of explainability undermine the feasibility of precautions and, hence are also likely to exacerbate civilian harm. In reviewing these interventions, I have argued that more attention should be paid to the characteristics of AI systems and their influence not only on singular decisions and their legality, but the character of warfare more broadly. To do so, I encourage scholars of military studies to pursue further empirical research into the real AI impacts on contemporary battlefields, as it is high time we move away from abstract discussions of emerging AI’s futures and concentrate on those applications that have already emerged. In studying the realities of warfare, I have also noted that the role of engineers and developers of the military AI systems should not be overlooked, despite currently being unaccounted for under international law.

Lastly, as scholars continue to respond to current uses of AI systems on the battlefield, it is important to avoid romanticizing either the human or the technology in our analyses. We romanticize the human; an image of a rational subject, often sidelining not only the already well-studied cognitive biases but more importantly the violent histories of our species. We also romanticize the AI; an image of an objective and all-knowing, better-than-human robot. This perspective is not only unattainable but extremely flawed, as AI technology is always going to be programmed with limitations, and therefore yield errors and produce mistakes. AI systems are the result of their engineers’ input and therefore those that advocate for the use of AI to avoid relying on flawed humans often forget that the AI is also made by humans whose errors and biases are embedded in design choices and hence are reproduced iteratively when using AI systems in targeting. Therefore, when studying these phenomena, we may be encouraged to not only study how technologies enable violence, but also how they reproduce it in novel ways.

***

Klaudia Klonowska is a Ph.D. Candidate in International Law at the Asser Institute and the University of Amsterdam and a former Visiting Researcher with the Lieber Institute at West Point Military Academy.

Photo credit: Unsplash

The Legal Context of Operations Al-Aqsa Flood and Swords of Iron

by Michael N. Schmitt

October 10, 2023

–

Hostage-Taking and the Law of Armed Conflict

by John C. Tramazzo, Kevin S. Coble, Michael N. Schmitt

October 12, 2023

–

Siege Law and Military Necessity

by Geoff Corn, Sean Watts

October 13, 2023