By Stephen L. Dorton & Samantha Harper
With a combination of legitimate potential and hype, artificial intelligence and machine learning (AI/ML) technologies are often considered the future of naval intelligence. More specifically, AI/ML technologies promise to not only increase the speed of analysis, but also deepen the quality of insights generated from large datasets.1 One can readily imagine numerous applications for AI/ML in naval intelligence at tactical, operational, and strategic levels: threat detection and tipping and cueing from signals intelligence (SIGINT) or electronic intelligence (ELINT), target classification with acoustic intelligence (ACINT) or imagery intelligence, using AI/ML to predict enemy movements for anti-submarine warfare, and many others.
The government, industrial, and academic sectors will continue to work fervently on challenges such as data collection, preprocessing, and storage, the development of better algorithms, and the development of infrastructure for storage and compute resources. However, even the best performing AI/ML technologies are moot if the analyst or downstream decision maker cannot trust the outputs. Given the gravity of decisions driven by naval intelligence, AI/ML outputs must be readily interpretable, and not only provide the what, but the why (i.e. why the answer is what it is) and the how (i.e. how specifically the AI/ML arrived at the answer).
The Challenge: Trust in AI
To illustrate this challenge, consider the following hypothetical scenario: a watch supervisor on an aircraft carrier is doing pre-deployment qualifications off the coast of Virginia. After making a brief head call they come back to find out that one of their junior watchstanders has reported a dangerous, but unlikely, aerial threat to the Tactical Action Officer (TAO). After nearly putting the ship in general quarters, the TAO realized that based on the operating area, the considerable range from the threat, and other intelligence on the threat’s location, it was impossible for that threat to be there. Further inspection shows that the AI/ML in the system was programmed to automatically classify tracks as the most dangerous possible entity that could not be ruled out, but the junior watchstander was unaware of this setting. Unfortunately, the AI/ML did not explain why it classified the track as a high threat contact, nor did it explain what signatures or parameters it considered, nor how it generated a list of possible tracks, so the junior watchstander made a bad call based on an incomplete understanding of the AI system.
The problem is that this is not a purely hypothetical scenario, but is a real event that happened several years ago, as recounted during an ongoing study to investigate the role of trust and AI/ML in intelligence. While one may easily dismiss this and say “no harm, no foul,” that would be myopic. First, if this same scenario happened in contested waters or with a less experienced TAO, there could have been serious ramifications (such as another Vincennes incident, in which an Iranian airliner was shot down by a U.S. Navy cruiser). Second, this “boy who cried wolf” scenario caused the TAO to lose trust in the watchstander, the supervisor, and the entire section. Not only was the watchstander afraid to make decisive calls after the event, but it took nearly half of the deployment making correct calls and answering requests for information to regain the trust of the TAO. This lack of trust might have caused the TAO to hesitate to act on their reports if a real threat were to be identified. These kinds of delays and second guessing can cost lives.
This example highlights another dimension to the challenge facing employment of AI/ML in naval intelligence. The goal is not to simply develop systems that sailors and analysts trust as much as possible. Having too much trust in AI/ML can result in misuse of the system (e.g. immediately accepting its outputs without considering the other available intelligence). Conversely, having too little trust can result in disuse of the system (missing out on genuine benefits of the system). Therefore, the pressing challenge for the future of naval intelligence is to develop AI/ML capabilities that allow operators to rapidly develop and calibrate their trust to appropriate levels in the right contexts and scenarios, the same way they would with their human teammates.
What is Trust? What Affects It?
The experimental psychology community has studied trust for years, defining it as “the attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability.”2 In other words, trust is the degree to which one is willing to make oneself vulnerable, or put oneself in the hands of another agent (e.g. a person, or an AI/ML system). It is critical to understand what makes people gain or lose trust, as trust greatly impacts the adoption of new systems, and can make or break the performance in a human-machine team. This is especially challenging in the context of naval intelligence, where uncertainty and vulnerability are always present.
Designing AI/ML systems to engender trust is a complicated affair, due in no small part to what a complex and highly-dimensional phenomenon trust is. There are roughly a dozen factors that affect trust, including the following:3
- Reputation: The AI/ML has received endorsement or reviews from others.
- Usability: The AI/ML is easy to interact with.
- Predictability: The ability to predict the actions of the AI/ML.
- Security: The importance of operational safety and data security to the AI/ML.
- Utility: The usefulness of the AI/ML in a task.
- Goal Congruence: The extent to which the AI/ML’s goals align with the user.
- Reliability: The AI/ML is reliable and consistent in functioning over time.
- Understandability/Explainability/Feedback: The extent to which one can understand what the AI/ML is doing, why it is doing it, and how it is doing it, either implicitly or explicitly.
- Trialability: There is opportunity to interact with the AI/ML prior to accepting or adopting it for operational use.
- Job Replacement: There is concern about the AI/ML taking one’s job.
- Errors/ False Alarms: Information provided by the AI/ML does not contain errors or false alarms.
A Naturalistic Study of Trust, AI, and Naval Intelligence: Early Findings
We are currently conducting a study to test the factors and better understand how trust is gained or lost in the context of naval intelligence, using a naturalistic decision making approach. Naturalistic decision making is the study of how people use their experiences in naturalistic settings, rather than in a controlled laboratory environment.4 This approach allows us to understand how these factors affect trust and decision making in the chaos of real world operations, complicated by missing information and time pressure.
More specifically, we used the Critical Incident Technique, a structured and repeatable means to collect data on prior incidents to solve practical problems.5 We recruited participants who had experience in intelligence, including planning, collection, analysis, or even military decision making as an active consumer of intelligence. Those in naval intelligence had experience in different intelligence fields, including ACINT, SIGINT, ELINT, GEOINT, and all-source intelligence, although most of their experiences were in tactical intelligence or operations using AI/ML that exploits intelligence.
Participants were asked to identify an AI/ML technology they worked with in the context of intelligence, and then to think of any defining event (or series of events) that made them gain or lose trust in that technology. This resulted in a sample of nine stories about trust in AI/ML in the context of naval intelligence: four about gaining trust, and five about losing trust. These stories were similar to the earlier story about the junior watchstander reporting an impossible threat. A research team coded each story for the presence or absence of each trust factor, allowing insights to be gained from the data. So, what factors affected trust in AI/ML in naval intelligence?
Explainability and Utility are Paramount
Understandability/Explainability/Feedback was the most common factor in gaining or losing trust, which was found in eight of the nine examples. It was present in all five stories about losing trust, where a lack of explainability manifested itself in multiple ways. A lack of understanding how the AI/ML generated results prevented the captain of a ship from knowing if they could safely override navigation recommendations from a GEOINT tool. In another case, it prevented search and rescue planners from even knowing if there were errors or limitations in another GEOINT product: “they put garbage in and got garbage out… but our people didn’t understand the theory behind what the machine was doing, so they couldn’t find [the] errors [in the first place].” In stories about gaining trust, analysts said that understanding the underlying algorithms enabled them to trust the AI/ML, because even when the outputs were wrong, they knew why. This knowledge enabled a SIGINT collector to adapt their workflow based on their understanding of the strengths and weaknesses of their AI/ML system, capitalizing on its strengths (as a tipper) and mitigating its weaknesses (as a classifier), “ultimately I was happy with the system… it gave me good enough advice as a tipper that a human could have missed.”
Utility, or the usefulness of the AI/ML in completing tasks, was the second-most commonly cited factor in gaining or losing trust. It was present in three stories about gaining trust, and three stories about losing trust. Ultimately, if the AI/ML helps someone do their job successfully, then it is trusted, and the inverse is true if it makes success more difficult. As an all-source analyst said of one of their AI/ML tools, “it’s an essential part of my job… if I can’t use this tool it’s a mission failure for me.” Conversely, another all-source analyst lost trust in an AI/ML tool because its capabilities were so limited that it did not help them complete their tasking, “When I first heard of it I thought it was going to be useful… then I learned it was built on bad assumptions… then I saw the answers [it produced]…”
Other Findings and Factors
Reputation, or the endorsement from others was cited in half of the stories about gaining trust, but never as a factor in losing trust. Because of the immense interpersonal trust required in naval intelligence, endorsement from another analyst can carry significant weight, “the team was already using the tool and I trusted the team I was joining… that made me trust the tool a bit before even using it.” Interestingly, predictability of the AI/ML was not cited as a factor in gaining or losing trust. One participant seemed to explain that the operational domain is rife with uncertainty, so one cannot expect predictability in an inherently unpredictable environment, “I’m smart enough to know that the [AI/ML tools] are taking data and making estimates… the nature of submarine warfare is dealing with ambiguous information…”
Finally, errors and false alarms were cited in three of the five stories with a loss of trust in AI/ML, but were never cited as factors for gaining trust. It seems plausible that this may be because a lack of errors may manifest itself as utility or reliability (it functions consistently over time), or it could be because of the previous sentiment: there will always be errors in an inherently uncertain domain such as naval intelligence, so there is no reasonable expectation of error-free AI/ML.
AI/ML tools will become more ubiquitous in naval intelligence across a wide variety of applications. Several factors affect trust in AI/ML, and some naturalistic investigation identified factors, such as explainability and utility, that play a role in gaining or losing trust in these systems. Appropriately calibrated trust, based on an understanding of the capabilities and limitations of AI/ML, is critical. Even in cases where the AI/ML does not produce a correct answer, operators will adapt their workflows and reasoning processes to use it for the limited cases or tasks for which they do trust it.
Unfortunately, AI/ML capabilities are often developed with good intentions, but fall into disuse and fail to provide value if they do not consider the human element of analysis. Analyst reasoning and sensemaking is one such component of the human element,6 but trust is another component that must be considered in the development of these systems, particularly in regard to explainability. Greatly complicating the matter of trust, but not addressed adequately yet, is that AI/ML can be deceived.7 Our potential adversaries are well aware of this weakness, so developing an understanding of how our AI/ML systems can be deceived and ultimately protected from deception is crucial.
If an analyst were asked how they arrived at their findings and their response was simply “.79” the commander would likely not trust their findings enough to make a high-stakes decision from them, so why would that be acceptable output from AI/ML? Developing trustable AI/ML technologies is one of the greatest challenges facing the future of naval intelligence.
Steve Dorton is a Human Factors Scientist and the Director of Sonalysts’ Human-Autonomy Interaction Laboratory. He has spent the last decade conducting RDT&E of complex human-machine systems for the Navy and other sponsors. More recently, his research has focused on human interactions with AI/ML and applying crowdsourcing in the context of intelligence analysis.
Samantha Harper is a Human Factors Engineer in Sonalysts’ Human-Autonomy Interaction Laboratory, who has experience in the design, execution, analysis, and application of user-centered research across various technical domains, including intelligence analysis, natural language processing, undersea warfare, satellite command and control, and others.
This work was supported in part by the U.S. Army Combat Capabilities Development Command (DEVCOM) under Contract No.W56KGU-18-C-0045. The views, opinions, and/or findings contained in this report are those of the authors and should not be construed as an official Department of the Army position, policy, or decision unless so designated by other documentation. This document was approved for public release on 10 March 2021, Item No. A143.
 McNeese, N. J., Hoffman, R. R., McNeese, M. D., Patterson, E. S., Cooke, N. J., & Klein, G. (2015). The human factors of intelligence analysis. Proceedings of the Human Factors and Ergonomics Society 59th Annual Meeting, 59(1), 130-134.
 Lee, J. & See, K. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46, 50-80. 10.1518/hfes.126.96.36.199392.
 Siau, K. & Wang, W. (2018). Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal, 31, 2.
Muir, B. M. (1994). Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics, 37(11), 1905-1922.
Rempel, J. K., Holmes, J. G., & Zanna, M. P. (1985). Trust in close relationships. Journal of Personality and Social Psychology, 49(1), 95–112. https://doi.org/10.1037/0022-35188.8.131.52.
Balfe, N., Sharples, S., & Wilson, J. R. (2018). Understanding is key: An analysis of factors pertaining to trust in a real-world automation system. Human Factors, 60(4), 477–495.
Hoff, K. A., & Bashir, M. (2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57(3), 407–434.
 Klein, G. (2017). Sources of Power: How People Make Decisions (20th Anniversary Edition). Cambridge, MA: MIT Press.
 Flanagan, J.C. (1954). The Critical Incident Technique. Psychological Bulletin, 5, 327-358. doi: http://dx.doi.org/10.1037/h0061470
 Moon, B. M. & Hoffman, R. R. (2005). How might “transformational” technologies and concepts be barriers to sensemaking in intelligence analysis, Proceedings of the Seventh International Naturalistic Decision Making Conference, J. M. C. Schraagen (Ed.), Amsterdam, The Netherlands, June 2005.
 Brennan, M. & Greenstadt, R. (2009). Practical attacks against authorship recognition techniques. Proceedings of the Twenty-First Innovative Applications of Artificial Intelligence Conference, 60-65.
Featured image: Lt. Jon Bielar, and tactical action officer Lt. Paul O’Brien call general quarters from inside the combat information center during the total ship’s survivability exercise aboard the Ticonderoga-class guided-missile cruiser USS Antietam (CG 54). (U.S. Navy photo by Mass Communication Specialist 3rd Class Walter M. Wayman/Released)
2 thoughts on “Trustable AI: A Critical Challenge for Naval Intelligence”
Trust is built on relationships. People to People; person-to-person, and with AI we need to find ways that help foster a relationship between the person and the machine.
It should not require expensive, exhaustive research to determine that humans need to understand what makes software/AI or hardware tick before they can trust it.
Until humans know what’s going on under the hood, they can’t figure out if it’s running properly or not; if it’s been hacked, is malfunctioning, or is spitting out wrong answers due to poor programming or bad data. Furthermore, without understanding how it works, they can’t modify/manipulate it to produce more useful outputs.
AI and ML are not magical, mystical gods; they are ones-and-zeros coding created by fallible humans. We should be investing in “management of machines” courses that teach us basics of coding and how to shepherd machines and software as effectively as we do other humans.