maintenance | Center for International Maritime Security

By LT P.J. Greenbaum and LT Vince Freschi

Introduction

Operational availability is the Nuclear Navy’s bread and butter, yet shipboard technicians are currently prevented from improving maintenance outcomes by an archaic data bottleneck. Scheduled maintenance to prevent system degradation or failure is planned years in advance and kept up-to-date utilizing detailed records boards which meticulously track maintenance completion. When a system fails or is degraded, corrective maintenance often requires hours of reading, symptom elaboration, and troubleshooting, before a component can be repaired. Fatigued watchstanders burn critical man-hours sifting through static technical libraries and disjointed databases to isolate casualty root causes – a complex, time consuming process akin to utilizing a card catalog to search tens of thousands of pages for a single sentence.

Consider an MMN2 (Machinist’s Mate, Nuclear) conducting a troubleshoot and repair of a noisy coolant pump. Currently, this Sailor must spend hours manually cross-referencing decades worth of material history logs with historical maintenance records, send manually-collected vibration analysis data off-ship for analysis by pricey contractors, and flip through multiple volumes of tech manuals, all to diagnose a possible problem that may or may not lie at the root cause of the issue. Generating a quality control package to repair the pump takes several hours, and generating the work authorizations and safety protocols (i.e. “tag-outs”) takes several more – all of which delays the time to repair for the pump, leaving critical propulsion plant components offline longer than needed. From the authors’ experience leading Sailors in the maintenance of nuclear systems, finding the right part alone can take hours, with many times a successful search becoming futile due to the part number turning obsolete. This results in an asset unavailable for tasking, reducing the operational force posture.

The solution to this issue is to incorporate existing technology to equip Sailors with the tools necessary to keep ships in the fight. Fortunately, the technology exists to provide locally hosted, air-gapped Small Language Models (SLMs) to enhance the nuclear propulsion plant troubleshooting process. By implementing a Retrieval-Augmented Generation (RAG) framework, Naval Reactors can transform its massive repositories of procedures, technical manuals, and material history into a live, up-to-date database of Reliable External Knowledge (REK). This approach will reduce manhours required for fault investigation, improve the accuracy of system troubleshooting, and result in reduced downtime for propulsion plant components.

The Structure: Coupling Small-Language Models with Retrieval Augmented Generation

Much attention has been garnered since the release of ChatGPT in late 2022 regarding the use cases for large language models (LLMs) – models that possess between 100 billion and 2 trillion parameters – which are able to digest large volumes of disaggregated data and generate human-like text in response.¹ While extremely powerful in their ability to synthesize and process loosely structured datasets, these machines require a high bandwidth connection to an off-ship data center – a non-starter for warships that frequently operate at EMCON in signal-denied environments. Even the Department of War’s release of GenAI.mil – a well-intentioned attempt at bringing generative AI capabilities directly to the warfighter – has limited applicability on a warship, where bandwidth and EMCON restrictions prevent its widespread use. Furthermore, LLMs can also be prone to “hallucinating” – irrelevant, incorrect, or misleading responses to queries – with potentially catastrophic consequences in a nuclear propulsion plant.²

The development and refinement of small language models (SLMs) possessing orders of magnitude fewer parameters (1 billion to 10 billion), however, allows for highly specialized, localized air-gapped models, capable of running on a single, high-functioning workstation – like a high-performance gaming laptop – which would be perfect for a rugged shipboard environment.³ These small-parameter SLMs would be most effective when coupled with a Retrieval-Augmented Generation (RAG) framework.⁴ This architecture replaces the model’s reliance on static, pre-trained memory with a dynamic system.⁵ By utilizing a local vector database to index every page of the ship’s electronic technical manuals – Reactor Plant Manuals (RPMs), Steam Plant Manuals (SPMs), machine-specific technical manuals, maintenance logs, and material history entries – the SLM no longer has to recall every answer from its training. All it has to do is find the answer within the verified, pre-uploaded documentation. RAG will enable it to return responses on the semantic meaning of queries and not the word-by-word match currently available with a “CTRL+F” search. This shift ensures that even a “small” 10 billion parameter model could deliver hull-specific, nuclear-grade technical guidance with a level of accuracy that matches or even exceeds its larger, data center-bound counterparts, all without the danger of hallucinations.⁶

The Nuclear Reliable External Knowledge Database

Key to the development of the vector database will be compiling and maintaining – with proper version control – a database of reliable external knowledge (REK) that the SLM can draw from when responding to Sailor queries. The widespread use of interactive electronic technical manuals (IETMs) like the Reactor Plant Manual and Steam Plant Manual onboard nuclear powered naval vessels simplifies the process of indexing the Nuclear Navy’s relevant data into a machine-readable format, but a machine which sees and understands procedural context – via the IETMs – without a thorough understanding of the system design bases – present in component technical manuals and procedural front-matter – runs the risk of providing inaccurate and incomplete recommendations. Therefore, in addition to the IETMs, legacy manuals for all propulsion plant components – like pumps and valves – as well as procedural guidelines – like the Joint Force Maintenance Manual (JFMM), the Radiological Controls for Ships, and the propulsion plant preventative maintenance system (PMS) – must be included in the REK database as well, allowing SLM recommendations to be pre-filtered through Nuclear Navy maintenance rules and regulations prior to arrival at the technician.

But the REK database cannot be static; it must be periodically updated to incorporate the latest feedback reports, Naval Reactors technical bulletins, and CASREPs , ensuring that the SLM maintains current hull and platform specific information. A monthly maintenance check, accomplished by uploading a “refresh” library – centrally created and made available via the Naval Reactors local area network – would allow for periodic updates in all but the strictest EMCON conditions. A critical advantage of incorporating fleet‑wide data is that equipment casualties encountered on one hull are often repeat iterations of known failure modes across other ships‑in‑class. By enabling Sailors to leverage prior diagnoses and corrective actions on sister platforms, the system reduces redundant troubleshooting, supports failure‑rate trending, and ultimately shortens equipment downtime.

Onboard the modern Ford-class engineering plant, the pre-existing automation and smart sensors open the doors to even more AI applications. Integrating the SLM with data log sets, vibrational analysis data, and material history, would allow the plant to be not only monitored through the eyes of its highly trained nuclear operators, but also through AI powered by the latest Silicon Valley advances. Coupling the SLM with Eulerian Video Magnification and installing fixed cameras on vital pumps and turbines would allow for early detection of failure on a minute-by-minute basis. The efficacy of preventative maintenance can be audited based on failure frequency and down time, empowering Sailors to continuously preserve the plant in the most efficient manner possible.

The Use Case

Consider once again the MMN2 from the introduction and the noisy reactor coolant pump. With a shipboard SLM co-pilot, the MMN2 could input the symptoms seen in a natural language prompt, accompanied by the equipment logs that preceded the failure along with the vibration data analysis for the pump. After a few seconds of “thinking,” the SLM would utilize its RAG framework to instantly pull the relevant drawings, scrape the log data for trends, and analyze the vibration data for potential failure mechanisms. But utilizing its live-updated REK library, the SLM could also flag CASREPs from other ships-in-class which noted similar failure modes. After a few more seconds, a quality assurance package could be generated in the ship-specific format, complete with all tools, parts, and materials required, required isolations for the work to be conducted, a step-by-step procedure for repair drawn from the relevant tech manual. But perhaps most importantly, each reference the SLM pulls its material from can be displayed on an adjacent window, with the MMN2 checking the SLMs recommendations each cited tech manual.

The Way Forward

The first step in deploying a shipboard RAG-equipped SLM at-scale must be accelerating the conversion of legacy technical manuals and drawings into machine-readable formats suitable for the creation of a vector database. Once all relevant technical publications, procedures, and best-practices have been compiled into a singular database (no small feat considering the Navy’s notorious compartmentalization), the data must be “chunked” into smaller sub-sections and embedded into a high-dimensional (due to computing restrictions, likely in the low hundreds of dimensions) numerical vector for retrieval by the model. Ensuring the model remains free of “data poisoning” – the injection of corrupted or incorrect data into a model’s training – will require a centralized organization, like Naval Reactors, to manage the training and development of the vector database. For optimal utility, shipboard technicians must be able to add data – like material history entries – into the vector database, although care must be taken to ensure that the model does not incorporate this potentially flawed shipboard data into its training phase. Fortunately, the recent establishment of the Propulsion Plant Local Area Network (PPLAN) technician school provides an avenue to train designated technicians on the procedures required to maintain the REK database. Designating these Sailors with a Naval Enlisted Classification (NEC) and designating those NECs as “critical” for billet-based distribution would ensure that ships have the requisite knowledge to always support their shipboard AI nodes onboard.

Next, the Navy must define the technical hardware specifications for a “Shipboard AI Node” that meets MIL-SPEC requirements for shock, vibration, and EMCON security. The Navy must prioritize maximizing video random access memory (VRAM) to maximize the processing power (the “intelligence”) of the SLM. Utilizing a laptop client already approved to handle classified information would save time and speed delivery of the system to the fleet.

Once the software and client have been paired together, the Nuclear Navy can utilize its already-existing training pipeline to begin integrating the system into maintenance and troubleshooting workflows. The Nuclear Power Training Units in Charleston, South Carolina and Ballston Spa, New York function as the perfect “proving ground” for the SLM in a fleet-like scenario. Treating the prototype rollout as a dynamic test – experimenting with model size, tokenization, chunking, and retrieval methods – would improve model performance and validate system functionality prior to fleet-wide deployment.

Lastly, and perhaps most importantly, Naval Reactors must establish a “Human-in-the-Loop” certification framework, ensuring that AI-assisted troubleshooting remains an advisory tool that works within the rigorous standards of the Nuclear Navy. Just as the SWO community still teaches its navigators paper charting at SURFNAV, despite the existence of well-proven Voyage Management Systems (VMS), the Nuclear Navy must work to ensure that any AI troubleshooting co-pilot utilized by its Sailors enhances their understanding of integrated plant operations, not replaces it. While some may argue that using such a co-pilot bypasses the deep-dive traditionally required to build system-wide expertise, a shipboard SLM acts as a learning accelerant, not as a crutch. By removing manual document searches, the tool allows technicians to spend more time synthesizing plant information and conducting high-level analysis – tasks that truly build systemic understanding – as opposed to spending hours searching through manuals for the “NIIN in a haystack.” When integrated into a “Human-in-the-Loop” framework, this technology ensures that a Sailor’s learning is grounded in the most accurate, cited technical data available. The stakes in nuclear reactor plant operations are simply too high to outsource critical thinking. This technology must be viewed as a force multiplier that sharpens a nuclear technician’s judgment and reinforces the culture of procedural compliance that defines the program.

Conclusion

The integration of SLMs equipped with RAG architecture into the nuclear propulsion environment represents a significant upgrade to the maintenance capabilities of the individual nuclear operators. Incorporating AI into the shipboard troubleshooting and maintenance workflow is a fundamental shift designed to reduce equipment downtimes. In a future conflict characterized by denied communications and long stints at sea, a ship’s ability to remain self-sufficient – by diagnosing and repairing its primary propulsion and electrical generation capacity without racing back to stateside contractors – could mean the difference between sustained operability and taking a capital ship out of the fight. Ultimately, the goal is clear: a propulsion plant where easily accessible, machine-readable data works as hard as the Sailor. The technology exists already; the Nuclear Navy’s leadership must just deploy it. By embracing SLMs and RAG architecture, the Navy’s most valuable and complex nuclear assets will remain mission-ready, even in the most contested environments.

LT Vincenzo Freschi was born in Olbia, Italy, to an American Navy Officer and an Italian chef, where he enjoyed life in the countryside with his Border Collies until he attended Penn State University. There he earned a degree in Nuclear Engineering with a minor in Military Studies, and commissioned as a Naval Officer in 2020. Following commissioning, he served aboard the USS Stockdale (DDG-106). In 2023 he completed the Navy Nuclear Power School and Prototype training pipelines before reporting to the USS Gerald R. Ford (CVN-78), the world’s largest aircraft carrier, where he worked as the RE DIVO. Now he serves as an NROTC instructor at Carnegie Mellon University while pursuing a master’s degree in Artificial Intelligence.

LT Paul (P.J.) Greenbaum grew up in Boiling Springs Pennsylvania and attended Princeton University on an NROTC scholarship, where he studied international relations, public policy, and African studies. He commissioned and pursued a follow-on Master’s degree at Tsinghua University in Beijing, China as a Schwarzman Scholar, where he studied international relations and Chinese language. He served onboard the USS Benfold (DDG-65) homeported in Yokosuka, Japan as the Electronic Warfare Officer, and he completed his follow-on nuclear propulsion training in Charleston, South Carolina. He reported to the USS Abraham Lincoln (CVN-72) in October 2023 and served for two years as the Reactor Mechanical division officer. He currently serves as an NROTC instructor at the University of California, Berkeley while pursuing a degree in nuclear engineering.

References

“A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness | ACM Transactions on Intelligent Systems and Technology.” Accessed December 21, 2025. https://dl.acm.org/doi/full/10.1145/3768165.

Agrawal, Prof. Pallavi. “Running LLMs Locally on Consumer Devices.” International Journal for Research in Applied Science and Engineering Technology 13, no. 4 (2025): 5433–41. https://doi.org/10.22214/ijraset.2025.69433.

Kandala, Savitha Viswanadh, Pramuka Medaranga, and Ambuj Varshney. “TinyLLM: A Framework for Training and Deploying Language Models at the Edge Computers.” arXiv:2412.15304. Preprint, arXiv, December 19, 2024. https://doi.org/10.48550/arXiv.2412.15304.

Lewis, Patrick, Ethan Perez, Aleksandra Piktus, et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv:2005.11401. Preprint, arXiv, April 12, 2021. https://doi.org/10.48550/arXiv.2005.11401.

Ruiz, Daniel C., and John Sell. “Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain.” arXiv:2410.20297. Preprint, arXiv, October 27, 2024. https://doi.org/10.48550/arXiv.2410.20297.

Shuster, Kurt, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. “Retrieval Augmentation Reduces Hallucination in Conversation.” arXiv:2104.07567. Preprint, arXiv, April 15, 2021. https://doi.org/10.48550/arXiv.2104.07567.

Featured Image: Sailors make repairs aboard the destroyer USS Halsey in the Arabian Sea in 2021. (U.S. Navy photo)

Last week an article came out about state-sponsored hacking that had nothing to do Edward Snowden or the NSA. Bloomberg News detailed the ongoing hacking of U.S. defense contractor QinetiQ. Two paragraphs in the piece particularly struck me:

“The [China-based] spies also took an interest in engineers working on an innovative maintenance program for the Army’s combat helicopter fleet. They targeted at least 17 people working on what’s known as Condition Based Maintenance, which uses on-board sensors to collect data on Apache and Blackhawk helicopters deployed around the world, according to experts familiar with the program.

The CBM databases contain highly sensitive information including the aircrafts’ individual PIN numbers, and could have provided the hackers with a view of the deployment, performance, flight hours, durability and other critical information of every U.S. combat helicopter from Alaska to Afghanistan, according to Abdel Bayoumi, who heads the Condition Based Maintenance Center at the University of South Carolina.”

A remote diagnostic system: safe and secure... — A remote diagnostic system: safe and secure…

While it’s unclear whether the hackers succeeded in accessing or exploiting the data, it is clear that they saw the information as valuable. And rightly so – systems such as condition based maintenance, remote diagnostics, and remote C2 systems are designed to reduce the workload burden on front-line “warfighters”, or the logistics burden on their platforms, by shifting the location of the work to be done elsewhere. This can also facilitate the use off-site processing power for more in-depth analysis of historical data sets and trends for such things as predicting part failures. The Army is not alone in pursuing CBM. The U.S. Navy has integrated CBM into its Arleigh Burke-class DDG engineering main spaces, meaning “ship and shore engineers have real maintenance data available, in real time, at their fingertips.”

However, the very information that enables this arrangement and the benefits it brings also creates risk. Every data link or information conduit created for the benefit of an operator means a point of vulnerability that can be targeted, and potentially exploited – whether revealing or corrupting potentially crucial information. This applies not only for CBM, but more dramatically for the C2 circuits for unmanned systems. I’m by no means the first to point out that CBM, et al, means tempting targets. UAV hacking has garnered a great deal of attention in the past year, but the Bloomberg article confirms an active interest exists in hijacking the enabling access of lower profile access points.

This raises several questions for CBM and remote diagnostics, not least of which is “is it worth it?” At what point does the benefit derived from the remote access become outweighed by the risks of that access being compromised? Given the sophistication of adversary hacking, should planners operate from the starting assumption that the data will be exploited and limit the extent of its use to non-critical systems? If operating under this assumption, should “cyber defense” attempts to protect this information be kept to a minimum so as not to incur unnecessary additional costs? Or should the resources be devoted to make the access as secure as the C2 systems allowing pilots to fly drones in Afghanistan from Nevada?

Scott is a former active duty U.S. Navy Surface Warfare Officer, and the former editor of Surface Warfare magazine. He now serves as an officer in the Navy Reserve and civilian writer/editor at the Pentagon. Scott is a graduate of Georgetown University and the U.S. Naval War College.

Note: The views expressed above are solely those of the authors and do not necessarily represent those of their governments, militaries, or the Center for International Maritime Security.

Center for International Maritime Security

Tag Archives: maintenance

Optimizing Reactor Plant Maintenance: The Case for Shipboard SLMs

Project Tango and Communicating the Problem

The Full Cost of Remote Diagnostics

Fostering the Discussion on Securing the Seas.