A Critical Examination of ChatGPT in a Human-in-the-Loop

Clever Hans in the Loop A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis - 2025

Research Paper on Clever Hans in the Loop? A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis

Research Area: Machine Learning

Abstract:

This paper presents a first-of-its-kind evaluation of integrating Large Language Models (LLMs) within a Human-In-The-Loop (HITL) framework for risk analysis in machinery functional safety, adhering to ISO 12100. The methodology systematically addresses LLM limitations, such as hallucinations and lack of domain-specific expertise, by embedding expert oversight to ensure reliable and compliant outputs. Applied to four diverse industrial case studies—motorized gates, autonomous transport vehicles, weaving machines, and rotary printing presses—this study assesses the applicability of ChatGPT in routine risk analysis tasks central to machinery functional safety workflows, such as hazard identification and risk assessment. The results demonstrated substantial improvements: during HITL involvement and the subsequent iterations of risk assessment with expert feedback, a complete agreement with ground truth was achieved across all four use cases. ChatGPT also identified additional scenarios and edge cases, enriching the risk analysis. Efficiency gains were notable, with time efficiency rated at 4.95 out of 5, on average, across case studies. Overall accuracy (4.7 out of 5) and usability (4.8 out of 5) ratings demonstrated the robustness of the HITL framework in ensuring reliable and practical outputs. Likert scale evaluations reflected high confidence in the refined outputs, emphasizing the critical role of HITL in enhancing both trust and usability. The study also highlights the importance of prompt design, revealing that longer initial prompts improve accuracy, while shorter iterative prompts maintain usability without compromising efficiency. The iterative HITL process further ensures that refined outputs align with safety standards and practical requirements. This evaluation underscores the transformative potential of generative AI in functional safety workflows, enhancing routine activities while ensuring rigorous human oversight in safety-critical, regulated industries.

Keywords:
risk analysis; large language models (LLMs); human-in-the-loop (HITL); ChatGPT; functional safety; ISO 12100; ISO 13849; hazard identification; safety-critical systems

Author(s) Name: Padma Iyenghar

Journal name: Eng

Conferrence name:

Publisher name: MDPI

DOI: 10.3390/eng6020031

Volume Information: Volume: 6, (2025)

Paper Link: https://www.mdpi.com/2673-4117/6/2/31

Office Address

Social List

Clever Hans in the Loop A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis - 2025

Research Paper on Clever Hans in the Loop? A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis

Abstract:

S-Logix (OPC) Private Limited

Office Address

Clever Hans in the Loop A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis - 2025

Research Paper on Clever Hans in the Loop? A Critical Examination of ChatGPT in a Human-in-the-Loop Framework for Machinery Functional Safety Risk Analysis

Abstract:

Related Papers