[2602.20202] Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
Summary
This paper evaluates the reliability of digital forensic evidence identified by large language models (LLMs), proposing a structured framework for artifact extraction and validation.
Why It Matters
As AI technologies become integral to forensic investigations, ensuring the reliability of AI-generated evidence is crucial for legal integrity. This study addresses significant challenges in digital forensics, providing a methodology that enhances accuracy and traceability, which is vital for law enforcement and legal proceedings.
Key Takeaways
- The proposed framework automates forensic artifact extraction and validation.
- Achieved over 95% accuracy in artifact extraction from a large dataset.
- Utilizes a Digital Forensic Knowledge Graph to enhance evidence reliability.
- Addresses challenges of credibility and integrity in AI-assisted digital forensics.
- Supports chain-of-custody adherence and contextual consistency in forensic relationships.
Computer Science > Cryptography and Security arXiv:2602.20202 (cs) [Submitted on 22 Feb 2026] Title:Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study Authors:Jeel Piyushkumar Khatiwala, Daniel Kwaku Ntiamoah Addai, Weifeng Xu View a PDF of the paper titled Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study, by Jeel Piyushkumar Khatiwala and 2 other authors View PDF HTML (experimental) Abstract:The growing reliance on AI-identified digital evidence raises significant concerns about its reliability, particularly as large language models (LLMs) are increasingly integrated into forensic investigations. This paper proposes a structured framework that automates forensic artifact extraction, refines data through LLM-driven analysis, and validates results using a Digital Forensic Knowledge Graph (DFKG). Evaluated on a 13 GB forensic image dataset containing 61 applications, 2,864 databases, and 5,870 tables, the framework ensures artifact traceability and evidentiary consistency through deterministic Unique Identifiers (UIDs) and forensic cross-referencing. We propose this methodology to address challenges in ensuring the credibility and forensic integrity of AI-identified evidence, reducing classification errors, and advancing scalable, auditable methodologies. A comprehensive case study on this dataset demonstrates the framework's effectiveness, achieving over 95 percent ...