[2602.10149] Exploring Semantic Labeling Strategies for Third-Party

[2602.10149] Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires

arXiv - AI March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.10149: Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires

Computer Science > Cryptography and Security arXiv:2602.10149 (cs) [Submitted on 9 Feb 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires Authors:Ali Nour Eldin, Mohamed Sellami, Walid Gaaloul, Julien Steunou View a PDF of the paper titled Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires, by Ali Nour Eldin and Mohamed Sellami and Walid Gaaloul and Julien Steunou View PDF HTML (experimental) Abstract:Third-Party Risk Assessment (TPRA) is a core cybersecurity practice for evaluating suppliers against standards such as ISO/IEC 27001 and NIST. TPRA questionnaires are typically drawn from large repositories of security and compliance questions, yet tailoring assessments to organizational needs remains a largely manual process. Existing retrieval approaches rely on keyword or surface-level similarity, which often fails to capture implicit assessment scope and control semantics. This paper explores strategies for organizing and retrieving TPRA cybersecurity questions using semantic labels that describe both control domains and assessment scope. We compare direct question-level labeling with a Large Language Model (LLM) against a hybrid semi-supervised semantic labeling (SSSL) pipeline that clusters questions in embedding space, labels a small representative subset using an LLM, and propagates labels to remaining questions...

Originally published on March 05, 2026. Curated by AI News.

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min · about 4 hours ago

Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min · about 6 hours ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · about 15 hours ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · about 15 hours ago

[2602.10149] Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires

About this article

Related Articles

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

No comments

Stay updated with AI News