[2602.15791] Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

[2602.15791] Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

arXiv - AI 4 min read Article

Summary

This article presents a novel approach to enhance building semantics preservation in AI model training using large language model encodings, demonstrating improved performance over traditional methods.

Why It Matters

The research addresses the critical need for accurate semantic representation in the architecture, engineering, construction, and operation (AECO) industry. By leveraging large language model embeddings, the study shows potential advancements in AI's ability to interpret complex building semantics, which can significantly impact model training and application in this sector.

Key Takeaways

  • Large language model encodings improve semantic comprehension in AI training.
  • The proposed method outperforms traditional one-hot encoding techniques.
  • LLM embeddings can effectively classify building object subtypes in BIMs.
  • Dimensionality reduction techniques enhance the utility of LLM encodings.
  • The findings suggest broader applications for AI in the AECO industry.

Computer Science > Artificial Intelligence arXiv:2602.15791 (cs) [Submitted on 17 Feb 2026] Title:Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings Authors:Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee View a PDF of the paper titled Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings, by Suhyung Jang and 3 other authors View PDF Abstract:Accurate representation of building semantics, encompassing both generic object types and specific subtypes, is essential for effective AI model training in the architecture, engineering, construction, and operation (AECO) industry. Conventional encoding methods (e.g., one-hot) often fail to convey the nuanced relationships among closely related subtypes, limiting AI's semantic comprehension. To address this limitation, this study proposes a novel training approach that employs large language model (LLM) embeddings (e.g., OpenAI GPT and Meta LLaMA) as encodings to preserve finer distinctions in building semantics. We evaluated the proposed method by training GraphSAGE models to classify 42 building object subtypes across five high-rise residential building information models (BIMs). Various embedding dimensions were tested, including original high-dimensional LLM embeddings (1,536, 3,072, or 4,096) and 1,024-dimensional compacted embeddings generated via the Matryoshka representation model. Experimental results demonstrated that LLM encodings...

Related Articles

Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min ·
Llms

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

A week or two ago, an open-source project called ATLAS made the rounds for scoring 74.6% on LiveCodeBench with a frozen 9B model on a sin...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

I'm on Claude Max (the $100/mo plan) and noticed something that surprised me. By Monday noon I had already used 40% of the 20x monthly li...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime