LLM 관련 주요 논문 - 2026-02-18

1. Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings


2. This human study did not involve human subjects: Validating LLM simulations as behavioral evidence


3. Recursive Concept Evolution for Compositional Reasoning in Large Language Models


4. PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra


5. CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving


6. Quantifying construct validity in large language model evaluations


7. GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27–May 1, 2026, Bergen, Norway


8. Improving LLM Reliability through Hybrid Abstention and Adaptive Detection


9. World-Model-Augmented Web Agents with Action Correction


10. AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents


11. EAA: Automating materials characterization with vision language model agents


12. Secure and Energy-Efficient Wireless Agentic AI Networks


13. Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs


14. Panini: Continual Learning in Token Space via Structured Memory


15. Protecting Language Models Against Unauthorized Distillation through Trace Rewriting


16. ResearchGym: Evaluating Language Model Agents on Real-World AI Research


17. CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing


18. Decision Quality Evaluation Framework at Pinterest


19. The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety


20. ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models


21. Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation


22. A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models


23. Revisiting Northrop Frye’s Four Myths Theory with Large Language Models


24. Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections


25. STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens


26. The geometry of online conversations and the causal antecedents of conflictual discourse


27. VLM-DEWM: Dynamic External World Model for Verifiable and Resilient Vision-Language Planning in Manufacturing


28. Improving MLLMs in Embodied Exploration and Question Answering with Human-Inspired Memory Modeling


29. SecCodeBench-V2 Technical Report


30. Logit Distance Bounds Representational Similarity


31. ActionCodec: What Makes for Good Action Tokenizers


32. Orchestration-Free Customer Service Automation: A Privacy-Preserving and Flowchart-Guided Framework


33. Far Out: Evaluating Language Models on Slang in Australian and Indian English


34. GMAIL: Generative Modality Alignment for generated Image Learning


35. Automated Multi-Source Debugging and Natural Language Error Explanation for Dashboard Applications


36. NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering


37. Fine-Tuning LLMs to Generate Economical and Reliable Actions for the Power Grid


38. Prescriptive Scaling Reveals the Evolution of Language Model Capabilities


39. Unforgeable Watermarks for Language Models via Robust Signatures


40. On Surprising Effectiveness of Masking Updates in Adaptive Optimizers


41. Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs


42. Visual Persuasion: What Influences Decisions of Vision-Language Models?


43. How to Train Your Long-Context Visual Document Model


44. Automatically Finding Reward Model Biases


45. Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems


46. OpaqueToolsBench: Learning Nuances of Tool Behavior Through Interaction


47. Weight space Detection of Backdoors in LoRA Adapters


48. ScrapeGraphAI-100k: A Large-Scale Dataset for LLM-Based Web Information Extraction


49. Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration


50. Indic-TunedLens: Interpreting Multilingual Models in Indian Languages


51. CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis


52. EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research


53. LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets