[arXiv Digest] 2025-07-24

1. Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations

Authors: Zhao Song, Song Yue, Jiahao Zhang
URL: https://arxiv.org/abs/2507.17699
요약 (영문): large language models are a central focus in today’s large language model (LLM) research . models are designed to output a step-by-step thinking process before arriving at a final answer to handle complex reasoning tasks . this thinking process may not actually enhance reasoning ability .
요약 (한글): 대규모 언어 모델은 오늘날 대규모 언어 모델(LLM) 연구의 중심입니다. 모델은 복잡한 추론 작업을 처리하기 위해 최종 답변에 도달하기 전에 단계별 사고 과정을 출력하도록 설계되었습니다. 이 사고 과정은 실제로 추론 능력을 향상시키지 않을 수 있습니다.

2. Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks

Authors: Ilias Chatzistefanidis, Navid Nikaein
URL: https://arxiv.org/abs/2507.17695
요약 (영문): autonomous agents are expected to play a vital role in the evolution of 6G networks . this shift facilitates the transition from a specialized intelligence approach where artificial intelligence algorithms handle isolated tasks . agents possess broader reasoning capabilities and can manage diverse network fun .
요약 (한글): 자율 에이전트는 6G 네트워크의 진화에 중요한 역할을 할 것으로 예상됩니다. 이러한 변화는 인공지능 알고리즘이 고립된 작업을 처리하는 전문화된 지능 접근 방식에서 전환을 촉진합니다. 에이전트는 더 광범위한 추론 능력을 보유하고 다양한 네트워크 재미를 관리할 수 있습니다.

3. Simulating multiple human perspectives in socio-ecological systems using large language models

Authors: Yongchao Zeng, Calum Brown, Ioannis Kyriakou, Ronja Hotz, Mark Rounsevell
URL: https://arxiv.org/abs/2507.17680
요약 (영문): to enable alternative, simulation-based exploration of different stakeholder perspectives, we develop the HoPeS (Human-Oriented Perspective Shifting) modelling framework . users can step into the agent roles to experience perspectival differences .
요약 (한글): 다양한 이해관계자의 관점에 대한 시뮬레이션 기반의 대안적 탐색을 지원하기 위해 유니티는 인간 중심의 관점 전환(HoPeS) 모델링 프레임워크를 개발했습니다. 사용자는 에이전트 역할에 들어가 관점의 차이를 경험할 수 있습니다.

4. Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

Authors: Xinyao Liu, Diping Song
URL: https://arxiv.org/abs/2507.17539
요약 (영문): multimodal large language models (MLLMs) demonstrate significant potential in the field of medical diagnosis . however, they face critical challenges in specialized domains such as ophthalmology, particularly the fragmentation of annotation granularity .
요약 (한글): 다중 모드 대규모 언어 모델(MLLM)은 의료 진단 분야에서 상당한 잠재력을 보여주지만 안과와 같은 전문 영역, 특히 주석 세분화의 파편화라는 중요한 과제에 직면해 있습니다.

5. An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models

Authors: Haoran Sun, Zekun Zhang, Shaoning Zeng
URL: https://arxiv.org/abs/2507.17477
요약 (영문): large language models have demonstrated remarkable progress in instruction following and general-purpose reasoning . but achieving high-quality alignment with human intent and safety norms without human annotations remains a fundamental challenge .
요약 (한글): 대규모 언어 모델은 인스트럭션 추종 및 범용 추론에서 괄목할 만한 진전을 보였지만, 사람의 주석 없이 사람의 의도 및 안전 규범과 고품질로 일치시키는 것은 여전히 근본적인 과제로 남아 있습니다.

6. Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

Authors: Shitong Zhu, Chenhao Fang, Derek Larson, Neel Reddy Pochareddy, Rajeev Rao, Sophie Zeng, Yanqing Peng, Wendy Summer, Alex Goncalves, Arya Pudota, Herve Robert
URL: https://arxiv.org/abs/2507.17289
요약 (영문): compliance brain assistant (CBA) is a conversational, agentic AI assistant designed to boost the efficiency of compliance tasks for personnel in enterprise environments . we design a user query router that can choose between (i) FastTrack mode: to handle simple requests that only need additional relevant context retrieved from knowledge corpora .
요약 (한글): 컴플라이언스 브레인 어시스턴트(CBA)는 기업 환경의 직원을 위한 컴플라이언스 업무의 효율성을 높이기 위해 설계된 대화형 AI 어시스턴트로, (i) 패스트트랙 모드: 지식 코퍼라에서 검색된 추가 관련 맥락만 필요한 간단한 요청을 처리하는 사용자 쿼리 라우터를 설계합니다.

7. Agent Identity Evals: Measuring Agentic Identity

Authors: Elija Perrier, Michael Timothy Bennett
URL: https://arxiv.org/abs/2507.17257
요약 (영문): central to agentic capability and trustworthiness of language model agents is the extent they maintain stable, reliable identity over time . however, LMAs inherit pathologies from large language models (LLMs) which can undermine their identifiability, continuity, persistence and consistency by interfering with their agentic capab .
요약 (한글): 언어 모델 에이전트의 에이전트 역량과 신뢰성의 핵심은 시간이 지나도 안정적이고 신뢰할 수 있는 정체성을 유지하는 정도입니다. 그러나 LMA는 에이전트 역량을 방해하여 식별성, 연속성, 지속성 및 일관성을 약화시킬 수 있는 대규모 언어 모델(LLM)의 병리 현상을 상속받습니다.

8. Improving LLMs’ Generalized Reasoning Abilities by Graph Problems

Authors: Qifan Zhang, Nuo Chen, Zehua Li, Miao Peng, Jing Tang, Jia Li
URL: https://arxiv.org/abs/2507.17168
요약 (영문): large language models have made remarkable strides in reasoning tasks . but their performance often falters on novel and complex problems . we pioneer the use of Graph Problem Reasoning (GPR) to enhance the general reasoning capabilities of LLMs.
요약 (한글): 대규모 언어 모델은 추론 작업에서 괄목할 만한 발전을 이루었지만, 새롭고 복잡한 문제에서는 종종 그 성능이 흔들리는 경우가 많습니다. 저희는 LLM의 일반적인 추론 능력을 향상시키기 위해 그래프 문제 추론(GPR)을 선도적으로 사용하고 있습니다.

9. HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study

Authors: Mandar Pitale, Jelena Frtunikj, Abhinaw Priyadershi, Vasu Singh, Maria Spence
URL: https://arxiv.org/abs/2507.17118
요약 (영문): the architecture of recent autonomous systems is trending toward end-to-end (E2E) monolithic architectures such as large language models (LLMs) and vision language models .
요약 (한글): 최근 자율 시스템의 아키텍처는 대규모 언어 모델(LLM) 및 비전 언어 모델과 같은 엔드투엔드(E2E) 모놀리식 아키텍처를 지향하는 추세입니다.

10. Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Authors: Linbo Cao, Jinman Zhao
URL: https://arxiv.org/abs/2507.17747
요약 (영문): we propose a debate-driven evaluation paradigm that transforms any existing QA dataset into structured adversarial debates . one model is given the official answer to defend, and another constructs and defends an alternative answer .
요약 (한글): 기존의 모든 QA 데이터 세트를 구조화된 적대적 토론으로 변환하는 토론 중심 평가 패러다임을 제안합니다. 한 모델에는 방어할 공식 답변이 주어지고 다른 모델은 대안 답변을 구성하고 방어합니다.

11. Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Authors: Anisha Gunjal, Anthony Wang, Elaine Lau, Vaskar Nath, Bing Liu, Sean Hendryx
URL: https://arxiv.org/abs/2507.17746
요약 (영문): many tasks lack a single, unambiguous ground truth-making it difficult to define reliable reward signals . traditional preference-based methods offer a workaround, but they rely on opaque reward functions that are difficult to interpret and prone to spurious correlations .
요약 (한글): 많은 작업에는 명확하고 단일한 근거가 없어 신뢰할 수 있는 보상 신호를 정의하기 어렵습니다. 기존의 선호도 기반 방법은 해결책을 제시하지만, 해석하기 어렵고 잘못된 상관관계가 발생하기 쉬운 불투명한 보상 함수에 의존합니다.

12. AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer

Authors: Danny D. Leybzon, Shreyas Tirumala, Nishant Jain, Summer Gillen, Michael Jackson, Cameron McPhee, Jennifer Schmidt
URL: https://arxiv.org/abs/2507.17718
요약 (영문): quantitative survey researchers can scale quantitative studies by using AI to conduct phone interviews . voice AI enables a more natural and adaptive respondent experience as iVR .
요약 (한글): 정량적 설문조사 연구자는 AI를 사용하여 전화 인터뷰를 수행함으로써 정량적 연구를 확장할 수 있습니다. 음성 AI는 iVR처럼 보다 자연스럽고 적응력 있는 응답자 경험을 가능하게 합니다.

13. From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

Authors: Karen Zhou, John Giorgi, Pranav Mani, Peng Xu, Davis Liang, Chenhao Tan
URL: https://arxiv.org/abs/2507.17717
요약 (영문): existing automated metrics often fail to align with real-world physician preferences . to address this, we propose a pipeline that distills user feedback into structured checklists for note evaluation .
요약 (한글): 기존의 자동화된 지표는 실제 의사의 선호도와 일치하지 않는 경우가 많습니다. 이를 해결하기 위해 사용자 피드백을 구조화된 체크리스트로 추출하여 노트 평가를 위한 파이프라인을 제안합니다.

14. CASCADE: LLM-Powered JavaScript Deobfuscator at Google

Authors: Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan
URL: https://arxiv.org/abs/2507.17691
요약 (영문): this paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler . by employing Gemini to identify critical prelude functions, the foundatia IR (JSIR) .
요약 (한글): 이 백서에서는 Gemini의 고급 코딩 기능과 컴파일러의 결정론적 변환 기능을 통합하는 새로운 하이브리드 접근 방식인 CASCADE를 소개합니다. 중요한 전주곡 기능인 파운데이션 IR(JSIR)을 식별하는 데 Gemini를 사용함으로써.

15. Enabling Cyber Security Education through Digital Twins and Generative AI

Authors: Vita Santa Barletta, Vito Bavaro, Miriana Calvano, Antonio Curci, Antonio Piccinno, Davide Pio Posa
URL: https://arxiv.org/abs/2507.17518
요약 (영문): digital twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology) and IoT (Internet of Things) infrastructures . integrating DTs with penetration testing tools and Large Language Models can enhance cybersecurity education and operational readiness .
요약 (한글): 디지털 트윈(DT)은 복잡한 IT(정보 기술), OT(운영 기술), IoT(사물 인터넷) 인프라를 복제할 수 있는 능력으로 사이버 보안 분야에서 각광받고 있습니다. DT를 모의 침투 테스트 도구 및 대규모 언어 모델과 통합하면 사이버 보안 교육과 운영 준비성을 강화할 수 있습니다.

16. MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

Authors: Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing
URL: https://arxiv.org/abs/2507.17476
요약 (영문): the evaluation of such LLMs’ multilingual reasoning capability across diverse languages and cultural contexts remains limited . existing multilingual benchmarks are typically constructed by translating existing English reasoning benchmarks . in this work, we introduce the Multilingual Native Reason .
요약 (한글): 다양한 언어와 문화적 맥락에서 이러한 LLM의 다국어 추론 능력에 대한 평가는 여전히 제한적입니다 . 기존의 다국어 벤치마크는 일반적으로 기존의 영어 추론 벤치마크를 번역하여 구성됩니다 . 이 작업에서는 다국어 네이티브 추론 을 소개합니다 .

17. BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles

Authors: Junhua Liu, Roy Ka-Wei Lee, Kwan Hui Lim
URL: https://arxiv.org/abs/2507.17472
요약 (영문): this work presents a novel approach to enhancing complex decision-making workflows through the integration of hierarchical learning alongside various enhancements . we propose an enhanced Byte-Pair Encoded, Gated Multi-head Hierarchical Attent .
요약 (한글): 이 작업은 다양한 개선 사항과 함께 계층적 학습의 통합을 통해 복잡한 의사 결정 워크플로우를 개선하는 새로운 접근 방식을 제시합니다. 우리는 향상된 바이트 쌍 인코딩, 게이트 멀티 헤드 계층적 주의력을 제안합니다.

18. Probing Vision-Language Understanding through the Visual Entailment Task: promises and pitfalls

Authors: Elena Pitta, Tom Kouwenhoven, Tessa Verhoef
URL: https://arxiv.org/abs/2507.17467
요약 (영문): this study investigates the extent to which the Visual Entailment task serves as a reliable probe of vision-language understanding in multimodal language models . we conduct a series of experiments across zero-shot, few-shot and fine-tuning settings .
요약 (한글): 이 연구는 시각적 수반 과제가 다중 모드 언어 모델에서 시각-언어 이해의 신뢰할 수 있는 프로브 역할을 하는 정도를 조사합니다. 우리는 제로 샷, 소수 샷 및 미세 조정 설정에서 일련의 실험을 수행합니다.

19. Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

Authors: Situo Zhang, Hanqi Li, Lu Chen, Zihan Zhao, Xuanze Lin, Zichen Zhu, Bo Chen, Xin Chen, Kai Yu
URL: https://arxiv.org/abs/2507.17448
요약 (영문): traditional graph-based and sequence-to-sequence models often lack generalized chemical knowledge, leading to predictions that are neither consistently accurate nor easily explainable . to address these challenges, we introduce retroDFM-R, a reasoning-based large language model .
요약 (한글): 기존의 그래프 기반 및 시퀀스 간 모델에는 일반화된 화학 지식이 부족하여 일관되게 정확하지 않거나 쉽게 설명할 수 없는 예측을 초래하는 경우가 많습니다. 이러한 문제를 해결하기 위해 추론 기반 대규모 언어 모델인 retroDFM-R을 소개합니다.

20. Each to Their Own: Exploring the Optimal Embedding in RAG

Authors: Shiting Chen, Zijian Zhao, Jinsong Chen
URL: https://arxiv.org/abs/2507.17442
요약 (영문): the methods for incorporating up-to-date information into LLMs or adding external knowledge to construct domain-specific models have garnered wide attention . the variant embedding models used in RAG exhibit heterogeneous training data and model architecture .
요약 (한글): 최신 정보를 LLM에 통합하거나 외부 지식을 추가하여 도메인별 모델을 구축하는 방법이 많은 관심을 받고 있으며, RAG에 사용되는 변형 임베딩 모델은 이질적인 학습 데이터와 모델 아키텍처를 보여줍니다.

21. HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

Authors: Zhaolin Cai, Fan Li, Ziwei Zheng, Yanjun Qin
URL: https://arxiv.org/abs/2507.17394
요약 (영문): video Anomaly Detection (VAD) aims to identify and locate deviations from normal patterns in video sequences . traditional methods often struggle with substantial computational demands and a reliance on extensive labeled datasets, thereby restricting their practical applicability .
요약 (한글): 비디오 이상 탐지(VAD)는 비디오 시퀀스에서 정상 패턴에서 벗어난 부분을 식별하고 위치를 찾는 것을 목표로 합니다. 기존 방법은 상당한 계산 요구와 광범위한 레이블이 지정된 데이터 세트에 의존하는 경우가 많아 실제 적용이 제한되는 경우가 많습니다.

22. Investigating Training Data Detection in AI Coders

Authors: Tianlin Li, Yunxiang Wei, Zhiming Li, Aishan Liu, Qing Guo, Xianglong Liu, Dongning Sun, Yang Liu
URL: https://arxiv.org/abs/2507.17389
요약 (영문): recent advances in code large language models (CodeLLMs) have made them indispensable tools in modern software engineering . however, these models occasionally produce outputs that contain proprietary or sensitive code snippets . training data detection (TDD) has become a critical task .
요약 (한글): 최근 코드 대용량 언어 모델(CodeLLM)의 발전으로 현대 소프트웨어 엔지니어링에서 없어서는 안 될 도구가 되었지만, 이러한 모델은 때때로 독점적이거나 민감한 코드 스니펫이 포함된 출력을 생성하며, 학습 데이터 탐지(TDD)는 중요한 작업이 되었습니다.

23. DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

Authors: Chuzhan Hao, Wenfeng Feng, Yuewei Zhang, Hao Wang
URL: https://arxiv.org/abs/2507.17365
요약 (영문): multi-step agentic retrieval systems based on large language models (LLMs) have demonstrated remarkable performance in complex information search tasks . however, these systems still face significant challenges in practical applications, particularly in generating factually inconsistent intermediate queries .
요약 (한글): 대규모 언어 모델(LLM)을 기반으로 하는 다단계 에이전트 검색 시스템은 복잡한 정보 검색 작업에서 놀라운 성능을 보여 왔지만, 이러한 시스템은 실제 적용에서 특히 사실과 일치하지 않는 중간 쿼리를 생성하는 데 있어 여전히 상당한 어려움에 직면해 있습니다.

24. A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Authors: Zhe Xu, Ziyi Liu, Junlin Hou, Jiabo Ma, Cheng Jin, Yihui Wang, Zhixuan Chen, Zhengyu Zhang, Zhengrui Guo, Fengtao Zhou, Yingxue Xu, Xi Wang, Ronald Cheong Kin Chan, Li Liang, Hao Chen
URL: https://arxiv.org/abs/2507.17303
요약 (영문): multimodal large language models have emerged as powerful tools for computational pathology . they offer unprecedented opportunities to integrate pathological images with language context for comprehensive diagnostic analysis . current MLLM approaches in pathology demonstrate significantly constrained reasoning capabilities .
요약 (한글): 다중 모드 대규모 언어 모델은 전산 병리학을 위한 강력한 도구로 부상했습니다. 병리학 이미지를 언어 컨텍스트와 통합하여 포괄적인 진단 분석을 할 수 있는 전례 없는 기회를 제공합니다. 병리학의 현재 MLLM 접근 방식은 상당히 제한된 추론 능력을 보여줍니다.

25. Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance

Authors: Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar
URL: https://arxiv.org/abs/2507.17273
요약 (영문): our framework integrates Knowledge Graphs and Large Language Model (LLM)-based agents to analyze complex DES output data from warehouse operations . it transforms raw DES data into a semantically rich KG, capturing relatio .
요약 (한글): 우리의 프레임워크는 지식 그래프와 LLM(대규모 언어 모델) 기반 에이전트를 통합하여 웨어하우스 작업의 복잡한 DES 출력 데이터를 분석하고, 원시 DES 데이터를 의미적으로 풍부한 KG로 변환하여 관계성을 포착합니다.

26. Understanding Prompt Programming Tasks and Questions

Authors: Jenny T. Liang, Chenyang Yang, Agnia Sergeyuk, Travis D. Breaux, Brad A. Myers
URL: https://arxiv.org/abs/2507.17264
요약 (영문): developers are embedding prompts in software known as prompt programs . prompt programming requires the developer to make many changes to their prompt . the questions developers ask to update their prompt are unknown .
요약 (한글): 개발자는 프롬프트 프로그램으로 알려진 소프트웨어에 프롬프트를 내장하고 있습니다. 프롬프트 프로그래밍은 개발자가 프롬프트를 많이 변경해야 합니다. 개발자가 프롬프트를 업데이트하기 위해 묻는 질문은 알 수 없습니다.

27. A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

Authors: Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata
URL: https://arxiv.org/abs/2507.17232
요약 (영문): large language models (LLMs) are trained on a vast amount of procedural texts . but they do not directly observe real-world phenomena . this poses a challenge, as intermediate states of ingredients are often omitted .
요약 (한글): 대규모 언어 모델(LLM)은 방대한 양의 절차적 텍스트로 학습되지만 실제 현상을 직접 관찰하지는 못합니다. 이는 재료의 중간 상태가 생략되는 경우가 많기 때문에 문제가 됩니다.

28. The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models

Authors: Giuseppe Russo, Debora Nozza, Paul Röttger, Dirk Hovy
URL: https://arxiv.org/abs/2507.17216
요약 (영문): a benchmark of 1,618 real-world moral dilemmas paired with a distribution of human moral judgments consisting of a binary evaluation and a free-text rationale . we treat this problem as a pluralistic distributional alignment task .
요약 (한글): 1,618개의 실제 도덕적 딜레마에 대한 벤치마크와 이분법적 평가와 자유 텍스트 근거로 구성된 인간의 도덕적 판단 분포를 결합하여 이 문제를 다원적 분포 정렬 과제로 처리합니다.

29. DesignLab: Designing Slides Through Iterative Detection and Correction

Authors: Jooyeol Yun, Heng Wang, Yotaro Shimose, Jaegul Choo, Shingo Takamatsu
URL: https://arxiv.org/abs/2507.17202
요약 (영문): design-related issues can be challenging for non-experts due to the complexity involved in navigating various design choices . design designers often lack the ability to refine their output, which is key aspect in real-world workflows .
요약 (한글): 디자인 관련 문제는 다양한 디자인 선택과 관련된 복잡성으로 인해 비전문가에게는 어려울 수 있습니다. 디자인 디자이너는 종종 실제 워크플로우의 핵심 요소인 결과물을 다듬을 수 있는 능력이 부족합니다.

30. LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks

Authors: Lijie Zheng, Ji He, Shih Yu Chang, Yulong Shen, Dusit Niyato
URL: https://arxiv.org/abs/2507.17188
요약 (영문): this work tackles the physical layer security problem of maximizing secrecy rate in heterogeneous UAV networks . we consider a realistic scenario where UAVs with diverse payloads and computation resources collaborate to serve ground terminals in presence of eavesdroppers .
요약 (한글): 본 연구는 이기종 무인항공기 네트워크에서 기밀성을 극대화하는 물리 계층 보안 문제를 다루며, 다양한 페이로드와 연산 자원을 가진 무인항공기가 도청자가 있는 상황에서 지상 단말기에 서비스를 제공하기 위해 협업하는 현실적인 시나리오를 고려합니다.

31. SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Authors: Zhiqiang Liu, Enpei Niu, Yin Hua, Mengshu Sun, Lei Liang, Huajun Chen, Wen Zhang
URL: https://arxiv.org/abs/2507.17178
요약 (영문): large language models have made significant progress in understanding Structured Knowledge (SK) like KG and Table . existing evaluations for SK understanding are non-rigorous and focus on a single type of SK .
요약 (한글): 대규모 언어 모델은 KG 및 Table과 같은 구조화된 지식(SK)을 이해하는 데 상당한 진전을 이루었습니다. 기존 SK 이해도 평가는 엄격하지 않고 단일 유형의 SK에 초점을 맞추고 있습니다.

32. Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination

Authors: Mariam ALMutairi, Hyungmin Kim
URL: https://arxiv.org/abs/2507.17134
요약 (영문): this paper presents a novel hybrid framework that integrates blockchain technology with a decentralized, large language model (LLM) powered multi-agent negotiation system to enhance the resilience and accountability of medical supply chains during crises .
요약 (한글): 이 백서에서는 블록체인 기술을 탈중앙화된 대규모 언어 모델(LLM) 기반 다중 에이전트 협상 시스템과 통합하여 위기 시 의료 공급망의 복원력과 책임성을 강화하는 새로운 하이브리드 프레임워크를 소개합니다.

33. Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

Authors: Yufei He, Ruoyu Li, Alex Chen, Yue Liu, Yulin Chen, Yuan Sui, Cheng Chen, Yi Zhu, Luca Luo, Frank Yang, Bryan Hooi
URL: https://arxiv.org/abs/2507.17131
요약 (영문): agents often struggle in environments where rules and required domain knowledge frequently change . current approaches, like offline fine-tuning and standard prompting, are insufficient because they cannot adapt to new knowledge during actual operation .
요약 (한글): 상담원은 규칙과 필요한 도메인 지식이 자주 바뀌는 환경에서 종종 어려움을 겪습니다. 오프라인 미세 조정 및 표준 프롬프트와 같은 현재의 접근 방식은 실제 운영 중에 새로운 지식에 적응할 수 없기 때문에 불충분합니다.

34. BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving

Authors: Wanyi Zheng, Minxian Xu, Shengye Song, Kejiang Ye
URL: https://arxiv.org/abs/2507.17120
요약 (영문): large language models (LLMs) have become increasingly popular in various areas . traditional business gradually shifting from rule-based systems to LLM-based solutions . existing LLM serving systems often use static or continuous batching strategies .
요약 (한글): 다양한 분야에서 대규모 언어 모델(LLM)이 점점 더 대중화되고 있습니다. 전통적인 비즈니스는 점차 규칙 기반 시스템에서 LLM 기반 솔루션으로 전환하고 있습니다. 기존 LLM 서빙 시스템은 종종 정적 또는 연속 배치 전략을 사용합니다.

35. Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models

Authors: Andrii Balashov
URL: https://arxiv.org/abs/2507.17107
요약 (영문): RL fine-tuning consistently modifies only a small subnetwork (typically 5-30% of weights), leaving most parameters unchanged . we call this phenomenon RL-induced parameter update sparsity .
요약 (한글): RL 미세 조정은 지속적으로 작은 하위 네트워크(일반적으로 가중치의 5-30%)만 수정하고 대부분의 파라미터는 변경하지 않습니다. 이 현상을 RL에 의한 파라미터 업데이트 희소성이라고 부릅니다.