[arXiv Digest] 2025-07-08

1. When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors

Authors: Scott Emmons, Erik Jenner, David K. Elson, Rif A. Saurous, Senthooran Rajamanoharan, Heng Chen, Irhum Shafkat, Rohin Shah
URL: https://arxiv.org/abs/2507.05246
요약 (영문): chain-of-thought (CoT) monitoring is an appealing AI safety defense . recent work on “unfaithfulness” has cast doubt on its reliability .
요약 (한글): 생각의 사슬(CoT) 모니터링은 매력적인 AI 안전 방어 수단이지만, 최근 ‘불성실성’에 대한 연구로 인해 그 신뢰성에 의문이 제기되고 있습니다.

2. Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration

Authors: Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Michael Lewis, Katia Sycara, Simon Stepputtis
URL: https://arxiv.org/abs/2507.05244
요약 (영문): TALENTS is a strategy-conditioned cooperator framework . it learns to represent, categorize, and categorate human partners in real time . this is particularly challenging in tasks with time pressure .
요약 (한글): TALENTS는 전략 조건부 협력자 프레임워크로, 실시간으로 인간 파트너를 표현하고 분류하고 분류하는 방법을 학습합니다. 이는 시간 압박이 있는 작업에서 특히 어렵습니다.

3. SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity’s Last Exam?

Authors: Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xinyu Zhu, Mengcheng Zhou, Yanfeng Wang, Weinan E, Siheng Chen
URL: https://arxiv.org/abs/2507.05241
요약 (영문): the rapid advancements of AI agents have ignited the long-held ambition of leveraging them to accelerate scientific discovery . humanity’s last exam (HLE) provides an exceptionally challenging touchstone for evaluating scientific AI agents .
요약 (한글): 인공지능 에이전트의 급속한 발전은 과학적 발견을 가속화하기 위해 인공지능을 활용하려는 오랜 야망에 불을 붙였습니다. 인류의 마지막 시험(HLE)은 과학적 인공지능 에이전트를 평가하는 데 매우 어려운 시금석이 될 것입니다.

4. MedGemma Technical Report

Authors: Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry (Dima)Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang
URL: https://arxiv.org/abs/2507.05201
요약 (영문): foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications . a collection of medical vision-language foundation models based on Gemma 3 4B and 27B .
요약 (한글): 의료 작업에서 우수한 성능을 발휘하고 작업별 튜닝 데이터가 적게 필요한 파운데이션 모델은 의료 AI 애플리케이션 개발을 가속화하는 데 매우 중요합니다. Gemma 3 4B 및 27B 기반의 의료 비전 언어 파운데이션 모델 모음입니다.

5. GIST: Cross-Domain Click-Through Rate Prediction via Guided Content-Behavior Distillation

Authors: Wei Xu, Haoran Li, Baoyuan Ou, Lai Xu, Yingjie Qin, Ruilong Su, Ruiwen Xu
URL: https://arxiv.org/abs/2507.05142
요약 (영문): cross-domain Click-Through Rate prediction aims to tackle data sparsity and cold start problems in online advertising systems . most existing methods rely on overlapping users to facilitate this transfer . but in real-world industrial settings, joint training struggles to learn optimal representations with different drs .
요약 (한글): 크로스 도메인 클릭률 예측은 온라인 광고 시스템의 데이터 희소성과 콜드 스타트 문제를 해결하는 것을 목표로 합니다. 대부분의 기존 방법은 중복 사용자에 의존하여 이러한 전송을 촉진하지만 실제 산업 환경에서 공동 학습은 다양한 DR을 통해 최적의 표현을 학습하는 데 어려움을 겪습니다.

6. Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

Authors: Shixuan Liu, Yue He, Yunfei Wang, Hao Zou, Haoxiang Cheng, Wenjing Yang, Peng Cui, Zhong Liu
URL: https://arxiv.org/abs/2507.05110
요약 (영문): KG reasoning is a critical research area focused on inferring missing knowledge . the assumption can easily be violated due to unknown sample selection bias during training or agnostic distribution shifts during testing .
요약 (한글): KG 추론은 누락된 지식을 추론하는 데 중점을 둔 중요한 연구 영역으로, 훈련 중 알 수 없는 표본 선택 편향이나 테스트 중 불가지론적 분포 변화로 인해 가정이 쉽게 위반될 수 있습니다.

7. How Rules Represent Causal Knowledge: Causal Modeling with Abductive Logic Programs

Authors: Kilian Rückschloß, Felix Weitkämper
URL: https://arxiv.org/abs/2507.05088
요약 (영문): this paper extends Pearl’s approach to causality and interventions to the setting of stratified abductive logic programs . it provides a translatioo of causal knowledge by building on philosophical foundations .
요약 (한글): 이 논문은 인과관계와 개입에 대한 펄의 접근 방식을 계층화된 납치 논리 프로그램의 설정으로 확장합니다. 철학적 토대를 구축하여 인과적 지식의 번역을 제공합니다.

8. When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

Authors: Maxence Boels, Harry Robertshaw, Alejandro Granados, Prokar Dasgupta, Sebastien Ourselin
URL: https://arxiv.org/abs/2507.05011
요약 (영문): teleoperated robotic surgery provides natural expert demonstrations for imitation learning (IL) reinforcement learning (RL) could potentially discover superior strategies through exploration . the first comprehensive comparison of IL versus RL .
요약 (한글): 원격 수술 로봇 수술은 모방 학습(IL)을 위한 자연스러운 전문가 시연을 제공합니다 강화 학습(RL)은 탐색을 통해 잠재적으로 우수한 전략을 발견할 수 있습니다 IL과 RL의 첫 번째 포괄적 비교 .

9. Supported Abstract Argumentation for Case-Based Reasoning

Authors: Adam Gould, Gabriel de Olim Gaul, Francesca Toni
URL: https://arxiv.org/abs/2507.04994
요약 (영문): supported Abstract argumentation for case-based reasoning (sAA-CBR) is a binary classification model in which past cases engage in debates by arguing in favour of their labelling and attacking or supporting those with opposing or agreeing labels.
요약 (한글): 지원되는 사례 기반 추론을 위한 추상적 논증(sAA-CBR)은 과거 사례들이 자신의 라벨에 찬성하고 반대 또는 동의 라벨을 가진 사례들을 공격하거나 지지함으로써 토론에 참여하는 이분법적 분류 모델입니다.

10. MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

Authors: Kaleem Ullah Qasim, Jiashu Zhang
URL: https://arxiv.org/abs/2507.04893
요약 (영문): accident severity prediction plays a critical role in transportation safety systems but is a persistently difficult task due to incomplete data, strong feature dependencies and severe class imbalance . existing methods often rely on monolithic models or black box prompting, which struggle to scale in noisy, real-world settings .
요약 (한글): 사고 심각도 예측은 교통 안전 시스템에서 중요한 역할을 하지만 불완전한 데이터, 강력한 기능 종속성, 심각한 등급 불균형으로 인해 지속적으로 어려운 작업입니다. 기존 방법은 종종 모놀리식 모델이나 블랙박스 프롬프트에 의존하는데, 이는 잡음이 많은 실제 환경에서 확장하기 어렵습니다.

11. DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine

Authors: Zewen Sun, Ruoxiang Huang, Jiahe Feng, Rundong Kong, Yuqian Wang, Hengyu Liu, Ziqi Gong, Yuyuan Qin, Yingxue Wang, Yu Wang
URL: https://arxiv.org/abs/2507.04877
요약 (영문): current large language models exhibit notable limitations in medical applications, particularly in conducting effective multi-turn dialogues and proactive questioning . these shortcomings hinder practical application and effectiveness in simulating real-world diagnostic scenarios .
요약 (한글): 현재의 대규모 언어 모델은 의료 애플리케이션, 특히 효과적인 멀티턴 대화 및 사전 질문 수행에 있어 현저한 한계를 보입니다. 이러한 단점은 실제 진단 시나리오를 시뮬레이션할 때 실제 적용과 효과를 저해합니다.

12. Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents

Authors: George Jagadeesh, Srikrishna Iyer, Michal Polanowski, Kai Xin Thia
URL: https://arxiv.org/abs/2507.04803
요약 (영문): this study examines the feasibility of applying large language models for forecasting the impact of traffic incidents on the traffic flow . the use of LLMs has several advantages over existing machine learning-based solutions such as not requiring a large training dataset and the ability to utilize incident logs .
요약 (한글): 이 연구는 교통 사고가 교통 흐름에 미치는 영향을 예측하기 위한 대규모 언어 모델 적용의 타당성을 검토합니다. LLM을 사용하면 대규모 학습 데이터 세트가 필요하지 않고 사고 로그를 활용할 수 있는 등 기존 머신 러닝 기반 솔루션에 비해 몇 가지 장점이 있습니다.

13. FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System

Authors: Toan Nguyen, Tri Le, Quang Nguyen, Anh Nguyen
URL: https://arxiv.org/abs/2507.04770
요약 (영문): we propose a multi-agent system for automatic furniture decoration . given a human prompt and a household furniture item such as a working desk or a TV stand, our system suggests automating the decoration process.
요약 (한글): 우리는 자동 가구 장식을 위한 다중 에이전트 시스템을 제안합니다. 사람의 프롬프트와 작업용 책상이나 TV 스탠드와 같은 가정용 가구 품목이 주어지면 우리 시스템은 장식 프로세스를 자동화할 것을 제안합니다.

14. LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

Authors: Sungmin Lee, Minju Kang, Joonhee Lee, Seungyong Lee, Dongju Kim, Jingi Hong, Jun Shin, Pei Zhang, JeongGil Ko
URL: https://arxiv.org/abs/2507.04748
요약 (영문): QA interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC systems . enabling accurate, real-time, and context-aware interactions introduces unique challenges, including the integration of frequently updated sensor data .
요약 (한글): 대규모 언어 모델(LLM)로 구동되는 QA 인터페이스는 HVAC 시스템과의 상호 작용을 개선하기 위한 유망한 방향을 제시합니다. 정확한 실시간 상황 인식 상호 작용을 구현하려면 자주 업데이트되는 센서 데이터의 통합을 비롯한 고유한 과제가 발생합니다.

15. Activation Steering for Chain-of-Thought Compression

Authors: Seyedarmin Azizi, Erfan Baghaei Potraghloo, Massoud Pedram
URL: https://arxiv.org/abs/2507.04742
요약 (영문): large language models excel at complex reasoning when they include intermediate steps . verbose, English-heavy CoTs and concise, math-centric coTs occupy distinct regions in the model’s residual-stream activation space .
요약 (한글): 대규모 언어 모델은 중간 단계를 포함할 때 복잡한 추론에 탁월합니다. 장황한 영어 중심의 CoT와 간결하고 수학 중심적인 CoT는 모델의 잔류 스트림 활성화 공간에서 서로 다른 영역을 차지합니다.

16. ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Xinyang He, Chujie Chen, Cangyuan Li, Mengdi Wang, Haobo Xu, Yinhe Han, Ying Wang
URL: https://arxiv.org/abs/2507.04736
요약 (영문): large language models (LLMs) show significant potential for automating RTL code generation . but current approaches face a critical challenge: they can not simultaneously optimize for functional correctness and hardware quality . post-processing techniques that attempt to improve PP can improve performance .
요약 (한글): 대규모 언어 모델(LLM)은 RTL 코드 생성 자동화에 상당한 잠재력을 보이지만 현재의 접근 방식은 기능적 정확성과 하드웨어 품질을 동시에 최적화할 수 없다는 중요한 과제에 직면해 있습니다. PP를 개선하려는 후처리 기술은 성능을 향상시킬 수 있습니다.

17. LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Movie Recommendation

Authors: Jinzhi Wang, Bin Li, Qingke Peng, Haozhou Li, Zeyuan Zeng, Ruimeng Li, Biyi Zhou
URL: https://arxiv.org/abs/2507.04722
요약 (영문): only 10% of head movies account for nearly half of all mentions . about 70% of tail movies receive merely 26% of the attention . this imbalance gives rise to three critical challenges .
요약 (한글): 헤드 무비의 10%만이 전체 언급의 거의 절반을 차지합니다. 꼬리 무비의 약 70%가 26%의 관심만을 받습니다. 이러한 불균형은 세 가지 중요한 과제를 야기합니다.

18. Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs

Authors: Roozbeh Yousefzadeh, Xuenan Cao
URL: https://arxiv.org/abs/2507.04719
요약 (영문): this position paper provides a critical but constructive discussion of current practices in benchmarking and evaluative practices . we identify practices that create barriers to contributing to this field and suggest ways to remove them .
요약 (한글): 이 입장문은 벤치마킹 및 평가 관행의 현재 관행에 대해 비판적이면서도 건설적인 논의를 제공하며, 이 분야에 기여하는 데 장애가 되는 관행을 파악하고 이를 제거할 수 있는 방법을 제안합니다.

19. Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message

Authors: Wei Duan, Li Qian
URL: https://arxiv.org/abs/2507.04673
요약 (영문): the rise of conversational interfaces has greatly enhanced usability . this reliance introduces an unexplored attack surface . a malicious payload is injected into a model-attributed message .
요약 (한글): 대화형 인터페이스의 등장으로 사용성이 크게 향상되었습니다. 이러한 의존성은 미개척 공격 표면을 도입합니다. 모델 어트리뷰션 메시지에 악성 페이로드가 삽입됩니다.

20. Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

Authors: Yun Qu, Qi Cheems Wang, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji
URL: https://arxiv.org/abs/2507.04632
요약 (영문): recent advances have witnessed the effectiveness of reinforcement learning (RL) finetuning in enhancing the reasoning capabilities of large language models (LLMs) the optimization process often requires numerous iterations to achieve satisfactory performance .
요약 (한글): 최근의 발전으로 대규모 언어 모델(LLM)의 추론 능력을 향상시키는 데 있어 강화 학습(RL) 미세 조정의 효과가 입증되었습니다. 최적화 프로세스에는 만족스러운 성능을 달성하기 위해 수많은 반복이 필요한 경우가 많습니다.

21. DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification

Authors: Zhipeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu, Changsheng Zhang, Yongsheng Huang, Bin Zhang
URL: https://arxiv.org/abs/2507.04600
요약 (영문): real-world time series typically exhibit complex temporal variations . existing analysis-based time series prediction methods fail to eliminate redundant scale-shared features across multi-scale time series . model over- or under-focusing on temporal patterns .
요약 (한글): 실제 시계열은 일반적으로 복잡한 시간적 변화를 보입니다. 기존의 분석 기반 시계열 예측 방법은 다중 규모 시계열에서 중복된 규모 공유 특징을 제거하지 못합니다. 시간적 패턴에 과도하게 또는 과소하게 초점을 맞추는 모델을 만듭니다.

22. Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective

Authors: Niloofar Shadab, Tyler Cody, Alejandro Salado, Taylan G. Topcu, Mohammad Shadab, Peter Beling
URL: https://arxiv.org/abs/2507.04594
요약 (영문): engineering methodologies revolve around established principles of decomposition and recomposition . these principles involve partitioning inputs and outputs at the component level . this view does not transfer well to intelligent systems, particularly when addressing scaling of intelligence as a system property .
요약 (한글): 엔지니어링 방법론은 분해와 재구성의 확립된 원칙을 중심으로 진행되며, 이러한 원칙에는 구성 요소 수준에서 입력과 출력을 분할하는 것이 포함됩니다. 이러한 관점은 특히 시스템 속성으로서 지능의 확장을 다룰 때 지능형 시스템에는 잘 적용되지 않습니다.

23. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Authors: Yuanzhe Hu, Yu Wang, Julian McAuley
URL: https://arxiv.org/abs/2507.05257
요약 (영문): benchmarks for LLM agents focus on evaluating reasoning, planning, and execution capabilities . another critical component-memory, encompassing how agents memorize, update, and retrieve long-term information, is under-evaluated .
요약 (한글): LLM 에이전트에 대한 벤치마크는 추론, 계획 및 실행 능력을 평가하는 데 중점을 두고 있습니다. 또 다른 중요한 요소인 에이전트가 장기 정보를 암기, 업데이트 및 검색하는 방법을 포괄하는 메모리는 저평가되어 있습니다.

24. From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Authors: Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller
URL: https://arxiv.org/abs/2507.05254
요약 (영문): apg predicts traffic participants’ future trajectories in dynamic environments . compared to previous models, joint prediction models explicitly account for interactions between agents .
요약 (한글): APG는 동적 환경에서 트래픽 참가자의 미래 궤적을 예측합니다. 이전 모델에 비해 공동 예측 모델은 에이전트 간의 상호 작용을 명시적으로 설명합니다.

25. Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving

Authors: Elahe Delavari, Feeza Khan Khanzada, Jaerock Kwon
URL: https://arxiv.org/abs/2507.05251
요약 (영문): RL allows agents to learn control policies through interaction with environments . large and high-dimensional action spaces often used to support fine-grained control can impede training efficiency and increase exploration costs .
요약 (한글): 에이전트는 환경과의 상호작용을 통해 제어 정책을 학습할 수 있습니다. 세분화된 제어를 지원하는 데 자주 사용되는 크고 고차원적인 작업 공간은 훈련 효율성을 저해하고 탐색 비용을 증가시킬 수 있습니다.

26. CTA: Cross-Task Alignment for Better Test Time Training

Authors: Samuel Barbeau, Pedram Fekri, David Osowiechi, Ali Bahri, Moslem YazdanpanahMasih Aminbeidokhti, Christian Desrosiers
URL: https://arxiv.org/abs/2507.05221
요약 (영문): test-time training has emerged as an effective method to enhance model robustness by incorporating an auxiliary unsupervised task during training and leveraging it for model updates at test time .
요약 (한글): 테스트 시간 훈련은 훈련 중에 보조적인 비지도 작업을 통합하고 테스트 시간에 모델 업데이트에 활용함으로써 모델 견고성을 향상시키는 효과적인 방법으로 부상했습니다.

27. All in One: Visual-Description-Guided Unified Point Cloud Segmentation

Authors: Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, Rao Muhammad Anwer
URL: https://arxiv.org/abs/2507.05211
요약 (영문): unified segmentation of 3D point clouds is crucial for scene understanding, but is hindered by its sparse structure, limited annotations, and the challenge of distinguishing fine-grained objects in complex environments . to address these challenges, we propose VDG-Uni3DSeg, a novel framework .
요약 (한글): 3D 포인트 클라우드의 통합된 분할은 장면 이해에 매우 중요하지만, 희박한 구조와 제한된 주석, 복잡한 환경에서 세분화된 오브젝트를 구별하는 데 어려움을 겪습니다. 이러한 문제를 해결하기 위해 유니티는 새로운 프레임워크인 VDG-Uni3DSeg를 제안합니다.

28. EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling

Authors: Boyuan Wang, Xinpan Meng, Xiaofeng Wang, Zheng Zhu, Angen Ye, Yang Wang, Zhiqin Yang, Chaojun Ni, Guan Huang, Xingang Wang
URL: https://arxiv.org/abs/2507.05198
요약 (영문): the rapid advancement of Embodied AI has led to an increasing demand for large-scale, high-quality real-world data . however, collecting such embodied data remains costly and inefficient .
요약 (한글): 구현형 AI의 급속한 발전으로 대규모의 고품질 실제 데이터에 대한 수요가 증가하고 있지만, 이러한 구현형 데이터를 수집하는 데는 여전히 많은 비용이 들고 비효율적입니다.

29. Train-before-Test Harmonizes Language Model Rankings

Authors: Guanhua Zhang, Ricardo Dominguez-Olmedo, Moritz Hardt
URL: https://arxiv.org/abs/2507.05195
요약 (영문): conflicting rankings hamper model selection, clouds comparisons, and adds confusion to growing ecosystem of competing models . a candidate solution to the problem is train on the test task .
요약 (한글): 상충되는 순위는 모델 선택을 방해하고, 비교를 흐리게 하며, 경쟁 모델의 생태계가 성장함에 따라 혼란을 가중시킵니다. 이 문제에 대한 후보 솔루션은 테스트 작업에 대한 훈련입니다.

30. Infrastructuring Contestability: A Framework for Community-Defined AI Value Pluralism

Authors: Andreas Mayer
URL: https://arxiv.org/abs/2507.05187
요약 (영문): the proliferation of AI-driven systems presents a fundamental challenge to human-computer interaction and computer-supported cooperative work . current approaches to value alignment lack the mechanisms for meaningful contestability . this leaves users unable to challenge or shape the values embedded in the systems that govern their digital lives .
요약 (한글): AI 기반 시스템의 확산은 인간과 컴퓨터의 상호 작용 및 컴퓨터 지원 협력 작업에 근본적인 도전을 제시합니다. 가치 정렬에 대한 현재의 접근 방식에는 의미 있는 경쟁 가능성을 위한 메커니즘이 부족합니다. 이로 인해 사용자는 디지털 생활을 지배하는 시스템에 내재된 가치에 도전하거나 이를 형성할 수 없습니다.

31. CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale

Authors: Jonathan Hyun, Nicholas R Waytowich, Boyuan Chen
URL: https://arxiv.org/abs/2507.05178
요약 (영문): despite rapid progress in large language model (LLM)-based multi-agent systems, current benchmarks fall short in evaluating their scalability, robustness, and coordination capabilities . existing environments typically focus on small-scale, fully observable, or low-complexity domains .
요약 (한글): 대규모 언어 모델(LLM) 기반 다중 에이전트 시스템의 빠른 발전에도 불구하고 현재 벤치마크는 확장성, 견고성 및 조정 기능을 평가하는 데 부족합니다. 기존 환경은 일반적으로 소규모, 완전히 관찰 가능하거나 복잡성이 낮은 도메인에 초점을 맞추고 있습니다.

32. OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model

Authors: Chen Wang, Tianyu Peng, Wen Yang, Yinan Bai, Guangfu Wang, Jun Lin, Lanpeng Jia, Lingxiang Wu, Jinqiao Wang, Chengqing Zong, Jiajun Zhang
URL: https://arxiv.org/abs/2507.05177
요약 (영문): the most powerful empathetic LSLMs are closed off, leaving the crucial details about the architecture, data and development opaque to researchers . openS2S is a fully open-source, transparent and transparent .
요약 (한글): 가장 강력한 공감형 LSLM은 폐쇄적이어서 아키텍처, 데이터 및 개발에 대한 중요한 세부 사항을 연구자에게 불투명하게 남겨두고 있습니다. openS2S는 완전 오픈 소스이며 투명하고 투명한 .

33. Critiques of World Models

Authors: Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu
URL: https://arxiv.org/abs/2507.05169
요약 (영문): there has been much debate on what a world model really is, how to build it, and how to evaluate it . in this essay, we draw inspiration from the concept of “hypnotic”
요약 (한글): 세계 모델이 실제로 무엇인지, 어떻게 구축하는지, 어떻게 평가하는지에 대해 많은 논쟁이 있었습니다. 이 에세이에서는 “최면”이라는 개념에서 영감을 얻었습니다.

34. LAID: Lightweight AI-Generated Image Detection in Spatial and Spectral Domains

Authors: Nicholas Chivaran, Jianbing Ni
URL: https://arxiv.org/abs/2507.05162
요약 (영문): current state-of-the-art AIGI detection methods typically rely on large, deep neural architectures . this creates significant computational barriers to real-time, large-scale deployment on social media platforms .
요약 (한글): 현재의 최신 AIGI 탐지 방법은 일반적으로 대규모의 심층 신경 아키텍처에 의존하기 때문에 소셜 미디어 플랫폼에 실시간으로 대규모로 배포하는 데 상당한 계산상의 장벽이 존재합니다.

35. AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models

Authors: Chinnappa Guggilla, Budhaditya Roy, Trupti Ramdas Chavan, Abdul Rahman, Edward Bowen
URL: https://arxiv.org/abs/2507.05157
요약 (영문): Large Language Models (LLMs) adapt to various styles and genres . they produce content that is both grammatically correct and semantically meaningful . recently, they have been misused to create highly realistic phishing emails, spread fake news, generate code to automate cyber crime .
요약 (한글): LLM(대규모 언어 모델)은 다양한 스타일과 장르에 적응하며 문법적으로 정확하고 의미적으로 의미 있는 콘텐츠를 생성하며 최근에는 매우 사실적인 피싱 이메일 생성, 가짜 뉴스 확산, 사이버 범죄 자동화를 위한 코드 생성에 악용되고 있습니다.

36. Effects of Unplanned Incoming Flights on Airport Relief Processes after a Major Natural Disaster

Authors: Luka Van de Sype, Matthieu Vert, Alexei Sharpanskykh, Seyed Sahand Mohammadi Ziabari
URL: https://arxiv.org/abs/2507.05150
요약 (영문): airports are important hubs where relief aid arrives and people need to be evacuated . the airport often forms a bottleneck due to the sudden need for increased capacity .
요약 (한글): 공항은 구호품이 도착하고 사람들이 대피해야 하는 중요한 허브입니다. 공항은 갑작스러운 수용 인원 증가로 인해 병목 현상이 발생하는 경우가 많습니다.

37. OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

Authors: Tom Hickling, Jonathan F. MacArt, Justin Sirignano, Den Waidmann
URL: https://arxiv.org/abs/2507.05149
요약 (영문): engineering quantities of interest typically take the form of time-average statistics such as $ frac1t int_0t f . optimization over $F(x; theta)$ has many engineering applications including geometric optimizationa .
요약 (한글): 공학적으로 관심 있는 수량은 일반적으로 $ frac1t int_0t f 와 같은 시간 평균 통계의 형태를 취합니다. $F(x; 세타)$ 에 대한 최적화는 기하학적 최적화를 포함한 많은 공학 응용 분야가 있습니다a .

38. Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization

Authors: Jaewook Lee, Alexander Scarlatos, Andrew Lan
URL: https://arxiv.org/abs/2507.05137
요약 (영문): Japanese combines syllabaries like hiragana with kanji, which are logographic characters of Chinese origin . keywords mnemonics are a common strategy to aid memorization .
요약 (한글): 일본어는 히라가나와 같은 음절과 한자, 즉 한자에서 유래한 문자를 결합한 언어입니다. 키워드 니모닉은 암기를 돕기 위한 일반적인 전략입니다.