๐Ÿ“„ PaperBytes

Weekly AI Papers โ€” 2026-05-09

๐Ÿ“„ 10ํŽธ ๐Ÿ›๏ธ ๋น…ํ…Œํฌ 6ํŽธ ๐Ÿ”ฅ ํŠธ๋ Œ๋”ฉ 3ํŽธ
1
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
Tencent Hunyuan

๐Ÿ” "๋”ฅ ์„œ์น˜๊ฐ€ ๋ชจ๋ธ์˜ โ€˜๋‡Œโ€™๊ฐ€ ์•„๋‹ˆ๋ผ โ€˜์†โ€™์ด ๋˜๋Š” ์ˆœ๊ฐ„, ๊ฒ€์ƒ‰์ด ๋ฌธ์ œ ํ•ด๊ฒฐ์˜ ํ•ต์‹ฌ์ด ๋ฉ๋‹ˆ๋‹ค"

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

๐Ÿ›๏ธ ์†Œ์†: Tencent Hunyuan (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: multimodal search, agentic reinforcement learning, open-source recipe, deep search, trajectory synthesis

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๊ฒ€์ƒ‰์€ ๋‹จ์ˆœํžˆ ์ •๋ณด ์ฐพ๊ธฐ์•ผ, ์™œ ์ด๋ ‡๊ฒŒ ๋ณต์žกํ•œ ๋ชจ๋ธ์ด ํ•„์š”ํ•˜์ง€?โ€
  • โ€œ๋‚ด๊ฐ€ ์“ฐ๋Š” ๋ชจ๋ธ์ด ๊ฒ€์ƒ‰์„ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ์ง€, ์–ด๋–ป๊ฒŒ ํ•™์Šตํ•˜๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์–ด.โ€
  • โ€œ์ž๊ธฐ๋งŒ์˜ ๊ฒ€์ƒ‰ ์—์ด์ „ํŠธ๋ฅผ ๋งŒ๋“ค๊ณ  ์‹ถ์€๋ฐโ€ฆ ๋ฐ์ดํ„ฐ๋‚˜ ํŠธ๋ ˆ์ด๋‹ ๋ฐฉ๋ฒ•์ด ๋„ˆ๋ฌด ๋‹ซํ˜€ ์žˆ์ž–์•„.โ€

๊ธฐ์กด์—๋Š” ๊ฒ€์ƒ‰์ด ๋ชจ๋ธ์˜ ๋ถ€๊ฐ€ ๊ธฐ๋Šฅ์ด์—ˆ๊ณ , ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ๋‚˜ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋น„๊ณต๊ฐœ๋กœ ๋‚จ์•„ ์žˆ์—ˆ์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์€ โ€œ๋ชจ๋“ ๊ฑธ ์˜คํ”ˆโ€ํ•œ ์ฒด๊ณ„๋ฅผ ์ œ์•ˆํ•ด, ๋ˆ„๊ตฌ๋‚˜ ๋™์ผํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๋”ฅ ์„œ์น˜ ์—์ด์ „ํŠธ๋ฅผ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • 7๊ฐœ์˜ ๋ฒค์น˜๋งˆํฌ์—์„œ ํ‰๊ท  10์  ์ด์ƒ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • SearchVL-SFT-36k, SearchVL-RL-8k ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›ˆ๋ จ โ†’ RL ๋ฐ์ดํ„ฐ๋Š” 8,000๊ฐœ์˜ ํŠธ๋ ˆ์ด๋‹ ํŠธ๋ž™์ด ํฌํ•จ๋จ

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

โ€œ๋น„๊ณต๊ฐœ ๋ฐ์ดํ„ฐ + ๋ณต์žกํ•œ ํŒŒ์ดํ”„๋ผ์ธ + ๊ฒ€์ƒ‰ ์‹คํŒจ์— ๋Œ€ํ•œ ๋ฌด๊ด€์‹ฌโ€ โ†’ โ€œ์˜คํ”ˆ ๋ฐ์ดํ„ฐ + ํŠธ๋ž™ ํ•ฉ์„ฑ + ํˆด ์‹คํŒจ์— ๋Œ€ํ•œ ์žฅ์• ๋ฌผ ์ œ๊ฑฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜โ€

2
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
ByteDance

๐Ÿง  โ€œ๋น„์ „ ํŠธ๋žœ์Šคํฌ๋จธ๊ฐ€ ๋งํ•  ์ˆ˜ ์žˆ๋‹ค? ๊ทธ๊ฒŒ ์ง„์งœ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฑฐ์•ผ!โ€

Let ViT Speak: Generative Language-Image Pre-training

๐Ÿ›๏ธ ์†Œ์†: ByteDance (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: Vision Transformer, Generative Pretraining, Multimodal LLM, Language Modeling, OCR

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๋น„์ „ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋งŒ ๋ณด๋Š” ๊ฑธ๋กœ ๋์ด์•ผ?โ€
  • โ€œํ…์ŠคํŠธ์™€ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๋•Œ, ๋””์ฝ”๋”๊ฐ€ ๊ผญ ํ•„์š”ํ• ๊นŒ?โ€
  • โ€œ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์–ด๋„ ์ข‹์€ ์„ฑ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋น„์ „-์–ธ์–ด ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹์ด ์žˆ์„๊นŒ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” ๋น„์ „ ์ธ์ฝ”๋”์™€ ์–ธ์–ด ๋””์ฝ”๋”๋ฅผ ๋ถ„๋ฆฌํ•ด ๋Œ€๋ฆฝ ๊ตฌ์กฐ๋กœ ํ›ˆ๋ จํ–ˆ์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์€ ๋‹จ์ผ ํŠธ๋žœ์Šคํฌ๋จธ๋กœ ์‹œ๊ฐ ํ† ํฐ๊ณผ ์–ธ์–ด ํ† ํฐ์„ ๋™์‹œ์— ํ•™์Šตํ•ด โ€˜์ƒ์„ฑํ˜•โ€™ ๋ฐฉ์‹์œผ๋กœ ๋งž์ถคํ˜• ๋Œ€ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • 8B ์ƒ˜ํ”Œ๋กœ ํ›ˆ๋ จํ•œ GenLIP์ด Recap-DataComp-1B ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ธฐ์กด ๊ฐ•๋ ฅํ•œ ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ **๋™๋“ฑ ๋˜๋Š” ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ**๋ฅผ ๋‹ฌ์„ฑ
  • ๋ฉ€ํ‹ฐ ๋ฆฌ์กธ๋ฃจ์…˜ ์ด๋ฏธ์ง€์—์„œ์˜ ์ถ”๊ฐ€ ํ›ˆ๋ จ์œผ๋กœ OCR ๋ฐ ์ฐจํŠธ ์ดํ•ด ๊ฐ™์€ **์„ธ๋ถ€์„ฑ ๋ฏผ๊ฐ ์ž‘์—…์—์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ** (์ƒ์„ธ ์ˆ˜์น˜: OCR์—์„œ 2.1% ์ ์ˆ˜ ์ƒ์Šน)

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

๊ธฐ์กด ๋ฐฉ์‹ โ†’ โ€œ๋น„์ „ ์ธ์ฝ”๋” + ์–ธ์–ด ๋””์ฝ”๋” ๋ถ„๋ฆฌ + ๋Œ€์กฐ์  ๋ฐฐ์น˜ ๊ตฌ์„ฑโ€

โ†’ ์ƒˆ ๋ฐฉ์‹ โ†’ โ€œ๋‹จ์ผ ํŠธ๋žœ์Šคํฌ๋จธ๋กœ ์‹œ๊ฐ+์–ธ์–ด ํ† ํฐ ๋™์‹œ ์ƒ์„ฑ ํ•™์Šต, ๋ฐ์ดํ„ฐ๋Ÿ‰ ์ตœ์†Œํ™”์—๋„ ์„ฑ๋Šฅ ์œ ์ง€โ€

3
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
alibaba-inc

๐ŸŽจ "๋ช‡ ์Šคํ…์œผ๋กœ๋„ ์™„์„ฑ๋„ ๋†’์€ ์ด๋ฏธ์ง€? ์ด ๋…ผ๋ฌธ์ด DMD์˜ ํ•œ๊ณ„๋ฅผ โ€˜์—ฐ์† ์‹œ๊ฐ„โ€™์œผ๋กœ ๋’ค์ง‘์—ˆ๋‹ค!"

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

๐Ÿ›๏ธ ์†Œ์†: alibaba-inc (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: diffusion distillation, continuous-time optimization, distribution matching, few-step generation, reverse KL divergence

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๋ช‡ ์Šคํ…๋งŒ์œผ๋กœ๋„ ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด?โ€
  • โ€œDMD๊ฐ€ ์™œ ํ•ญ์ƒ ๋นˆ๋ฒˆํ•œ GAN์ด๋‚˜ ๋ณด์ƒ ๋ชจ๋ธ์„ ๋Œ์–ด๋‚ด๋Š” ๊ฑธ๊นŒ?โ€
  • โ€œ์—ฐ์† ์‹œ๊ฐ„์œผ๋กœ ํ•™์Šตํ•˜๋ฉด, ์ด๋ฏธ์ง€์˜ ๋””ํ…Œ์ผ์ด ์‚ฌ๋ผ์ง€์ง€ ์•Š์„๊นŒ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด DMD๋Š” ๊ณ ์ •๋œ ๋ช‡ ๊ฐœ์˜ timesteps์—์„œ๋งŒ ๋ถ„ํฌ ๋งค์นญ์„ ํ•˜๋ฉฐ, ์ด๋กœ ์ธํ•ด ์‹œ๊ฐ์  ์•„ํ‹ฐํŒฉํŠธ์™€ ๊ณผ๋„ํ•œ ๋งค๋„๋Ÿฌ์›€์ด ๋ฐœ์ƒํ–ˆ๊ณ , ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณต์žกํ•œ ๋ณด์กฐ ๋ชจ๋“ˆ์ด ํ•„์š”ํ–ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๋ถ„ํฌ ๋งค์นญ์„ โ€˜์—ฐ์† ์‹œ๊ฐ„โ€™์œผ๋กœ ํ™•์žฅํ•ด, ์ž„์˜์˜ ์ƒ˜ํ”Œ๋ง ๊ฒฝ๋กœ์ƒ์—์„œ ๋งค์นญ์„ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์‹œ๊ฐ์  ํ’ˆ์งˆ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ณ , ๋ณด์กฐ ๋ชจ๋“ˆ ์—†์ด๋„ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • SD3-Medium์—์„œ 10-step์œผ๋กœ ์ƒ์„ฑ ์‹œ, FID 13.8 โ†’ CDM ์ ์šฉ ํ›„ **FID 9.1**๋กœ 34% ๊ฐ์†Œ (๋น„๊ต ๊ธฐ์ค€: vanilla DMD)
  • Longcat-Image์—์„œ 5-step์œผ๋กœ ์ƒ์„ฑ ์‹œ, CLIP score 1.42 โ†’ CDM ์ ์šฉ ํ›„ **CLIP score 1.68**๋กœ 18% ์ฆ๊ฐ€ (๋น„๊ต ๊ธฐ์ค€: vanilla DMD)

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

**๊ธฐ์กด DMD (๊ณ ์ • timesteps + reverse KL) โ†’ CDM (์—ฐ์† ์‹œ๊ฐ„ ์Šค์ผ€์ค„ + ์˜คํ”„ํŠธ๋ž™ ๋งค์นญ)**

4
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
Tencent

๐Ÿ”ฅ "LLM์ด ํˆด์„ ์“ฐ๋Š” ์ˆœ๊ฐ„, ๊ทธ 'ํ•œ ๋ฒˆ'์ด ์™œ ์ค‘์š”ํ•œ์ง€โ€ฆ ์ด์ œ ์ •ํ™•ํžˆ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋‹ค!"

A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

๐Ÿ›๏ธ ์†Œ์†: Tencent (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: Agentic RL, Information Gain, Turn-level Clipping, Credit Assignment, Policy Optimization

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œLLM์ด ์—ฌ๋Ÿฌ ํˆด์„ ์‚ฌ์šฉํ•  ๋•Œ, ๊ฐ ํˆด์ด ์–ผ๋งˆ๋‚˜ โ€˜์ •๋ณด๋ฅผ ์คฌ๋Š”๊ฐ€โ€™๋ฅผ ์–ด๋–ป๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์„๊นŒ?โ€
  • โ€œ๋‹ค์–‘ํ•œ ํ„ด์—์„œ์˜ ์„ฑ๊ณผ๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ, ์œ„์น˜๋‚˜ ๋งฅ๋ฝ์ด ๋‹ค๋ฅด๋ฉด ์–ด๋–ป๊ฒŒ ๋น„๊ตํ•ด์•ผ ํ•˜๋‚˜?โ€
  • โ€œ์—…๋ฐ์ดํŠธ ๋ฒ”์œ„๋ฅผ ๊ณ ์ •์œผ๋กœ ๋‘๋ฉด, ์ •๋ณด๊ฐ€ ํ’๋ถ€ํ•œ ํ„ด๊ณผ ๋ถ€์กฑํ•œ ํ„ด์ด ๋ชจ๋‘ ๋˜‘๊ฐ™์ด ํ•™์Šต๋ฐ›๋Š” ๊ฑด ๊ณต์ •ํ•œ๊ฐ€?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: "๊ธฐ์กด์—๋Š” X์˜€๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์€ Y๋กœ ๋’ค์ง‘์—ˆ์Šต๋‹ˆ๋‹ค"]

๊ธฐ์กด์—๋Š” ํˆด ํ˜ธ์ถœ์˜ ๊ธฐ์—ฌ๋„๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ, ์ „์ฒด ํŠธ๋ž™ํ† ๋ฆฌ์˜ ๊ฒฐ๊ณผ๋งŒ์œผ๋กœ ํŒ๋‹จํ•˜๊ฑฐ๋‚˜, ๋ณต์žกํ•œ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋กœ ๋ถ„์‚ฐํ–ˆ์œผ๋‚˜, ํ„ด ๊ฐ„ ๋งฅ๋ฝ ์ฐจ์ด์™€ ์ •๋ณด๋Ÿ‰ ์ฐจ์ด๋ฅผ ๊ณ ๋ คํ•˜์ง€ ๋ชปํ•ด ์ •ํ™•ํ•œ ํ•™์Šต์ด ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๊ฐ ํ„ด์˜ ์ •๋ณด ์ฆ๊ฐ€๋Ÿ‰(Information Gain)์„ ๋ณธ์งˆ์  ์‹ ํ˜ธ๋กœ ํ™œ์šฉํ•˜๊ณ , ๊ทธ ์‹ ํ˜ธ๋ฅผ ์ •๊ทœํ™”, ๋ˆ„์ , ํด๋ฆฌํ•‘ ๋ฐฉ์‹์„ ๋ชจ๋‘ ์žฌ์„ค๊ณ„ํ•˜์—ฌ, ํ„ด๋ณ„๋กœ ์ •ํ™•ํžˆ ํ•™์Šต์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • **turn-group ์ •๊ทœํ™”**๋กœ ๊ฐ ํ„ด์ด ๊ฐ™์€ ์ƒํ˜ธ์ž‘์šฉ ๊นŠ์ด(= ํ„ด ์ธ๋ฑ์Šค) ๋‚ด์—์„œ๋งŒ ๋น„๊ต๋˜๋ฉฐ, 100%์˜ ํ„ด ์ •๋ณด๊ฐ€ ์ •ํ™•ํžˆ ํ‰๊ฐ€๋จ.
  • **๋ถ„์‚ฐ ์กฐ์ •๋œ ํ• ์ธ ๋ˆ„์ **์œผ๋กœ, ๋ˆ„์ ๋œ IG๋ฅผ โˆš(๋ˆ„์  ํ•ญ๋ชฉ ์ˆ˜)๋กœ ๋‚˜๋ˆ„์–ด, 100๊ฐœ ํ„ด๊ณผ 10๊ฐœ ํ„ด ๋ชจ๋‘์—์„œ ์œ ์‚ฌํ•œ ์ด์  ํฌ๊ธฐ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋จ.

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

**๊ธฐ์กด ๋ฐฉ์‹: ๊ณ ์ • ํด๋ฆฌํ•‘ ๋ฒ”์œ„๋กœ ๋ชจ๋“  ํ„ด์„ ๋™์ผํ•˜๊ฒŒ ์—…๋ฐ์ดํŠธ** โ†’ **์ƒˆ ๋ฐฉ์‹: ๊ฐ ํ„ด์˜ IG์— ๋”ฐ๋ผ ํด๋ฆฌํ•‘ ๋ฒ”์œ„๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ ˆํ•ด ์ •๋ณด ํ’๋ถ€ํ•œ ํ„ด์€ ๋” ํฌ๊ฒŒ, ์ •๋ณด ๋ถ€์กฑํ•œ ํ„ด์€ ๋” ์ž‘๊ฒŒ ์—…๋ฐ์ดํŠธ**

5
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
Tencent

๐Ÿง  โ€œLLM์ด ๊ธด ๋ฌธ๋งฅ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ โ€˜์ „์ฒด ๋‡Œ ํ™œ์„ฑํ™”โ€™๋ฅผ ๋ชจ๋ฐฉํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฑด ์ง„์งœ๊ฐ€ ์•„๋‹ˆ์•ผ?โ€

MiA-Signature: Approximating Global Activation for Long-Context Understanding

๐Ÿ›๏ธ ์†Œ์†: Tencent (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: global activation, submodular selection, long-context understanding, compressed representation, working memory refinement

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๊ธด ๋ฌธ๋งฅ์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๋น ๋œจ๋ฆฌ์ง€ ์•Š๊ฒŒ ํ•˜๋ ค๋ฉด, ๋ชจ๋ธ์ด ์ „์ฒด ๋ฌธ์žฅ์„ ๊ธฐ์–ตํ•ด์•ผ ํ•˜๋Š” ๊ฑด๊ฐ€์š”?โ€
  • โ€œRAG๋‚˜ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์—์„œ โ€˜๋ชจ๋“  ํ™œ์„ฑํ™”๋œ ์ •๋ณดโ€™๋ฅผ ์“ฐ๋Š” ๊ฑด ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฑฐ ์•„๋ƒ?โ€
  • โ€œLLM์ด โ€˜์‚ฌ์‹ค์ƒโ€™ ๊ธฐ์–ตํ•˜๋Š” ์ •๋ณด๋ฅผ ์••์ถ•ํ•ด์„œ ์žฌํ˜„ํ•˜๋Š” ๊ฑด ๊ฐ€๋Šฅํ• ๊นŒ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” ๊ธด ๋ฌธ๋งฅ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ ์ „์ฒด ํ† ํฐ์„ ๊ทธ๋Œ€๋กœ ํ™œ์šฉํ–ˆ์œผ๋‚˜, ์ด ๋…ผ๋ฌธ์€ โ€˜์ „์ฒด ํ™œ์„ฑํ™” ์ƒํƒœโ€™๋ฅผ ์••์ถ•๋œ ๊ฐœ๋… ๊ธฐ๋ฐ˜ ํ‘œํ˜„(MiA-Signature)์œผ๋กœ ๋Œ€์ฒดํ•จ]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • ๊ธด ๋ฌธ๋งฅ ์ดํ•ด์—์„œ MiA-Signature๊ฐ€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค **3.2๋ฐฐ ๋” ๋น ๋ฅธ ๊ณ„์‚ฐ ์†๋„**๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, **RAG ์‹œ์Šคํ…œ์—์„œ 2.8% ์„ฑ๋Šฅ ํ–ฅ์ƒ**์„ ๊ธฐ๋ก
  • **์ž‘์—… ๊ธฐ์–ต์„ ํ™œ์šฉํ•œ ๋ฐ˜๋ณต ์กฐ์ •**์„ ํ†ตํ•ด **10% ์ด์ƒ์˜ ์ •ํ™•๋„ ํ–ฅ์ƒ**์„ ๋‹ฌ์„ฑ (์ตœ๋Œ€ 128๊ฐœ์˜ ๊ณ ์ˆ˜์ค€ ๊ฐœ๋…์œผ๋กœ ํ™œ์„ฑํ™” ๊ณต๊ฐ„์„ ์ปค๋ฒ„)

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

โ€œ์ „์ฒด ํ† ํฐ์„ ๋น ์ง์—†์ด ์“ฐ๋Š” ๋ฐฉ์‹โ€ โ†’ โ€œ์••์ถ•๋œ ๊ฐœ๋… ์ง‘ํ•ฉ์œผ๋กœ ์ „์ฒด ํ™œ์„ฑํ™” ํŒจํ„ด์„ ๋Œ€์ฒดโ€

๋…ผ๋ฌธ ๋ณด๊ธฐ โ†’ Yuqing Li, Jiangnan Li, Mo Yu ์™ธ 3๋ช…
6
๐Ÿ”ฅ ํŠธ๋ Œ๋”ฉ 260+
Ai2

๐Ÿค– โ€œ๋กœ๋ด‡์ด โ€˜์™œโ€™ ์›€์ง์ด๋Š”์ง€, โ€˜์–ด๋–ป๊ฒŒโ€™ ๊ฒฐ์ •ํ•˜๋Š”์ง€โ€ฆ ์ด์   ์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•ด์š”?โ€

MolmoAct2: Action Reasoning Models for Real-world Deployment

๐Ÿ›๏ธ ์†Œ์†: Ai2

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: Vision-Language-Action, Open-weight, Embodied Reasoning, Flow-Matching, Adaptive Depth

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๋กœ๋ด‡์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์•ˆ์ •์ ์œผ๋กœ ์›€์ง์ด๋ ค๋ฉด, ์™œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๋ชจ๋ธ์ด ์‹คํŒจํ–ˆ๋‚˜์š”?โ€
  • โ€œ์˜คํ”ˆ์›จ์ดํŠธ ๋ชจ๋ธ์ด ๋กœ๋ด‡ ์ œ์–ด์— ์“ฐ์ผ ์ˆ˜ ์žˆ์„๊นŒ์š”? ์„ฑ๋Šฅ๊ณผ ์ง€์—ฐ ์‚ฌ์ด์˜ ๊ท ํ˜•์€?โ€
  • โ€œ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ๋กœ๋ด‡์ด โ€˜๋ณ€ํ™”๋œ ์žฅ๋ฉดโ€™๋งŒ ์žฌ์ถ”๋ก ํ•˜๊ณ , โ€˜๋ณ€ํ•˜์ง€ ์•Š์€ ์žฅ๋ฉดโ€™์€ ๊ธฐ์–ตํ•ด๋‘๋Š” ๊ฑธ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” ๋‹ซํžŒ ์‹œ์Šคํ…œ์ด๋‚˜ ๋น„์‹ผ ํ•˜๋“œ์›จ์–ด์— ์˜์กดํ•˜๋ฉฐ, ์ง€์—ฐ์ด ๊ธธ๊ณ  ์„ฑ๊ณต๋ฅ ์ด ๋‚ฎ์€ VLA ๋ชจ๋ธ์ด์—ˆ๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์€ ์™„์ „ ์˜คํ”ˆ์›จ์ดํŠธ + ์ €์ง€์—ฐ + ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•˜๋Š” MolmoAct2๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • MolmoER๋Š” GPT-5์™€ Gemini Robotics ER-1.5๋ฅผ ์ œ์น˜๊ณ , 13๊ฐœ์˜ ๋ชธ์ฒด ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์œ„๋ฅผ ์ ๅ  โ€” 3.3M ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›ˆ๋ จํ–ˆ์œผ๋ฉฐ, โ€œ์ŠคํŽ˜์…œ๋ผ์ด์ฆˆ-ํ…Œ์ธ-๋ฆฌํ—ˆ์„ธโ€ ์ „๋žต์„ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
  • MolmoAct2-BimanualYAM์€ 720์‹œ๊ฐ„์˜ ํ…”๋กœํŽ˜๋ ˆ์ดํ‹ฐ๋“œ ์ด์ค‘ ์† ์žฅ์น˜ ๋ฐ์ดํ„ฐ๋กœ, ํ˜„์žฌ ๊ฐ€์žฅ ํฐ ์˜คํ”ˆ ์ด์ค‘ ์† ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ Franka(DROID)์™€ SO100/101์˜ ํ•„ํ„ฐ๋ง๋œ ํ•˜์œ„์…‹๋„ ํ•จ๊ป˜ ์ œ๊ณต.

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

โ€œํ์‡„ํ˜• ๋ชจ๋ธ + ๋น„์‹ผ ํ•˜๋“œ์›จ์–ด + ์ง€์—ฐ์ด ๊ธด ์ถ”๋ก โ€ โ†’ โ€œ์˜คํ”ˆ์›จ์ดํŠธ + ์ €์ง€์—ฐ + ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ๋†’์€ ์„ฑ๊ณต๋ฅ โ€

7
๐Ÿ›๏ธ ๋น…ํ…Œํฌ
Google

๐Ÿง  โ€œAI๊ฐ€ ๋งํ•˜๋Š” ์ง„์‹ค์€ ๋ฏฟ์„ ์ˆ˜ ์—†์–ด? ๊ทธ ๋‹ต์€ โ€˜๋ชจ๋ฅด๋Š” ๊ฑธ ์ธ์ •ํ•˜๋ผโ€™โ€

Hallucinations Undermine Trust; Metacognition is a Way Forward

๐Ÿ›๏ธ ์†Œ์†: Google (๋น…ํ…Œํฌ)

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: hallucinations, metacognition, uncertainty, LLMs, trustworthy AI

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œAI๊ฐ€ โ€˜ํ™•์‹คํ•˜๊ฒŒโ€™ ๋งํ•œ ๊ฑธ ๋ฏฟ์„ ์ˆ˜ ์žˆ์„๊นŒ?โ€
  • โ€œ์ •๋‹ต์„ ๋ชจ๋ฅด๋Š” ๊ฑธ ์–ด๋–ป๊ฒŒ ์•Œ ์ˆ˜ ์žˆ์„๊นŒ?โ€
  • โ€œAI๊ฐ€ ํ‹€๋ ธ์„ ๋•Œ, ๊ทธ๊ฑธ โ€˜์ž˜๋ชปโ€™์ด๋ผ๊ณ  ์ธ์ •ํ•ด์•ผ ํ•˜๋Š” ๊ฑด๊ฐ€?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” โ€œ์ •๋‹ต์„ ์ค„ ๊ฒƒโ€ ๋˜๋Š” โ€œ๋ชจ๋ฅด๋ฉด ์•„๋ฌด๊ฒƒ๋„ ๋งํ•˜์ง€ ๋ง ๊ฒƒโ€์ด๋ผ๋Š” ์ด์ง„์  ์„ ํƒ์ด์—ˆ๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์€ โ€œ๋ชจ๋ฅด๋Š” ๊ฑธ ์†”์งํžˆ ๋งํ•˜๋ผโ€๋Š” ์ƒˆ๋กœ์šด ๊ธธ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • **โ€œ์ •๋‹ต-๋ฌด์ง€โ€ ์ด์ง„๋ฒ•์ด ์•„๋‹Œ โ€˜์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ถˆํ™•์‹ค์„ฑ ํ‘œํ˜„โ€™์ด ๊ฐ€๋Šฅํ•ด์กŒ์Œ** โ€” ๋…ผ๋ฌธ์€ โ€œํ™•์‹  ์—†๋Š” ์ •๋ณด๋Š” ๋ฐ˜๋“œ์‹œ ์™œ๊ณก๋œ ์ •๋ณดโ€๋ผ๋Š” ์ „์ œ๋ฅผ ๊นจ๋œจ๋ฆฌ๋ฉฐ, ๋ชจ๋ธ์ด ์ž์‹ ์ด ๋ชจ๋ฅธ๋‹ค๊ณ  ์ธ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”ํƒ€์ธ์ง€ ๊ธฐ๋Šฅ์„ ์ œ์•ˆํ•จ.
  • **โ€œ์‹ค์ œ ์‹คํ—˜์—์„œ 30%์˜ ํ—ˆ๊ตฌ์  ๋‹ต๋ณ€์„ ๊ฐ์†Œ์‹œ์ผฐ์Œโ€** โ€” ๊ธฐ์กด ๋ชจ๋ธ์ด โ€˜ํ™•์‹คํ•˜๊ฒŒโ€™ ํ—ˆ๊ตฌ๋ฅผ ๋‚ด๋ฑ‰๋Š” ๋Œ€์‹ , โ€˜๋ชจ๋ฅด๋Š” ๊ฑธ ์ธ์ •โ€™ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ 30%์˜ ํ—ˆ๊ตฌ ๊ฐ์†Œ ํšจ๊ณผ๋ฅผ ๋ณด์˜€์Œ.

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

**โ€œ์ •๋‹ต๋งŒ ์ฃผ๊ณ , ํ‹€๋ ธ์„ ๋•Œ๋Š” ์•„๋ฌด๊ฒƒ๋„ ๋งํ•˜์ง€ ๋ง๋ผโ€ โ†’ โ€œํ‹€๋ ธ์„ ๋•Œ๋Š” โ€˜๋ชจ๋ฅด๋Š” ๊ฑธ ์†”์งํžˆ ๋งํ•˜๋ผโ€™โ€**

๋…ผ๋ฌธ ๋ณด๊ธฐ โ†’ Gal Yona, Mor Geva, Yossi Matias
8
๐Ÿ”ฅ ํŠธ๋ Œ๋”ฉ 116+
FrameX-AI

๐Ÿ”ฅ โ€œ์™œ ๋ชจ๋“  ํ”„๋ ˆ์ž„์ด ๋˜‘๊ฐ™์ด ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋‚˜์š”? ๋น„๋””์˜ค ์ƒ์„ฑ์€ โ€˜์–ด๋””โ€™๊ฐ€ ์ค‘์š”ํ•˜๊ณ  โ€˜์–ด๋–ป๊ฒŒโ€™ ํ›ˆ๋ จํ•ด์•ผ ํ• ์ง€ ์•„๋Š” ๊ฒŒ ํ•ต์‹ฌ์ด์—์š”.โ€

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

๐Ÿ›๏ธ ์†Œ์†: FrameX-AI

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: reward distillation, streaming video generation, reliability-aware, perplexity-aware, spatiotemporal weighting

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๋ชจ๋“  ํ”„๋ ˆ์ž„์„ ๋˜‘๊ฐ™์ด ์‹ ๋ขฐํ•ด์„œ ํ›ˆ๋ จํ•˜๋ฉด ์™œ ํ’ˆ์งˆ์ด ์•ˆ ์˜ค๋ฅผ๊นŒ?โ€
  • โ€œ๋น„๋””์˜ค ํ’ˆ์งˆ์ด ๋–จ์–ด์ง€๋Š” ๋ถ€๋ถ„์€ ์–ด๋””์— ์žˆ๋Š” ๊ฑธ๊นŒ? ๊ทธ๋ƒฅ ์ „์ฒด๋ฅผ ๋‹ค ํ•™์Šตํ•˜๋ฉด ํšจ์œจ์ด ์—†์ž–์•„.โ€
  • โ€œ๋ชจ๋ธ์ด โ€˜์–ด๋””โ€™์— ์ง‘์ค‘ํ•ด์„œ ๊ฐœ์„ ํ•ด์•ผ ํ• ์ง€, ์–ด๋–ป๊ฒŒ ํŒ๋‹จํ•ด์•ผ ํ• ๊นŒ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: "๊ธฐ์กด์—๋Š” X์˜€๋Š”๋ฐ, ์ด ๋…ผ๋ฌธ์€ Y๋กœ ๋’ค์ง‘์—ˆ์Šต๋‹ˆ๋‹ค" ๊ตฌ์กฐ]

๊ธฐ์กด์—๋Š” ํ•™์Šต ์‹œ ๋ชจ๋“  rollout๊ณผ ํ”„๋ ˆ์ž„, ํ”ฝ์…€์„ ๋™์ผํ•œ ์‹ ๋ขฐ๋„๋กœ ์ทจ๊ธ‰ํ•ด ์ „์ฒด์ ์œผ๋กœ ํ›ˆ๋ จํ–ˆ์ง€๋งŒ, Stream-R1์€ rollout ๊ฐ„ ์‹ ๋ขฐ๋„(Inter-Reliability)์™€ ๊ณต๊ฐ„ยท์‹œ๊ฐ„ ๋‚ด perplexity(Intra-Perplexity)๋ฅผ ๋ถ„๋ฆฌํ•ด์„œ ๊ฐ๊ฐ ๋‹ค๋ฅด๊ฒŒ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•ด, ํ•™์Šต ํšจ์œจ๊ณผ ํ’ˆ์งˆ์„ ๋™์‹œ์— ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • **๋น„๋””์˜ค ํ’ˆ์งˆ, ์›€์ง์ž„ ํ’ˆ์งˆ, ํ…์ŠคํŠธ ์ •๋ ฌ 3๊ฐœ ์ถ•์—์„œ ํ‰๊ท  1.5~2.3๋ฐฐ ํ–ฅ์ƒ** (๊ธฐ์กด DMD ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• ๋Œ€๋น„, benchmark์—์„œ 1.5๋ฐฐ~2.3๋ฐฐ ํ–ฅ์ƒ)
  • **์ŠคํŽ˜์ด์Šค-ํƒ€์ž„ ๋‹จ์œ„์˜ ๊ฐœ์„  ํฌ์ธํŠธ๋ฅผ 87%์˜ ์ •ํ™•๋„๋กœ ์ž๋™ ํƒ์ƒ‰** (reward ๋ชจ๋ธ ๊ธฐ๋ฐ˜ saliency map์„ ํ†ตํ•ด 87%์˜ ์ง€์—ญ/ํ”„๋ ˆ์ž„์ด ๊ฐœ์„  ํšจ๊ณผ๊ฐ€ ๋†’์€ ๊ฒƒ์œผ๋กœ ์‹๋ณ„)

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

**์ผ๊ด„์ ์œผ๋กœ ๋ชจ๋“  ํ”„๋ ˆ์ž„์„ ์‹ ๋ขฐํ•˜๊ณ  ํ›ˆ๋ จํ•˜๋Š” ๋ฐฉ์‹ โ†’ rollout๋ณ„ ์‹ ๋ขฐ๋„์™€ ๊ณต๊ฐ„ยท์‹œ๊ฐ„๋ณ„ perplexity์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ์‹**

9
๐Ÿ”ฅ ํŠธ๋ Œ๋”ฉ 143+

๐Ÿง  โ€œ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด LLM์ด ๋งฅ๋ฝ์—์„œ ์Šค์Šค๋กœ โ€˜์Šคํ‚ฌโ€™์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์„๊นŒ? โ€” ์ด๊ฑด ๋‹จ์ˆœํ•œ ์งˆ๋ฌธ์ด ์•„๋‹™๋‹ˆ๋‹ค.โ€

From Context to Skills: Can Language Models Learn from Context Skillfully?

๐Ÿ›๏ธ ์†Œ์†: Unknown

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: context learning, skill discovery, self-play, multi-agent, CL-bench

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ๋งฅ๋ฝ์ด ๊ธธ๊ณ  ๋ณต์žกํ•  ๋•Œ LLM์ด ์Šค์Šค๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์„๊นŒ?โ€
  • โ€œ์‚ฌ๋žŒ์ด ์ง์ ‘ โ€˜์Šคํ‚ฌโ€™์„ ๋งŒ๋“ค์–ด์ฃผ๋ฉด ์ข‹๊ฒ ๋Š”๋ฐ, ๊ทธ๊ฒŒ ํ˜„์‹ค์ด ์•„๋‹ˆ์•ผ.โ€
  • โ€œ์ž๋™์œผ๋กœ ํ•™์Šตํ•˜๋Š” โ€˜์Šคํ‚ฌโ€™์ด ์ง„์งœ๋กœ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฑธ๊นŒ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” ๋งฅ๋ฝ์—์„œ ํ•™์Šตํ•˜๋ ค๋ฉด ์ˆ˜์ž‘์—…์œผ๋กœ ์Šคํ‚ฌ์„ ๋งŒ๋“ค์–ด์•ผ ํ–ˆ๊ณ , ์™ธ๋ถ€ ํ”ผ๋“œ๋ฐฑ๋„ ์—†์—ˆ์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์€ ์ž์œจ์ ์œผ๋กœ ์Šคํ‚ฌ์„ ํƒ์ƒ‰ยท์ •์ œยท์„ ํƒํ•˜๋Š” Ctx2Skill ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • CL-bench 4๊ฐœ ํƒœ์Šคํฌ์—์„œ ** backbone ๋ชจ๋ธ์˜ ํ•ด๊ฒฐ๋ฅ ์„ ํ‰๊ท  18.7% ํ–ฅ์ƒ**
  • **Cross-time Replay ๋ฉ”์ปค๋‹ˆ์ฆ˜์œผ๋กœ 27.4%์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ**์„ ๋‹ฌ์„ฑ

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

โ€œ์‚ฌ๋žŒ์ด ์ˆ˜์ž‘์—…์œผ๋กœ ์Šคํ‚ฌ์„ ๋งŒ๋“ค์–ด์•ผ ํ–ˆ๋˜ ๋งฅ๋ฝ ํ•™์Šตโ€ โ†’ โ€œ์ž์œจ์ ์œผ๋กœ ์Šคํ‚ฌ์„ ํƒ์ƒ‰ยท์ •์ œยท์„ ํƒํ•˜๋Š” ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œโ€

10
RLWRLD

๐Ÿค– โ€œ์–ด๋–ป๊ฒŒ ์ด๋Ÿฐ ๊ฑธ ํ•œ ๊ฑฐ์•ผ? ์ธ๊ฐ„์ฒ˜๋Ÿผ ๋‹ค๋ฃจ๋Š” ๋กœ๋ด‡ ์ •์ฑ…, ์ด์ œ ํ˜„์‹ค์—์„œ ๊ฐ€๋Šฅํ•ด์กŒ๋„ค!โ€

RLDX-1 Technical Report

๐Ÿ›๏ธ ์†Œ์†: RLWRLD

๐Ÿท๏ธ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ: Vision-Language-Action, Multi-Stream Action Transformer, Dexterous Manipulation, Real-Time Deployment, Humanoid Control

๐Ÿ’ญ ์ด๋Ÿฐ ์งˆ๋ฌธ์„ ํ•ด๋ณธ ์  ์žˆ๋‚˜์š”?

  • โ€œ์–ด๋–ค ๋กœ๋ด‡์ด ์ธ๊ฐ„์ฒ˜๋Ÿผ ๋ฌผ๊ฑด์„ ๋‹ค๋ฃจ๋Š” ๊ฑธ๊นŒ?โ€
  • โ€œ์‹ค์ œ๋กœ ์›€์ง์ด๋Š” ๋กœ๋ด‡์ด ์–ธ์–ด์™€ ์‹œ๊ฐ์„ ๋™์‹œ์— ์ดํ•ดํ•˜๊ณ  ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ์„๊นŒ?โ€
  • โ€œ์‹ค์ œ ๋กœ๋ด‡์ด ์‹คํŒจํ•˜์ง€ ์•Š๊ฒŒ ํ•˜๋Š” ๊ฑด, ๋‹จ์ˆœํžˆ โ€˜์ง€๋Šฅโ€™์ด ์•„๋‹ˆ๋ผ โ€˜์‹œ์Šคํ…œ ์„ค๊ณ„โ€™์•ผ?โ€

[ํ•ต์‹ฌ ์„ค๋ช…: ๊ธฐ์กด์—๋Š” ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ๋งŒ์œผ๋กœ ๋กœ๋ด‡ ์ •์ฑ…์„ ๊ตฌ์ถ•ํ–ˆ์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์€ โ€˜์›€์ง์ž„ ์ธ์‹โ€™, โ€˜๊ธฐ์–ต ๊ธฐ๋ฐ˜ ์˜์‚ฌ๊ฒฐ์ •โ€™, โ€˜๋ฌผ๋ฆฌ ๊ฐ์ง€โ€™ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ํ†ตํ•ฉํ•œ MSAT ์•„ํ‚คํ…์ฒ˜์™€ ์‹œ์Šคํ…œ ์„ค๊ณ„๋ฅผ ๊ฒฐํ•ฉํ•ด, ์‹ค์ œ ์ธ๊ฐ„์ฒ˜๋Ÿผ ๋‹ค๋ฃจ๋Š” ๋กœ๋ด‡ ์ •์ฑ…์„ ๊ตฌํ˜„ํ–ˆ๋‹ค.]

ํŠนํžˆ ์ฃผ๋ชฉํ•  ์ :

  • ALLEX ์ธ์ฒด ๋กœ๋ด‡ ํƒœ์Šคํฌ์—์„œ ์„ฑ๊ณต๋ฅ  86.8% ๋‹ฌ์„ฑ โ€” ฯ€_{0.5}์™€ GR00T N1.6(์•ฝ 40%)๋ณด๋‹ค 2๋ฐฐ ์ด์ƒ ๋†’์Œ
  • ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ ์š”๊ตฌ์‚ฌํ•ญ(์›€์ง์ž„, ๊ธฐ์–ต, ๋ฌผ๋ฆฌ ๊ฐ์ง€)์„ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์‹ค์‹œ๊ฐ„ ๋ฐฐํฌ ์ตœ์ ํ™” ์ ์šฉ

๐ŸŽฏ ์™œ ์ด๊ฒƒ์ด ๊ฒŒ์ž„ ์ฒด์ธ์ €์ธ๊ฐ€? :

โ€œ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธ๋งŒ์œผ๋กœ ๋กœ๋ด‡์„ ์ œ์–ดํ•˜๋ผโ€ โ†’ โ€œ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋ฅผ ํ†ตํ•ฉํ•œ ์•„ํ‚คํ…์ฒ˜ + ์‹œ์Šคํ…œ ์„ค๊ณ„๋กœ ์‹ค์ œ ๋กœ๋ด‡์„ ์ธ๊ฐ„์ฒ˜๋Ÿผ ๋‹ค๋ฃจ๋ผโ€

โœ‰๏ธ

๋งค์ผ ๋ฐ›์•„๋ณด์„ธ์š”

AI ๋ฐ์ผ๋ฆฌ ๋‰ด์Šค ยท ๋…ผ๋ฌธ ยท GitHub ํŠธ๋ Œ๋“œ๋ฅผ ๋งค์ผ ํ•œ๊ตญ์–ด๋กœ ์ •๋ฆฌํ•ด ๋ณด๋‚ด๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์ŠคํŒธ ์—†์Œ ยท ์–ธ์ œ๋“  ๊ตฌ๋…์ทจ์†Œ ๊ฐ€๋Šฅ