실리콘밸리, AI 에이전트 훈련 ‘환경’에 투자 경쟁

2025.09.22

Reinforcement learning (RL) environments for training AI agents are emerging as a next-generation core technology in Silicon Valley. While current agents released by OpenAI and Perplexity still show clear limitations, the industry is betting on simulated training grounds capable of handling complex tasks as a new growth driver. An RL environment is essentially a training ground where AI mimics real software use to perform tasks, offering more complexity and stronger learning effects than static datasets. Google DeepMind’s AlphaGo and OpenAI’s early RL projects are cited as precedents. Recently, RL environments have evolved to handle practical tasks, such as using a simulated browser to make an online purchase. Investment momentum is strong. Startups like Mechanize and Prime Intellect are entering the market to compete for leadership, while traditional data-labeling companies such as Surge and Mercor are shifting their strategies from static datasets toward simulation-focused approaches. Anthropic is reportedly considering more than $1 billion in investment in RL environments by next year. Still, questions remain over scalability. Experts warn that RL environments are vulnerable to structural problems such as “reward hacking,” and some major labs predict it will be difficult to achieve quick results. Even so, Silicon Valley is rallying around the idea that RL environments could be the key to maintaining AI’s next wave of competitive advantage, fueling further investment.

실리콘밸리에서 AI 에이전트 훈련을 위한 강화학습(RL) 환경이 차세대 핵심 기술로 떠오르고 있다. 오픈AI와 퍼플렉시티 등에서 내놓은 에이전트들이 아직 한계가 뚜렷한 가운데, 산업 전반이 복잡한 작업을 시뮬레이션할 수 있는 훈련장을 새로운 성장 동력으로 점찍은 것이다. RL 환경은 AI가 실제 소프트웨어 사용 과정을 흉내 내며 과제를 수행하는 훈련장으로, 단순 데이터셋보다 복잡하고 강력한 학습 효과를 제공한다. 구글 딥마인드의 알파고나 오픈AI의 초기 RL 프로젝트가 그 전례로 꼽힌다. 최근에는 브라우저 시뮬레이션을 통해 온라인 쇼핑을 수행하는 등 실용적 과제를 다루는 방식으로 발전하고 있다. 투자 열기도 뜨겁다. Mechanize, Prime Intellect 같은 신생 기업들이 잇따라 등장하면서 시장 주도권 경쟁에 뛰어들고 있으며, 전통적인 데이터 라벨링 기업인 Surge와 Mercor도 정적 데이터셋에서 시뮬레이션 중심으로 전략을 전환하고 있다. Anthropic은 내년까지 10억 달러 이상을 투입하는 방안도 검토 중이다. 다만 확장성에 대한 의문도 여전하다. 전문가들은 RL 환경이 ‘보상 해킹’ 같은 구조적 문제에 취약하다고 지적하며, 일부 대형 연구소들도 단기간 내 성과를 내기 어렵다는 전망을 내놓고 있다. 그럼에도 실리콘밸리는 RL 환경이 차세대 AI 경쟁력의 핵심이라는 데 의견을 모으며 투자를 이어가고 있다.

#AI 에이전트 #실리콘밸리

버트

ai@tech42.co.kr

기자의 다른 기사보기

실리콘밸리, AI 에이전트 훈련 ‘환경’에 투자 경쟁

버트

관련 기사

클로드 코드 소스코드 51만 줄 유출...앤트로픽 "인적 오류, 고객 데이터 무관"

“몸값 3조 육박, K-엔비디아 떴다”… 리벨리온, 4억 달러 유치하며 미국 정조준

블루스카이 AI 도입에 유저 12만 명 ‘집단 차단’

"미국인 80% AI 우려, Z세대 가장 비관적"...미국 설문조사 결과

실리콘밸리, AI 에이전트 훈련 ‘환경’에 투자 경쟁

버트

관련 기사

클로드 코드 소스코드 51만 줄 유출...앤트로픽 "인적 오류, 고객 데이터 무관"

“몸값 3조 육박, K-엔비디아 떴다”… 리벨리온, 4억 달러 유치하며 미국 정조준

블루스카이 AI 도입에 유저 12만 명 ‘집단 차단’

"미국인 80% AI 우려, Z세대 가장 비관적"...미국 설문조사 결과

많이 본 기사