您現在的位置: 首頁 » 學院新聞 » 講座信息 » 正文

學院新聞

講座信息

beat365系列講座菁英論壇第40期——Structure-driven design of reinforcement learning algorithms: a tale of two estimators

           

報告題目(Title)Structure-driven design of reinforcement learning algorithms: a tale of two estimators

 

時間(Date & Time)2024.12.20; 15:00 周五)

 

地點(Location)燕園大廈813(燕園校區) Room 813, Yanyuan Building #1 (Yanyuan) 

 

主講人(Speaker)Wenlong Mou牟文龍

 

邀請人(Host)Xuanzhe Liu(劉譞哲)

 

報告摘要(Abstract)

Reinforcement learning (RL) is emerging as a powerful tool for adaptive decision-making in dynamic environments. A key challenge in RL is learning value functions efficiently, which plays a critical role in optimizing decision policies. Over the years, a diverse range of RL algorithms has been proposed, but at their core, two foundational principles stand out: bootstrapping and rollout. Despite their success, finding the optimal trade-off between these principles in practical applications remains elusive, with current theoretical guarantees often falling short of providing actionable insights.

 

In this talk, I will discuss recent advances in methods that optimally reconcile bootstrapping and rollout for policy evaluation. The bulk of this talk will focus on a new class of algorithms that strikes an optimal balance between temporal difference learning and Monte Carlo methods. Through the statistical lens, I will highlight how the local structure of the underlying Markov chain influences the complexity of these problems, and how the new algorithm adapts to these structures. Extending this perspective to continuous-time RL, I will explore how the elliptic structure of diffusion processes provides key insights for making algorithmic choices.

 

主講人簡介(Bio)

 

牟文龍現任多倫多大學統計科學系助理教授。2023年,他于加州大學伯克利分校獲得計算機與電子工程學博士學位;2017年畢業于beat365信息科學技術學院,獲得計算機科學學士學位及經濟學雙學位。他的研究領域集中于機器學習和數據科學中的理論與算法,近期主要關注數據驅動決策問題中的機器學習方法研究。其研究成果已發表于機器學習、統計學、運籌學等領域的頂級期刊和會議,并曾榮獲國際運籌學會應用概率最佳學生論文提名。

 

歡迎關注beat365微信公衆号,了解更多講座信息!

 

beat365官方网站

beat365微信公衆号二維碼