史成春博士,现任伦敦政治经济学院统计系副教授,曾在北卡罗来纳州立大学(North Carolina State University)获得统计学博士学位。他的研究主要集中在强化学习领域(Reinforcement Learning),特别是在策略评估(Policy Evaluation)、因果推断(Causal Inference)、半监督学习(Semi-Supervised Learning)等方面的应用与优化。史博士曾荣获Institute of Mathematical Statistics (IMS) Tweedie Award和Royal Statistical Society (RSS) Research Prize等奖项。
This talk considers policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.