A fast and accurate estimator for large scale linear model viadata averaging

发稿时间:2021-06-08浏览次数:

报告题目:A fast and accurate estimator for large scale linear model viadata averaging

主讲人:许王莉

报告摘要:This work is concerned with the estimation problem of linear model when the sample size is extremely large and the data dimension can vary with the sample size. In this setting,the least square estimator based on full data is not feasible with limited computational resources. Many existing methods for this problem are based on sketching technique. We derive fine-grained lower bounds of the conditional mean squared error for sketching methods. For sampling methods, our lower bound provides an attainable optimal convergence rate. We propose a new sketching method based on data averaging. The proposed method reduces the original data to a few averaged observations. These averaged observations still satisfy the linear model and are used to estimate the regression coefficients. The asymptotic behavior of the proposed estimation procedure is studied. Our theoretical results show that the proposed method can achieve a faster convergence rate than the optimal convergence rate for sampling methods. Theoretical and numerical results show that the proposed estimator has good statistical performance as well as low computational cost.

许王莉简介:中国人民大学统计学教授,博士生导师,中国人民大学教学督导专家。2010 年先后入选“新世纪优秀人才计划”和“北京市科技新星计划”。近年来一直从事模型拟合优度检验,高维数据分析,随机缺失数据,两阶段抽样数据以及纵向数据分析等方面的统计推断研究。先后主持了4项国家自然科学基金,以及教育部人文社会科学重点研究基地重大项目,北京市自然科学基金重点项目和教育部人文社科基金等多项科研课题, 在统计学国际一流期刊(包括顶级期刊)发表论文70余篇,并在科学出版社合作出版《非参数蒙特卡洛检验及其应用》和单著《缺失数据的模型检验及其应用》。

报告时间:2021年6月10日下午19点

报告形式:腾讯会议

会议链接:https://meeting.tencent.com/s/CiMtoqcYf3mr

会议 ID:796 242 590

主办单位:科研处/数学与统计学院