学习经历
2001.2-2003.12,中国科大统计与金融系概率论与数理统计专业(导师:应志良教授、赵林城教授),获理学博士学位。
1997.9-2000.6,中国科大数学系应用数学专业(导师:赵林城教授),获理学硕士学位。
1993.9-1997.6,中国科大数学系计算数学及其应用软件专业,获理学学士学位。
工作经历
2018.6至今,中国科大统计与金融系,教授、博导。
2011.2-2018.5,复旦大学生物统计学研究所,研究员、博导。
2000.7-2011,中国科大统计与金融系,助教/讲师。
博士后研究
2009.1-2010.12,美国NIH国立癌症研究所(postdoctoral visiting fellow)
2006.8-2007.11,耶鲁大学医学院(postdoctoral associate)
2004.3-2005.4,乔治华盛顿大学统计系(postdoctoral scientist)
主要研究方向
机器学习、因果推断、生物统计。
研究成果
在Biometrika、Biometrics、Bioinformatics、Annals of Applied Statistics、Statistics in Medicine、Statistica Sinica、Biostatistics 、 Proceedings of the National Academy of Sciences(PNAS)、Human Molecular Genetics、NeurIPS、KDD、MICCAI等国内外统计学期刊/会议上发表论文逾百篇,截止2025年3月获得1628次引用,其中大部分为他引(参看本人的谷歌学术主页)。
部分国家级研究项目
基于核心家庭全基因组关联研究的亲源效应分析,国家自然科学基金面上项目,12171451,51万,2022.1.1-2025.12.31,主持人
平台供应链的统计学习方法研究,国家自然科学基金重大项目(课题二),72091212,248万,2021.1-2025.12,专题负责人
基于新一代高通量测序数据的若干统计方法学研究,国家自然科学基金面上项目,11771096,48万,2018.1.1-2021.12.31,主持人
基于新一代测序数据的统计遗传学新理论、方法与应用,973项目(课题五),2012CB316505, 约900万,2012.1.1-2016.8.31,课题骨干
全基因组DNA甲基化研究中的统计学方法,国家自然科学基金面上项目,11371101, 50万,2014.1.1-2017.12.31,主持人
人类基因关联分析的若干问题,国家自然科学基金青年项目,10701067, 15万,2008.1.1-2010.12.31,主持人
研究生培养
截止到2024年已经培养了7名博士和19名硕士。
研究生招生
欢迎对研究有热情、具有扎实数理基础的本科生报考本人的博士研究生。
部分论文列表(更多论文列表见本人的researchgate主页或谷歌学术主页):
16. Cen M, Wang Z, Zhuang Z, Zhang H, Su D, Bao Z, Wei W, Magnier B, Yu L, Wang L. (2024). ORCGT: Ollivier-Ricci curvature-based graph model for lung STAS prediction. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. Lecture Notes in Computer Science, vol 15005. Springer, Cham. 553–563.
15. Yang S, Yuan H, Zhang X, Wang M, Zhang H, Wang H. (2024). Conversational dueling bandit in generalized linear models. KDD24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining pp. 3806–3817.
14. Cen M, Li X, Guo B, Jonnagaddala J, Zhang H#, Xu X# (2023). DPSeq: a novel and efficient digital pathology classifier for predicting cancer biomarkers using sequencer architecture. The American Journal of Pathology 193:12.
13. Zhang Y, Zhang X, Zhang H, Liu C. (2023). Low-rank latent matrix-factor prediction modeling for generalized high-dimensional matrix-variate regression. Statistics in Medicine 42(20):3616--3635.
12. Cheng Y, Feng S, Yang J, Zhang H, Liang Y. (2022). Provable benefit of multitask representation learning in reinforcement learning. Advances in Neural Information Processing Systems 35 (NeurIPS 2022) pp. 31741-31754. (Spotlight, ~5\%)
11. Sun X, Wen S, Lu C, Zhou B, Curnoe D, Lu H, Li H, Wang W, Cheng H, Yi S, Jia X, Du P, Xu X, Lu Y, Lu Y, Zheng H, Zhang H, Sun C, Wei L, Han F, Huang J, Edwards RL, Jin L, Li H (2021). Ancient DNA and multimethod dating confirm the late arrival of anatomically modern humans in southern China. Proceedings of the National Academy of Sciences 118: e2019158118.
10. Zhang H#, Mukherjee B, Arthur V, Hu G, Hochner H, Chen J# (2020). An Efficient and Computationally Robust Statistical Method for Analyzing Case-Control Mother-Offspring Pair Genetic Association Studies. Annals of Applied Statistics 14: 560-584.
9. Ji Y, Yu C, Zhang H# (2020). contamDE-lm: Linear model based differential gene expression analysis using next-generation RNA-seq data from contaminated tumor samples. Bioinformatics 36: 2492-2499.
8. Lyu T, Ying Z, Zhang H# (2019). A new semiparametric transformation approach to disease diagnosis with multiple biomarkers. Statistics in Medicine 38:1386-1398.
7. Zhang H, Chatterjee N, Rader D, Chen J (2018). Adjustment of non-confounding covariates in case-control genetic association studies. Annals of Applied Statistics 12(1):200-221.
6. Shen Q, Hu J, Jiang N, Hu X, Luo Z, Zhang H# (2016). contamDE: Difierential expression analysis of RNA-seq data for contaminated tumor samples. Bioinformatics 32(5): 705-712.
5. Zhang H#, Xu J, Jiang N, Hu X, Luo Z (2015). PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data. Statistics in Medicine 34: 1577-1589.
4. Zhang H, Qin J, Landi M, Caporaso N, Yu K (2013). A copula-model based semiparametric interaction test under the case-control design. Statistica Sinica. 23: 1505-1521.
3. Zhang H, Olschwang S, Yu K. (2010). Statistical inference on the penetrances of rare genetic mutations based on a case-family design. Biostatistics 11:519-532.
2. Chen K*, Ying Z*, Zhang H*, Zhao L* (2008). Analysis of least absolute deviation. Biometrika 95(1):107-122.
1. Zhang H, Zheng G, Li Z (2006). Statistical analysis for haplotype-based matched case-control studies. Biometrics 62:1124-1131.