报 告 人:佐治亚大学统计系暨生物信息研究所刘亮教授
报告题目:Species delimitation using machine learning
报告时间:2024年5月16日 上午9:30-10:30
报告地点:分析测试中心620
主办单位:生命科学学院、比较基因组学省高校重点实验室、江苏省基因组学国际联合研究中心、科学技术研究院
报告人简介:
刘亮,美国佐治亚大学统计系暨生物信息研究所教授。国际分子系统发育基因组学研究领域新型物种树方法的创始人之一,曾获2008年度国际系统生物学家协会优秀科研奖。长期担任 Systematic Biology, Bioinformatics, Journal ofMathematic Biology, Molecular Biology and Evolution, Molecular Ecology等国际学术期刊的评委,在Science、PNAS、Systematic Biology、Molecular Biology andEvolution、Bioinformatics等国际学术期刊发表论文80余篇,论文总引用次数约3.5万余次,单篇论文最高引用2.4万余次。担任美国国家自然科学基金委员会二审评委。
报告摘要:
In the realm of biology, species are identified through a classification systemthat groups together organisms with shared traits and the ability to reproducewith each other. There's a keen interest in understanding how species aredefined and whether their evolutionary roots can be traced through geneticsequences. Various techniques are employed for species identification, including Automated Barcode Gap Discovery, the General Mixed Yule Coalescentmethod, and the Poisson Tree Process method. Yet, these methods come withdrawbacks, such as time consumption and difficulty handling large datasets. In this talk, we delve into employing supervised machine learning techniques like Catboost, XGboost, Classification Tree, Support Vector Machine, and K-nearestNeighbors. Additionally, we explore unsupervised machine learning with K-meansClustering and a deep learning approach using Neural Networks for speciesdelimitation. We examine five distinct species trees as our test cases. Ourfindings reveal that supervised machine learning models exhibit superioraccuracy compared to unsupervised machine learning and deep learning models.