“数字+”与之江统计讲坛(第54讲)6月21日印第安纳-普渡联合大学Honglang Wang讲座预告

发表时间:06月20日 15:52

讲座题目:Dominant-Set Based Clustering For Functional Data

主讲人:Honglang Wang

讲座时间:2024年 6月21日(周五)14:00-15:00

讲座地点:综合楼644会议室


主讲人简介: 

Dr. Honglang Wang is an Associate Professor of Statistics in the Department of Mathematical Sciences at Indiana University-Purdue University Indianapolis (IUPUI). He got his PhD in Statistics from Michigan State University in 2015. His research interests focus on Statistical Analysis for Longitudinal and Functional Data, High Dimensional Statistical Inference and its Applications, Causal Inference, Machine Learning/Deep Learning, Nonparametric Statistics, Empirical Likelihood Methods and its Applications, and Statistical Genetics/Genomics. 


讲座摘要:

Dominant-set based clustering is a sequential partitioning of data to maximize the within-cluster similarity using the concept from graph theory, which is different from common existing methods such as K-means clustering, agglomeration hierarchical clustering, and spectral clustering. We propose a hierarchical bipartition procedure under the penalized optimization framework with the tuning parameter selected by maximizing the modularity of the resulting two clusters. The proposed dominant-set based hierarchical clustering method is applied to functional data clustering with a flexible choice of similarity measures between curves. It is not only robust to imbalanced groups but also to outliers, which overcomes the limitation of many existing clustering methods.

We further propose a thorough semi-supervised clustering method that learns the metric by modularity maximization over a linear combination of similarity metric candidates from the labeled portion of the data, and perform hierarchical dominant-set based clustering tuned by modularity maximization.  The proposed algorithm is not only able to learn a global metric but also able to learn individual metrics for each cluster, which permits innovative clustering with overlapping clusters.  This is a general clustering method and superiorly applicable to functional data which in nature encompass a variety of metrics for comparing curves. Empirical investigations using simulation studies and real data applications demonstrate the advantages of our proposed methods.