Topic Modeling in NLP 공부 커리큘럼 정리
0. Latent Dirichlet Allocation (LDA) (David M Blei , 2003)
1. Neural Topic Model (NTM)
1-1. ProdLDA (Neural-ProdLDA) (Srivastava and Sutton, 2017)
1-2. Combined TM (Bianchi et al., 2020)
1-3. ZeroshotTM (Bianchi et al., 2021)
2. Evaluation Metrics
C_v , Purity
Top-Purity and Normalized Mutual Information(Top-NMI) as metrics(Nguyen et al., 2018)
The KMeans algorithm to topic proportions z and use the clustered documents to report purity(Km-Purity) and NMI Km-NMI (Zhao et al., 2020a)
2.1 Topic Coherence
2.1.1 Normalized Pointwise Mutual Information (NPMI) , (Lau et al., 2014)
2.1.2 Word Embedding (WE) (Fang et al., 2016)
2.2 Topic Diversity
2.2.1 Topic Uniqueness (TU) (Dieng et al., 2020 , Topic modeling in embedding spaces )
2.2.2 Inversed Rank-Biased Overlap (I-RBO) (Bianchi et al., 2021 , Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence )
3. 참고
3.1 C_v , Purity
3.2 : Top-Purity and Normalized Mutual Information(Top-NMI) as metrics(Nguyen et al., 2018)
3.3 : The KMeans algorithm to topic proportions z and use the clustered documents to report purity(Km-Purity) and NMI : Km-NMI (Zhao et al., 2020a)
3.4 : Standard RBO (Webber et al., 2010; Terragni et al., 2021b)