Study/Topic Modeling

Topic Modeling in NLP 공부 커리큘럼 정리

Seung-won Seo 2024. 1. 8. 23:45

 

0. Latent Dirichlet Allocation (LDA) (David M Blei , 2003)

 

 

1. Neural Topic Model (NTM) 

 

1-1. ProdLDA (Neural-ProdLDA) (Srivastava and Sutton, 2017)

1-2. Combined TM (Bianchi et al., 2020)

1-3. ZeroshotTM (Bianchi et al., 2021)

 

2. Evaluation Metrics

C_v , Purity 

 

Top-Purity and Normalized Mutual Information(Top-NMI) as metrics(Nguyen et al., 2018)

The KMeans algorithm to topic proportions z and use the clustered documents to report purity(Km-Purity) and NMI Km-NMI (Zhao et al., 2020a)

 

 

2.1 Topic Coherence

 

2.1.1 Normalized Pointwise Mutual Information (NPMI) , (Lau et al., 2014) 

2.1.2 Word Embedding (WE) (Fang et al., 2016)

 

2.2 Topic Diversity

 

2.2.1 Topic Uniqueness (TU) (Dieng et al., 2020 , Topic modeling in embedding spaces )

2.2.2 Inversed Rank-Biased Overlap (I-RBO) (Bianchi et al., 2021 , Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence ) 

 

3. 참고

3.1 C_v , Purity

 

3.2 : Top-Purity and Normalized Mutual Information(Top-NMI) as metrics(Nguyen et al., 2018)

3.3 : The KMeans algorithm to topic proportions z and use the clustered documents to report purity(Km-Purity) and NMI : Km-NMI (Zhao et al., 2020a)

3.4 : Standard RBO (Webber et al., 2010; Terragni et al., 2021b)