Topic Modeling with Contrastive Learning papers
·
Study/Topic Modeling
Topic Modeling 연구에서 contrastive learning 을 활용하거나 혹은 관련있는 연구논문 리스트를 정리하였습니다. Contrastive Learning for Neural Topic Model (Neurips 2021) Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning (EMNLP 2022) Improving topic disentanglement via contrastive learning (Information Processing & Management 2023) Unified Neural Topic Model via Contrastive Learning and ..
Traditional Topic Model
·
Study/Topic Modeling
1. Non-negative Matrix Factorization based topic model 2. (Bayesian) Probabilistic graphical model : LDA (Latent Dirichlet Allocation) 2-1. 디리클레 분포 디리클레 분포는 베타분포를 일반화한 형태로 0과 1사이의 값을 가지는 multivariate 확률변수의 bayesian 모형에 사용한다. 디리클레 분포의 probability density function (PDF) 는 다음과 같다. 2-2. LDA Process ( Generative process + Inference process ) LDA 는 전통적인 토픽모델링 기법으로 , 우리의 목표는 토픽별 단어의 분포(Topic-Word distr..
Preliminary for Topic Models
·
Study/Topic Modeling
Preliminary : Mathematics , Statistics , ... 1. Statistics 1.1 Dirichlet Distribution 디리클레 분포는 연속확률분포중 하나로 , k차원의 실수 벡터중 벡터의 성분이 양수이며 모든 성분을 더한값이 1인경우에 대해 확률값이 정의되는 분포이다. 디리클레 분포를 보통 사용하는 것은 , Bayesian approach 에서 다항분포(multinomial distribution) 의 conjugate prior distribution 으로 사용하기 때문이다. (사전확률) 디리클레 분포는 다항분포에 대한 사전확률의 성질을 가지며 , 이 값을 이용해 사후확률을 계산할 수 있다. (By Bayes Theorem) 1.2 Joint Distribution ..
Topic Modeling Task 정리하기
·
Study/Topic Modeling
0. Preliminary for Topic Model 1. Traditional Topic Model 2. Neural Topic Model (NTM) 3. Clustering based Topic Model 4. Various (Neural) Topic Model 4-1. Hierarchical NTM 4-2. Short Text NTM 4-3. Cross-lingual NTM 4-4. Dynamic NTM 4-5. Correlated NTM 4-6. Lifelong NTM 5. Evaluations of Topic Model 6. Challenges of Topic Model
Automatic Evaluation Metrics for Topic Modeling
·
Study/Topic Modeling
Automatic evaluation metrics : topic coherence and diversity of the models. Topic Coherence Measures NPMI (Lau et al., 2014) - Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality (EACL , 2014) - Normalized Pointwise Mutual Information WE (Fang et al., 2016) - Word Embedding (WE) - Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data ..
Topic Modeling in NLP 공부 커리큘럼 정리
·
Study/Topic Modeling
0. Latent Dirichlet Allocation (LDA) (David M Blei , 2003) 1. Neural Topic Model (NTM) 1-1. ProdLDA (Neural-ProdLDA) (Srivastava and Sutton, 2017) 1-2. Combined TM (Bianchi et al., 2020) 1-3. ZeroshotTM (Bianchi et al., 2021) 2. Evaluation Metrics C_v , Purity Top-Purity and Normalized Mutual Information(Top-NMI) as metrics(Nguyen et al., 2018) The KMeans algorithm to topic proportions z and use..