German mathematician in the 1800s who contributed widely to the field of modern mathematics. There is probability distribution named after him:
"Dirichlet Distribution". This distribution is later used in LDA i.e Latent Dirichlet Allocation.
And in 2003, LDA was 1st published as a graphical model for topic discovery in the journal of Machine Learning Research.
Assumptions of LDA for topic modeling:
Documents with similar topics use similar groups of words
like topic of business:
money, stock, market ,etc.
Latent topics can then be found by searching for groups of words that frequently occur together in documents across the corpus.
We can actually think of these two assumptions mathematically.
Documents are probability distributions over some underlying latent topics.
And topics themselves are probability distribution over words
It assumes that documents are produced in the following fachion:
Decide on the number of words N the document will have.
Choose a topic mixture for the document, according to dirichlet distribution over the fixed set of K topics. for example this document is 60% business, 20% politics, 10% food.
Generate each word in the documents by first picking the topic according to the multinominal distribution previously, so we pick words. So we pick words 60% from business topic, 20% from politics and 10% from food topic.
Using the topic to generate the words itself. If we selected food topic, we might generate the word apple with 60% probability. Home with 30% probability.
Johann Peter Gustav Lejeune Dirichlet:
German mathematician in the 1800s who contributed widely to the field of modern mathematics. There is probability distribution named after him:
"Dirichlet Distribution". This distribution is later used in LDA i.e Latent Dirichlet Allocation.
And in 2003, LDA was 1st published as a graphical model for topic discovery in the journal of Machine Learning Research.
Assumptions of LDA for topic modeling:
We can actually think of these two assumptions mathematically.
It assumes that documents are produced in the following fachion: