dictionary of unique tokens
I didn't catch the need for a unique token dictionary. Could you illustrate it further, please?
And in the final result, we didn't assign the articles to a certain topic, or I didn't understand the final output...
Hi Abdulrahman!
The dictionary maps each unique word in the dataset to a numerical ID. This step is essential because topic modeling algorithms like LDA and LSA work with numbers, not raw text. The dictionary enables us to convert each article into a bag-of-words which counts how often each word appears. This is exactly what the model needs to analyze patterns across documents.
The final result shows the two topics discovered, along with the top 5 words for each and their importance weights within the topic. This gives a clear sense of what themes were found in the article collection. Yes, this doesn't list articles by topic, but it shows what kind of themes the model found in the dataset. Behind the scenes, every article contributes to the creation of these topics, and each article has a mix of them.
Hope this helps.
Best,
Ivan