This article is the logical conclusion of our previous article on automatic topic labeling.
As the discussed task is too subjective and there is no set of relevant labels and textual anomalies it is complicated to apply standard evaluation metrics based on precision and recall. For the task of topic labeling we have taken the dataset from David Blei's research paper [1], containing multinomial word distribution with manually defined labels and compare the output provided by our framework to these labels.
Within the experimental configuration defined in the previous subsection, each topic from the initial dataset has been processed with our framework, labeled and the output is compared to the manually defined topic names. It should also be mentioned that we have applied several techniques with different parameters in order to provide an exhaustive overview of the performance and possible solutions.
The initial dataset is as follows:
The table headers represent topic labels. The terms are sorted in a descending order according to their marginal probabilities. The comparative analysis is illustrated in Table 1, where columns correspond to applied techniques and rows are generated labels. The first part corresponds to experiments based on hypernyms with different values of threshold.
As it might be seen from the table setting the threshold value to 2 gives more relevant results than 3. Later we applied DPBedia knowledge graph in order to find the most coherent word from the distribution. We then generate hypernyms for the found 'cover' word.
Finally, we apply LOD concept to the terms as it may sometimes be useful to be aware of the context. Using LOD concept we obtain the word definition and apply TextRank onto this definition to get multi-word label. We experiment applying TextRank on 'cover' word LOD, on the combination of all the terms definitions and on summary of summaries (we summarize each term definition and apply TextRank on all the summaries aggregated).
As it may be seen the best result is generated with hypernyms and threshold set to 2 as in this case we obtained almost 25% of precision. However, in Table 4 we observe that for topic 'Budget' the framework generated money usually which seems to be partly correct whereas in the hypernym 2 approach the given topic has been incorrectly labeled as idea. The result for summarization applied to all the LODs without compression has proven to be poor so it is not included in this section. To sum up, we may conclude that there is still space for improvement and the overall performance may be possibly ameliorated by combining the techniques together.
[1] M. I. J. David M. Blei, Andrew Y. Ng. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003.
Comments