Rob is interested in using NLP to discern the level of objectivity or bias in text. As an example, he took the transcripts of the debates of this year's presidential campaign. Here's part of what he did with them:
For more, have a look at the post on Semantic analysis of GOP debates.
- Wikipedia is a source of documents labeled as not objective.
- Movie reviews are a source of documents labeled by rating, number of stars.
- Topic cohesion measures how well a given document stays "on-topic" or even "on-message".
- KL Divergence is entropy based measure of relatedness of topics.
There was an interesting side discussion of the orthogonality of topic modeling and word embedding (word2vec).