The material (recorded video, workflows, data, and slides) used for the webinar can be downloaded from: http://www.knime.com/about/events/knime-text-mining-webinar-online-training-october-2013
The last part of the webinar was dedicated to sentiment analysis. An example workflow showed how to implement a sentiment analysis task. The sequence of steps is the usual:
- data reading and documents creation;
- enrichement tags using a sentiment corpus;
- filtering and stemming;
- bag of word creation;
- frequencies calculation;
- sentiment tag conversion into attitude measure (+1= good; -1 = bad)
- attitude measure for each document or group of document as the mean attitude across all words
- document binning with 3 bins built on attitude distribution: negative documents; neutral documents; and positive documents
For more details about this basic procedure, please refer to the webinar material linked above.
This approach is based on single words. It does not take into account word co-occurrence. A very clear example of when such approach would not work is the case of negations. If I say "good" that is a positive word. But if I say "not good" that is a negative sequence of words. The negation "not" changes the sentiment polarity of the word "good". Without going too deep into NLP processing, to take into account the effect of negations it would be enough to switch the word sentiment polarity if the word is preceeded by a negation.
We then used a combination of a "Lag Column" node and of a "Rule Engine" node to produce a new data column named "changePolarity". The "Lag Column" node puts side by side the current (term) and the previous word (term(-1)) in the sentence. The "Rule Engine" node generates the new data column named "changePolarity". "changePolarity" is set to 1 by default for each word and changes value if:
- a negation (like not) is preceeding the current word (changePolarity = -1)
- an enahncement adverb preceeds the current word (change Polarity = 2)
Later on, a "Math Formula" node multiplies the word attitude by its changePolarity value to produce the real attitude of the 2-gram word.
Figure below shows the "Rule Engine" configuration window.
And the figure after that shows the workflow with the three new nodes introduced for attitude correction.