I had the chance to read Nick Beauchamp‘s “Predicting and Interpolating State‐Level Polls Using Twitter Textual Data”, published in 2017 by AJPS, this morning. It uses Twitter data to predict candidate success (to be more specific, voting intention for a candidate) at the state level. It is a neat application of Twitter data in that it does not just rely on descriptive sentiment analysis. Beauchamp insists that proper model training and out-of-sample validation are necessary to make inferences from data of this nature.
One question I was left with is whether the model needs to be reworked in light of online trolling and the outright manipulation of Twitter as a platform. The article tracks the 2012 election, when the social media were (for lack of a better word) ‘innocent’ compared to what came after. Today we have so much more ‘noise’ than even six years ago. It is of course possible to train another model on the basis of new data, but I kept wondering if there can be a general model to make sense of Twitter data, or we should tweak our models to changing practices of social media use.