Data Skeptic podcast

Check out Data Skeptic’s newest podcast on Natural Language Processing (NLP):

The podcast covers the uses of NLP, from topic modeling and translation to speech recognition. Kyle Polich and Linh Da Tran address some of the criticism, too. Fred Jelinek’s “Every time I fire a linguist…” makes a paleo appearance while discussing what appears to be a less-than-friendly debate among computer science folks and linguists.

I liked in particular the segment where Lucy Park talks about the development of KoNLPy, a Python library for text processing in Korean.   

Nick Beauchamp’s AJPS article

I had the chance to read Nick Beauchamp‘s “Predicting and Interpolating State‐Level Polls Using Twitter Textual Data”, published in 2017 by AJPS, this morning. It uses Twitter data to predict candidate success (to be more specific, voting intention for a candidate) at the state level. It is a neat application of Twitter data in that it does not just rely on descriptive sentiment analysis. Beauchamp insists that proper model training and out-of-sample validation are necessary to make inferences from data of this nature. 

One question I was left with is whether the model needs to be reworked in light of online trolling and the outright manipulation of Twitter as a platform. The article tracks the 2012 election, when the social media were (for lack of a better word) ‘innocent’ compared to what came after. Today we have so much more ‘noise’ than even six years ago. It is of course possible to train another model on the basis of new data, but I kept wondering if there can be a general model to make sense of Twitter data, or we should tweak our models to changing practices of social media use. 

Tasks for the week 1/2 – 1/6

There is less than a week before teaching starts again, so I will be putting in as much time as possible to learn more Python in the next four days. Here are my sources:

Michael Allen‘s blog: meant for data science beginners in healthcare, but it is absolutely necessary for all Python novices in any knowledge domain

Paul Barry’s Head First Python (O’Reilly): fun to read, relatively easy introduction to Python, but things get pretty complicated starting with the web development chapter

Ryan Mitchell‘s Web Scraping with Python: Collecting Data from the Modern Web (O’Reilly): essential web scraping for a social scientist like myself

For more information on Python’s BeautifulSoup library, I recommend this video by Corey Schafer

Hello, world!

Hello everyone,

I have the strong intuition that social sciences can benefit from the recent advances in the processing and analyzing of data. I am just not sure how I can put that intuition to good use. I have recently started taking courses in computer science, and learned some Java, C++ and SQL. Now it is time to connect the dots! So, this blog is to keep me disciplined about my education in data science, and to share insights with friends and colleagues.