Tuesday, July 19, 2011

Data Mining #NCDs Hash Tags - Part II PHP Streaming Twitter API

Planning for future research I used the open source code in the Twitter database server module to begin storing tweets related to #NCDs from the Twitter streaming API and began saving them in my personal database. The database contains separate tables for tweets, tweet_tags, tweet_urls, tweet_mentions, and users.
To download and store all related #NCDs tweets you must connect to the Twitter API server, and maintain the connection permanently, with tweets being received in real-time, and reestablish a connection if there is a network error. The complete list of search terms is below. I have already collected 100,000 #NCDs related tweets and was looking to collect 1 million by sometime next week. With the data, what types of questions could we possibly answer?

List of Search Terms: 'NCDs','#NCDs','#LIVESTRONG','#UNSummit','#tobacco','#NCDChild','@ncdaction','@NCDs_PAHO','#Diabetes','@ncdalliance','@HealthCaribbean', '#cancer','#publichealth','#smoking','@HealthCaribbean', '@globalhealthorg'.

Here is what accidentally downloading 10,000 tweets about Justin Bieber looks like: http://bit.ly/qT3EWV


