Tuesday, July 19, 2011

Data Mining #NCDs Hash Tags - Part II PHP Streaming Twitter API

Planning for future research I used the open source code in the Twitter database server module to begin storing tweets related to #NCDs from the Twitter streaming API and began saving them in my personal database. The database contains separate tables for tweets, tweet_tags, tweet_urls, tweet_mentions, and users.
To download and store all related #NCDs tweets you must connect to the Twitter API server, and maintain the connection permanently, with tweets being received in real-time, and reestablish a connection if there is a network error. The complete list of search terms is below. I have already collected 100,000 #NCDs related tweets and was looking to collect 1 million by sometime next week. With the data, what types of questions could we possibly answer?

List of Search Terms: 'NCDs','#NCDs','#LIVESTRONG','#UNSummit','#tobacco','#NCDChild','@ncdaction','@NCDs_PAHO','#Diabetes','@ncdalliance','@HealthCaribbean', '#cancer','#publichealth','#smoking','@HealthCaribbean', '@globalhealthorg'.

Here is what accidentally downloading 10,000 tweets about Justin Bieber looks like: http://bit.ly/qT3EWV

1 comment :

Sameer Manas said...

Hello Kurry,
I am Manas, the admin of technograte.com. Some of your posts are very informative for programming. Would you like to write a few articles for us about these techniques. We would love to have you as a part of our team.

If you are interested then let us know via email : technograte@gmail.com

Kind Regards
Sameer Manas