Thursday, April 29, 2010

Twitter Python Mongo

...or how many buzzwords can you get in one title. Here is a shortish piece of code that pulls data from Twitter and inserts it into Mongo. Other than the shortness of the code (given what it accomplishes!), what is remarkable is the ease of use of the data that is passed around, with a minimum amount of marshalling: Twitter can return data in JSON which is the native Mongo format and Python can use with a minimum of tweaking (mostly to reduce the response from Twitter).


import urllib
import json
import string
from pymongo import Connection

def runQuery(query, pp, pages):
ret = []
for pg in range(1, pages+1):
print 'page...' + str(pg)
p = urllib.urlopen('http://search.twitter.com/search.json?q=' + query + '&rpp=' + str(pp) + '&page=' + str(pg))
s = json.load(p)
dic = json.dumps(s)
dic = string.replace(dic, 'null', '"none"')
dx = eval(dic)
listOfResults = dx['results']
for result in listOfResults:
ret.append( { 'id':result['id'], 'from_user':result['from_user'], 'created_at':result['created_at'], 'text': result['text'] } )
completeRet = {"results": ret}
return completeRet

c = Connection()
d = c.twitterdb
coll = d.postbucket
res = runQuery('Iran', 100, 15)
ptrData = res.get('results')
for item in ptrData:
coll.save(item)