Julian @ Thales: Python

Showing posts with label Python. Show all posts

Tuesday, June 08, 2010

Python classes

Old-style:

class OldClass:

def method(self):

….

Characterized by:

P = OlcClass()

p.__class__ à 'OldClass'

type(p) à 'instance'

>>> class Test:

def __init__(self):

print 'Test initialized'

def meth(self):

self.member = 1

print 'Test.member = ' + str(self.member)

>>> class TestKid(Test):

"This is derived from Kid, meth is overriden and so is member"

def __init__(self):

print 'Kid initialized'

def meth(self):

self.member = 2

Test.meth(self)

print 'Kid.member = ' + str(self.member)

Above is shown how to override a method, call its parent implementation; the member attribute is shared between the parent and child classes and hence calling a function in parent which references it will modify it in the child as well. The parent constructor (or any other overridden function) is not called by default.

New-style:

>>> class Test(object):

Type(p) would return 'Test'. Unifies types and classes.

It has classmethods and staticmethods.

Also (for both old and new):

P = Test() ; calling a class object yields a class instance

p.__dict__ à {'member':1}

p.__dict__['member'] = 1 ; same as p.member

You can use properties (almost .NET-style) to access class attributes with new classes.

More.

Thursday, May 27, 2010

Images in SQL Server

A simple .NET class to dump image data into/from SQL Server. I'd like to explore a potential alternative to this using FILESTREAM.

In T-SQL:


update version set [file] = BulkColumn from
openrowset(bulk 'e:\....jpg', single_blob) as [file]
where ...;

In Python/MySQL, this is done like this: (the image column, image_data, is defined as BLOB in MySQL)


>>> import MySQLdb
>>> connection = MySQLdb.connect('','root','','RTest')
>>> blob = open('d:\\pic1.jpg', 'rb').read()
>>> sql = 'INSERT INTO rtest.mm_image(image_data, mm_person_id_mm_person) VALUES(%s, 1)'
>>> args = (blob,)
>>> cursor = connection.cursor()
>>> cursor.execute(sql, args)
>>> connection.commit()

blob is a string type.

Thursday, May 20, 2010

Python ORM

A first shot at ORMing in Python.

The second, improved shot. Inheritance/polymorphism in weakly-typed languages such as Python is a bit hard to grasp at first. Anyway, this seems quite cool.

Class diagram: (I am not an expert @ UML)

Existing solutions:
- for PHP
- for Python

Monday, May 10, 2010

BLOBing in Mongo

SQL Server, Oracle, Cache, etc, all have binary streams, and offer various ways of storing binary data (such as images) directly into the database. I was curious to see how this would work with Mongo, and here is the result:


import pymongo
import urllib2
import wx
import sys
from pymongo import Connection

class Image:

    def __init__(self):
        self.connection = pymongo.Connection()
        self.database = self.connection.newStore
        self.collection = self.database.newColl
        self.imageName = "Uninitialized"
        self.imageData = ""

    def loadImage(self, imageUrl, imageTitle = "Undefined"):
        try:
            ptrImg = urllib2.Request(imageUrl)
            ptrImgReq = urllib2.urlopen(ptrImg)
            imageFeed = ptrImgReq.read()
            self.imageData = pymongo.binary.Binary(imageFeed, pymongo.binary.BINARY_SUBTYPE)
            self.imageName = imageTitle
            ptrImgReq.close()
       
        except:
            self.imageName = "Error " + str(sys.exc_info())
            self.imageData = None

    def persistImage(self):
        if self.imageData == None:
            print 'No data to persist'
        else:
            print 'Persisting ' + self.imageName
            self.collection.insert({"name":self.imageName, "data":self.imageData})
            self.imageData = None
   
    def renderImage(self, parm = None):
        if parm == None:
            self.imageData = self.collection.find_one({"name":self.imageName})
        else:
            self.imageName = parm
            self.imageData = self.collection.find_one({"name":self.imageName})
   
        ptrApp = wx.PySimpleApp()
        fout = file('d:/tmp.jpg', 'wb')
        fout.write(self.imageData["data"])
        fout.flush()
        fout.close()
        wximg = wx.Image('d:/tmp.jpg',wx.BITMAP_TYPE_JPEG)
        wxbmp = wximg.ConvertToBitmap()
        ptrFrame = wx.Frame(None, -1, "Show JPEG demo")
        ptrFrame.SetSize(wxbmp.GetSize())
        wx.StaticBitmap(ptrFrame, -1, wxbmp, (0,0))
        ptrFrame.Show(True)
        ptrApp.MainLoop()


img = Image()
img.loadImage('http://i208.photobucket.com/albums/bb82/julianzzkj/Acapulco/e614.jpg', 'Acapulco at night')
img.persistImage()
img.renderImage('Acapulco at night')

I have had some problems with installing PIL, so this is certainly not optimal (I have to use wx for image rendering instead, and I have not found a way of feeding a JPG datastream to an image constructor, hence the ugly recourse to a temporary file). However, the idea was to test how the database can store an image, which seems to work quite well, despite taking a few seconds to load a 300kb file.

A findOne query returns:


> db.newColl.findOne()
{
        "_id" : ObjectId("4be82f74c7ccc11908000000"),
        "data" : BinData type: 2 len: 345971,
        "name" : "Acapulco at night"
}
>

Thanks are due for some of the wx code.

Thursday, April 29, 2010

Twitter Python Mongo

...or how many buzzwords can you get in one title. Here is a shortish piece of code that pulls data from Twitter and inserts it into Mongo. Other than the shortness of the code (given what it accomplishes!), what is remarkable is the ease of use of the data that is passed around, with a minimum amount of marshalling: Twitter can return data in JSON which is the native Mongo format and Python can use with a minimum of tweaking (mostly to reduce the response from Twitter).


import urllib
import json
import string
from pymongo import Connection

def runQuery(query, pp, pages):
 ret = []
 for pg in range(1, pages+1):
  print 'page...' + str(pg)
  p = urllib.urlopen('http://search.twitter.com/search.json?q=' + query + '&rpp=' + str(pp) + '&page=' + str(pg))
  s = json.load(p)
  dic = json.dumps(s)
  dic = string.replace(dic, 'null', '"none"')
  dx = eval(dic)
  listOfResults = dx['results']
  for result in listOfResults:
   ret.append( { 'id':result['id'], 'from_user':result['from_user'], 'created_at':result['created_at'], 'text': result['text'] } )
  completeRet = {"results": ret}
 return completeRet
  
c = Connection()
d = c.twitterdb
coll = d.postbucket
res = runQuery('Iran', 100, 15)
ptrData = res.get('results')
for item in ptrData:
 coll.save(item)

A Twitter Python web service

Taking the code from the previous post: here is a Python web service that reads the Twitter feed for a given query and returns a subset of the results in JSON:


import urllib
import json
import string
import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler

def runQuery(query, pp, pages):
 p = urllib.urlopen('http://search.twitter.com/search.json?q=' + query + '&rpp=' + str(pp) + '&page=' + str(pages))
 s = json.load(p)
 dic = json.dumps(s)
 dic = string.replace(dic, 'null', '"none"')
 dx = eval(dic)
 listOfResults = dx['results']
 ret = []
 for result in listOfResults:
  ret.append( { 'id':result['id'], 'from_user':result['from_user'], 'created_at':result['created_at'], 'text': result['text'] } )
 completeRet = {"results": json.dumps(str(ret))}
 return str(completeRet)
  
class RequestHandler(SimpleXMLRPCRequestHandler):
 rpc_paths=('/RPC2')
 
server=SimpleXMLRPCServer(("localhost", 8000), requestHandler=RequestHandler)
server.register_introspection_functions()
server.register_function(runQuery, 'qry')
server.serve_forever()

More potential uses of this (including Google Apps, Mongo, or Processing) later. And here is how to use it (from Python):


>>> import xmlrpclib
>>> s = xmlrpclib.ServerProxy('http://localhost:8000')
>>> print s.qry('Bumrungrad', 10, 1)

Where the first numeric parameter is the number of records per page and the second, the number of page (max 100/15).

Wednesday, April 28, 2010

Twitter API

A bit of topical coding.... getting tweets regarding the situation in Bangkok:


>>> import urllib
>>> from xml.dom import minidom
>>> p=urllib.urlopen('http://search.twitter.com/search.atom?q=Bangkok')
>>> xml=minidom.parse(p)
>>> p.close()
>>> nodes=xml.getElementsByTagName('title')
>>> for node in nodes:
 print node.firstChild.NodeValue

It's the first time I try the Twitter API, and it seems simple enough!

Tuesday, April 27, 2010

Mongo and Cache

Some similarities:

the system-generated row id: (_id for Mongo)
object references, and a kind of relationship definition in Mongo:


> x = {name:'Lab test'}
{ "name" : "Lab test" }
> db.second.save(x)
> pat = {name:'Amornrakot', test:[new  DBRef('second', x._id)]}
{
        "name" : "Amornrakot",
        "test" : [
                {
                        "$ref" : "second",
                        "$id" : ObjectId("4bd6d7c64e660000000f665a")
                }
        ]
}
> pat.test[0].fetch()
{ "_id" : ObjectId("4bd6d7c64e660000000f665a"), "name" : "Lab test" }

The similarities aren't surprising perhaps; it is the differences that trouble me (in this case, Mongo's looseness - lack of structure); although SQLite was the first one to go down that path, by not enforcing strict data typing, and now Mongo doesn't even enforce schemas. A discussion on Mongo database design principles here.

For now I have a couple of other questions:

is there a reporting tool that binds to JSON/Mongo natively?
how do you update an existing JSON entry? just one tuple, not the entire record; some notes:
var p = db.coll.findOne();
p.member (notation supported, p is an object already and there is no need to eval() it; originally, say p{member:"y"} ) = "x" and now p is disconnected from the collection, but db.coll.save(p) does update it in place

What is cool is that you can save JS objects (declared using the JS object notation):

function pobj(param){this.p1=param;}

var newObj = new pobj("test");

db.coll.save(newObj);

db.coll.find(); returns { "_id" : ObjectId("4bd722a6eb29000000007ac4"), "p1" : "test" }. You can even 'serialize' objects' methods, and then call the method for the objects deserialized using findOne. All of this might be JS-specific candy, I am curious how this ports over to other language drivers.

So you can view Mongo as a (JS) object-oriented database, with nothing in the way of SQL facilities though; a tuple serialization mechanism; a key-value pair list; a 'document'/hierarchical database using JSON as the document format (as opposed to xDB's XML), all of which are correct.

Another question: when you have an embedded object, var ptrUser = {name : "Mr Iwata", address : { city : "Tokyo }}, how do you search by the inner object properties? db.coll.find({address:{ city : "criteria" }} does not seem to work.RTfM

Also, if you store objects with different structures in one collection, they can be inspected:


from pymongo import Connection
c = Connection()
d = c.clinical
coll = d.physician
for item in coll.find():
 itmkeys = []
 print item.get("_id")
 for ky in item.iterkeys():
  itmkeys.append(ky)
 print itmkeys

Lots of interesting info at the Wikipedia JSON page.

Monday, April 26, 2010

Very basic Google Chart

create the URL
you can then pull it in Python:


>>> import urllib
>>> p=urlopen('http://chart.apis.google.com/chart?chs=250x100&chd=t:60,40,90,20&cht=p3')
>>> data = p.read()
>>> f = file('d:\\file.png', 'wb')
>>> f.write(data)
>>> f.close()

<br />

It's quite easy to build the URL based on the data in a Googledoc spreadsheet: (code modified from Google's own documentation)


try:
  from xml.etree import ElementTree
except ImportError: 
  from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom
import string

def main():
 gd_client = gdata.spreadsheet.service.SpreadsheetsService()
 gd_client.email = '______________@gmail.com'
 gd_client.password = '________'
 gd_client.source = 'SpreadSheet data source'
 gd_client.ProgrammaticLogin()

 print 'List of spreadsheets'
 feed = gd_client.GetSpreadsheetsFeed()
 PrintFeed(feed)

 key = feed.entry[string.atoi('0')].id.text.rsplit('/', 1)[1]

 print 'Worksheets for spreadsheet 0'
 feed = gd_client.GetWorksheetsFeed(key)
 PrintFeed(feed)

 key_w = feed.entry[string.atoi('0')].id.text.rsplit('/', 1)[1]

 print 'Contents of worksheet'
 feed = gd_client.GetListFeed(key, key_w)
 PrintFeed(feed)

 return

def PrintFeed(feed):
 for i, entry in enumerate(feed.entry):
     if isinstance(feed, gdata.spreadsheet.SpreadsheetsCellsFeed):
  print 'Cells Feed: %s %s\n' % (entry.title.text, entry.content.text)
     elif isinstance(feed, gdata.spreadsheet.SpreadsheetsListFeed):
  print 'List Feed: %s %s %s' % (i, entry.title.text, entry.content.text)
  print ' Contents:'
  for key in entry.custom:
      print '  %s: %s' % (key, entry.custom[key].text)
      print '\n',
     else:
  print 'Other Feed: %s. %s\n' % (i, entry.title.text)


if __name__ == "__main__":
    main()

Friday, April 23, 2010

NHS Choices on GoogleApps

Here is the Google Apps version of the (Python) NHS Choices application I discussed in the previous posts.

I can't even begin to say how cool this is. 3 hours in Notepad (hence the crudeness) and we get the hospitals in the UK, from anywhere. This is really amazing.

The source code.

Sunday, April 11, 2010

Searching in Python

There is perhaps a more Pyhton-specific way of storing the data to be loaded into the Mongo database: a list of dictionaries. In this case, a dictionary is defined as {'name':__name__, 'service':__service__, 'web':__web__}.To add an element to the holding list (say, NHS): NHS.append({'name':'Wigan General', 'service':5, 'web':None}). Then, a function can be defined which will return the index of the list containing the element matching its parameter; i.e.:

>>> def idx(ky, val):

for item in NHS:

if item[ky] == val:

return NHS.index(item)

Usage:

>>> print idx('name', 'Wigan General')
will yield Wigan's index in the list. I'm quite curious how fast this is with several thousand records! But Python's ability to easily make sense of a complex data structure is impressive.

Another way of searching, using list comprehensions:

>>> def idx2(ky, val):

lstIdx = [item[ky] == val for item in NHS]

return lstIdx.index(True)

It would also be interesting to know if the bytecode generated by Python is different between the two.

Saturday, April 10, 2010

Mongo, Python, and NHS Choices

Using Python, NHS open data (NHS Choices), and Mongo: for example, getting the name and the web sites of all the providers in the Wigan area (why Wigan? No idea, just that their football team ~~seems to be pretty bad~~ recently defeated Arsenal, and the name stuck with me).

Start the database: go to the bin subdirectory of the install directory, and type mongod –dbpath .\

I will connect to the database using the Python API (pymongo).

NHS choices offers several health data feeds:

News
Find Services
Live Well
Health A-Z (Conditions)
Common Health Questions

As mentioned, I will use the second; to access it, you need to get a password and a login (apply for one here).

The basic Python code to query for providers and extract their names and web addresses is this:

First, build a list of services, as per the NHS documentation (the service code and the location are two required parameters):

services = [[1, 'GPs'], [2,'Dentists'], etc]

Then, query the web service:

for x in range(0, len(services)):

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")

if website[0].firstChild <> None:

xmldoc.unlink()

The response will have a 3-item dataset, the service type, the provider name, and the web site (if one exists).

Mongo is a bit different in that the 'server' does not create a database physically until something is written to that database, so from the console client (launch, in \bin\: mongo) you can connect to a database that does not exist yet (use NHS in this case will create the NHS database - in effect, it will create files named NHS in the current directory).

Creating the 'table' from the console client: NHS = { service : "service", name : "name", website : "website" };

db.data.save(NHS); will create a collection (similar to SQL namespaces) and save the NHS table into it. The mongo client uses JavaScript as language and JSON notation to define the tables.

To access this collection in Python:

>>> from pymongo import Connection

>>> connection = Connection()

>>> db=connection.NHS

>>> storage=db.data

>>> post={"service" : 1, "name" : "python", "website" : "mongo" }

>>> storage.insert(post)

Here is the full code in Python to populate the database:

import urllib

from xml.dom import minidom

from pymongo import Connection

print "Building list of services..."

services = [[1, 'GPs'], [2,'Dentists'], [3, 'Pharmacists'], [4, 'Opticians'], [5, 'Hospitals'], [7, 'Walk-in centres'],[9, 'Stop-smoking services'], [10, 'NHS trusts'], [11, 'Sexual health services'], [12,' DISABLED (Maternity units)'], [13, 'Sport and fitness services'], [15, 'Parenting & Childcare services'], [17, 'Alcohol services'], [19, 'Services for carers'], [20, 'Renal Services'], [21, 'Minor injuries units'], [22, 'Mental health services'], [23, 'Breast cancer screening'], [24, 'Support for independent living'], [26, 'Memory problems'], [27, 'Termination of pregnancy (abortion) clinics'], [28, 'Foot services'], [29, 'Diabetes clinics'], [30, 'Asthma clinics'], [31,' Midwifery teams'], [32, 'Community clinics']]

print "Connecting to the database..."

connection = Connection()

db = connection.NHS

storage = db.data

print "Scanning the web service..."

for x in range(0, len(services)):

print '*** ' + services[x][1] + ' ***'

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")

namei = name[0].firstChild.nodeValue

if website[0].firstChild <> None:

websitei = ' ' + website[0].firstChild.nodeValue

else:

websitei = 'none'

post = { "service" : x, "name" : namei, "website" : websitei }

storage.insert(post)

xmldoc.unlink()

To see the results from the Mongo client:

> db.data.find({service:5}).forEach(function(x){print(tojson(x));});

Will return all the hospitals inserted in the database (service for hospitals = 5); the response looks like this:

{

"_id" : ObjectId("4bc062dbc7ccc10428000032"),

"website" : " http://www.wiganleigh.nhs.uk/Internet/Home/Hospitals/tlc.asp",

"name" : "Thomas Linacre Outpatient Centre",

"service" : 5

}

Next, it might be interesting to try this using Mongo's REST API, and perhaps to build a GoogleApp to do so.

Thursday, January 21, 2010

Google App Engine and Python

Quick steps to develop with Google App Engine:

- download the SDK
- this will place a Google App Launcher shortcut on your desktop
- click on it
- File > Create New Application and choose the directory (a subdirectory with the app name will be created there)
- Edit and change the name of the main *.py file you'll be using (say, myapp.py)
- create your myapp.py file and save it in the subdirectory created 2 steps ago
- select the app in the Google App Launcher main window and click Run
- open a browser: http://localhost:808x (this is the default value, check in your application settings as defined in Google App Launcher)
- then you have to deploy it to your applications in your Google profile

More, later, this is just a quick vade mecum.

Saturday, December 05, 2009

Google App Engine

The Python dev environment. A very basic sample app that comes with the SDK - and which I managed to deploy without a lot of headache (and without having read the documentation!).

Python confusion

...at least for a newbie.

lists: L = ['a', 'b', 'this is another element', 1, [1, 2, 3]]. Can do a L.append(), len(L), L.pop(), etc
tuples: T = 1, 2, 3, 4, 'this is a tuple element'. Or T1 = () for an empty tuple, T2= 'one element tuple', . None of the functions listed above apply.
sets: S = {1, 2, 3, 'set element'}. The items must be unique and set functions are available. S = set(L) converts list to set (removes duplicates in the process).

Julian @ Thales