Saturday, April 10, 2010

Mongo, Python, and NHS Choices


Using Python, NHS open data (NHS Choices), and Mongo: for example, getting the name and the web sites of all the providers in the Wigan area (why Wigan? No idea, just that their football team seems to be pretty bad recently defeated Arsenal, and the name stuck with me).

Start the database: go to the bin subdirectory of the install directory, and type mongod –dbpath .\

I will connect to the database using the Python API (pymongo).

NHS choices offers several health data feeds:

  • News
  • Find Services
  • Live Well
  • Health A-Z (Conditions)
  • Common Health Questions


As mentioned, I will use the second; to access it, you need to get a password and a login (apply for one here).

The basic Python code to query for providers and extract their names and web addresses is this:

First, build a list of services, as per the NHS documentation (the service code and the location are two required parameters):

services = [[1, 'GPs'], [2,'Dentists'], etc]

Then, query the web service:

for x in range(0, len(services)):

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")


if website[0].firstChild <> None:

xmldoc.unlink()

The response will have a 3-item dataset, the service type, the provider name, and the web site (if one exists).

Mongo is a bit different in that the 'server' does not create a database physically until something is written to that database, so from the console client (launch, in \bin\: mongo) you can connect to a database that does not exist yet (use NHS in this case will create the NHS database - in effect, it will create files named NHS in the current directory).

Creating the 'table' from the console client: NHS = { service : "service", name : "name", website : "website" };

db.data.save(NHS); will create a collection (similar to SQL namespaces) and save the NHS table into it. The mongo client uses JavaScript as language and JSON notation to define the tables.

To access this collection in Python:

>>> from pymongo import Connection

>>> connection = Connection()

>>> db=connection.NHS

>>> storage=db.data

>>> post={"service" : 1, "name" : "python", "website" : "mongo" }

>>> storage.insert(post)


Here is the full code in Python to populate the database:

import urllib

from xml.dom import minidom

from pymongo import Connection


print "Building list of services..."

services = [[1, 'GPs'], [2,'Dentists'], [3, 'Pharmacists'], [4, 'Opticians'], [5, 'Hospitals'], [7, 'Walk-in centres'],[9, 'Stop-smoking services'], [10, 'NHS trusts'], [11, 'Sexual health services'], [12,' DISABLED (Maternity units)'], [13, 'Sport and fitness services'], [15, 'Parenting & Childcare services'], [17, 'Alcohol services'], [19, 'Services for carers'], [20, 'Renal Services'], [21, 'Minor injuries units'], [22, 'Mental health services'], [23, 'Breast cancer screening'], [24, 'Support for independent living'], [26, 'Memory problems'], [27, 'Termination of pregnancy (abortion) clinics'], [28, 'Foot services'], [29, 'Diabetes clinics'], [30, 'Asthma clinics'], [31,' Midwifery teams'], [32, 'Community clinics']]


print "Connecting to the database..."

connection = Connection()

db = connection.NHS

storage = db.data


print "Scanning the web service..."

for x in range(0, len(services)):

print '*** ' + services[x][1] + ' ***'

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")

namei = name[0].firstChild.nodeValue

if website[0].firstChild <> None:

websitei = ' ' + website[0].firstChild.nodeValue

else:

websitei = 'none'

post = { "service" : x, "name" : namei, "website" : websitei }

storage.insert(post)

xmldoc.unlink()


To see the results from the Mongo client:


> db.data.find({service:5}).forEach(function(x){print(tojson(x));});


Will return all the hospitals inserted in the database (service for hospitals = 5); the response looks like this:


{

"_id" : ObjectId("4bc062dbc7ccc10428000032"),

"website" : " http://www.wiganleigh.nhs.uk/Internet/Home/Hospitals/tlc.asp",

"name" : "Thomas Linacre Outpatient Centre",

"service" : 5

}


Next, it might be interesting to try this using Mongo's REST API, and perhaps to build a GoogleApp to do so.

Friday, April 09, 2010

Internet, 247


This is tiresome. I've complained about it before. Until there is a seamless Internet experience, where things work most of the time, I won't buy into the Internet platform. It's unreliable.


Mongo


Yet another non-relational database. There definitely seems to be a trend in that direction, a market corner that the RDBMS's do not address convincingly. I would guess that for a general purpose system, which might have power users/report writers, a full-fledged database makes sense, but for a turnkey system where speed is important and there is no need to expose too much of the internals to the users, this type of offering is useful.

Saturday, April 03, 2010

More Cache-specific features

Using dates in Cache: (an alternative to this)

  • Class Person…. { DOB As %Date }
  • INSERT INTO Person(… DOB) VALUES(…, TO_DATE('12 Jan 1989')

Using queries in Cache:

Class HIS.Patient Extends %Persistent
{

Property Name As %String;

Property DOB As %Date;

Property Inpatient As %Boolean;

/// Class query
Query GetInpatients(Inpatient As %Boolean=1) As %SQLQuery(SELECTMODE="RUNTIME")
{
SELECT * FROM Patient
WHERE Inpatient = :Inpatient
ORDER BY Name
}

}


Insert some data in SQL, then:

  • S rset=##class(%RecordSet).%New()
  • S rset.ClassName="HIS.Patient"
  • S rset.QueryName="GetInpatients"
  • Do rset.Execute(1)
  • W rset.Next()
  • If result is 1, w rset.Data("Name")

Cache Streams

Used to store large (>32k) amounts of data:

  • binary
  • character

Hence, streams are supersized binary (%Binary) or character (%String) types. The difference with other databases systems is that streams in Cache can be stored in external files or database global variables.

The following code:

Class CM.Patient Extends %Persistent
{

Property PatientName As %String;

Property PatientData As %FileCharacterStream(LOCATION = "D:/");

}


Executed in Terminal:

> DO %^CD

> S p = ##class(CM.Patient).%New()

> DO p.PatientData.Write("Test")


Will create a temporary file on drive D: where the "Test" string will be written to; when the Terminal session ends, the file will be deleted as the object is garbage-collected. In the order for the stream to persist, you have to %Save() the object.

SQL Server just recently started offering FILESTREAMs.

O'Reilly's take on the IOS

Interesting, the Internet Operating System.

Wednesday, March 31, 2010

Everything is searchable

Interesting things happening while I wasn't watching. Not only the previously mentioned Data.gov, or the newspapers moving in the same direction, but also ...

I'm really curious what effect will this have on databases. In theory, with the right authentication in place, everything could be exposed online and EDI would be vastly simplified, from sneakernet to HL7, everything would be replaced by REST calls.

At any rate, Freebase's attempt to organize everything is ambitious/stunning.

Wikipedia's own API, here.

Related: Talis.

Data.gov

US Government's open data initiative. Interesting, have to find out more about what's there. The potential for mashups and visualizations is great... if the data is trusted and current.
Here is the equivalent UK site.
I need to look into this some more, for now a lot of the data seems to be Excel files that can be downloaded. No universal REST/JSON access?

Sunday, March 21, 2010

Interface engines


Unbeknownst to myself, in a previous job I wrote an interface engine. It was a (initially) simple VB application which was reading files from a directory and copying them to another directory, then deleting them from the source. Then more source directories were added. Then, file endings had to determine the destination. Then, file contents (i.e., header or template-based triggers) had to determine which files were copied and which not. Finally, database tables became source/destinations. The thing had grown immensely since the beginning and it was running 24/7, with a 30 second polling interval, and a large number of customers were relying in fact on a rickety PC running a hopelessly spaghetti-coded application with manually-installed DLLs. If only they knew.


I did not know at the time that this was an interface engine. Somewhat restricted to healthcare and financial IT, one wonders why this concept hasn't taken off in mainstream computing more significantly. I.e., in databases, interfaces could provide a strong and pervasive (cross-platform, even) replication/synchronization mechanism.


Or perhaps the concept has been folded into workflow engines and transaction-based orchestration systems; Biztalk or Oracle Fusion can presumably do what interface engines do, and much more – but at a hefty footprint.


Mirthcorp's Java-based interface engine, below, running against a Cache database over JDBC/ODBC.


Message passing:







Channel definition:




A web-based interface engine? Perhaps using technology similar to YQL. Monitoring a queue of some kind and updating your Facebook status? Have to think about this one.

Saturday, March 20, 2010

Cache dates

Cache's multiple interfaces (RDMBS, OO) can cause confusion even for simple operations, such as inserting dates. Here is how that is done:

In SQL:

insert into CM.StrmClass(PatientAddress, PatientName, PatientDateOfBirth)

values('5 Soi Rangnam Bangkok, Thailand', 'TG', {d '1990-03-17'})

Rather ugly ODBC formatting for the date field. In Object Script:

Set p.PatientDateOfBirth = $ZDateh("12/21/1977")

By the way, if a %Save fails (does not make the results commit to the database), then there is a problem with the data validation. Running the container from the Terminal should show the result (e.g. W p.%Save())




Another interesting feature of Cache, which accounts for its performance: 'Using a unique feature known as “subscript mapping,” you can specify how the data within one or more arrays is mapped to a physical database file. Such mapping is a database administration task and requires no change to class/table definitions or application logic. Moreover, mapping can be done within a specific sparse array; you can map one range of values to one physical location while mapping another to another file, disk drive, or even to another database server. This makes it possible to reconfigure Caché applications (such as for scaling) with little effort.'

Monday, March 15, 2010

New media uselessness

I'm not a new media expert but even I know that…

  • Calling an ad a 'welcome screen' is a pointless trickery. I understand you need to pay for content somehow, no need to sugarcoat it
  • Sometimes, more media is bad: if there is an article that lists the top 10 cities to live, 14 most troubled real estate markets, 10 top moments in British TV history, etc, well, I take that as information. I can just read it, I don't need photos to go with it, or if there are photos, they should be optional. Making me scroll through 10 or 14 screens, each heavy on images but low on info, that will result in me simply leaving an otherwise potentially interesting article

So maybe the death of print is largely exaggerated.

Updating Cache from Excel

Excel (2007) is great as a quick database front end tool – reasonably versatile and easy to set up. This ease of use might make many want to be able to update a database from Excel, and that is not so easy.

It has to be done in code. One way would be through linking the Excel file to the database server (there are different mechanisms for doing this, e.g. OPENROWSET in SQL Server or the SQL Gateway in Cache). However, if you're running Office in 64-bit mode you soon discover that there is no 64-bit compliant Excel ODBC driver.

A quick and more natural solution is to embed the database update code in Excel itself; in the case of a Cache backend, there are two ways to write this code:

  • Using ADO/ODBC

    Dim cm As Command

    Dim cn As Connection

    Dim cns As String


    cns = "DSN=CacheWeb User"


    Set cn = New Connection

    cn.Open cns


    Set cm = New Command

    cm.ActiveConnection = cn

    cm.CommandType = adCmdText

    cm.CommandText = "INSERT INTO Income(UserId, Income, Country) VALUES('@u', @i, '@c')"

    cm.Execute


  • Using Cache ActiveX

A bit more interesting since it gets into some Cache internals:

Dim Factory As CacheObject.Factory

Set Factory = New CacheObject.Factory


If Factory.Connect("cn_iptcp:127.0.0.1[1972]:TESTDBNS") = False Then

MsgBox "Failed to connect"

Exit Function

End If


Dim Income As Object

Set Income = Factory.New("Income")

Income.UserId = "2"

Income.Income = 120

Income.country = "PAK"

Income.sys_Save

Income.sys_Close


Set Income = Nothing

Saturday, March 13, 2010

HealthSpace

It seems there is a government-run GoogleHealth-like service in the UK.



I am still miffed by the resistance people show to this type of project. After all, they bank online. Why not keep track of their health info online?

Thursday, January 21, 2010

Google App Engine and Python

Quick steps to develop with Google App Engine:

- download the SDK
- this will place a Google App Launcher shortcut on your desktop
- click on it
- File > Create New Application and choose the directory (a subdirectory with the app name will be created there)
- Edit and change the name of the main *.py file you'll be using (say, myapp.py)
- create your myapp.py file and save it in the subdirectory created 2 steps ago
- select the app in the Google App Launcher main window and click Run
- open a browser: http://localhost:808x (this is the default value, check in your application settings as defined in Google App Launcher)
- then you have to deploy it to your applications in your Google profile

More, later, this is just a quick vade mecum.

Tuesday, January 19, 2010

MySQL errno 150

InnoDb needs an index to exist on the column referenced by a FKey. Interesting, I have to check if this is a requirement of the relational model, I don't remember seeing that before although I seem to recall that this might be one of those auto/hidden indexes created by SQL Server (__________ix#122200_____________, etc).

Monday, January 04, 2010

Interesting SQL

select * from #t1

can also be written as....

select * from #t1 t1 left join #t2 t2
on 1 = 2


While:
select * from #t1 t1 left join #t2 t2
on 1 = 1

... is really a cross join.

This is acceptable:

select * from
(select 1 a, 2 b, 3 c) t1
full outer join
(select 2 a, 4 b, 5 c) t2
on t1.a = t2.a

Perhaps not very useful, but interesting to know nonetheless as they are available. And, I really do like (T-SQL's new) MERGE statement.

Saturday, December 05, 2009

Google App Engine

The Python dev environment. A very basic sample app that comes with the SDK - and which I managed to deploy without a lot of headache (and without having read the documentation!).

Python confusion

...at least for a newbie.
  • lists: L = ['a', 'b', 'this is another element', 1, [1, 2, 3]]. Can do a L.append(), len(L), L.pop(), etc
  • tuples: T = 1, 2, 3, 4, 'this is a tuple element'. Or T1 = () for an empty tuple, T2= 'one element tuple', . None of the functions listed above apply.
  • sets: S = {1, 2, 3, 'set element'}. The items must be unique and set functions are available. S = set(L) converts list to set (removes duplicates in the process).

Sunday, November 08, 2009

Quick note on ORM

So it would appear that Cache uses the active record ORM. But combined with a unit of work for related objects (swishing, I believe they call it). I really need the time to look at Cache, the Entity Framework, and Python's ORM in more detail. Interesting stuff.