Julian @ Thales: API

Showing posts with label API. Show all posts

Friday, January 20, 2012

Database.com

I have started using database.com. Although the authentication (from Apex Data Loader) is a pain, I have to say I am quite impressed with how powerful the platform is. A couple of quick thoughts, and I will soon have more I hope, as I just started developing an application using it:

interesting to see how many of the features (value and display lists, row id references) are similar to (for example) Caché 's: basically nothing is ever new in computing it seems
its usability would be greatly increased I guess if it supported disconnected recordsets so that a client application can use it even without access to the cloud

This also proves that it does make sense in certain instance to maintain one's own platform/language/database stack.

More soon.

Wednesday, January 12, 2011

Yahoo + Koprol != Love

You would think that it would be easy to link your Yahoo and Koprol accounts.
If only.

Friday, June 25, 2010

Google Analytics API

Very basic GA API project (very poorly hosted too!)

Monday, June 14, 2010

EMC xProc Designer

Yet one more visual designer.

Thursday, April 29, 2010

Twitter Python Mongo

...or how many buzzwords can you get in one title. Here is a shortish piece of code that pulls data from Twitter and inserts it into Mongo. Other than the shortness of the code (given what it accomplishes!), what is remarkable is the ease of use of the data that is passed around, with a minimum amount of marshalling: Twitter can return data in JSON which is the native Mongo format and Python can use with a minimum of tweaking (mostly to reduce the response from Twitter).


import urllib
import json
import string
from pymongo import Connection

def runQuery(query, pp, pages):
 ret = []
 for pg in range(1, pages+1):
  print 'page...' + str(pg)
  p = urllib.urlopen('http://search.twitter.com/search.json?q=' + query + '&rpp=' + str(pp) + '&page=' + str(pg))
  s = json.load(p)
  dic = json.dumps(s)
  dic = string.replace(dic, 'null', '"none"')
  dx = eval(dic)
  listOfResults = dx['results']
  for result in listOfResults:
   ret.append( { 'id':result['id'], 'from_user':result['from_user'], 'created_at':result['created_at'], 'text': result['text'] } )
  completeRet = {"results": ret}
 return completeRet
  
c = Connection()
d = c.twitterdb
coll = d.postbucket
res = runQuery('Iran', 100, 15)
ptrData = res.get('results')
for item in ptrData:
 coll.save(item)

A Twitter Python web service

Taking the code from the previous post: here is a Python web service that reads the Twitter feed for a given query and returns a subset of the results in JSON:


import urllib
import json
import string
import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler

def runQuery(query, pp, pages):
 p = urllib.urlopen('http://search.twitter.com/search.json?q=' + query + '&rpp=' + str(pp) + '&page=' + str(pages))
 s = json.load(p)
 dic = json.dumps(s)
 dic = string.replace(dic, 'null', '"none"')
 dx = eval(dic)
 listOfResults = dx['results']
 ret = []
 for result in listOfResults:
  ret.append( { 'id':result['id'], 'from_user':result['from_user'], 'created_at':result['created_at'], 'text': result['text'] } )
 completeRet = {"results": json.dumps(str(ret))}
 return str(completeRet)
  
class RequestHandler(SimpleXMLRPCRequestHandler):
 rpc_paths=('/RPC2')
 
server=SimpleXMLRPCServer(("localhost", 8000), requestHandler=RequestHandler)
server.register_introspection_functions()
server.register_function(runQuery, 'qry')
server.serve_forever()

More potential uses of this (including Google Apps, Mongo, or Processing) later. And here is how to use it (from Python):


>>> import xmlrpclib
>>> s = xmlrpclib.ServerProxy('http://localhost:8000')
>>> print s.qry('Bumrungrad', 10, 1)

Where the first numeric parameter is the number of records per page and the second, the number of page (max 100/15).

Friday, April 23, 2010

NHS Choices on GoogleApps

Here is the Google Apps version of the (Python) NHS Choices application I discussed in the previous posts.

I can't even begin to say how cool this is. 3 hours in Notepad (hence the crudeness) and we get the hospitals in the UK, from anywhere. This is really amazing.

The source code.

Saturday, April 10, 2010

Mongo, Python, and NHS Choices

Using Python, NHS open data (NHS Choices), and Mongo: for example, getting the name and the web sites of all the providers in the Wigan area (why Wigan? No idea, just that their football team ~~seems to be pretty bad~~ recently defeated Arsenal, and the name stuck with me).

Start the database: go to the bin subdirectory of the install directory, and type mongod –dbpath .\

I will connect to the database using the Python API (pymongo).

NHS choices offers several health data feeds:

News
Find Services
Live Well
Health A-Z (Conditions)
Common Health Questions

As mentioned, I will use the second; to access it, you need to get a password and a login (apply for one here).

The basic Python code to query for providers and extract their names and web addresses is this:

First, build a list of services, as per the NHS documentation (the service code and the location are two required parameters):

services = [[1, 'GPs'], [2,'Dentists'], etc]

Then, query the web service:

for x in range(0, len(services)):

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")

if website[0].firstChild <> None:

xmldoc.unlink()

The response will have a 3-item dataset, the service type, the provider name, and the web site (if one exists).

Mongo is a bit different in that the 'server' does not create a database physically until something is written to that database, so from the console client (launch, in \bin\: mongo) you can connect to a database that does not exist yet (use NHS in this case will create the NHS database - in effect, it will create files named NHS in the current directory).

Creating the 'table' from the console client: NHS = { service : "service", name : "name", website : "website" };

db.data.save(NHS); will create a collection (similar to SQL namespaces) and save the NHS table into it. The mongo client uses JavaScript as language and JSON notation to define the tables.

To access this collection in Python:

>>> from pymongo import Connection

>>> connection = Connection()

>>> db=connection.NHS

>>> storage=db.data

>>> post={"service" : 1, "name" : "python", "website" : "mongo" }

>>> storage.insert(post)

Here is the full code in Python to populate the database:

import urllib

from xml.dom import minidom

from pymongo import Connection

print "Building list of services..."

services = [[1, 'GPs'], [2,'Dentists'], [3, 'Pharmacists'], [4, 'Opticians'], [5, 'Hospitals'], [7, 'Walk-in centres'],[9, 'Stop-smoking services'], [10, 'NHS trusts'], [11, 'Sexual health services'], [12,' DISABLED (Maternity units)'], [13, 'Sport and fitness services'], [15, 'Parenting & Childcare services'], [17, 'Alcohol services'], [19, 'Services for carers'], [20, 'Renal Services'], [21, 'Minor injuries units'], [22, 'Mental health services'], [23, 'Breast cancer screening'], [24, 'Support for independent living'], [26, 'Memory problems'], [27, 'Termination of pregnancy (abortion) clinics'], [28, 'Foot services'], [29, 'Diabetes clinics'], [30, 'Asthma clinics'], [31,' Midwifery teams'], [32, 'Community clinics']]

print "Connecting to the database..."

connection = Connection()

db = connection.NHS

storage = db.data

print "Scanning the web service..."

for x in range(0, len(services)):

print '*** ' + services[x][1] + ' ***'

endpoint='http://www.nhs.uk/NHSCWS/Services/ServicesSearch.aspx?user=__login__&pwd=__password__&q=Wigan&type=' + str(x)

usock=urllib.urlopen(endpoint)

xmldoc=minidom.parse(usock)

usock.close()

nodes = xmldoc.getElementsByTagName("Service")

for node in nodes:

website = node.getElementsByTagName("Website")

name = node.getElementsByTagName("Name")

namei = name[0].firstChild.nodeValue

if website[0].firstChild <> None:

websitei = ' ' + website[0].firstChild.nodeValue

else:

websitei = 'none'

post = { "service" : x, "name" : namei, "website" : websitei }

storage.insert(post)

xmldoc.unlink()

To see the results from the Mongo client:

> db.data.find({service:5}).forEach(function(x){print(tojson(x));});

Will return all the hospitals inserted in the database (service for hospitals = 5); the response looks like this:

{

"_id" : ObjectId("4bc062dbc7ccc10428000032"),

"website" : " http://www.wiganleigh.nhs.uk/Internet/Home/Hospitals/tlc.asp",

"name" : "Thomas Linacre Outpatient Centre",

"service" : 5

}

Next, it might be interesting to try this using Mongo's REST API, and perhaps to build a GoogleApp to do so.

Wednesday, March 31, 2010

Everything is searchable

Interesting things happening while I wasn't watching. Not only the previously mentioned Data.gov, or the newspapers moving in the same direction, but also ...

Freebase
Wolfram Alpha Data

I'm really curious what effect will this have on databases. In theory, with the right authentication in place, everything could be exposed online and EDI would be vastly simplified, from sneakernet to HL7, everything would be replaced by REST calls.

At any rate, Freebase's attempt to organize everything is ambitious/stunning.

Wikipedia's own API, here.

Related: Talis.

Data.gov

US Government's open data initiative. Interesting, have to find out more about what's there. The potential for mashups and visualizations is great... if the data is trusted and current.
Here is the equivalent UK site.
I need to look into this some more, for now a lot of the data seems to be Excel files that can be downloaded. No universal REST/JSON access?

Saturday, March 27, 2010

Oracle objects vs Cache objects in .NET

Oracle Objects vs Cache Objects

Monday, March 15, 2010

Updating Cache from Excel

Excel (2007) is great as a quick database front end tool – reasonably versatile and easy to set up. This ease of use might make many want to be able to update a database from Excel, and that is not so easy.

It has to be done in code. One way would be through linking the Excel file to the database server (there are different mechanisms for doing this, e.g. OPENROWSET in SQL Server or the SQL Gateway in Cache). However, if you're running Office in 64-bit mode you soon discover that there is no 64-bit compliant Excel ODBC driver.

A quick and more natural solution is to embed the database update code in Excel itself; in the case of a Cache backend, there are two ways to write this code:

Using ADO/ODBC
Dim cm As Command
Dim cn As Connection
Dim cns As String

cns = "DSN=CacheWeb User"

Set cn = New Connection
cn.Open cns

Set cm = New Command
cm.ActiveConnection = cn
cm.CommandType = adCmdText
cm.CommandText = "INSERT INTO Income(UserId, Income, Country) VALUES('@u', @i, '@c')"
cm.Execute
Using Cache ActiveX

A bit more interesting since it gets into some Cache internals:

Dim Factory As CacheObject.Factory

Set Factory = New CacheObject.Factory

If Factory.Connect("cn_iptcp:127.0.0.1[1972]:TESTDBNS") = False Then

MsgBox "Failed to connect"

Exit Function

End If

Dim Income As Object

Set Income = Factory.New("Income")

Income.UserId = "2"

Income.Income = 120

Income.country = "PAK"

Income.sys_Save

Income.sys_Close

Set Income = Nothing

Monday, September 28, 2009

Oracle and objects

Some quick notes regarding Oracle (11)'s OO features:

Create a custom type - which, other than data types, can include member functions (defined in two parts, the data and the function declarations, and the body containing the function definitions).

Create the table:

CREATE TABLE( person_typ pobject, ... )

Inserting the data is done this way:

INSERT INTO object_table VALUES ( 'second insert',
person_typ (51, 'donald', 'duck', 'dduck@disney.com', '66-650-555-0125'));

Notice the implicit constructor.

To call a method:

SELECT o.pobject.get_idno() from object_table o

This is cool. But usually objects are used in code. So how is the client code/databaset object chasm bridged over?

These objects should be stored alone, without relational data (row objects as opposed to column objects as in the example above).

CREATE TABLE person_obj_table OF person_typ;

Scanning the object table:

DECLARE person person_typ;
BEGIN
SELECT VALUE(p) INTO person FROM person_obj_table p WHERE p.idno = 101;
person.display_details();
END

Pointers to objects are supported via the REF type.
You can use a SELECT INTO to load a specific row object into a object variable.

You can implement database functions, procedures, or member methods of an object
type in PL/SQL, Java, C, or .NET as external procedures. This is a way to have the objects execute code defined externally. Only PL/SQL and Java code is stored in the database.

As far as consuming objects externally, one way is by the means of using untyped structures or by using a wizard to create strongly typed (Java) classes:

Strongly typed representations use a custom Java class that corresponds to a particular object type, REF type, or collection type and must implement the interface oracle.sql.ORAData.

Object views, where you define a filter that interprets the rows in a table as an object, is an interesting innovation.

So does this really solve the impedance problem? It's not like you define an object in C# then persist it in the database, then deserialize it in the application again and call its methods. It's more like, you define an object in the database, and with some manual work you can map between it and a custom class you define in Java. You can define some of its methods in C# (using the Oracle Database Extensions for .NET) - how is that for multiple indirections?

The question is really, where do you want your code to execute. In the case discussed above, (defining member functions in .NET) Oracle acts as a CLR host for the .NET runtime; not unlike the way SQL Server external procedures (written in C and compiled as DLL's) used to run in an external process space. So the code executes outside the (physical) database process, but still inside a (logical) database layer. I still can't escape a nagging feeling that this is as database-centric a view of the application as they come. Usually the design of an application starts with actors modeling, etc, and the data layer is something that does not come into play until the end. Ideally, from an application designer's perspective, as I mentioned above, you should be able to just persist an object somehow to the database, and instantiate/deserialize it from the data layer/the abstract persistence without too much fuss. In the case of Cache this is made easier by the fact that the application layer coexists with the database layer and has access to the native objects (at least, if you use the Cache application development environment).

In the case of Oracle the separate spaces, database for storage/execution and application for execution pose the standard impedance discrepancy problem, which I am not sure is in any way eased by the OO features of the database.

An ideal solution? Maybe database functionality should be provided by the OS layer and the application development/execution environment should be able to take advantage of that.

Meanwhile, Microsoft's Entity Framework (actually, a rather logical development from ADO.NET) deals with this problem in the dev environment. What I have seen so far looks cool, just a couple of questions:

can you start with the entities and generate (forward engineer) the database tables

how is the schema versioned and how are evolutionary changes sync'ed

how does the (obvious) overhead perform when there are hundreds of tables, mappings, etc.

Incidentally, using the Oracle ODP.NET driver in Visual Studio yields a much better experience with an Oracle database than using the standard MS drivers. You actually get a return (XML-formatted) when querying object tables (the MS driver reports it as 'unsupported data type') and can interact with the underlying database much more, including tuning advisor, deeper database object introspection, etc.

Even PostgreSQL (which I find quite cool actually) does portray itself as having object/relational features - table structures can be inherited.

Saturday, September 26, 2009

More on globals and classes in Caché

Interesting - it seems that "dynamic languages" have been around for much longer than (us) Ruby users would think. Here's Caché's own version of it at work:

Class Definition: TransactionData

/// Test class - Julian, Sept 2009
Class User.TransactionData Extends %Persistent
{
Property Message As %String;
Property Token As %Integer;
}

Routine: test.mac

Set ^tdp = ##class(User.TransactionData).%New()
Set ^tdp.Message = "XXXX^QPR^JTX"
Set ^tdp.Token = 131

Write !, "Created: " _ ^tdp

Terminal:

USER> do ^test
... Created 1@User.TransactionData

Studio: Globals

tdp
^tdp = "1@User.TransactionData"
tdp.Message
^tdp.Message = "XXXX^QPR^JTX"
tdp.Token
^tdp.Token = 131

The order of creation is:

create the class
this will create the SQL objects
populating the SQL table will instantiate the globals
the globals are: classD for data, classI for index

Objects can be created (%New)/opened(%OpenId) from code, but to be saved (%Save: which will update the database), the restrictions must be met (required properties, unique indexes, etc).

Also, I finally got the .NET gateway generator to work: it creates native .NET classes that can communicate with Cache objects. Here is a sample of the client code:

InterSystems.Data.CacheClient.CacheConnection cn = new InterSystems.Data.CacheClient.CacheConnection("Server=Irikiki; Port=1972;" +
"Log File = D:\\CacheNet\\DotNetCurrentAccess.log; Namespace = USER;" +
"Password = ______; USER ID = ____");
cn.Open();
PatientInfo pi = new PatientInfo(cn);
pi.PatientName = "New Patient";
pi.PatientID = new byte[1]{6};
InterSystems.Data.CacheTypes.CacheStatus x = pi.Save();
Console.WriteLine(x.Message);

PatientInfo is a class defined in Cache, as follows:

Class User.PatientInfo Extends %Persistent
{

Property PatientName As %String [ Required ];
Property PatientDOB As %Date;
Property PatientID As %ObjectIdentity;

Method getVersion() As %String
{
Quit "Version 1.0"
}

Index IndexPatientName On PatientName;
Index IndexPatientId On PatientID [ IdKey, PrimaryKey, Unique ];

}

Easy enough, the getVersion() method is available to the C# code, as are the persistence and all the other methdods natively available in ObjectScript. The generated code is here.

Wednesday, September 23, 2009

Ahead of the curve?

Some of the challenges I encountered while working on the AIR/GoogleHealth project:

- learning the Google Data API
- learning the Google Health API which rests on top of the Data API
- (re) figuring out some of AIR's limitations and features
- (re) figuring out some of JavaScript's limitations and features
- using the mixed AIR/JavaScript environment

In my experience this is pretty standard when dealing with new languages and platforms. 15 years on, still a struggle - but then probably one should be worried when one becomes too proficient in a language/platform because it's already obsolete by then.

Thursday, July 23, 2009

Open Source, Cloud-based Approach to Describing Solution Architectures

Mike Walker discusses in a recent issue of the Microsoft Architecture Journal a set of tools that can be used to document solution architectures - based, not surprisingly, on Microsoft tools. Together, these make up the Enterprise Architecture Toolkit.

Since I don't have a Windows Server to run Sharepoint (I could, presumably, use Azure), I came up with a similar application setup using open source or cloud-based tools:

The only thing that needs to be built is the manager ("gateway", in the chart above) which can be a RIA application whose role is to tie everything together. Sounds simple enough?

Tuesday, July 14, 2009

XDb

Documentum/EMC offers a XML database named XDb. XProc is in fact designed to work with XDb.

Perhaps XML database is a misnomer. It really is a way of storing XML documents, without (apparently) enforcing any relational integrity constraints other than those defined by the DTD (and perhaps XLink, athough so far I don't know if XLink is declarative only). Therefore XDb and XProc work hand in hand, one allowing for the storage of XML documents, the other allowing for manipulation of those documents (and perhaps, in-place updates).

The logical design is therefore done at a superior level. The 'database' concept appears to function when various stored documents are manipulated as sets - XDb supports XQuery (preferred), also XPath and XPointer.

Each XML document is stored as a DOM.Document and can be manipulated using the standard methods (createAttribute, createTextNode, etc).

I can see a possible usage in, for example, GoogleHealth - where XDb would store well-formatted templates for charts, diagnoses, allergies, vaccines, etc, which would be populated for each patient encounter and loaded into GH.

While in normal usage write contention should not be an issue, I am curious how does XDb deal with document versioning and multiple writes against the same documents - or is the R/W pipe single throttled? (later - here it is - clicking Refresh in the Db manager while an update process was underway yielded the following error:)

Interesting XML database reference information here.

Wednesday, June 17, 2009

HealthVault

Yes it does have an API, and some sample apps. The .NET samples include a web application to talk to the service - however, I give up on it for the time being as the utility to make certificates seems to crash all the time (nice unhandled error, by the way; the crash seems related to the fact that the app is installed in Program Files as opposed to Documents, and Visual Studio doesn't have full rights to PF). Will come back to it later, but so far it is remarkably similar to Google Health.

One additional thing, the SDK features some device drivers to enable medical devices to talk directly to HV. Nice - as long as they don't cause any crashes...

Tuesday, April 14, 2009

Slideshare

Some of the academic presentations I have worked on can be found here.