Julian @ Thales

Saturday, December 05, 2009

Google App Engine

The Python dev environment. A very basic sample app that comes with the SDK - and which I managed to deploy without a lot of headache (and without having read the documentation!).

Python confusion

...at least for a newbie.

lists: L = ['a', 'b', 'this is another element', 1, [1, 2, 3]]. Can do a L.append(), len(L), L.pop(), etc
tuples: T = 1, 2, 3, 4, 'this is a tuple element'. Or T1 = () for an empty tuple, T2= 'one element tuple', . None of the functions listed above apply.
sets: S = {1, 2, 3, 'set element'}. The items must be unique and set functions are available. S = set(L) converts list to set (removes duplicates in the process).

Sunday, November 08, 2009

So it would appear that Cache uses the active record ORM. But combined with a unit of work for related objects (swishing, I believe they call it). I really need the time to look at Cache, the Entity Framework, and Python's ORM in more detail. Interesting stuff.

Saturday, October 31, 2009

Cloud, AIR, GoogleHealth

My new article is online at developer.com. Wish I would have added a few more images though. What to write about next?

Monday, October 26, 2009

Wither SQL?

More on the rise of non-relational databases. Maybe it is time for me to do another 'strategy' post... something to tie Cache, CouchDb and all the others together.

Friday, October 23, 2009

Under the, err... hood?

Just love it when unsuspecting users are exposed to the gory details of their favorite web sites.

Wednesday, October 14, 2009

CouchDb

Yet another thing to look at.

Friday, October 02, 2009

More on ORM; optimization?

Thinking about Cache and its 'native' objects, I wonder if when those are persisted to disk, only the data is actually saved, or any functions as well - i.e., p-Code compiled ObjectScript or Cache Basic member functions. Or perhaps even (class-query) SQL. All this code could be optimized for the given state (properties) of the object.

Then the question becomes, where is this data saved - perhaps in some raw extensions of the sparse arrays that hold the object member data.

Another interesting aspect (related to the sparse array storage system) is the kind of optimization, if any, that occurs at the SQL relational engine level. If there is optimization of any kind done at the I/O-sparse array level, this might conflict with the SQL optimization. Interesting stuff.

Which brings into question, is the optimization cottage industry a by product of the relational model? I have always found Oracle's optimization 'strategies' (the thick books dealing with that) somewhat ludicrous and antiquated. In order to do that really well, you need a deep understanding both of data and of sorting algorithms; with so many intervening layers (physical design, I/O patterns), even that understanding is corrupted. So if you can avoid a couple of grievous errors (e.g. multiple joins on non-indexed columns), you will do reasonably well. But then, the DBMS should be able to detect if you're about to make a grievous error (or perhaps the reporting tool, if you use one). So, why a thick book on optimization?

AIR and GoogleHealth

Finally, I completed this project. Write up coming soon - in the meantime, here is the Javascript/AIR code amalgamation. Briefly, this is a client for GoogleHealth written in Adobe AIR/Javascript; it lets you query a GH profile and update it (via the Atom protocol). GH documentation is spotty and occasionally incorrect, so this wasn't as pain-free as it should. Neither is the code production-ready or elegant. It is just a prototype - a working one.

The code requires a (sqlite) database, and of course the HTML forms. However, the most important functionality is encapsulated in the file, so that should be enough for a quick start.

It's all, of course, ensconced in Subversion....

Wednesday, September 30, 2009

Subversion

I have not used source control systems much, and I am finding that setting it up, on a Windows machine, with open source IDE's (especially not Eclipse) is more painful than it should be - documentation somehow seems to assume you're either using Eclipse or a Unix system or both. Here is what seems to work for me:

install Subversion
create (in DOS) a directory where you will store the files: dirX
in DOS: svnadmin create dirX (e.g.: D:\svn)
in DOS: set EDITOR=notepad.exe
in DOS, D:\>svn mkdir file:///svn/python (if python is the sub directory where you want to store a project); using a \ (eg svn\python) will cause svn to fail with a weird assertion
do the initial load of the project in the subversion system: svn import D:\pythonsource\ file:///svn/python (assuming your project is in D:\pythonsource)
you will get a message in Notepad - close it, and choose [c] in DOS to continue the process of loading the directory into subversion
at this point you will have the original source, the subversion source, and when the IDE will check out from subversion it will create another project, so you can delete the initial source directory
you might want to only include the source files from the initial load... and create the project to include everything; have to be careful here if you need additional libraries (eg developing Processing projects in the NetBeans IDE, which will need the additional core.jar added to libraries)
set up the IDE's:
NetBeans:
use the TeamCheckout menu option
use the URL as below (Aptana)
you will be asked to create a new project to which the files will be downloaded
if you do, be careful not to create a new Main class (assuming you have a Java project)
so ideally the workflow is
create the initial project in the IDE
only keep the SRC directory
create the SVN structure as above
create the new project in the IDE based on a SVN checkout
Aptana:
open the SVN view
create new Repository Location (right click in the SVN window)
the URL will be file:///d:/svn/python
then back to the SVN view to check out the project into an active project (right click on the repository)
you will manipulate the files through the Team context menu (right click the file in the solution explorer) in the main Aptana view (not Pydev, if you are using it for Python files) - update the file, update the directory, then check it in
if you import it into a new project, eg AIR, you will be able to specify all the parameters again so if you have some existing project parameters (eg startup form), you will need to manually make the necessary adjustments (for AIR, change the application manifest, application.xml; also you will need to reimport the AIRAliases.js file)
at this point the code is checked out and available to use; remember to update/commit it to the repository
with AIR specifically, you shouldn't commit the icons to the repository (and others such as the .project file)

Alternatively, (at least in NetBeans), once you created the first SVN connection, you can check in a project without going through svn import. Just write the source, then right click on it and choose SubversionCommit to sent it to the repository. You can still look at the history changes between different versions - not sure how well this works in an environment with multiple users though since the original codebase is your own.

More details here. Notice that having Subversion running will show the hard drive where you have the repositories with a different icon in Windows Explorer.

Monday, September 28, 2009

Oracle and objects

Some quick notes regarding Oracle (11)'s OO features:

Create a custom type - which, other than data types, can include member functions (defined in two parts, the data and the function declarations, and the body containing the function definitions).

Create the table:

CREATE TABLE( person_typ pobject, ... )

Inserting the data is done this way:

INSERT INTO object_table VALUES ( 'second insert',
person_typ (51, 'donald', 'duck', 'dduck@disney.com', '66-650-555-0125'));

Notice the implicit constructor.

To call a method:

SELECT o.pobject.get_idno() from object_table o

This is cool. But usually objects are used in code. So how is the client code/databaset object chasm bridged over?

These objects should be stored alone, without relational data (row objects as opposed to column objects as in the example above).

CREATE TABLE person_obj_table OF person_typ;

Scanning the object table:

DECLARE person person_typ;
BEGIN
SELECT VALUE(p) INTO person FROM person_obj_table p WHERE p.idno = 101;
person.display_details();
END

Pointers to objects are supported via the REF type.
You can use a SELECT INTO to load a specific row object into a object variable.

You can implement database functions, procedures, or member methods of an object
type in PL/SQL, Java, C, or .NET as external procedures. This is a way to have the objects execute code defined externally. Only PL/SQL and Java code is stored in the database.

As far as consuming objects externally, one way is by the means of using untyped structures or by using a wizard to create strongly typed (Java) classes:

Strongly typed representations use a custom Java class that corresponds to a particular object type, REF type, or collection type and must implement the interface oracle.sql.ORAData.

Object views, where you define a filter that interprets the rows in a table as an object, is an interesting innovation.

So does this really solve the impedance problem? It's not like you define an object in C# then persist it in the database, then deserialize it in the application again and call its methods. It's more like, you define an object in the database, and with some manual work you can map between it and a custom class you define in Java. You can define some of its methods in C# (using the Oracle Database Extensions for .NET) - how is that for multiple indirections?

The question is really, where do you want your code to execute. In the case discussed above, (defining member functions in .NET) Oracle acts as a CLR host for the .NET runtime; not unlike the way SQL Server external procedures (written in C and compiled as DLL's) used to run in an external process space. So the code executes outside the (physical) database process, but still inside a (logical) database layer. I still can't escape a nagging feeling that this is as database-centric a view of the application as they come. Usually the design of an application starts with actors modeling, etc, and the data layer is something that does not come into play until the end. Ideally, from an application designer's perspective, as I mentioned above, you should be able to just persist an object somehow to the database, and instantiate/deserialize it from the data layer/the abstract persistence without too much fuss. In the case of Cache this is made easier by the fact that the application layer coexists with the database layer and has access to the native objects (at least, if you use the Cache application development environment).

In the case of Oracle the separate spaces, database for storage/execution and application for execution pose the standard impedance discrepancy problem, which I am not sure is in any way eased by the OO features of the database.

An ideal solution? Maybe database functionality should be provided by the OS layer and the application development/execution environment should be able to take advantage of that.

Meanwhile, Microsoft's Entity Framework (actually, a rather logical development from ADO.NET) deals with this problem in the dev environment. What I have seen so far looks cool, just a couple of questions:

can you start with the entities and generate (forward engineer) the database tables

how is the schema versioned and how are evolutionary changes sync'ed

how does the (obvious) overhead perform when there are hundreds of tables, mappings, etc.

Incidentally, using the Oracle ODP.NET driver in Visual Studio yields a much better experience with an Oracle database than using the standard MS drivers. You actually get a return (XML-formatted) when querying object tables (the MS driver reports it as 'unsupported data type') and can interact with the underlying database much more, including tuning advisor, deeper database object introspection, etc.

Even PostgreSQL (which I find quite cool actually) does portray itself as having object/relational features - table structures can be inherited.

Saturday, September 26, 2009

More on globals and classes in Caché

Interesting - it seems that "dynamic languages" have been around for much longer than (us) Ruby users would think. Here's Caché's own version of it at work:

Class Definition: TransactionData

/// Test class - Julian, Sept 2009
Class User.TransactionData Extends %Persistent
{
Property Message As %String;
Property Token As %Integer;
}

Routine: test.mac

Set ^tdp = ##class(User.TransactionData).%New()
Set ^tdp.Message = "XXXX^QPR^JTX"
Set ^tdp.Token = 131

Write !, "Created: " _ ^tdp

Terminal:

USER> do ^test
... Created 1@User.TransactionData

Studio: Globals

tdp
^tdp = "1@User.TransactionData"
tdp.Message
^tdp.Message = "XXXX^QPR^JTX"
tdp.Token
^tdp.Token = 131

The order of creation is:

create the class
this will create the SQL objects
populating the SQL table will instantiate the globals
the globals are: classD for data, classI for index

Objects can be created (%New)/opened(%OpenId) from code, but to be saved (%Save: which will update the database), the restrictions must be met (required properties, unique indexes, etc).

Also, I finally got the .NET gateway generator to work: it creates native .NET classes that can communicate with Cache objects. Here is a sample of the client code:

InterSystems.Data.CacheClient.CacheConnection cn = new InterSystems.Data.CacheClient.CacheConnection("Server=Irikiki; Port=1972;" +
"Log File = D:\\CacheNet\\DotNetCurrentAccess.log; Namespace = USER;" +
"Password = ______; USER ID = ____");
cn.Open();
PatientInfo pi = new PatientInfo(cn);
pi.PatientName = "New Patient";
pi.PatientID = new byte[1]{6};
InterSystems.Data.CacheTypes.CacheStatus x = pi.Save();
Console.WriteLine(x.Message);

PatientInfo is a class defined in Cache, as follows:

Class User.PatientInfo Extends %Persistent
{

Property PatientName As %String [ Required ];
Property PatientDOB As %Date;
Property PatientID As %ObjectIdentity;

Method getVersion() As %String
{
Quit "Version 1.0"
}

Index IndexPatientName On PatientName;
Index IndexPatientId On PatientID [ IdKey, PrimaryKey, Unique ];

}

Easy enough, the getVersion() method is available to the C# code, as are the persistence and all the other methdods natively available in ObjectScript. The generated code is here.

Wednesday, September 23, 2009

Ahead of the curve?

Some of the challenges I encountered while working on the AIR/GoogleHealth project:

- learning the Google Data API
- learning the Google Health API which rests on top of the Data API
- (re) figuring out some of AIR's limitations and features
- (re) figuring out some of JavaScript's limitations and features
- using the mixed AIR/JavaScript environment

In my experience this is pretty standard when dealing with new languages and platforms. 15 years on, still a struggle - but then probably one should be worried when one becomes too proficient in a language/platform because it's already obsolete by then.

Tuesday, September 22, 2009

Caché and ODBC

And yes, reporting tools do indeed allow you to use Caché-specific SQL:

Above, Microsoft Report Builder 2.0 with Caché-tinged SQL.

Monday, September 21, 2009

Sybase joins the healthcare fray

Sybase has now a set of solutions for healthcare. Which is interesting, as previously they were known for their financial industry focus. So indeed it would appear that healthcare-oriented IT is poised to grow to the same prominence as that hitherto enjoyed by finance-IT.

Their flagship product in the industry seems to be eBiz Impact, YAIP (yet another integration platform) in the vein of Ensemble, DBMotion, and perhaps even alert-online. I might have to revise my chart from a few posts ago.

Saturday, September 12, 2009

XMLite?

So, just for fun, I decided to code the GoogleHealth client in Adobe AIR. Using the embedded sqlite database allows for a nice persistence of the 'session' variables, but here we also run into a small issue: since communication with GoogleHealth is done via XML - which, BTW, demands feeding the XMLHttpRequest output into DOMParser for a more natural processing - and there are large XML documents to be passed between the cloud and the client (e.g., the notice template and the other CCR data) it would make a lot of sense to use a XML database such as xDB. But, AIR is JS-based, hence no easy JDBC access, so the only solution would seem to be using SQLite's Virtual Tables as a gateway into xDB. Not sure it is doable - it probably is, but not worth the effort (VTables need C-API coding, xDB is Java-based... etc). Just another example of the impedance difference problem, this time at the data communication layer, and an illustration of how global data connectivity is still far from being achieved.

Thank you for the improved version

Getting back to working on the GoogleHealth demo. Code that used to work a month ago doesn't anymore - the only change is, I upgraded Firefox. Which caused some problems with XMLHttpRequest. D'oh - the code works just fine in Internet Explorer, save for the (useless) ActiveX warnings. Why, why, why Firefox, why do you return a '0' status? Changes/bugs/whatever it is such as this as very annoying, a waste of time, and a serious productivity drain. Not to mention that Firefox doesn't seem to render this very site correctly.

Ok enough ranting. Will be documenting the GH project next... update to follow.

Tuesday, September 08, 2009

Follow-ups

Interesting link related to my previous posts on Intersystems, HL7, etc.

And SPARQL, something I should look into.

Tuesday, August 04, 2009

(very) Preliminary performance comparisons

Ok, I hope to finish this before I tire of it, but here are the comparisons between INSERT ops for Cache and C-Tree. x - # records, y - milliseconds.

Friday, July 31, 2009

Competition in high technology

One of the purposes of the last posts was to see how do high-tech firms actually compete. Industry analysis is usually presented in a nice compartmentalized way where the value chain is clearly identified and the five forces framework can be neatly applied. I think this post will show that for large firms, with offerings across different segments of an industry, or even crossing industry barriers, the analysis is a bit more complex.

I think this starts with the fact that the "IT industry" is in fact a multiple-headed beast, since so many other industries use it. So defining the industry in which these players compete is difficult in itself.

So basically some companies started in an industry vertical (Intersystems - healthcare) where they built a complete stack which then they exported to other verticals (finance for Intersystems), or to the "center", becoming integrated players (Cache is portraying itself as a general purpose "post-relational" database; my guess is that this moniker is an attempt to rebrand it as a mainstream competitor, kind of reframing the "hierarchical database with roots in 60's healthcare software" description; BTW, there is nothing wrong with this description, and I think it is a cool product with remarkable performance characteristics).

The challenge in this case is convincing a mainstream audience that a niche product originating in a vertical is indeed a viable proposition. Tough, especially since the ecosystem (e.g., reporting tools) is built around standards that for example Cache works around (e.g., the SQL "pointers").

Secondly, there are the pure vertical players (which I haven't really talked much about here such as Siemens and GE). They built their applications portfolios by acquisitions (so perhaps dbMotion is a potential acquisition target?) but they rely on the mainstream vendors from the "center" for the base technology (perhaps; e.g., Siemens uses MS SQL as the db engine for its HIS, but Epic uses Cache).

Then, there are the mainstream technology companies which are trying to move from the center (pure database platform) into verticals (Amalga). At this point they are obviously encroaching on the vertical vendors territory, be they pure vertical players or integrated players. How will companies compete on one segment while collaborating on others remains to be seen (vertical industry offering competition, collaboration at the platform level).

Fourth, there are niche players (dbMotion, SQLite, FairCom) which operate either in the vertical or in the center, but offering solutions appropriate for a specific vertical (e.g. FairCom having found a reasonalby comfortable place as the engine of choice for turnkey systems). As mentioned already, I would guess that at some point dbMotion's PE backers (if there are any) would be looking for an exit in the guise of a purchase by a vendor, either in the mainstream/center (more likely) or in the vertical, while SQLite or FairCom are likely, due to their more general (albeit niche) appeal, to survive on their own.

There are plenty of interesting companies I have not covered such as MCObject, db4o, Pervasive, OpenLINK, NeoTool, and perhaps even Progress. As time permits I might revisit this writeup to include them, and perhaps even do a nice boxes and arrows schema, as good strategy analysis always seems to demand!