Julian @ Thales: 2009

Saturday, December 05, 2009

Google App Engine

The Python dev environment. A very basic sample app that comes with the SDK - and which I managed to deploy without a lot of headache (and without having read the documentation!).

Python confusion

...at least for a newbie.

lists: L = ['a', 'b', 'this is another element', 1, [1, 2, 3]]. Can do a L.append(), len(L), L.pop(), etc
tuples: T = 1, 2, 3, 4, 'this is a tuple element'. Or T1 = () for an empty tuple, T2= 'one element tuple', . None of the functions listed above apply.
sets: S = {1, 2, 3, 'set element'}. The items must be unique and set functions are available. S = set(L) converts list to set (removes duplicates in the process).

Sunday, November 08, 2009

So it would appear that Cache uses the active record ORM. But combined with a unit of work for related objects (swishing, I believe they call it). I really need the time to look at Cache, the Entity Framework, and Python's ORM in more detail. Interesting stuff.

Saturday, October 31, 2009

Cloud, AIR, GoogleHealth

My new article is online at developer.com. Wish I would have added a few more images though. What to write about next?

Monday, October 26, 2009

Wither SQL?

More on the rise of non-relational databases. Maybe it is time for me to do another 'strategy' post... something to tie Cache, CouchDb and all the others together.

Friday, October 23, 2009

Under the, err... hood?

Just love it when unsuspecting users are exposed to the gory details of their favorite web sites.

Wednesday, October 14, 2009

CouchDb

Yet another thing to look at.

Friday, October 02, 2009

More on ORM; optimization?

Thinking about Cache and its 'native' objects, I wonder if when those are persisted to disk, only the data is actually saved, or any functions as well - i.e., p-Code compiled ObjectScript or Cache Basic member functions. Or perhaps even (class-query) SQL. All this code could be optimized for the given state (properties) of the object.

Then the question becomes, where is this data saved - perhaps in some raw extensions of the sparse arrays that hold the object member data.

Another interesting aspect (related to the sparse array storage system) is the kind of optimization, if any, that occurs at the SQL relational engine level. If there is optimization of any kind done at the I/O-sparse array level, this might conflict with the SQL optimization. Interesting stuff.

Which brings into question, is the optimization cottage industry a by product of the relational model? I have always found Oracle's optimization 'strategies' (the thick books dealing with that) somewhat ludicrous and antiquated. In order to do that really well, you need a deep understanding both of data and of sorting algorithms; with so many intervening layers (physical design, I/O patterns), even that understanding is corrupted. So if you can avoid a couple of grievous errors (e.g. multiple joins on non-indexed columns), you will do reasonably well. But then, the DBMS should be able to detect if you're about to make a grievous error (or perhaps the reporting tool, if you use one). So, why a thick book on optimization?

AIR and GoogleHealth

Finally, I completed this project. Write up coming soon - in the meantime, here is the Javascript/AIR code amalgamation. Briefly, this is a client for GoogleHealth written in Adobe AIR/Javascript; it lets you query a GH profile and update it (via the Atom protocol). GH documentation is spotty and occasionally incorrect, so this wasn't as pain-free as it should. Neither is the code production-ready or elegant. It is just a prototype - a working one.

The code requires a (sqlite) database, and of course the HTML forms. However, the most important functionality is encapsulated in the file, so that should be enough for a quick start.

It's all, of course, ensconced in Subversion....

Wednesday, September 30, 2009

Subversion

I have not used source control systems much, and I am finding that setting it up, on a Windows machine, with open source IDE's (especially not Eclipse) is more painful than it should be - documentation somehow seems to assume you're either using Eclipse or a Unix system or both. Here is what seems to work for me:

install Subversion
create (in DOS) a directory where you will store the files: dirX
in DOS: svnadmin create dirX (e.g.: D:\svn)
in DOS: set EDITOR=notepad.exe
in DOS, D:\>svn mkdir file:///svn/python (if python is the sub directory where you want to store a project); using a \ (eg svn\python) will cause svn to fail with a weird assertion
do the initial load of the project in the subversion system: svn import D:\pythonsource\ file:///svn/python (assuming your project is in D:\pythonsource)
you will get a message in Notepad - close it, and choose [c] in DOS to continue the process of loading the directory into subversion
at this point you will have the original source, the subversion source, and when the IDE will check out from subversion it will create another project, so you can delete the initial source directory
you might want to only include the source files from the initial load... and create the project to include everything; have to be careful here if you need additional libraries (eg developing Processing projects in the NetBeans IDE, which will need the additional core.jar added to libraries)
set up the IDE's:
NetBeans:
use the TeamCheckout menu option
use the URL as below (Aptana)
you will be asked to create a new project to which the files will be downloaded
if you do, be careful not to create a new Main class (assuming you have a Java project)
so ideally the workflow is
create the initial project in the IDE
only keep the SRC directory
create the SVN structure as above
create the new project in the IDE based on a SVN checkout
Aptana:
open the SVN view
create new Repository Location (right click in the SVN window)
the URL will be file:///d:/svn/python
then back to the SVN view to check out the project into an active project (right click on the repository)
you will manipulate the files through the Team context menu (right click the file in the solution explorer) in the main Aptana view (not Pydev, if you are using it for Python files) - update the file, update the directory, then check it in
if you import it into a new project, eg AIR, you will be able to specify all the parameters again so if you have some existing project parameters (eg startup form), you will need to manually make the necessary adjustments (for AIR, change the application manifest, application.xml; also you will need to reimport the AIRAliases.js file)
at this point the code is checked out and available to use; remember to update/commit it to the repository
with AIR specifically, you shouldn't commit the icons to the repository (and others such as the .project file)

Alternatively, (at least in NetBeans), once you created the first SVN connection, you can check in a project without going through svn import. Just write the source, then right click on it and choose SubversionCommit to sent it to the repository. You can still look at the history changes between different versions - not sure how well this works in an environment with multiple users though since the original codebase is your own.

More details here. Notice that having Subversion running will show the hard drive where you have the repositories with a different icon in Windows Explorer.

Monday, September 28, 2009

Oracle and objects

Some quick notes regarding Oracle (11)'s OO features:

Create a custom type - which, other than data types, can include member functions (defined in two parts, the data and the function declarations, and the body containing the function definitions).

Create the table:

CREATE TABLE( person_typ pobject, ... )

Inserting the data is done this way:

INSERT INTO object_table VALUES ( 'second insert',
person_typ (51, 'donald', 'duck', 'dduck@disney.com', '66-650-555-0125'));

Notice the implicit constructor.

To call a method:

SELECT o.pobject.get_idno() from object_table o

This is cool. But usually objects are used in code. So how is the client code/databaset object chasm bridged over?

These objects should be stored alone, without relational data (row objects as opposed to column objects as in the example above).

CREATE TABLE person_obj_table OF person_typ;

Scanning the object table:

DECLARE person person_typ;
BEGIN
SELECT VALUE(p) INTO person FROM person_obj_table p WHERE p.idno = 101;
person.display_details();
END

Pointers to objects are supported via the REF type.
You can use a SELECT INTO to load a specific row object into a object variable.

You can implement database functions, procedures, or member methods of an object
type in PL/SQL, Java, C, or .NET as external procedures. This is a way to have the objects execute code defined externally. Only PL/SQL and Java code is stored in the database.

As far as consuming objects externally, one way is by the means of using untyped structures or by using a wizard to create strongly typed (Java) classes:

Strongly typed representations use a custom Java class that corresponds to a particular object type, REF type, or collection type and must implement the interface oracle.sql.ORAData.

Object views, where you define a filter that interprets the rows in a table as an object, is an interesting innovation.

So does this really solve the impedance problem? It's not like you define an object in C# then persist it in the database, then deserialize it in the application again and call its methods. It's more like, you define an object in the database, and with some manual work you can map between it and a custom class you define in Java. You can define some of its methods in C# (using the Oracle Database Extensions for .NET) - how is that for multiple indirections?

The question is really, where do you want your code to execute. In the case discussed above, (defining member functions in .NET) Oracle acts as a CLR host for the .NET runtime; not unlike the way SQL Server external procedures (written in C and compiled as DLL's) used to run in an external process space. So the code executes outside the (physical) database process, but still inside a (logical) database layer. I still can't escape a nagging feeling that this is as database-centric a view of the application as they come. Usually the design of an application starts with actors modeling, etc, and the data layer is something that does not come into play until the end. Ideally, from an application designer's perspective, as I mentioned above, you should be able to just persist an object somehow to the database, and instantiate/deserialize it from the data layer/the abstract persistence without too much fuss. In the case of Cache this is made easier by the fact that the application layer coexists with the database layer and has access to the native objects (at least, if you use the Cache application development environment).

In the case of Oracle the separate spaces, database for storage/execution and application for execution pose the standard impedance discrepancy problem, which I am not sure is in any way eased by the OO features of the database.

An ideal solution? Maybe database functionality should be provided by the OS layer and the application development/execution environment should be able to take advantage of that.

Meanwhile, Microsoft's Entity Framework (actually, a rather logical development from ADO.NET) deals with this problem in the dev environment. What I have seen so far looks cool, just a couple of questions:

can you start with the entities and generate (forward engineer) the database tables

how is the schema versioned and how are evolutionary changes sync'ed

how does the (obvious) overhead perform when there are hundreds of tables, mappings, etc.

Incidentally, using the Oracle ODP.NET driver in Visual Studio yields a much better experience with an Oracle database than using the standard MS drivers. You actually get a return (XML-formatted) when querying object tables (the MS driver reports it as 'unsupported data type') and can interact with the underlying database much more, including tuning advisor, deeper database object introspection, etc.

Even PostgreSQL (which I find quite cool actually) does portray itself as having object/relational features - table structures can be inherited.

Saturday, September 26, 2009

More on globals and classes in Caché

Interesting - it seems that "dynamic languages" have been around for much longer than (us) Ruby users would think. Here's Caché's own version of it at work:

Class Definition: TransactionData

/// Test class - Julian, Sept 2009
Class User.TransactionData Extends %Persistent
{
Property Message As %String;
Property Token As %Integer;
}

Routine: test.mac

Set ^tdp = ##class(User.TransactionData).%New()
Set ^tdp.Message = "XXXX^QPR^JTX"
Set ^tdp.Token = 131

Write !, "Created: " _ ^tdp

Terminal:

USER> do ^test
... Created 1@User.TransactionData

Studio: Globals

tdp
^tdp = "1@User.TransactionData"
tdp.Message
^tdp.Message = "XXXX^QPR^JTX"
tdp.Token
^tdp.Token = 131

The order of creation is:

create the class
this will create the SQL objects
populating the SQL table will instantiate the globals
the globals are: classD for data, classI for index

Objects can be created (%New)/opened(%OpenId) from code, but to be saved (%Save: which will update the database), the restrictions must be met (required properties, unique indexes, etc).

Also, I finally got the .NET gateway generator to work: it creates native .NET classes that can communicate with Cache objects. Here is a sample of the client code:

InterSystems.Data.CacheClient.CacheConnection cn = new InterSystems.Data.CacheClient.CacheConnection("Server=Irikiki; Port=1972;" +
"Log File = D:\\CacheNet\\DotNetCurrentAccess.log; Namespace = USER;" +
"Password = ______; USER ID = ____");
cn.Open();
PatientInfo pi = new PatientInfo(cn);
pi.PatientName = "New Patient";
pi.PatientID = new byte[1]{6};
InterSystems.Data.CacheTypes.CacheStatus x = pi.Save();
Console.WriteLine(x.Message);

PatientInfo is a class defined in Cache, as follows:

Class User.PatientInfo Extends %Persistent
{

Property PatientName As %String [ Required ];
Property PatientDOB As %Date;
Property PatientID As %ObjectIdentity;

Method getVersion() As %String
{
Quit "Version 1.0"
}

Index IndexPatientName On PatientName;
Index IndexPatientId On PatientID [ IdKey, PrimaryKey, Unique ];

}

Easy enough, the getVersion() method is available to the C# code, as are the persistence and all the other methdods natively available in ObjectScript. The generated code is here.

Wednesday, September 23, 2009

Ahead of the curve?

Some of the challenges I encountered while working on the AIR/GoogleHealth project:

- learning the Google Data API
- learning the Google Health API which rests on top of the Data API
- (re) figuring out some of AIR's limitations and features
- (re) figuring out some of JavaScript's limitations and features
- using the mixed AIR/JavaScript environment

In my experience this is pretty standard when dealing with new languages and platforms. 15 years on, still a struggle - but then probably one should be worried when one becomes too proficient in a language/platform because it's already obsolete by then.

Tuesday, September 22, 2009

Caché and ODBC

And yes, reporting tools do indeed allow you to use Caché-specific SQL:

Above, Microsoft Report Builder 2.0 with Caché-tinged SQL.

Monday, September 21, 2009

Sybase joins the healthcare fray

Sybase has now a set of solutions for healthcare. Which is interesting, as previously they were known for their financial industry focus. So indeed it would appear that healthcare-oriented IT is poised to grow to the same prominence as that hitherto enjoyed by finance-IT.

Their flagship product in the industry seems to be eBiz Impact, YAIP (yet another integration platform) in the vein of Ensemble, DBMotion, and perhaps even alert-online. I might have to revise my chart from a few posts ago.

Saturday, September 12, 2009

XMLite?

So, just for fun, I decided to code the GoogleHealth client in Adobe AIR. Using the embedded sqlite database allows for a nice persistence of the 'session' variables, but here we also run into a small issue: since communication with GoogleHealth is done via XML - which, BTW, demands feeding the XMLHttpRequest output into DOMParser for a more natural processing - and there are large XML documents to be passed between the cloud and the client (e.g., the notice template and the other CCR data) it would make a lot of sense to use a XML database such as xDB. But, AIR is JS-based, hence no easy JDBC access, so the only solution would seem to be using SQLite's Virtual Tables as a gateway into xDB. Not sure it is doable - it probably is, but not worth the effort (VTables need C-API coding, xDB is Java-based... etc). Just another example of the impedance difference problem, this time at the data communication layer, and an illustration of how global data connectivity is still far from being achieved.

Thank you for the improved version

Getting back to working on the GoogleHealth demo. Code that used to work a month ago doesn't anymore - the only change is, I upgraded Firefox. Which caused some problems with XMLHttpRequest. D'oh - the code works just fine in Internet Explorer, save for the (useless) ActiveX warnings. Why, why, why Firefox, why do you return a '0' status? Changes/bugs/whatever it is such as this as very annoying, a waste of time, and a serious productivity drain. Not to mention that Firefox doesn't seem to render this very site correctly.

Ok enough ranting. Will be documenting the GH project next... update to follow.

Tuesday, September 08, 2009

Follow-ups

Interesting link related to my previous posts on Intersystems, HL7, etc.

And SPARQL, something I should look into.

Tuesday, August 04, 2009

(very) Preliminary performance comparisons

Ok, I hope to finish this before I tire of it, but here are the comparisons between INSERT ops for Cache and C-Tree. x - # records, y - milliseconds.

Friday, July 31, 2009

Competition in high technology

One of the purposes of the last posts was to see how do high-tech firms actually compete. Industry analysis is usually presented in a nice compartmentalized way where the value chain is clearly identified and the five forces framework can be neatly applied. I think this post will show that for large firms, with offerings across different segments of an industry, or even crossing industry barriers, the analysis is a bit more complex.

I think this starts with the fact that the "IT industry" is in fact a multiple-headed beast, since so many other industries use it. So defining the industry in which these players compete is difficult in itself.

So basically some companies started in an industry vertical (Intersystems - healthcare) where they built a complete stack which then they exported to other verticals (finance for Intersystems), or to the "center", becoming integrated players (Cache is portraying itself as a general purpose "post-relational" database; my guess is that this moniker is an attempt to rebrand it as a mainstream competitor, kind of reframing the "hierarchical database with roots in 60's healthcare software" description; BTW, there is nothing wrong with this description, and I think it is a cool product with remarkable performance characteristics).

The challenge in this case is convincing a mainstream audience that a niche product originating in a vertical is indeed a viable proposition. Tough, especially since the ecosystem (e.g., reporting tools) is built around standards that for example Cache works around (e.g., the SQL "pointers").

Secondly, there are the pure vertical players (which I haven't really talked much about here such as Siemens and GE). They built their applications portfolios by acquisitions (so perhaps dbMotion is a potential acquisition target?) but they rely on the mainstream vendors from the "center" for the base technology (perhaps; e.g., Siemens uses MS SQL as the db engine for its HIS, but Epic uses Cache).

Then, there are the mainstream technology companies which are trying to move from the center (pure database platform) into verticals (Amalga). At this point they are obviously encroaching on the vertical vendors territory, be they pure vertical players or integrated players. How will companies compete on one segment while collaborating on others remains to be seen (vertical industry offering competition, collaboration at the platform level).

Fourth, there are niche players (dbMotion, SQLite, FairCom) which operate either in the vertical or in the center, but offering solutions appropriate for a specific vertical (e.g. FairCom having found a reasonalby comfortable place as the engine of choice for turnkey systems). As mentioned already, I would guess that at some point dbMotion's PE backers (if there are any) would be looking for an exit in the guise of a purchase by a vendor, either in the mainstream/center (more likely) or in the vertical, while SQLite or FairCom are likely, due to their more general (albeit niche) appeal, to survive on their own.

There are plenty of interesting companies I have not covered such as MCObject, db4o, Pervasive, OpenLINK, NeoTool, and perhaps even Progress. As time permits I might revisit this writeup to include them, and perhaps even do a nice boxes and arrows schema, as good strategy analysis always seems to demand!

dbMotion

similar to: partially, Intersystems

dbMotion presents itself as a SOA-based platform enabling medical information interoperability and HIE. It is made up of several layers (from data integration, the lowest, to presentation, the highest) which are tied together and present to the exterior a 'unified medical schema', a patient-centric medical record. A business layer does data aggregation.

There are also a few other components such as shared services (which deals with, among others, unique patient identification). UMS is based on HL7 V3 Reference Information Model. Other features include custom views into the data, data-triggered events, and an EMR gateway.

As I understand it, without having seen it in an actual deployment, dbMotion's offering is similar to Intersystems' Ensemble, without the underlying infrastructure (no Cache included, it relies on the user's database), but with the HealthShare component (so it offers healthcare-specific application infrastructure, whereas Intersystems' offerings are more segmented). What would be the benefit, compared to Ensemble? It does not take a whole Cache installation so it might (?) be cheaper, and the dev skills might be more widespread; it also is more mainstream-RAD. It seems to be a solution for patching together an existing infrastructure, whether my feeling about Ensemble is that it would perhaps work best with a brand new setup.

Interestingly enough, dbMotion is developed using the Microsoft stack, and the company is in fact a Microsoft partner.

What I don't quite get from the description is how does HL7 interfacing work with dbMotion - UMS is (perhaps logically) based on the (XML-based) HL7.v3 RIM, but is there a conversion mechanism to the other versions? How about v2 endpoints?

Oracle

similar to: IBM, Microsoft

As far as I can tell, other than platform offerings, Oracle's only specific healthcare product is Transaction Base, a IHE solution. While the full spec is here, my initial assessment is that it would make sense in an environment with an already significant Oracle investment. There is a life sciences product as well (Argus Safety Suite) which I believe Oracle just purchased; the other life sciences product is Clinical Data Management which deals with managing clinical trials data.

Interesting, but apparently not as exhaustive as some of the other products discussed here.

Microsoft

similar to: Intersystems, Oracle/IBM

Through acquisitions, Microsoft has built an impressive array of offerings in the healthcare space:

HIS/PACS-RIS
LifeSciences
Unified Intelligence System

HIS is pretty clear - direct competition to the Intersystems TrakCare discussed here.

UIS is a data aggregator and is somewhat similar to dbMotion and Ensemble. It integrates with HealthVault as an EMR solution.

LifeSciences is similar to Oracle and IBM offerings in that it is a suprastructure built on an existing pure technology platform that is targeted at the needs of life sciences.

Same as Oracle and IBM, Microsoft has arrived at the healthcare apps arena from the pure tech extreme - leveraging a platform into a specific vertical, quite the opposite of Intersystems, which started with an industry-specific application which it then moved (more or less) downstream as a general-purpose platform.

FairCom C-Tree

similar to: SQLite

FairCom is not an illogical choice to follow InterSystems; both companies' databases claim to be among the fastest on the market.

Also, both are "developers'" platforms, designed less with a general-purpose audience in mind and more with a techie audience. Both originate from successful companies that have been in business for a long time, and yet are not so well known outside tech circles.

So what are the differences?

What most people like about FairCom cTree is the access they get to the source code, which allows them to interact with the database through various interfaces, native, ADO, ODBC, etc. I guess that this is also possible with mySQL, SQLite, and perhaps PostGreSQL as well. FairCom predates (or is a contemporary) of most of these products.

Where FairCom differs from Intersystems is that its product is even less open, the cTree Ace SQLExplorer tool notwithstanding. It takes minimal admin effort and it seems targeted at turnkey or embedded systems developers, with its heavy access on C-application layer programming. You can certainly access cTree from C#, but the product is written in C and has a C developer audience in mind first; if performance is its main selling point (which makes sense: connecting from a JVM through a JDBC/ODBC bridge to, say, a remote Cache gateway which will in turn translate the code to native requests is probably akin to entering virtual machine hell), then staying close to the core system is compulsory. More on performance later.

Another thing that Cache and C-Tree have in common (but where they also differ) is that they provide different "views" into the database engine: hierarchical/sparse arrays/B-Trees in the case of Cache, C-Trees with ISAM and SQL interfaces in the case of C-Tree. Relational databases are based on, if memory serves, B-Trees (or B+ trees). However, SQL Server for example, keeps the relational engine very close to the B-Tree structure (time to review those Kalen Delaney books); in fact, I found the whole interaction between the set-based SQL and row-based processing engine quite fascinating.

Both Cache and C-Tree take a slightly different approach; the various interfaces into their storage engines are clearly provided for convenience only; back in the day, as far as I recall, Db-Lib was the library of choice for SQL Server as well (makes you wonder where does TDS live now?) The bottom line is that if you are going to use Cache or C-Tree, you should use the native interfaces; there is no other reason why you would choose C-Tree over a mainstream product such as SQL Server or Oracle, or even mySQL.

C-Tree uses ISAM as its innermost data structure; this harkens back to the mainframe days, and what it means is is that data is accessed directly through indexes, as opposed to allowing the query optimizer to decide which indexes to use (for a relational database).

As per Wikipedia, ISAM data is fixed-length. Indexes are stored in a separate tables and not in the leaves of data tables. MySQL functions on the same principle. A relational mechanism can exist on top of the ISAM structures. A more detailed presentation of the technicalities of working with the system can be found here.

You can see more details of the structure here - how each table corresponds to a data/index file pair.

The reason I am likening it to SQLite is that it is a niche product that caters to a well-defined group: developers of embedded or turnkey systems (which is not dissimilar to who SQLite targets - remember that it is the db of choice for iTunes and Adobe AIR).

Middleware, continued (Intersystems HealthShare/TrackCare)

HealthShare is an extension of Caché and Ensemble in the healthcare vertical. It is called a 'health information exchange', and I am going to try to see if this matches what HealthVault and GoogleHealth are attempting to be; terminology can be tricky. 'EHR' comes up frequently when describing HealthShare, but is it just an EHR data platform? Or a HIS?

A consortium called IHE exists, affiliated with HIMSS, which attempts to establish interconnectivity standards for the healthcare IT industry. It documents how established standards (DICOM, HL7) should be used to exchange information between clinical applications. HealthShare operates along those lines.

IHE does things such as this:

A cardiology use case

Or as this:

Actors and transactions involved in an electrocardiography IHE "profile"

(images from the IHE Cardiology Profile documentation)

HealthShare organizes data from clinical providers and makes it accessible to clinicians via a web browser. Although it does store some data locally and performs some data transformations, essentially it is a repository of clinical data from one/multiple providers. A similar product that I can think of is Amalga UIS. Some of the components first introduced in Ensemble (gateways) are in this case used to provide connectivity to the various clinical information sources. HealthVault would be the equivalent of the HealthShare Edge Cache Repository, a store of shared data defined at each clinical data provider's level.

Another component is the Hub, developed in Ensemble, which connects all the data sources together and among others performs patient identification - something which I am too familiar with. I am curious how the Hub is updated (event-based, day-end process?)

Edge Cache can replicate some or all of the clinical data from the original sources. At the minimum, it requests data through the gateways of the original sources, at the request of the Hub. It therefore serves another role that I am quite familiar with, that of a backup system for the HIS or practice management system.

(image from the official HealthShare docs)

TrackCare is a web-based HIS; (un?) surprisingly, just like Amalga, it is not available in the US. It covers both financial and clinical apps. It is built on top of Ensemble. Since it is a full-fledged HIS, its description is beyond the scope of this post, but can be found here.

The whole Intersystems portfolio of applications can be depicted as follows:

I will try to use this model when dealing with other vendors as well.

A few concluding remarks:

this is an integrated stack; you just need the OS and it gives you a storage system, an application development environment
however, the app dev environment isn't for the faint of heart, the VB-inspired offering notwithstanding; and some of the other languages offered are somewhat unusual by today's standards - but this is a throwback to the system's 60's roots; it must perform quite well in fact, since it has not gone the way of COBOL! (anyone really uses Object-COBOL?)
the above makes it less known than, say, MS Visual Studio - but the environment is in fact targeted at specialized business developers and not at a mass audience
in the verticals that it targets (healthcare, finance) it seems to do quite well - Intersystems, the flagbearer for MUMPS, has been in business for over 3 decades
my question would be why there isn't an offering for finance (similar to the healthcare solutions) - perhaps the industry is much more fragmented than healthcare?
so the vendor's strategy in this case (Intersystems) is to offer a platform, a development environment, and a foray into an industry vertical. I am not sure which came first (apparently, all at the same time! if you read the history behind MUMPS), while, as we will see, other vendors' route has been different.

Thursday, July 30, 2009

Middleware, continued (Intersystems Ensemble)

Incidentally, I started a project that will test different databases' performance. The (yet, incomplete) version is here, and the C# class that specifically deals with Caché interaction is here (a subclass of the main database class). The I/O ops are very simple, only dealing with 2 fields.

Ensemble is a RAD platform; it allows users to create workflows, RIA's, and rule-based logic (hence, it can work as an interface engine). It contains an application server, workflow server, document server all in one - not surprising, given the Caché platform's own relatively wide array of offerings, on which Ensemble is based. As far as I can tell without having seen the product, it is really a set of extensions built in the Caché environment to provide messaging, workflow, and portal services, with some industry-specific features such as HL7/EDI, and endpoints for BPEL applications, database access, and other EAI connectors. Ensemble also offers data (SSIS-style? not too difficult to understand, and resulting in federated databases, as already implemented in the Caché application server external database gateway) and object transformation (Java --> .NET ORBing? I am not sure how is this done, I assume through instantiating VM's for each of the supported platforms and performing marshaling between the objects).

I assume that messaging is implemented in the Caché application server - not entirely different from the original MSMQ.

As far as RAD capabilities (as so far I have mostly talked about infrastructure), Ensemble offers some graphical code generators for BPM; I am assuming it also supports the Caché development environment (ObjectScript, MVBasic and Basic).

In Microsoft terms, Ensemble is basically VS + SQLServer + SSIS + WCF + WWF + BPEL parser + BizTalk + customizations. In bold, the middleware stack.

On closer inspection, it appears that the inter-VM object conversion is in fact introspection- and proxy-based instantiation of .NET objects which are made available by Ensemble to Caché's native VM's. Ensemble runs a .NET VM which can execute .NET objects natively through worker threads. I am curious if this requires a Windows Server to be available at runtime - not sure how distributed can the Ensemble installation be.

Middleware (Intersystems Cache)

...as in connected systems, the original title to which this blog just reverted. I just noticed lately that there has been a bewildering proliferation of offerings in this space, some having to do with the Cloud, some having to do with verticals such as health care. So I will try to make sense of some of these things next.

Intersystems offers a database platform (Caché), a RAD toolset (Ensemble), a BI system (DeepSee), and two products specifically targeted at healthcare, an information exchange platform (HealthShare) and a web based IS (TrakCare). So if I was to put everything in a matrix comparing different vendors, it would almost have to be a 3D one - one dimension would have to cover the platform, and another (the depth) would have to cover the vertical (healthcare), as for example, Microsoft offers both the platform and the vertical app.

Caché is a combination of several products, some of which originate in MUMPS, which is a healthcare-specific programming language developed in the 1960's. MUMPS used hierarchical databases and was an underpinning of some of the earliest HIS developments (Wikipedia is our friend); at some point it ran on PDP-11, which incidentally was the first computer I did ever see.

It makes one wonder what would have happened had MUMPS became the database standard as opposed to what would become Oracle, as MUMPS predates R2 (and C, by the way). But the close connection between the language and the database, which might strike some today as strange, goes back to its origins.

Caché's performance stems from its sparse record architecture, and from its hierarchical (always-sorted) structure.

Caché has been modernized to provide a ODBC-compliant interface (and derivatives: ADO.NET) and an object-oriented 'view' of the data and functionality embedded in MUMPS (ObjectScript). The development environment also offers a BASIC-type of programming language and a website construction toolkit, quite a lot for one standalone package.

It seems that Caché is a document-oriented database, which would make it similar to a XML database in some ways - the main 'entities' are arrays in one case, nodes in the other as opposed to relational tables.

At the same time, for a hierarchical database, Intersystems somewhat confusingly portrays it as an "object" database, which is probably not technically incorrect, since one of the views of the data is "object"-based as I mentioned above.

Creating a class in Caché also creates a table accessible in SQL (via the database interfaces, or through the System Management Portal). The table has a few additional fields on top of the class' properties - an ID and a class name (used for reflection, I assume). The System Management Portal also provides a way to execute SQL statements against the database, although at first sight I cannot seem to create a new data source in Visual Studio - and have to access the data programmatically.

One of the ways using the database from Microsoft Visual Studio requires the use of a stub creator app - CacheNetWizard, which failed every time I tried to use it. The other is to use the Caché ADO.NET provider:

command = new CacheCommand(sql, cnCache);
CacheDataReader reader = command.ExecuteReader();
while (reader.Read())
{
noRecsRead++;
if (noRecs > 0 && noRecsRead >= noRecs)
break;
}

Running a large operation (a DELETE, in this case) from one client seems to spawn multiple CACHE.EXE processes.

There are several ways of exporting data from Caché - exporting classes, which only exports the class definition (in fact, the table definition) and exporting the table itself to text, which exports the contents.

The multidimensional array view of Caché reminds me somewhat of dictionary and array types in languages such as Ruby and Python, while the untyped data elements are also used in SQLite. Arrays can be embedded together to provide a sort of materialized view (in SQL terms) in effect.

Ultimately, the gateway to Caché's hierarchical engine is the Application Server, which takes care of the virtual machines for each of the supported languages, of the SQL interface, of the object views, and of the web/SOAP/XML accessibility, as well as providing communication mechanisms with other Caché instances and other databases (via ODBC and JDBC). The VM's access Caché tables as if they were variables.

When it comes to languages, Caché offers a BASIC variant and ObjectScript. The BASIC can be accessed from the (integrated) Studio (used for writing code) or from a (DOS-based) Terminal (used for launching code). It operates within the defined namespaces (class namespaces or table schemas). A difference from other variants of the language, which is due to the tight connection with the Caché engine, is the presence of disk-stored "global" variables, whose name is prefixed by ^; BASIC function names are actually global variables. Another difference is the presence of multidimensional arrays, similar to Python or Ruby, but which in this case are closely related to the Caché database engine (to which they are a core native feature - hierarchical databases' tables are ordered B-Trees, and these B-Trees provide the actual implementation of arrays; the SQL "tables" and OO "classes" are just views into these B-Trees/arrays); they do not have to be declared.

The array "index" is nothing else than a notation for the key of the node of the B-Tree. Non-numeric indexes are therefore possible.

Architecturally, I would be curious to know if these trees are always stored on disk, or they are cached in memory and some lazy-writer process at some point commits them to disk.

The image above - which I stole from the official docs, and modified - shows the structure of the tree; 4 is a value that the official example stored in all nodes, but any value in any node can be anything.

It can be seen that this "array" implementation actually does not need the d1 x d2 * d3 * ... dn storage for a n-dimension array.

This lack of structure allows for small size but it also can create problems at run time, especially if the consumer of the array and the producer are different; the consumer might not be aware of all the indexes/dimensions of the array. A function exists, traverse(), which can be called recursively to yield all existing subscripts.

If called with the same number of arguments, traverse() does a sibling search. An increase of the number in arguments will make it go down one level; an empty argument will yield the first index of the child (quite naturally, since you don't know what that might be at runtime). However I am still not sure how you can fully discover an array with a potentially unlimited number of dimensions, so the application must enforce at least some structure to the arrays/tables.

Now that the actual storage is better understood, it is interesting to see how these features show up in the table/class structure. What is the mechanism that allows for arbitrary indices to pop up at runtime?

A ^global variable is a persistent object of a class and a row in a SQL table; the latter are OO/relational "views" of the B-Tree/array. To answer a question from above, instantiating a new object creates it in memory; opening it (via its ID property) loads it in memory from disk. It is important to understand that an object is a row in a table. This is a sub-structure of the tree/array, e.g. ^SALARY("Julian", 36, 8) = 125000.75: ^SALARY is the entire structure, and ^SALARY("Scott") represents a different person's salary, and a different row in the table.

Does the tree's dynamic indexing means that classes are effectively dynamic as well and can be changed at runtime? Not really. Neither does the SQL table structure change to reflect changes in the underlying array.

As it can be seen, the value of the global (^USER) is a pointer to the index of the first element, which also is the $ID column of that row.

Interestingly, adding a ^USERD(2, "Dummy") creates an empty record in the table, and adding a ^USERD(2) actually populates the record. So the second level in the ^USERD(2) does not actually show in the table at all. Is this child the next table in the hierarchy?

Mapping the other concepts, the class' package does become the database schema. Creating a table or a class does not instantiate the ^global (array), that only happens when data populates the array. The array's name becomes package.classNameD.

ObjectScript is another language supported by Caché. It is available from the Terminal (one of the ways of interacting with Caché, besides the Studio and the System Management Portal), where you can directly issue ObjectScript commands - you use ObjectScript to launch Basic routines stored in the system. Commands can be abbreviated, which unfortunately makes for unreadable code, as the MUMPS example at Wikipedia shows (it compiled fine in Studio!).

ObjectScript is also an untyped language, allowing for interesting effects such as this:

> set x = 2 --> 2
> set x = 2 + "1a" --> 3, since "1a" is interpreted as 1

System routine names are preceded by %, and routine names are always preceded by ^ as they are globals. Routines can be called from specific (tagged) entry points by executing DO tag^routine. The language is case- AND space-sensitive.

Creating a class also creates ObjectScript routines, which, as far as I can tell, deal with the database persisting operations of the class. for allows for argument list iteration, (similar to Ruby?). It supports regular expressions (through the ? pattern), a fairly robust support for lists, and an interesting type named bit-string (similar to BCD?).

Routines are saved with the .mac extension.

Creating a ^global variable in ObjectScript in Terminal makes it visible in the System Management Portal under "Globals". However, this does not create a table available in SQL.

"Writing" a global only renders that particular node, e.g. ^z is not the same as ^z(1) (the zwrite command does that). However, killing ^z removes the whole tree.

It can be seen that, not unlike with XML (node values vs. attributes), data can be stored in nodes (^global(subscript) = value), or in the subscripts themselves.

There are a couple of handy packages that let you run Oracle/SQLServer DDL to create Caché tables.

There is a lot more about the OO/Relational features of Caché that I have not covered; e.g., it is possible to create objects with hierarchies in ObjectScript, or have array properties of classes, that become flattened tables or related tables in SQL. More details here, with the caveat that Reference properties appear a referential integrity mechanism of sorts which could perhaps have been implemented more "relationally" through foreign keys (supported by Caché, but Caché SQL also supports a pointer dereferencing type of notation, e.g. SELECT User->Name; I am not sure how useful that is since most SQL is actually generated by reporting tools - and I don't think Crystal Reports can generate this Caché-specific SQL; I might be wrong, perhaps this is dealt with in the ODBC/ADO.NET layer).

More on MUMPS' hierarchical legacy here. On OO, XML, hierarchical (and even relational!) databases, here.

This is just a brief overview of several aspects of the Caché platform. Next I will go over the rest of Intersystems' offerings.

Thursday, July 23, 2009

Open Source, Cloud-based Approach to Describing Solution Architectures

Mike Walker discusses in a recent issue of the Microsoft Architecture Journal a set of tools that can be used to document solution architectures - based, not surprisingly, on Microsoft tools. Together, these make up the Enterprise Architecture Toolkit.

Since I don't have a Windows Server to run Sharepoint (I could, presumably, use Azure), I came up with a similar application setup using open source or cloud-based tools:

The only thing that needs to be built is the manager ("gateway", in the chart above) which can be a RIA application whose role is to tie everything together. Sounds simple enough?

Sunday, July 19, 2009

Mobile EHR

It makes sense for mobile carriers to get involved in the EHR arena. However, the way I see this done is via cloud-stored EHR info that is accessible through a handset; how else would you carry the record should you decide to move to another carrier?

I still think it is far fetched for a mobile carrier to roll out an entire HIS application though. There are so many verticals (all, practically) that make use of mobile communications one way or another, should mobile communications providers create solutions for everything?

And a 'global' mEHR, while a nice idea indeed, I think will be always hindered by competing standards and lack of acceptance - after all, even the mobile infrastructure worldwide is fragmented, CDMA, GPS, etc. Why would the application layer be any different?

Worth keeping an eye on though.

Wednesday, July 15, 2009

Google Maps knows where you are

This is way cool: if you connect to the Internet using WiFi, Google Maps 'knows' where you are and shows your location by default.

Slowly it is all coming together - the 'cloud' means that you can keep your data (and processes!) in one place, and you can access it (via WiFi) from anywhere, even using a lightweight client. Also both the client and the cloud backend 'know' where you are so functionality can be tailored to the time/location.

I'm not sure how much computing power is needed on the (portable) client - probably, only enough for rich media rendering. Other than specialized applications, most that an average user really needs should be easily done using a client that combines media/communication/lightweight computing services. I don't think iPhone is there yet (as the all-purpose 'client'), but perhaps a combination of iPhone and Kindle, three versions from now, might become just that.

Tuesday, July 14, 2009

XDb

Documentum/EMC offers a XML database named XDb. XProc is in fact designed to work with XDb.

Perhaps XML database is a misnomer. It really is a way of storing XML documents, without (apparently) enforcing any relational integrity constraints other than those defined by the DTD (and perhaps XLink, athough so far I don't know if XLink is declarative only). Therefore XDb and XProc work hand in hand, one allowing for the storage of XML documents, the other allowing for manipulation of those documents (and perhaps, in-place updates).

The logical design is therefore done at a superior level. The 'database' concept appears to function when various stored documents are manipulated as sets - XDb supports XQuery (preferred), also XPath and XPointer.

Each XML document is stored as a DOM.Document and can be manipulated using the standard methods (createAttribute, createTextNode, etc).

I can see a possible usage in, for example, GoogleHealth - where XDb would store well-formatted templates for charts, diagnoses, allergies, vaccines, etc, which would be populated for each patient encounter and loaded into GH.

While in normal usage write contention should not be an issue, I am curious how does XDb deal with document versioning and multiple writes against the same documents - or is the R/W pipe single throttled? (later - here it is - clicking Refresh in the Db manager while an update process was underway yielded the following error:)

Interesting XML database reference information here.

Sunday, July 12, 2009

XProc

Documentum's XProc XDesigner - a first step towards I see as a full online development environment, although this is more similar to Yahoo Pipes. The technology is there (web-based GUI + cloud for compilation and even possibly for deployment), I think it's only a matter of finding a way of monetizing it by tool developers. Is Microsoft really making money on Visual Studio though?

Wednesday, July 08, 2009

Over mashed-up

It occurred to me that I started this blog to, well, blog about my thoughts on various aspects of computing. A while ago though, this became my testing ground for various mashups, widgets, embedded code, and so on - mainly because ~~Blogspot~~ Blogger allows for all kinds of code to be inserted, which Wordpress (free hosted Wordpress, that is) doesn't. Anyway, this doesn't make for a nice reading experience, so perhaps it is time I should refocus on writing and move the coding elsewhere.

An interesting experiment

And worth reading... if I can find the time.

FREE (full book) by Chris Anderson

Tuesday, July 07, 2009

Platform Convertor Strategy Analysis

A work in progress, a consulting project that discusses the positioning of a platform convertor.

Platform Convertor Strategy Analysis

View more documents from Razvan julian Petrescu.

Sunday, July 05, 2009

Visualization, again

Visualization of data seems to be the new new wave of BI. Already mentioned IBM's offering a few months ago, but are quite a few other players in this space, startups with interesting products, such as TableauDesktop, Omniscope, and even SAP has a product (Xcelsius), or research institutions (such as Stanford with Protovis).

What can I say: Tufte meets SQL. And perhaps Processing should get in the game - surprised I haven't seen any rich visualization libraries for it - yet.

Thursday, June 18, 2009

Modeling

The RUP (rational unified process) is very nice, and so is UML. For smaller projects though, the following will do:

I would really like to know how much code is written according to diagrams. The mental image that programmers have of a problem's universe is a fascinating topic indeed - and far reaching, since how a software system works determines, ultimately, how a user has to work to accommodate the system.

Wednesday, June 17, 2009

HealthVault

Yes it does have an API, and some sample apps. The .NET samples include a web application to talk to the service - however, I give up on it for the time being as the utility to make certificates seems to crash all the time (nice unhandled error, by the way; the crash seems related to the fact that the app is installed in Program Files as opposed to Documents, and Visual Studio doesn't have full rights to PF). Will come back to it later, but so far it is remarkably similar to Google Health.

One additional thing, the SDK features some device drivers to enable medical devices to talk directly to HV. Nice - as long as they don't cause any crashes...

Tuesday, June 16, 2009

AIR and Google Health

Recently I've been tinkering with AIR and Google Health (GH). It's been surprisingly easy, if one can overlook the endless stream of XML returned by GH - but there is no other way, HL7 would be just as nasty looking. I don't know yet how it returns the file/image data that can be attached to the GH account.

AIR seems an ideal environment to build a desktop client to front a GH cloud-based application: it's lightweight, Javascript-compatible, and portable across platforms.

Speaking of, it seems that AIR will be ported to mobiles as well. I would argue that the paragraph above (and not just the stronger OO features found in Flash and available to AIR) is a strong reason to do this port, although I am not sure how easy is to develop and maintain AIR applications, and also I am not sure how well do these applications perform given the several layers of virtual environments they have to execute in.

Will write more thoughts as soon as I finish the small scale project I am working on right now, 4-5 forms of reduced complexity (but with a significant amount of functionality built in; the underpinnings of this relatively simple project are amazingly complex and would have been hard to imagine only a few years ago).

I haven't looked at Google Tables yet, but read some things about YQL and can see a scenario where medical (and other) personal information (e.g. reverse phone lookups, credit history, white and yellow pages) is queriable over the web via a SQL-type of language with the right security in place. In fact, the infrastructure already exists! So it would be just a mater of connecting the pipes.

And, I haven't even started to look at HealthVault's API (if it has one).

Wednesday, June 10, 2009

A map of where I've been

Monday, June 01, 2009

LinkedIn widget

Donkey Kong

I never really played this game in the 80's - it seemed to be available only on computers I did not own, such as the C64. I can, finally - will someone make a Sentinel widgety game available please?

Donkey Konggreat free widgetsWidgetbox

Whatever one thinks of video games, I find it amazing that today you can run what was once a significant programming effort in a 'virtual' OS through several layers of interpreted code (widget > flash > browser > OS process > ...). I wonder how similar is the machine code ultimately generated on the OS to the machine code of the original program :)

Tuesday, April 14, 2009

Slideshare

Some of the academic presentations I have worked on can be found here.

LiveEarth

Sunday, April 05, 2009

More tweets visualizations

For quite some time now I have been finding visualizations cool. There is a whole list of blogs and web sites dedicated to this rather obscure area of - computing? Web 2.0? It's an Edward Tufte-meets-open source type of thing... and even (SF) author Bruce Sterling is in on the game. And now, even IBM: they too are visualizing tweets (real-time Internet seems to be the in thing now). Still unsure about the usefulness of it all, but it makes for nice mind-map-like charting.

Thursday, March 26, 2009

Visualizing tweets live

I'm not sure how useful it is, but it's certainly cool: Twittervision.

Sunday, March 15, 2009

Virtual Worlds in Asia

Check out this SlideShare Presentation: