Friday, July 31, 2009
Competition in high technology
I think this starts with the fact that the "IT industry" is in fact a multiple-headed beast, since so many other industries use it. So defining the industry in which these players compete is difficult in itself.
So basically some companies started in an industry vertical (Intersystems - healthcare) where they built a complete stack which then they exported to other verticals (finance for Intersystems), or to the "center", becoming integrated players (Cache is portraying itself as a general purpose "post-relational" database; my guess is that this moniker is an attempt to rebrand it as a mainstream competitor, kind of reframing the "hierarchical database with roots in 60's healthcare software" description; BTW, there is nothing wrong with this description, and I think it is a cool product with remarkable performance characteristics).
The challenge in this case is convincing a mainstream audience that a niche product originating in a vertical is indeed a viable proposition. Tough, especially since the ecosystem (e.g., reporting tools) is built around standards that for example Cache works around (e.g., the SQL "pointers").
Secondly, there are the pure vertical players (which I haven't really talked much about here such as Siemens and GE). They built their applications portfolios by acquisitions (so perhaps dbMotion is a potential acquisition target?) but they rely on the mainstream vendors from the "center" for the base technology (perhaps; e.g., Siemens uses MS SQL as the db engine for its HIS, but Epic uses Cache).
Then, there are the mainstream technology companies which are trying to move from the center (pure database platform) into verticals (Amalga). At this point they are obviously encroaching on the vertical vendors territory, be they pure vertical players or integrated players. How will companies compete on one segment while collaborating on others remains to be seen (vertical industry offering competition, collaboration at the platform level).
Fourth, there are niche players (dbMotion, SQLite, FairCom) which operate either in the vertical or in the center, but offering solutions appropriate for a specific vertical (e.g. FairCom having found a reasonalby comfortable place as the engine of choice for turnkey systems). As mentioned already, I would guess that at some point dbMotion's PE backers (if there are any) would be looking for an exit in the guise of a purchase by a vendor, either in the mainstream/center (more likely) or in the vertical, while SQLite or FairCom are likely, due to their more general (albeit niche) appeal, to survive on their own.
There are plenty of interesting companies I have not covered such as MCObject, db4o, Pervasive, OpenLINK, NeoTool, and perhaps even Progress. As time permits I might revisit this writeup to include them, and perhaps even do a nice boxes and arrows schema, as good strategy analysis always seems to demand!
dbMotion
- similar to: partially, Intersystems
dbMotion presents itself as a SOA-based platform enabling medical information interoperability and HIE. It is made up of several layers (from data integration, the lowest, to presentation, the highest) which are tied together and present to the exterior a 'unified medical schema', a patient-centric medical record. A business layer does data aggregation.
There are also a few other components such as shared services (which deals with, among others, unique patient identification). UMS is based on HL7 V3 Reference Information Model. Other features include custom views into the data, data-triggered events, and an EMR gateway.
As I understand it, without having seen it in an actual deployment, dbMotion's offering is similar to Intersystems' Ensemble, without the underlying infrastructure (no Cache included, it relies on the user's database), but with the HealthShare component (so it offers healthcare-specific application infrastructure, whereas Intersystems' offerings are more segmented). What would be the benefit, compared to Ensemble? It does not take a whole Cache installation so it might (?) be cheaper, and the dev skills might be more widespread; it also is more mainstream-RAD. It seems to be a solution for patching together an existing infrastructure, whether my feeling about Ensemble is that it would perhaps work best with a brand new setup.
Interestingly enough, dbMotion is developed using the Microsoft stack, and the company is in fact a Microsoft partner.What I don't quite get from the description is how does HL7 interfacing work with dbMotion - UMS is (perhaps logically) based on the (XML-based) HL7.v3 RIM, but is there a conversion mechanism to the other versions? How about v2 endpoints?
Oracle
- similar to: IBM, Microsoft
As far as I can tell, other than platform offerings, Oracle's only specific healthcare product is Transaction Base, a IHE solution. While the full spec is here, my initial assessment is that it would make sense in an environment with an already significant Oracle investment. There is a life sciences product as well (Argus Safety Suite) which I believe Oracle just purchased; the other life sciences product is Clinical Data Management which deals with managing clinical trials data.
Interesting, but apparently not as exhaustive as some of the other products discussed here.
Microsoft
- similar to: Intersystems, Oracle/IBM
Through acquisitions, Microsoft has built an impressive array of offerings in the healthcare space:
- HIS/PACS-RIS
- LifeSciences
- Unified Intelligence System
HIS is pretty clear - direct competition to the Intersystems TrakCare discussed here.
UIS is a data aggregator and is somewhat similar to dbMotion and Ensemble. It integrates with HealthVault as an EMR solution.
LifeSciences is similar to Oracle and IBM offerings in that it is a suprastructure built on an existing pure technology platform that is targeted at the needs of life sciences.
Same as Oracle and IBM, Microsoft has arrived at the healthcare apps arena from the pure tech extreme - leveraging a platform into a specific vertical, quite the opposite of Intersystems, which started with an industry-specific application which it then moved (more or less) downstream as a general-purpose platform.
FairCom C-Tree
- similar to: SQLite
FairCom is not an illogical choice to follow InterSystems; both companies' databases claim to be among the fastest on the market.
Also, both are "developers'" platforms, designed less with a general-purpose audience in mind and more with a techie audience. Both originate from successful companies that have been in business for a long time, and yet are not so well known outside tech circles.
So what are the differences?
What most people like about FairCom cTree is the access they get to the source code, which allows them to interact with the database through various interfaces, native, ADO, ODBC, etc. I guess that this is also possible with mySQL, SQLite, and perhaps PostGreSQL as well. FairCom predates (or is a contemporary) of most of these products.
Where FairCom differs from Intersystems is that its product is even less open, the cTree Ace SQLExplorer tool notwithstanding. It takes minimal admin effort and it seems targeted at turnkey or embedded systems developers, with its heavy access on C-application layer programming. You can certainly access cTree from C#, but the product is written in C and has a C developer audience in mind first; if performance is its main selling point (which makes sense: connecting from a JVM through a JDBC/ODBC bridge to, say, a remote Cache gateway which will in turn translate the code to native requests is probably akin to entering virtual machine hell), then staying close to the core system is compulsory. More on performance later.
Another thing that Cache and C-Tree have in common (but where they also differ) is that they provide different "views" into the database engine: hierarchical/sparse arrays/B-Trees in the case of Cache, C-Trees with ISAM and SQL interfaces in the case of C-Tree. Relational databases are based on, if memory serves, B-Trees (or B+ trees). However, SQL Server for example, keeps the relational engine very close to the B-Tree structure (time to review those Kalen Delaney books); in fact, I found the whole interaction between the set-based SQL and row-based processing engine quite fascinating.
Both Cache and C-Tree take a slightly different approach; the various interfaces into their storage engines are clearly provided for convenience only; back in the day, as far as I recall, Db-Lib was the library of choice for SQL Server as well (makes you wonder where does TDS live now?) The bottom line is that if you are going to use Cache or C-Tree, you should use the native interfaces; there is no other reason why you would choose C-Tree over a mainstream product such as SQL Server or Oracle, or even mySQL.
C-Tree uses ISAM as its innermost data structure; this harkens back to the mainframe days, and what it means is is that data is accessed directly through indexes, as opposed to allowing the query optimizer to decide which indexes to use (for a relational database).
As per Wikipedia, ISAM data is fixed-length. Indexes are stored in a separate tables and not in the leaves of data tables. MySQL functions on the same principle. A relational mechanism can exist on top of the ISAM structures. A more detailed presentation of the technicalities of working with the system can be found here.
You can see more details of the structure here - how each table corresponds to a data/index file pair.
The reason I am likening it to SQLite is that it is a niche product that caters to a well-defined group: developers of embedded or turnkey systems (which is not dissimilar to who SQLite targets - remember that it is the db of choice for iTunes and Adobe AIR).
Middleware, continued (Intersystems HealthShare/TrackCare)
A consortium called IHE exists, affiliated with HIMSS, which attempts to establish interconnectivity standards for the healthcare IT industry. It documents how established standards (DICOM, HL7) should be used to exchange information between clinical applications. HealthShare operates along those lines.
IHE does things such as this:
Or as this:
HealthShare organizes data from clinical providers and makes it accessible to clinicians via a web browser. Although it does store some data locally and performs some data transformations, essentially it is a repository of clinical data from one/multiple providers. A similar product that I can think of is Amalga UIS. Some of the components first introduced in Ensemble (gateways) are in this case used to provide connectivity to the various clinical information sources. HealthVault would be the equivalent of the HealthShare Edge Cache Repository, a store of shared data defined at each clinical data provider's level.
Another component is the Hub, developed in Ensemble, which connects all the data sources together and among others performs patient identification - something which I am too familiar with. I am curious how the Hub is updated (event-based, day-end process?)
Edge Cache can replicate some or all of the clinical data from the original sources. At the minimum, it requests data through the gateways of the original sources, at the request of the Hub. It therefore serves another role that I am quite familiar with, that of a backup system for the HIS or practice management system.
TrackCare is a web-based HIS; (un?) surprisingly, just like Amalga, it is not available in the US. It covers both financial and clinical apps. It is built on top of Ensemble. Since it is a full-fledged HIS, its description is beyond the scope of this post, but can be found here.
The whole Intersystems portfolio of applications can be depicted as follows:
I will try to use this model when dealing with other vendors as well.
A few concluding remarks:
- this is an integrated stack; you just need the OS and it gives you a storage system, an application development environment
- however, the app dev environment isn't for the faint of heart, the VB-inspired offering notwithstanding; and some of the other languages offered are somewhat unusual by today's standards - but this is a throwback to the system's 60's roots; it must perform quite well in fact, since it has not gone the way of COBOL! (anyone really uses Object-COBOL?)
- the above makes it less known than, say, MS Visual Studio - but the environment is in fact targeted at specialized business developers and not at a mass audience
- in the verticals that it targets (healthcare, finance) it seems to do quite well - Intersystems, the flagbearer for MUMPS, has been in business for over 3 decades
- my question would be why there isn't an offering for finance (similar to the healthcare solutions) - perhaps the industry is much more fragmented than healthcare?
- so the vendor's strategy in this case (Intersystems) is to offer a platform, a development environment, and a foray into an industry vertical. I am not sure which came first (apparently, all at the same time! if you read the history behind MUMPS), while, as we will see, other vendors' route has been different.
Thursday, July 30, 2009
Middleware, continued (Intersystems Ensemble)
Ensemble is a RAD platform; it allows users to create workflows, RIA's, and rule-based logic (hence, it can work as an interface engine). It contains an application server, workflow server, document server all in one - not surprising, given the Caché platform's own relatively wide array of offerings, on which Ensemble is based. As far as I can tell without having seen the product, it is really a set of extensions built in the Caché environment to provide messaging, workflow, and portal services, with some industry-specific features such as HL7/EDI, and endpoints for BPEL applications, database access, and other EAI connectors. Ensemble also offers data (SSIS-style? not too difficult to understand, and resulting in federated databases, as already implemented in the Caché application server external database gateway) and object transformation (Java --> .NET ORBing? I am not sure how is this done, I assume through instantiating VM's for each of the supported platforms and performing marshaling between the objects).
I assume that messaging is implemented in the Caché application server - not entirely different from the original MSMQ.
As far as RAD capabilities (as so far I have mostly talked about infrastructure), Ensemble offers some graphical code generators for BPM; I am assuming it also supports the Caché development environment (ObjectScript, MVBasic and Basic).
In Microsoft terms, Ensemble is basically VS + SQLServer + SSIS + WCF + WWF + BPEL parser + BizTalk + customizations. In bold, the middleware stack.
On closer inspection, it appears that the inter-VM object conversion is in fact introspection- and proxy-based instantiation of .NET objects which are made available by Ensemble to Caché's native VM's. Ensemble runs a .NET VM which can execute .NET objects natively through worker threads. I am curious if this requires a Windows Server to be available at runtime - not sure how distributed can the Ensemble installation be.
Middleware (Intersystems Cache)
Intersystems offers a database platform (Caché), a RAD toolset (Ensemble), a BI system (DeepSee), and two products specifically targeted at healthcare, an information exchange platform (HealthShare) and a web based IS (TrakCare). So if I was to put everything in a matrix comparing different vendors, it would almost have to be a 3D one - one dimension would have to cover the platform, and another (the depth) would have to cover the vertical (healthcare), as for example, Microsoft offers both the platform and the vertical app.
Caché is a combination of several products, some of which originate in MUMPS, which is a healthcare-specific programming language developed in the 1960's. MUMPS used hierarchical databases and was an underpinning of some of the earliest HIS developments (Wikipedia is our friend); at some point it ran on PDP-11, which incidentally was the first computer I did ever see.
It makes one wonder what would have happened had MUMPS became the database standard as opposed to what would become Oracle, as MUMPS predates R2 (and C, by the way). But the close connection between the language and the database, which might strike some today as strange, goes back to its origins.
Caché's performance stems from its sparse record architecture, and from its hierarchical (always-sorted) structure.
Caché has been modernized to provide a ODBC-compliant interface (and derivatives: ADO.NET) and an object-oriented 'view' of the data and functionality embedded in MUMPS (ObjectScript). The development environment also offers a BASIC-type of programming language and a website construction toolkit, quite a lot for one standalone package.
It seems that Caché is a document-oriented database, which would make it similar to a XML database in some ways - the main 'entities' are arrays in one case, nodes in the other as opposed to relational tables.
At the same time, for a hierarchical database, Intersystems somewhat confusingly portrays it as an "object" database, which is probably not technically incorrect, since one of the views of the data is "object"-based as I mentioned above.
Creating a class in Caché also creates a table accessible in SQL (via the database interfaces, or through the System Management Portal). The table has a few additional fields on top of the class' properties - an ID and a class name (used for reflection, I assume). The System Management Portal also provides a way to execute SQL statements against the database, although at first sight I cannot seem to create a new data source in Visual Studio - and have to access the data programmatically.
One of the ways using the database from Microsoft Visual Studio requires the use of a stub creator app - CacheNetWizard, which failed every time I tried to use it. The other is to use the Caché ADO.NET provider:
command = new CacheCommand(sql, cnCache);
CacheDataReader reader = command.ExecuteReader();
while (reader.Read())
{
noRecsRead++;
if (noRecs > 0 && noRecsRead >= noRecs)
break;
}
Running a large operation (a DELETE, in this case) from one client seems to spawn multiple CACHE.EXE processes.
There are several ways of exporting data from Caché - exporting classes, which only exports the class definition (in fact, the table definition) and exporting the table itself to text, which exports the contents.
The multidimensional array view of Caché reminds me somewhat of dictionary and array types in languages such as Ruby and Python, while the untyped data elements are also used in SQLite. Arrays can be embedded together to provide a sort of materialized view (in SQL terms) in effect.
Ultimately, the gateway to Caché's hierarchical engine is the Application Server, which takes care of the virtual machines for each of the supported languages, of the SQL interface, of the object views, and of the web/SOAP/XML accessibility, as well as providing communication mechanisms with other Caché instances and other databases (via ODBC and JDBC). The VM's access Caché tables as if they were variables.
When it comes to languages, Caché offers a BASIC variant and ObjectScript. The BASIC can be accessed from the (integrated) Studio (used for writing code) or from a (DOS-based) Terminal (used for launching code). It operates within the defined namespaces (class namespaces or table schemas). A difference from other variants of the language, which is due to the tight connection with the Caché engine, is the presence of disk-stored "global" variables, whose name is prefixed by ^; BASIC function names are actually global variables. Another difference is the presence of multidimensional arrays, similar to Python or Ruby, but which in this case are closely related to the Caché database engine (to which they are a core native feature - hierarchical databases' tables are ordered B-Trees, and these B-Trees provide the actual implementation of arrays; the SQL "tables" and OO "classes" are just views into these B-Trees/arrays); they do not have to be declared.
The array "index" is nothing else than a notation for the key of the node of the B-Tree. Non-numeric indexes are therefore possible.
Architecturally, I would be curious to know if these trees are always stored on disk, or they are cached in memory and some lazy-writer process at some point commits them to disk.
The image above - which I stole from the official docs, and modified - shows the structure of the tree; 4 is a value that the official example stored in all nodes, but any value in any node can be anything.
It can be seen that this "array" implementation actually does not need the d1 x d2 * d3 * ... dn storage for a n-dimension array.
This lack of structure allows for small size but it also can create problems at run time, especially if the consumer of the array and the producer are different; the consumer might not be aware of all the indexes/dimensions of the array. A function exists, traverse(), which can be called recursively to yield all existing subscripts.
If called with the same number of arguments, traverse() does a sibling search. An increase of the number in arguments will make it go down one level; an empty argument will yield the first index of the child (quite naturally, since you don't know what that might be at runtime). However I am still not sure how you can fully discover an array with a potentially unlimited number of dimensions, so the application must enforce at least some structure to the arrays/tables.
Now that the actual storage is better understood, it is interesting to see how these features show up in the table/class structure. What is the mechanism that allows for arbitrary indices to pop up at runtime?
A ^global variable is a persistent object of a class and a row in a SQL table; the latter are OO/relational "views" of the B-Tree/array. To answer a question from above, instantiating a new object creates it in memory; opening it (via its ID property) loads it in memory from disk. It is important to understand that an object is a row in a table. This is a sub-structure of the tree/array, e.g. ^SALARY("Julian", 36, 8) = 125000.75: ^SALARY is the entire structure, and ^SALARY("Scott") represents a different person's salary, and a different row in the table.
Does the tree's dynamic indexing means that classes are effectively dynamic as well and can be changed at runtime? Not really. Neither does the SQL table structure change to reflect changes in the underlying array.
As it can be seen, the value of the global (^USER) is a pointer to the index of the first element, which also is the $ID column of that row.
Interestingly, adding a ^USERD(2, "Dummy") creates an empty record in the table, and adding a ^USERD(2) actually populates the record. So the second level in the ^USERD(2) does not actually show in the table at all. Is this child the next table in the hierarchy?
Mapping the other concepts, the class' package does become the database schema. Creating a table or a class does not instantiate the ^global (array), that only happens when data populates the array. The array's name becomes package.classNameD.
ObjectScript is another language supported by Caché. It is available from the Terminal (one of the ways of interacting with Caché, besides the Studio and the System Management Portal), where you can directly issue ObjectScript commands - you use ObjectScript to launch Basic routines stored in the system. Commands can be abbreviated, which unfortunately makes for unreadable code, as the MUMPS example at Wikipedia shows (it compiled fine in Studio!).
ObjectScript is also an untyped language, allowing for interesting effects such as this:
> set x = 2 --> 2
> set x = 2 + "1a" --> 3, since "1a" is interpreted as 1
System routine names are preceded by %, and routine names are always preceded by ^ as they are globals. Routines can be called from specific (tagged) entry points by executing DO tag^routine. The language is case- AND space-sensitive.
Creating a class also creates ObjectScript routines, which, as far as I can tell, deal with the database persisting operations of the class. for allows for argument list iteration, (similar to Ruby?). It supports regular expressions (through the ? pattern), a fairly robust support for lists, and an interesting type named bit-string (similar to BCD?).
Routines are saved with the .mac extension.
Creating a ^global variable in ObjectScript in Terminal makes it visible in the System Management Portal under "Globals". However, this does not create a table available in SQL.
"Writing" a global only renders that particular node, e.g. ^z is not the same as ^z(1) (the zwrite command does that). However, killing ^z removes the whole tree.
It can be seen that, not unlike with XML (node values vs. attributes), data can be stored in nodes (^global(subscript) = value), or in the subscripts themselves.
There are a couple of handy packages that let you run Oracle/SQLServer DDL to create Caché tables.
There is a lot more about the OO/Relational features of Caché that I have not covered; e.g., it is possible to create objects with hierarchies in ObjectScript, or have array properties of classes, that become flattened tables or related tables in SQL. More details here, with the caveat that Reference properties appear a referential integrity mechanism of sorts which could perhaps have been implemented more "relationally" through foreign keys (supported by Caché, but Caché SQL also supports a pointer dereferencing type of notation, e.g. SELECT User->Name; I am not sure how useful that is since most SQL is actually generated by reporting tools - and I don't think Crystal Reports can generate this Caché-specific SQL; I might be wrong, perhaps this is dealt with in the ODBC/ADO.NET layer).
More on MUMPS' hierarchical legacy here. On OO, XML, hierarchical (and even relational!) databases, here.
This is just a brief overview of several aspects of the Caché platform. Next I will go over the rest of Intersystems' offerings.
Thursday, July 23, 2009
Open Source, Cloud-based Approach to Describing Solution Architectures
Since I don't have a Windows Server to run Sharepoint (I could, presumably, use Azure), I came up with a similar application setup using open source or cloud-based tools:
The only thing that needs to be built is the manager ("gateway", in the chart above) which can be a RIA application whose role is to tie everything together. Sounds simple enough?
Sunday, July 19, 2009
Mobile EHR
I still think it is far fetched for a mobile carrier to roll out an entire HIS application though. There are so many verticals (all, practically) that make use of mobile communications one way or another, should mobile communications providers create solutions for everything?
And a 'global' mEHR, while a nice idea indeed, I think will be always hindered by competing standards and lack of acceptance - after all, even the mobile infrastructure worldwide is fragmented, CDMA, GPS, etc. Why would the application layer be any different?
Worth keeping an eye on though.
Wednesday, July 15, 2009
Google Maps knows where you are
Slowly it is all coming together - the 'cloud' means that you can keep your data (and processes!) in one place, and you can access it (via WiFi) from anywhere, even using a lightweight client. Also both the client and the cloud backend 'know' where you are so functionality can be tailored to the time/location.
I'm not sure how much computing power is needed on the (portable) client - probably, only enough for rich media rendering. Other than specialized applications, most that an average user really needs should be easily done using a client that combines media/communication/lightweight computing services. I don't think iPhone is there yet (as the all-purpose 'client'), but perhaps a combination of iPhone and Kindle, three versions from now, might become just that.
Tuesday, July 14, 2009
XDb
Documentum/EMC offers a XML database named XDb. XProc is in fact designed to work with XDb.
Perhaps XML database is a misnomer. It really is a way of storing XML documents, without (apparently) enforcing any relational integrity constraints other than those defined by the DTD (and perhaps XLink, athough so far I don't know if XLink is declarative only). Therefore XDb and XProc work hand in hand, one allowing for the storage of XML documents, the other allowing for manipulation of those documents (and perhaps, in-place updates).
The logical design is therefore done at a superior level. The 'database' concept appears to function when various stored documents are manipulated as sets - XDb supports XQuery (preferred), also XPath and XPointer.
Each XML document is stored as a DOM.Document and can be manipulated using the standard methods (createAttribute, createTextNode, etc).
I can see a possible usage in, for example, GoogleHealth - where XDb would store well-formatted templates for charts, diagnoses, allergies, vaccines, etc, which would be populated for each patient encounter and loaded into GH.
While in normal usage write contention should not be an issue, I am curious how does XDb deal with document versioning and multiple writes against the same documents - or is the R/W pipe single throttled? (later - here it is - clicking Refresh in the Db manager while an update process was underway yielded the following error:)
Interesting XML database reference information here.
Sunday, July 12, 2009
XProc
Wednesday, July 08, 2009
Over mashed-up
Tuesday, July 07, 2009
Platform Convertor Strategy Analysis
Sunday, July 05, 2009
Visualization, again
What can I say: Tufte meets SQL. And perhaps Processing should get in the game - surprised I haven't seen any rich visualization libraries for it - yet.