Julian @ Thales: August 2006

Monday, August 28, 2006

GoogleMaps

Since I will soon have my 'summer' vacation, here is a link to my first GoogleMaps app: a list of places I have been to. It's a V1 thing, hampered by my inadequate JavaScript skills, that I promise to improve.

I'm kind of surprised though that the Google Geocoder does not seem to recognize (cities in) Japan and China?? I even tried their online demo and it cannot find these two locations.

The API is here.

Of databases and connections

A few notes to self regarding SqlClient and OleDb connections:

link between program and data
opening a connection is expensive
hence connection pooling: ADO does not destroy the connection object even after you close it
the connection is kept in a pool
it is destroyed after a time out interval (60 seconds => disconnect in SQL Trace)
reusable connection: which matches the connection details(data store, user name, password)
to turn connection pooling off in OLE DB: append to ConnectionString 'OLE DB SERVICES = -2':
- significant differences: 6 seconds for 100 000 connections to ....?
- not using this leaves the connection in SQL logged in at the time of the initial logon even after it is closed and reopened in code
- the connection disappears from Activity Monitor when the program exits
- however, if a connection is closed and the program is still running, after a while it disappears from the Activity Monitor (after the time out)
- if the connection is not closed, it stays open in the Activity Monitor
- multiple connections are opened if a Connection.Open is issued even if they have the same authentication and data store
setting the connection to null/Nothing clears it from the pool (? does not seem to affect the Activity Monitor)
- if the connection is set to nothing without closing it, it shows in Activity Monitor
- not closing the connection causes it not to time out even when set to nothing
- it is not clear what effect has setting the connection to Nothing/Dispose-ing in OleDb
in ODBC: use the control panel (how do you turn it off???)
using the SqlClient instead of OleDb shows the application in Activity Monitor as .Net SqlClient Data Provider
using SqlClient seems to keep the connection alive even after closed for longer than OleDb (does it ever time out?)
using OleDb shows the application as the exe not the OleDb Data Provider

Ok that is a lot of notes to self. I'm investigating this stuff: it's fairly well known but when you have to debug performance problems every little details counts and you have to be considerably more careful reading the fine print.

Which reminds me, each OS should provide some kind of relational/transactional storage service. Unix/Linux/Mac OS already does - SQLite.

Wednesday, August 23, 2006

Web Operating System?

Various web cognoscenti have been ballyhooing the 'OS' concept in a web context: Google OS. This has more to do with the coolness factor of any new software development arena than with actual functionality provided by the respective software.

The Google suite offers the following:

GoogleDesktop (supposedly at the core of the 'OS', and a resource hog to boot!)
Audacity (audio editing)
Orkut (social networking)
GoogleTalk
GoogleVideo
GoogleCalendar
Writely(word processing)
Gdrive (internet data storage)

Other than Gdrive, none of the above belong to an OS.

Leaving coolness aside, there are genuinely innovative Web-based applications whose complexity is close enough to that of desktop-based applications. For example, computadora.de 's shell is not that different from Windows 95's shell as far as the basic functionality it offers. Flash is a kind of Win32 in this case.

On the middle layer, salesforce.com is a good example of application domain functionality provided by a Web-based layer. It should be entirely possible to offer a payroll processing service.

And yes, I have a computadora.de account. You can even upload mp3's there and play them using the integrated mp3 player – which I did, an Alejandro Fernandes song, in keeping with the Mexican origin of the software.

SQL 2005 Endpoints

SQL Server can act as an application server by the means of endpoints (listeners). These can be defined over TCP or over HTTP, and support SOAP, TSQL, service broker, and database mirroring payloads.

To create a SOAP endpoint, create the stored procedures or functions that provide the functionality. Then run a CREATE ENDPOINT ... AS HTTP... FOR SOAP. Important parameters: SITE, and the WEBMETHODs collection.

A SOAP request returns an object array or a DataSet object. The default SOAP wrapper is created by Visual Studio.

This is quite nice. If you need to use a data-centric web service, just create one directly in SQL. To use it, just define the Web reference in the VS IDE; this will make an object of type SITE with a endpoint member you can access the data exposed by the SQL Server (e.g. If your SITE parameter was set to 'mySite', and the endpoint was named 'myEndPoint', you have a mySite object available which has a myEndPoint member, which exposes the functions/stored procedures defined on the SQL Server).

Monday, August 21, 2006

getUserPhotos part II

Zuardi in the previous post means Fabricio Zuardi, and he is the author of a Flickr API kit: a (REST-based) implementation of the Flickr API client in Actionscript. For some reason, he forgot/overlooked to implement one of the methods in the API, getUserPhotos.

Friday, August 18, 2006

Finally, getUserPhotos

I finally completed the missing flickr.urls.getUserPhotos from Zuardi’s Flickr library for Actionscript. It was easier than I though. Here it is:

public function getUserPhotos(api_key:String, user_id:String):Void{
var method_url:String = _rest_endpoint +
"?method=flickr.urls.getUserPhotos&api_key=";
var flickrUrlsObjPointer:FlickrUrls = this;

if(!api_key){
throw new Error("api_key is required");
}
else
this._api_key = api_key;

method_url += this._api_key;

if( user_id )
method_url += "&user id=" + user_id;
else
method_url += "&user_id=" + this._user_id;

this._response.onLoad = function(success:Boolean)
{
var error:String = "";
var isRsp:Boolean = false;

if( success )
{
if( this.firstChild.nodeName == "rsp" )
/* got a valid REST response */
{
isRsp = true;
if( this.firstChild.firstChild.nodeName == "user")
/* got a usable return */
{
flickrUrlsObjPointer._user_photos_url =
this.firstChild.firstChild.attributes['url'];

}// end usable
else
if(this.firstChild.firstChild.nodeName == "err")
/* got an error */
{
error ="ERROR CODE:" +
this.firstChild.firstChild.attributes['code'] +
"msg:" + this.firstChild.firstChild.attribute
['msg'];
}// end error

}/* end valid REST */
else
error = this.firstChild.attributes['code'] +
" msg: " + this.firstChild.attributes['msg'];
}// end Success
else
error = "Cannot load user photos: " +
method_url;
flickrUrlsObjPointer.onGetUserPhotos
(error,flickrUrlsObjPointer);

}//end onLoad

this._response.load( method_url );

}//end function

Some of his original coding is a bit grating to a perfectionist such as me :) However, I find the self reference (var flickrUrlsObjPointer:FlickrUrls ) that enables him to reach to the parent object in onLoad a nice touch. Actionscript 2 is still a mess though as far as readability.

For the unitiated, all we are trying to do is to capture the output from a REST call such as this: http://www.flickr.com/services/rest/?method=flickr.urls.getUserPhotos&api_key=404d98e10174604c8050f4f732e2162e&user_id=66489324%40N00

Via a XML object – the expected response is something like this


user nsid = “66489324%40N00” url="http://www.flickr.com/photos/zzkj/”

Tuesday, August 15, 2006

I hate embedded Crystal

These last days I had the dubious pleasure of setting up an integrated Crystal Reports xi solution. Reporting is certainly one of the less glamorous but more crucial aspects of corporate computing. This was a simple RPT to PDF converter, yet the compiled distributable clocked in at a hefty 70 MB, mostly due to the dreaded merge modules (a 150+ MB separate download; at least they seem to have fixed the annoying KeyCode bug). Not to mention that in order to get it to work with Visual Studio 2005 I had to download a Release 2 – 2 downloads of 700 and 300 MB, respectively.

Clearly, this is unacceptable. MS Reporting Services all of a sudden makes sense although that is no walk in the park either.

Most of CR’s (now, Business Objects) heft comes from, I think, the fact that it covers the entire processing workflow – connecting to data, parsing the data streams, rendering the results. It connects to a dizzying array of data sources and it may have its own internal query processor for all I can tell. In fact, there would be a nice architectural solution to this if one considers that reporting is really just the basic processing of a firehose data stream. If vendors could come up with and agree on an ODBC-type of interface, the life of report tool creators (and of programmer users) would be so much easier. Everything that Crystal connects to today, for example, could be a data store supporting a ‘reporting’ interface. Querying the data store via this interface (if binary objects can describe their supported methods via IUnknown, certainly data can describe its own structure!) would result in a XML output that could be ideally be streamed directly to a XAML visual layer.

It would be nice.

Wednesday, August 09, 2006

A web crawl algorithm

For a while now I have been very intrigued by web crawlers. After all it’s the stuff of hacker movies… So a few nights ago I came up with a nice little algorithm.

Starting with a given URL, we want to open the web page found at that URL, build the list of pages it references, and do the same for each page referenced therein. Of course, since this could potentially scan the entire www, I decided to limit the actual exploration to pages in the same domain (the other pages will be terminal nodes). And, I wanted to build an unordered list of references; so the first output would be a list of all the pages found this way, and the second a list of page pairs (unordered: if page A href’s page B and page B href’s page A, I wanted only one A,B pair to be shown).

It’s done like this: we start with two (empty) collections, one for the pages (indexed by the page’s link), the other for the page pairs (the first one is an object collection, the second, an object reference collection).

We add the root to the object collection. Then we call the root’s discover method, which:

builds a list of links in the document (this is the weak or rather inelegant point of the algorithm; I am using regular expressions to extract the links, and problems stem from the variety of ways in which a href can be coded in HTML: href=www, href=’www’, and there can be relative and absolute links);
for each link, if the respective address exists in the object collection (remember, this collection is indexed by the (object) page’s link), add the pair (the current page, at this step, the root, and the page identified by the link) to the pairs collection (if: the link does not refer to the current node, to avoid self-referential links, and if the pair does not already exist in the reverse order);
if the address does not exist, create a new page object, add it to the objects and to the links collection, and call this object’s discover method (unless the page points to a different domain, or to a non-readable object such as a jpeg).

Nice object recursion. Again, most of the code deals with reading the HTML stream, parsing the href’s, etc, the algorithm itself takes a mere 30 lines or so. I implemented it in .NET and after banging my head against the wall with a few limit cases, I got it to work very nicely, and a Java implementation would be just a transcription. I’ll look into porting it to Actionscript next.

Actually I could have used only one collection. Instead of inspecting the objects collection I could have inspected the pairs collection (after making it a full object collection). This would have been a more awkward and time consuming search, since each page object could be in that collection multiple times, whereas in the current object collection it is found only once.

I am not sure how else would you do a crawler (probably, the Google search index algorithm employs a similar crawling methodology) without being able to DIR the web server. Which raises the question: would a page that is never referenced by any other page ever be found by Google?

Here’s hoping this impresses Sandra Bullock (The net) or Angelina Jolie (Hackers).

Saturday, August 05, 2006

REST

There is a whole philosophy behind REST. This is worth a read. The world of software is not free of philosophies.... GOTO, Unix, open source, and now procedure calls.

Julian @ Thales