November 26, 2007

Can Open Source achieve meaningful market penetration?

Let me state for the record that I applaud the Open Source Movement (OSM). The existence of independently developed competing software solutions to business needs helps to motivate all the players, Microsoft included, to innovate continuously and to keep prices down. In turn, competing software brings its own innovations and product designs to the marketplace and offers alternatives that accommodate different work styles and preferences. The OSM has delivered some impressive products and potentially represents a real choice in the software industry.

Having said that, let me also state for the record that my experience in actually trying to use an Open Source product has been, if not an unmitigated disaster, then at least a severe and costly (in terms of my time) disappointment.

There are two categories of people who will try an open source product for the first time: those who are developer/programmers, and those who are not. They all share the same basic motivation, I think, which is to seek out a cost-effective alternative software solution to a business problem. In my case, the problem was to simplify the creation and maintenance of ETL routines for a data warehouse with multiple inhomogeneous data sources. There are products designed for this purpose by Informatica, Oracle, Business Objects, and of course, Microsoft. You get Microsoft’s offering with the SQL Server database. The others charge an arm and a leg for their respective products. I wanted to look at a product that was not Microsoft’s and was less expensive than the others.

Enter Pentaho Data Integration (PDI). This package was an open source project called Kettle (Kettle Extract, Transform, Transport, and Load Environment) that is now packaged and marketed by Pentaho Open Source Business Intelligence. It is a neat looking product, with a GUI called Spoon (the complete metaphor also includes a Pan and a Kitchen) that adheres to a visual convention for modeling a transformation or a job similar to its competitors. It works with or without a central repository. It converts a transformation model or a job to XML files that can be used by the Kettle suite or other software. The objects with which you create a model seem at first glance very intuitively designed.

Unfortunately, when intuition fails, the documentation (such as it is) is not much help. There is a user guide that incompletely describes each icon and menu option, but doesn’t provide any practical tutorials. There are some example transformation files and job files that you can examine to get a clearer picture of how some features work, but they still leave a lot of uncharted territory. This gap is supposed to be made up by the “online community” – forums in which you can post questions and get answers, often from the original designer of the program.

It seems clear from the postings on the PDI forum that most of the people using this software are programmers who have the time and the budget to do their own debugging, and to read from and post to blogs and bulletin boards. The rest of us don’t really have that luxury. For us, the alternative is to hire Pentaho for a consulting gig, and that’s where the money comes in.

Pentaho is following the same business model with Kettle as RedHat has done with Linux. They have taken charge of coordinating the efforts of independent programmers who volunteer to create enhancements to the Kettle software; assemble the results and what documentation there is as a free downloadable package; and offer consulting assistance and training to entities that want to implement the product in their environments. The gaps in information and the bugs that surface are all income opportunities.

Here’s my complaint: If you brand a product that is hard to use; includes limited, incomplete, or inaccurate documentation; and has basic features that don’t work, you undermine the credibility of the product, the brand, and the whole Open Source approach, and you limit its ability to achieve serious market penetration. If the plan is to bring in those consulting dollars to plug the gaps, it doesn’t seem like a very good plan since, if the product lacks or loses credibility out of the box, the consulting potential goes to zero!

A truly fundamental function of any software is to save the work you have done with it. But I couldn’t save my work using the PDI version 2.5.2. In response to my post on the forum, the creator of Kettle informed me that “a workaround” had been included in version 3. It may have been just an unfortunate choice of words, but “workaround” implies to me that the problem is pretty serious and has not been fully resolved.

On a more complex level, consider the basic creation of statistics. Suppose a table has an attribute with a finite set of values v1, v2, v3 … vN. An ETL procedure has to denormalize these attributes: a target table has a set of columns colv1, colv2, colv3…colvN, and for each occurrence of an attribute value in the original table with key KN, the count in corresponding column of the target table is incremented by 1. In a procedure, this is handled with a CASE statement, such as:

“case source.attrcol when v1 then target.colv1 = target.colv1+1 when v2 then target.colv2 = target.colv2+1…when vN then target.colvN = target.colvN+1 end”

Granted there is a lot going on here, but it is very simple to program in this way. PDI’s denormalizer tool seems to be able to put the value vN into target.colvN, but not to simply count the number of times vN occurred with this key. There doesn’t seem to be any practical help for this in the documentation or the examples. The only option seems to be to pay Pentaho either for consulting or for a training class. Not likely to happen on spec.

If all OSM projects are marketed this way, I can’t see them achieving much in the way of market penetration. A few committed OSM geeks in big corporate IT departments with deep pockets will probably launch “pilot” implementations that get institutionalized without enjoying wide adoption. The SMB market, where cost-effective software alternatives are needed most, will remain largely undeveloped because they can’t afford to invest their own resources in “not ready for primetime” software, and would consider spending on consulting a high-risk low-return proposition. OSM vendors should reconsider at least this aspect of their business.

May 28, 2007

Software as a Service and the Personal Computer

Many years ago, I looked at the amount of disk storage on the computer I was using, and the amount of RAM, and the size of the word processing program it was running at the time, and I thought: this is so out of line! How much of that program did I actually use at any given time, and how much was taking up most of my available memory? And then I thought, this program is over five years old; there is a newer version available, but I don’t want to spend the money to upgrade for a few marginally useful enhancements.

And then I thought, one day no one will buy personal productivity software ever again. They will rent it by the session from their ISP for next to nothing. They will always get the latest version; they will never have to worry about the effect of upgrading their operating system. Hell, they won’t even have to worry whether they have the necessary capacity on their computer, because the computer will have no capacity at all! We will return to the dumb terminal of the 1970’s, but with WYSIWYG displays instead of green screens.

I think it’s coming. Really. There is a trend among some specialty software products to provide programs on line as a service. A prominent one is Salesforce.com, which has long provided contact management tools over the world wide web. Once upon a time, Microsoft gave away its contact management software with every copy of Windows.

Earlier this week, Intel showed off a concept laptop that was as thin as the Motorola Razr phone. They did it by replacing the hard drive with flash memory. Look, Ma: no moving parts! What if this were followed through to its logical conclusion: no memory, either! A basic terminal that can connect with the World Wide Web, either wirelessly or through your house electrical system. These things would be dirt cheap, disposable, and potentially ubiquitous.Â

One drawback: personal storage. ISPs would be more than happy to sell you all the gigabytes of storage you want, but I’m not sure I want all my documents to be kept on a public utility.

Another drawback: offline functionality. You may find yourself in a time or place where you can’t connect with the internet. Or maybe you just don’t want to. We all want to be alone, sometimes. A limited function, no-memory terminal is useless without that connectivity. You’re better off with an old typewriter if you have letters to write.

But the fact remains, such computers would substantially reduce the total cost of computing for the average consumer. An inexpensive, maintenance free gateway to the world would get a lot of people on line who cannot get there now, both in developing nations and in the developed world. Further, the stripped down terminal would be an ecological boon. Where, after all, do old computers go to die? The stuff they are made of today is pretty toxic. Imagine a biodegradable terminal.Â

Will software developers and hardware makers get on board with this? Does Bill Gates’ vision of the Internet Appliance include the demise of personal copies of Microsoft products? And what about the ISPs? Could they economically upgrade their capacity with enough application servers to obviate the need for any capacity on the client?

The development of personal computing today is about at the level of the automobile by the 1920’s. The complexity of the machines had grown, but manufacturing technology had reached the point where the amount of mechanical knowledge an individual required to keep his car running was greatly reduced. The car was more reliable over all, and there were specialists to fix it when it wasn’t. And the highway infrastructure was growing, though it didn’t get its really big push until the 1950’s.

Computing is evolving as a convergence of many information technologies: radio, TV, the Internet. There are some functions you want to keep separate from your TV. But do you really want or need to spend up to a thousand dollars every couple of years to do word processing or Internet searches?

What do you think?

January 6, 2007

The Better Choice

We all make choices, at one time or another, that we come to regret, or at least to feel we should have taken an available alternative. We make choices based on limited information about the benefits and risks. As time passes and we begin to assess the actual benefits that accrue, we may feel our expectations have not been met.

The reason for Business Intelligence systems is to provide as much objective information as possible to facilitate a meaningful risk/benefit analysis. It is something of an irony, therefore, that there is no reliable way to adequately assess the risks and benefits of investing in one Business Intelligence tool versus another. This is not to suggest that any one BI product is generally superior to any other, but rather that the attributes of some products will be more appropriate in a given context than attributes of other products.

As with any tool, one size or configuration of Business Intelligence product does not fit all circumstances. But unless a decision maker has direct experience with the available options, he will select one over another based on his comfort level with what he is being told and by whom.

There may be sources of information one could turn to for information to fill the gaps. But they are awfully hard to find. I’ve tried. For example, this article from IT ToolBox (click here) gives a brief rundown of some key features for about half a dozen products. CAMagazine (click here) published this feature-comparison grid in PDF format comparing eleven products sold in Canada, including Cognos, Hyperion and Informatica but excluding Business Objects and Oracle. Both of these examples date from 2004, and competition being what it is, much may have changed in the intervening three years. Neither is sufficiently comprehensive to be of much use.

I know I would find it tremendously helpful if there were a website I could go to that provides up-to-date comparisons of product features for all the current BI software offerings. It would also be helpful to find comments from the user community, including both IT and non-IT decision makers who have used two or more products, that contain insights into why one product turned out to be a better choice for them than any other.  I’d like to know what made them choose the original product, what made them decide to change, why they chose the second product, and so on.

Since I haven’t found any place I can go for that, I’m opening up the floor in this blog. If any of my readers know of a good source for this information, please post the URL. If you have any experiences to relate, please post them.Â

I’m not looking for product endorsements; “best/worst product ever” just isn’t going to fly. A tool that didn’t work out well for one decision maker may be perfect for another, for both objective and subjective reasons. And frankly, all the tools on the market today are thoughtfully designed, powerful, and effective, and I have no desire to disparage any of them, implicitly or explicitly. None, however, is perfect for every user, and I’d like to see if there are identifiable guidelines for making the right choice.

This is not a contest. There is no prize for the most compelling story. The only reward I can promise is that the experiences you describe here may be helpful to someone else. In order to do that, postings should contain the following minimal information, along with any commentary you feel is beneficial.

  • A brief description of the project for which the tools were selected
  • A brief description of the project environment, including operating system, database, the industry, and the target audience.
  • The tool that was selected first, and the criteria on which the selection was based
  • The tool(s) selected later, and the reasons for those selections
  • A summary assessment of the results to date.

In the summary assessment, it would be helpful to know if any of the original selection criteria were later determined to be invalid or irrelevant. This would include, for example, mid-project changes in focus or direction, or other environmental changes, as well as those determinations resulting from experience with the product(s).Â