An Interview with Mike Olson on XQuery and Database Technologies

Mike Olson is the President and CEO of Sleepycat Software, the makers of Berkeley DB, the most widely-used open source developer database software in the world with more than 200 million deployments. Mr. Olson has held numerous positions since joining Sleepycat in 1998 — he was one of the original authors of the Berkeley DB software, and he served as Vice President of Marketing before being appointed CEO in 2000. Last year, Sleepycat Software released Berkeley DB XML, a new edition of its developer database that stores and retrieves data using native XML.

Stylus Studio® is the leading XML IDE for XML data integration, featuring advanced support for XQuery development, including XQuery editing, mapping, debugging and performance profiling, as well as tools for working with XSLT, XPath and XML Schema. Ivan Pedruzzi, Stylus Studio®'s Senior Product Architect and editor of The Stylus Scoop newsletter, recently had the opportunity to meet up with Mike on behalf of the Stylus Studio® developer community. The two chatted about the impact of XML technologies on database technologies and other topics related to XML data integration.



Ivan Pedruzzi: Hi Mike. Thanks for taking the time to meet with The Stylus Scoop today. As the CEO of the company responsible for producing the most widely deployed database product in the world, can you give us your take on emerging XML technologies such as XQuery?

Mike Olson: Hi Ivan, I appreciate the opportunity to chat with the Stylus Studio® user community. To answer your question, I would say that the database community is tremendously energized about XQuery and XML standards. We feel that the XQuery standard has become something of a rallying point for XML and database developers and product vendors alike. In talking with our customers, we get the overall sense that XQuery is perceived to be flexible enough for them to work with on a number of different kinds of new applications.

Personally, I tend to view XQuery as the beginning of a possible shift beyond SQL since it is designed in such a way that people can use it to query any data — of course it works great for XML, but it has application for relational data, legacy data sources, or anything else. XQuery is particularly beneficial in the data aggregation area, for example, in creating views of distributed data. The challenge for us as a database product company is to design optimized solutions for all of this advanced query processing.

IP: And is that thinking behind the decision to create Berkeley DB XML?

MO: In part, sure. But in addition to reading the tea leaves [laughs], we talk to, and listen to, our customers. Sleepycat has a very large and loyal community of developers who know what they like and know what they need. So, ultimately, our decision was made based on customer feedback which helped us identify what is a natural evolution of the product line that, we hope, anticipates future demand.

IP: What kind of new applications do you see developers building with new XML enabled database technologies and XML query facilities?

MO: Interest in XQuery is across the board. For example, from an industry perspective, we see activity in financial services, telecommunications, bioinformatics, software product companies (ISVs)... even bloggers [Web loggers — Ed.] are using Berkeley DB XML to store and syndicate their Web content. We also find that developers working with XQuery come from a broad technological background, encompassing both the Microsoft and Java developer communities. Having such a diverse developer base makes it difficult to generalize the use of XQuery technologies, but I would say that developers are increasingly turning to XQuery to provide much more 'elegant' solutions to problems that were previously tremendously messy if not completely impossible to work out, in areas of data integration, Web services and content management. Technically, I'm not surprised given that these are natural strengths for XQuery, but it's exciting for us as a database product company to witness the demand and enthusiasm around XQuery grow from up close.

IP: A common criticism of XML by database folks is that the perceived overhead needed to parse and process the XML data is too great. Can you respond to this?

MO: Well, I guess that the first point I'd make is that XML and XQuery are data description and data management languages — DDL and DML — just like the DDL and DML components of SQL. It's not fundamentally harder to process XQuery than it is to process SQL. Now, if you have a relational database with an XQuery front-end, then you need to process the XQuery, and then you need to process the SQL that you generate, and of course that's less efficient. If your data is in XML, you want to use a natural XML-based language to operate on it natively.

It's important to understand that using XML simply implies a certain amount of overhead to parse and process XML. Same deal as processing SQL. However, when you put things in perspective, the overhead of processing XML is often a small price to pay given the tremendous benefits like increased interoperability and flexibility that XML has to offer — as I mentioned a moment ago, you can use it to tackle, elegantly and easily, applications that would have you coding in knots using SQL. And, if I can blow our horn for a moment —

IP: Of course ...

MO: ...The overhead is substantially minimized given the tremendous optimization techniques employed by Berkeley DB XML.

IP: What advice can you give to an engineer thinking about using Berkeley DB XML versus a more traditionally packaged relational database? What are the advantages?

MO: We're often asked similar questions by developers trying to figure out if XQuery is right for their application. If you're struggling with the decision on whether to choose the XML route, think about the following — First, did your data start its life as XML, or are you working with structured or semi-structured content? In either case, consider using a native XML database. Of course, more often than not, people will likely find themselves working with non-XML data sources, and if this is the case they are probably also writing increasing amounts of complex code for constructing and deconstructing the XML fragments. This is another telltale sign that you ought to be considering a native XML database, after all — you don't take apart your car every time you put it in the garage, why do that with data?

Alternatively, if your data started out as, say, relational data, but switching to a native XML database is somehow not an option, think about employing Berkeley DB XML as a kind of forward cache to alleviate performance bottlenecks.

Overall, it often simply boils down to value — what do you demand of your database? If you're only interested in transaction processing in the pure relational sense, then XQuery is probably not for you. But if you require more sophisticated features for data aggregation or more advanced query facilities, then chances are XQuery will help.

IP: To what extent does Berkeley DB XML provide support for XML Schema?

MO: We can optionally perform an XML Schema validation prior to inserting XML into the database. XML Schema is wonderful because it addresses so many of the shortcomings of its predecessor, document type definitions [DTD — Ed.] — in particular, DTDs didn't support the concept of data types, and this was a real stumbling block for most database developers trying to use XML with relational database technologies. But XML Schema addresses this and adds so much more in terms of data modeling flexibility, and using Stylus Studio®'s XML Schema Editor (which, by the way, we happen to think is the best one out there) makes it very easy to visually create advanced XML data models for representing just about anything. Who knows, XML Schema could potentially become a new de facto standard in data modeling in the same way that entity relationship or unified modeling language diagrams are so widely used.

As a database product company, we're always thinking about speed, and that's another reason why we're so excited about XML Schema. XML Schema provides a wealth of useful information that our product engineers can use to optimize overall database performance. For example, analyzing the shape of an XML document, the data types, etc. — there are a lot of clever things you can do to make XML query processing more efficient, which we definitely plan to roll out in the future. Overall we think XML Schema is definitely worth looking into.

IP: Can you explain to our users the relationship between Berkeley DB and Berkeley DBXML?

MO: Berkeley DB XML uses Berkeley DB for transactional processing and storage. Essentially we've layered XML query processing and indexing services on top of our core storage engine which disappears into the application; the end user never has to know it's there, and you don't have to hire a database administrator. One additional advantage of this layered architecture is that you can use different instances of Berkeley DB and Berkeley DB XML under the same transaction.

IP: What XML querying standards does Berkeley DB XML support?

MO: The current version of Berkeley DB XML can be queried via XPath 1.0. We provide C++ and Java APIs, as well a few other scripting languages like Perl and Python for programmatically invoking the XML queries and handling the results. We're currently in the final stages of rolling out Berkeley DB XML 2.0, which features a new XQuery interface, as well as updated support for XPath 2.0. We're making good progress and expect to be ready to release Berkeley DB XML 2.0 before the end of 2004.

IP: In time for the holidays!

MO: [laughs] Exactly! We're also working to expand XML querying scenarios by rolling out new APIs for PHP and TCL, so as you can see there are a lot of new developments as far as querying technologies are concerned. But we're particularly impressed by the new integrated XQuery and XPath facilities in Stylus Studio® for Berkeley DB XML, which is a great help in visualizing the data, building and testing XML queries, deciding what queries to use, and how to optimize them. Being able to access, update, and create Berkeley DB XML containers from within Stylus Studio® is pretty cool, too.

IP: Do you have any plans to support XQJ? [XQJ is the XML Query API for Java, for programmatically invoking XQuery expressions against a data source and handling the results, much like how developers currently use JDBC or ADO — Ed.]

MO: We like XQJ for several reasons. When it is released, XQJ will provide a reasonably well thought out Java API. Moreover, XQJ will likely prove to be important in establishing the critical mass needed to increase the adoption rates of XQuery on the Java platform. So, while we don't have any published plans to implement XQJ, it's safe to say that we're watching it with great interest.

IP: What do you think about the state of XQuery tools?

MO: Well, for serious XQuery work there's only one choice, and that's clearly Stylus Studio®. Your robust XQuery editor along with an integrated XQuery debugger, visual XQuery mapper and XQuery performance profiler provide an enormous productivity boost for so many aspects of developing advanced XQuery applications. But XQuery applications are more complex than XQuery alone — developers must deal with XML Schemas, XPath, Web services, and so on — and Stylus Studio® provides tools for working with all of these technologies. My favorite aspect of Stylus Studio® is that developers can quickly and easily build and deploy sophisticated data driven applications that would have previously required substantially more complex code or expensive third party integration frameworks. This is great news for software developers and data integration architects!

IP: As an embeddable XML database, Berkeley DB XML has quite the reputation for being the fastest on the market. Can you elaborate on your performance optimization efforts?

MO: As we've already discussed, Berkeley DB XML avoids the overhead of a SQL layer and client-server inter-process communication, so those are key architectural advantages. In terms of query performance optimization, we've done several different things. First and foremost, we provide the ability to create different indexes, for example by element/attribute value, by the presence of an element/attribute, by the location of a node within a document, etc. Having multiple available indexes enables faster look-up by allowing individual collections to be indexed differently, resulting in faster processing of XPath or XQuery expressions. A cost-based query optimizer considers the indexes that exist, the data volume that a query is likely to produce, and the cost of computation and disk I/O to select a query plan with the lowest run-time cost. I could go on for quite a while on this subject as it is a central focus of our work, but in a nutshell Berkeley DB XML is the fastest product in its class.

IP: Where can XML developers learn more about Berkeley DB XML and XQuery?

MO: Our Sr. Software Engineer, George Feinberg, is doing a talk on Native XML Databases at the upcoming XML Conference and Expo this November in Washington, DC. Then there's our Web site, www.sleepycat.com. We have numerous developer resources including a mailing list, a blog maintained by our developers, and various other technical materials. The full product with source code, documentation and sample code is available for free download and open-ended evaluation. And of course, downloading the free trial copy of Stylus Studio® 6 XML Professional Edition further reduces the learning curve!

IP: I enjoyed talking with you today. Another benefit of our companies' partnership — one that doesn't get too much press. [laughs] Thanks for taking the time to talk with us today, Mike. I'm looking to future collaborations.

MO: Me, too, Ivan. This has been a pleasure. Thanks.


Editor's Note: XML Tech Talks are a regular feature of The Stylus Scoop. If you liked this interview, consider subscribing to our XML developer newsletter today! An archive of past interviews with other XML technology guru's is available.

PURCHASE STYLUS STUDIO ONLINE TODAY!!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Try Stylus Studio Powerful XQuery IDE

Download a free trial of our award-winning XQuery profiling tools!

Top 10 XQuery Trends

Read about the top 10 XQuery Trends and how they will impact change the way enterprise software applications are built.

What's New for Stylus Studio® X16?

New XQuery & Web Services Tools, Support for MySQL, PostgreSQL, HL7 EDI, Microsoft .NET Code Generation and much more!

Ask Someone You Know

Does your company use Stylus Studio? Do your competitors? Engineers from over 100,000 leading companies use Stylus Studio, and now you can ask someone from your own organization about their experiences using Stylus Studio.

XQuery Help and Discussion Forum

Learn about XQuery development at the SSDN's new XQuery Help and Discussion Forum

 
Free Stylus Studio XML Training: