An Interview With Dr. Daniela Florescu, Editor of the W3C XQuery Specification

Dr. Daniela Florescu is one of the editors of the standard XML Query Language, XQuery. She has been involved with the XML Query Working in W3C group since its inception, and she authored two XQuery precursors, XML-QL and Quilt. Today, Dr. Florescu represents Oracle Corporation in the Working Group. Inside Oracle, she is helping the XQuery team develop their products in Oracle10g. Prior to Oracle, Daniela was founder and CTO of XQRL, a startup whose goal was to build a streaming XQuery engine. After BEA Systems acquired XQRL, Daniela became the architect of the XQuery engine used in the BEA's Liquid Data and Weblogic products. For many years now she's been working on a side research project familiar to most in the XML community — trying to extend XQuery to a full programming language for Web services!

Stylus Studio® is the leading XML IDE for XML data integration, featuring advanced support for XQuery development, including XQuery editing, mapping, debugging, and performance profiling, as well as tools for working with XML Schema and XSLT. Ivan Pedruzzi, Stylus Studio®'s Senior Product Architect and editor of The Stylus Scoop newsletter, recently met with Dr. Florescu on behalf of the Stylus Studio® developer community. The two chatted about her top-secret XQuery mission and more — read all about it here!


Ivan Pedruzzi:Hi, Daniela. Thanks for taking the time to meet with the Stylus Scoop! I might as well cut to the chase, since our readers will want to know — can you tell us more about this XQuery research "side-project" you've been working on for the past 4 years?

Daniela Florescu:Well, since you asked ... but please be careful, it's easy to get me started about XQuery, the hard part is stopping me [laughs]. Here is my view: I believe strongly in the role that XQuery will play in the future of information processing. Otherwise I could not have worked on it for so many long (and hard…) years. But while I'm passionate about XQuery and its possibilities, I also believe that XQuery is not the end of the game — we have much more work to do before we can say that we really address the problems related to information processing. In particular I believe that XQuery is a just one component — but a vital one — in a bigger architecture that involves, among other technologies, a declarative language specially designed for Service Oriented Architectures, or Web services, if you prefer. XQuery is a fundamental component, there is no doubt about that, but it is still only a component, one piece in a larger framework.

IP: But what's wrong with current Web services technologies in your eyes? And what are you proposing?

DF: First, the idea behind service oriented architectures — enabling distributed access to data and services, etc. — is great. I agree completely with the underlying philosophy. But Web services (which you can consider as an implementation paradigm of the SOA architecture) have been around for nearly 5 years now and have yet to take off in the way many predicted they would. I think that one of the reasons why the Web services adoption curve has been so gradual is simply that they are difficult to develop. To build a Web service today, you'd need at least these three components:

  1. Communication with the external world— this implies XML and all the associated technologies (e.g. XML Schema validation, XML forms, and XSLT or XQuery for data transformation)
  2. An imperative programming language — something like Java or C# to code business logic and use and/or manipulate the data.
  3. Persistence and transactions — your data, whatever your program does, needs to be persisted in a transactional (safe) manner.

There are good reasons for each of the layers above, but dealing with the "three-legged animal" (XML, objects, and tuples), as some people call this architecture, is undoubtedly hard, and poses both performance and productivity problems. And yet, as Pat Helland from Microsoft noted at a recent CIDR, people continue to use this solution despite its inherent difficulties — and there are good reasons for that.

IP: IP: And yet, there's always something better — better solution, better technology...

DF: Yes! And that's why it's going to be very interesting to see how things evolve over the next couple of years. And when I look into the future ...

IP: Madame Florescu, seer!

DF: [laughs] ... I see people programming within a single framework built around XML as a first class citizen. That's at least what I would personally like to see happening in the future, being both a programmer myself, but also a consumer of information.

I believe that we can design a programming language that manipulates information in a native XML format, and design it with Web services architectures in mind. This implies native support for event-based programming, and asynchronicity and parallelism, among others, so it won't be trivial, to say the least. Java and C# don't offer such primitives natively, and they are needed. Microsoft's effort around C-omega is another attempt to solve this problem, by adding such primitives to C#, together with more XML support.

IP: But you're undeterred, in spite of this?

DF: Well, the problem that we are trying to solve is complex enough; I don't think that there will be a unique solution to it. Personally, I would start building a solution from the other end — start with XML and build a solution around it, not the other way around. I believe that XML as an information model will survive much longer than any of the programming languages that we know today, and many people agree with this analysis. If this is so, I think that it is more natural to build a programming language centered around XML, instead of adding XML support around existing programming languages. So not only I believe that the information exchanged between programs will be XML, but I also believe that the information will be internally manipulated as XML. In other words there won't be a distinction between "data on the outside" and the "data on the inside" of a Web Service.

IP: That makes a lot of sense. Where do you start? What are the technologies or building blocks that we have today?

DF: There's so much available already: we have XML itself, of course, as a standard syntax for representing information; XML Schema to describe structural constraints; a standard type system; and a standard (soon, I understand) abstract XML data model and a query and manipulation language (XQuery).

IP: All this, and still you've been at it for over four years!

DF: [laughs] It's a good start, but they're still not powerful enough to fulfill the goal I described earlier. XQuery is only a read-only language — there's not yet a standard update mechanism, but there are many good ideas for this and it will be one of the first tasks we tackle as soon as XQuery 1.0 is out the door. Also, we need to develop the programming language or control flow capabilities of XQuery in a similar manner to the way stored procedures extend the simple "select-from-where" in the relational world.

These features would enable developers to describe the end-to-end logic of a Web service (or as much of it as possible) using a single paradigm and a single framework. Such a programming language, specially designed for XML and Web Services — and that would include native support for event based programming, parallelism and asyncronicity, and be built as an extension of a declarative kernel like XQuery —would make Web services programming much simpler, and would offer much better opportunities for automatic optimization.

Together with my colleague Donald Kossmann and his students we even built a prototype of such a system, and is working great. If people are interested, more information could be found at: http://xl.informatik.uni-heidelberg.de.

Building such a system convinced me that this is a path definitively worth exploring. But I have become aware that not everyone agrees with this: at the CIDR conference I mentioned earlier, I received the award for the "idea the world is least ready for". The statue that came with the award is sitting now on my desk [laughs].

IP: How many legs does it have? [laughs] So, XQuery can do all this?

DF: Well, that's my vision at least. But I often find that developers misunderstand, or don't fully appreciate, XQuery's potential — they don't fully understand its capabilities and where it is useful. They just don't have a good idea of where XQuery belongs and what its key benefits are; you know, the big picture.

IP: Coming soon to a W3C recommendation near you? I think it's fair to characterize what you described previously as your long-term vision. What about the short-term utility of XQuery?

DF: Today I see XQuery as a declarative XML-to-XML mapping language, an expression language. And it's XQuery's declarative aspect that is key: in XQuery, the logic of the mapping, or processing, is completely dissociated from the where and how the processing happens. In some sense this makes it very similar to SQL: a query in SQL describes in a declarative fashion the mapping between a set of input tables, and the output table, the result. I think the familiarity of this SQL paradigm can only help speed the adoption of XQuery.

IP: Beyond that, what are XQuery's capabilities?

DF: In practical terms, an XQuery program takes as input several XML information sources, which it then selects, filters, transforms, joins, and aggregates, eventually authoring a new piece of XML information. This is powerful stuff. XQuery can be used as a basic tool in a variety of environments, and for different goals: as a transformation language, as a data integration language, as a generic query language for large document repositories, as a publish/subscribe language specification, etc.

IP: Can you give us examples of use cases?

DF: You know, I was hoping you would ask me that! [laughs]. Ok, briefly: As an XML transformation language, XQuery is a perfect processing language for the middle tier. Of course you could also use XSLT for this, as some do, but I think XQuery will work better. The reason is that XQuery supports typed data (which will be supported only in XSLT 2.0), but more importantly, XQuery plans to add support for update capability, whereas XSLT 2.0 plans to remain a pure transformation language. In my experience, people also need to update their data in the middle tier.

IP: Ok. Data integration?

DF: When data comes from multiple sources — relational databases, XML files applications servers, Web services — XML is the natural language for uniformly expressing all this data, and XQuery is the best way to aggregate it. This sort of XQuery usage is fundamental to products like the BEA's Liquid Data and DataDirect XQuery.

And now that so much information is stored in XML, a generic XML Query language is required for accessing or processing large volumes of XML data, in the same way SQL is used to access relational data. That's why Oracle is making huge investments into Oracle10g, extending XML storage and query facilities. This is possibly one of the biggest future uses for XQuery in the long term, but I guess this is also the one that will develop more slowly, yet steadily, as it takes time to convert or wrap all the information into XML, and then build tools to manipulate it.

IP: And what about the benefits? Why should a software engineer be using XQuery?

DF: That's easy — in the short term for the productivity; in the longer term for both the productivity and the performance advantages. Also, I think it's only a matter of time before XQuery becomes a fundamental component in many other standards that need XML processing, like SQL/XML, content management standards, BPEL, and others.

IP: But there's a perception in the developer community of a "performance overhead" when it comes to using XML and XQuery?

DF: Yes - I think that's another big XQuery misconception. Let's summarize what's going on today — we have a large volume of information, and for various reasons, many people want to represent this information as XML. At the same time, many enterprise applications need to be able to process that information at a rate of millions of messages per second. Right now many people are of the mindset that unless you hand-code the XML processing, you'll never be able to get the performance or scalability that is demanded of these applications.

It might be true that today it is hard (not impossible) to find an off-the-shelf product that is capable of efficiently handling a high volume of XML data and executing complex XQueries. However, this situation is about to change — gradually, in the next months, and dramatically over time. We just have to trust the open source implementers and the vendors. Many XQuery based products are being released in 2005 — Oracle10g, DataDirect XQuery — they boast significant performance gains, and I am confident that their performance will only continue to increase over time. Just remember how the performance of the SQL engines, or the performance of the Java virtual machines were the subject of so much criticism in their first years, and how much they improved over time. I expect the same thing will happen with XQuery.

IP: We haven't yet talked about one of the things that makes XQuery so appealing — its logical and physical data independence. This makes possible faster, and more scalable applications.

DF: You're right. The problem with hand-coded XML processing code has been that it was vulnerable to changes in the data, the schema, the hardware — to any kind of change in the environment or even the intended application usage. You were effectively writing throw-away code. Logical/physical data and programs independence is a golden principle in computing: the "what" has to be separate from the "how", and the compiler or the optimizer will take care of figuring out the best implementation strategy. So, in the case of XQuery, the separation of the query and the query processor over time will lead to greater application performance.

IP: What can an XQuery compiler do to increase the performance?

DF: Oh, lots! Automatic data streaming and memory minimization, data partitioning and code parallelization, data access pattern selection, code rewriting and simplification based on structural constraints and integrity constraints, and so on. Those techniques are very well understood in the SQL context; they just need to be adapted to the reality of XML and XQuery. Of course, I expect that many original optimization techniques will be developed for XQuery or XSLT, too. The ones I cite above are only the ones we know today from the state of the art in relational optimization and traditional compilation.

IP: How does a database developer get started working with XQuery? What about all those complaints about XQuery being too complicated?

DF: Yes, XQuery is perceived as being a complex language. But the reality is that the problems it tries to solve are inherently difficult, and to manage them requires the creation of a broader information querying and processing language than any that exists today. We need to handle simultaneously the dual aspect of information: lexical and binary, and this is hard. Moreover, we need to handle a spectrum of information that runs from purely textual to completely structured (and everything in between). We need to handle data with and without schemas, and the language needs to work on any data, in so many different use cases.

However, we shouldn't exaggerate those complexities either. Any new technology requires some time to get acquainted. After gaining some familiarity with the language, people usually start to enjoy it a lot, especially when they realize what they can do with it. I did teach XQuery to students, and they love it.

As to how to get started with XQuery, I recommend downloading one implementation and start playing around with it. One suggestion would of course be Stylus Studio®.

IP: I'd like to note that your last comment was given freely. [laughs]

DF: I am nothing if not impartial. Honestly, though, people seem to grasp XQuery concepts much faster when they have a tool that lets you say, specify the data input sources, edit and debug an XQuery expression, then execute the query and examine the results. Your XQuery editor does all that, using a GUI that is intuitive, but not a toy — it lets you do real work, and learn while you do it. Stylus Studio®'s XML, XPath and XML Schema editors are a big help, too, since XQuery leverages those technologies.

IP: Well, thanks for that! Readers interested in learning more about Stylus Studio® can visit our Web site. Given this discussion, these pages in particular might be of interest:

DF: Commercial interruption?

IP: We have to pay the bills ... So, Daniela, what are you setting your sights on when you wrap up the XQuery project?

DF: Well, mapping the human genome is on my short list. And then immediately after that, real estate on the Amalfi coast sounds just perfect.

IP: [laughs] Daniela, thank you so much for your time and insights. We at Stylus Studio® are very eager to see XQuery get into full flower.

DF: Spring isn't far off...


Editor's Note: If you liked this interview, subscribe to The Stylus Scoop! (our bi-monthly XML developer newsletter), or email this article to a friend! Learn more about Stylus Studio®'s XQuery tools or join the XQuery discussion group today!.

PURCHASE STYLUS STUDIO ONLINE TODAY!!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Try Stylus

Download a free trial of Stylus Studio® today!

Learn XQuery in 10 Minutes!

Say goodbye to 10-minute abs, and say Hello to "Learn XQuery in Ten Minutes!", the world's fastest and easiest XQuery primer, now available for free!

Ask Someone You Know

Does your company use Stylus Studio? Do your competitors? Engineers from over 100,000 leading companies use Stylus Studio, and now you can ask someone from your own organization about their experiences using Stylus Studio.

 
Free Stylus Studio XML Training: