[Yanel-dev] [Fwd: The rationale behind the Abstract Query Model [was: Xpath deprecated] apache]

Mon Jul 23 09:33:31 CEST 2007

Hi

This might be interesting for Yanel as well and in particular for Yarep.

Cheers

Michi

-------- Original Message --------
Subject: 	The rationale behind the Abstract Query Model [was: Xpath 
deprecated] apache
Date: 	Sun, 22 Jul 2007 16:49:09 +0200
From: 	David Nuescheler <david at day.com>
Reply-To: 	dev at jackrabbit.apache.org
To: 	dev at jackrabbit.apache.org

Hi All,

I would like to try to make the argument for the AQM and explain
why it is not about reinventing the wheel.

I personally hate long emails, so please let me apologize. If you are
interested in the topic though, I can guarantee that this will certainly
save you some time future discussions.

In the JSR-283 Expert Group we have representation of just about
every large content repository / content management vendor that is
active in the java space.

Most of these vendors have been in the content repository space for
decades and have a very good understanding of what their existing
customers requirements are. They have very large install bases and
the infrastructure they provide should be considered a significant
long-term investment.

When it comes to the "Query Facility" in a content repository
the members of the Expert Group came to the conclusion that
there is a certain set of functionality that we can support
across many actual real-life repository implementations.
And this is what should be mandated by the specification.

It important to understand, that this "feature set" is well negotiated.
There is not a lot of wiggle room from a functionality standpoint
due to the existing implementations.

I believe that the members of the JSR-283 expert group are the
right people to judge what is implementable in a reasonable
timeframe in their respective content repositories.

To illustrate the simplified query functionality landscape how
I see it, I tried to explain it visually:
http://www.day.com/o.file/aqm1.png?get=08c5075f4f07b12ae1a9269044658cc1

As mentioned above I think the question of how large the "black circle"
should be, to still allow for a reasonable adoption of a standard,
is something that is should not subject to this discussion here.

I guess we would all agree that in a perfect world we would all love
to expose the most feature rich query interface, which I agree would
probably (as of today) be a full XQuery implementation.

If we go back to the real world, we are still stuck with the problem
from a specification perspective, how to describe the "black circle"
in the most precise, clear and concise way.

The specification needs to allow a repository vendor to know exactly
what features they need to expose to comply with specification and
also what a user can expect from a query perspective from a
content repository.

In JCR v1.0 (aka JSR-170) we decided to use a subtractive way of
specifying the query feature set.
I tried to visualize that in the following chart.
http://www.day.com/o.file/aqm2.png?get=e0532f1c6f2e6ca93ed9bf713eb3b6fe

So we specified XPath 2.0 as the basis and then tried to identify
everything that cannot be mandated to a content repository based
on real-world implementations.
Which turned out to be a lot.

On the other hand we found ourselves in the situation that features
like full text search, full text syntax, ranking, ordering,
projections and so on
where not standardized by Xpath 2.0.
So we ended up specifying a fuzzy subset plus sizable additions.

Please keep in mind that defining the feature subset is not
trivial since many repositories have for example limitations
on path queries.

Based on this experience we needed to look for something that
was extensible for the future and would allow us describe in
a much crisper way how the "query facility" works.
To get a clear picture of what the content repository needs to be
able to provide we defined the Abstract Query Model.

I am convinced that the Abstract Query Model provides a very
clear and concise description of the query facilities of a
content repository and therefore personally am a big supporter
of specifying it this way.

Since JCR always intended, to leave it up to the repository
user what query language they preferred, we already
introduced in JCR v1.0 a mechanism that allowed to extend
support for query languages on a repository basis.

It is evident that there is no single query language that
is the best suited for all types of queries for all use cases.

In JSR-283 we enhanced this extensibility. Instead of just
allowing the content repository to expose multiple query
languages, we now even allow the developer to use different
query language parsers that are content repository vendor
independent.

http://www.day.com/o.file/aqm3.png?get=97e829a34161d813fbd2fe3c94f9ec01

I am sure that there are a number of discussions regarding
query, and I would like to make sure that we do not confuse
some of the issues at hand.

Please consider the following three "thread topics" as
a new subject for your post, when commenting:

---
(1) JCR should offer more powerful query facilities:

"I would like the content repository vendors to agree
on broader functionality so it maps better to something
like XQuery."

[ this should be addressed or cc'd to jsr-283-comments at jcp.org
I would like to inform you though that we had this discussion
pretty much for the last 6 years in the expert group and the
current consensus is well negotiated. ]

---
(2) Jackrabbit should offer more powerful

"I need better XQuery or Xpath or JOIN or fulltext support in Jackrabbit"

[ this should be addressed to dev at jackrabbit.apache.org and I am
convinced that we will be eager to learn about your usecases and
find out how we can address them. ]

---
(3) The "black circle" can be specified in a simpler fashion

"I think I can come up with a shorter way that is easier to
understand to express the exact feature set agreed upon by the
vendors and specified by the AQM"

[ this should be addressed or cc'd to jsr-283-comments at jcp.org and
I can guarantee that we will be thrilled to read your proposal and
have you as a future member of the expert group ;) ]

---

Please feel free to comment and let me know your thoughts.

regards,
david

-- 
Michael Wechner
Wyona      -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
michael.wechner at wyona.com, michi at apache.org
+41 44 272 91 61