This project has retired. For details please refer to its Attic page.
Apache Chemistry - Query Integration
     
   
 

OpenCMIS Query Integration

The CMIS standard contains a powerful query language that supports full text and relational metadata query capabilities and is modeled along a subset of SQL. Many repositories will have the demand to integrate into this query interface. OpenCMIS provides support to make a query integration easier. This article explains the various hooks that are provided to integrate into the query interface. These hooks provide different levels of comfort and flexibility. OpenCMIS integrates a query parser that uses ANTLR as parsing engine. However there is no strong dependency on ANTLR. If you prefer a different language parsing tool it is possible to do this.

There are four different levels how you can integrate query:

  1. Implement query in the discovery service
  2. Use the built-in ANTLR and ANTLR CMISQL grammar
  3. Use OpenCMIS CMISQL grammar and integrate into ANTLR query walker
  4. Use predefined query walker and integrate into interface PredicateWalker.

Implement query in the discovery service

The first way is to implement the query() method like any other service method on your own. This gives you the maximum flexibility including using a parser tool of your choice and extensions of the query grammar as you like. This is also the method with the highest implementation effort.

Use built-in ANTLR and ANTLR CMISQL grammar

OpenCMIS comes with a build-in integration of ANTLR and provides a grammar file for CMISQL. You can reuse this grammar file, modify or extend it and integrate query by using the ANTLR mechanisms for parsing and walking the abstract syntax tree. Please refer to the ANTLR documentation for further information. This is the right level to use if you need custom parser tree transformations or would like to extend the grammar with your own constructs. For demonstration purposes OpenCMIS provides an extended grammar as an example.

Use OpenCMIS CMSIQL grammar and integrate into ANTLR query walker

If the standard CMISQL grammar is sufficient for you there is another level of integration. For many repositories there are common tasks for processing queries: The columns of the select part need to be evaluated and mapped to type and property definitions. The from area needs to be mapped to type definitions and some parts of the where part again refer to properties in types. In addition all aliases defined in the statement need to be resolved and many validations are performed. OpenCMIS provides a class that performs these common tasks. You can make use of the resolved types, properties and aliases and walk the resulting abstract syntax tree (AST) to evaluate the query. You are free to walk the AST as many times as you need and in the order you prefer. The basic idea is that the SELECT and FROM parts are processed by OpenCMIS and you are responsible for the WHERE part. The InMemory server provides an example for this level of integration: For each object contained in the repository the tree is traversed and it's checked if it matches the current query. You can take the InMemory code as an example if you decide to use this integration level.

Use predefined query walker

For some repositories a simple and one-pass query traversal is sufficient. This can be the case if for example your query needs to be translated to a SQL query statement. Because ANTLR has some complexity OpenCMIS provides a predefined walker that performs a simple one pass depth-first traversal. If this is sufficient this interface hides most of the complexity of ANTLR. All you have to do is to implement a Java interface (PredicateWalker). You can refer to the InMemory server for example code (InMemoryWhereClauseWalker).

AbstractPredicateWalker implements interface PredicateWalker and implements common functionality useful for traversing the tree. For example parsing literals like "abc", -123 to Java objects like String and Integer is handled there.

If the interface of the predefined walker PredicateWalker does not fit your needs you can define your own interface. The code generated by ANTLR does not make any assumptions how you design the walking of your tree. The only dependency is contained in the interface PredicateWalkerBase consisting of a single method. If you start defining your own walker you have to implement or extend PredicateWalkerBase. The unit tests contain an example for this. See class QueryConditionProcessor in the unit tests for the InMemory server.

Note: There is currently no predefined walker for JOIN statements. If you need to support JOINs you have to build your own walker for this part as outlined in the previous section.

Using QueryObject

The class QueryObject provides all the basic functionality for resolving types and properties and performs common validation tasks. The QueryObject processes the SELECT and FROM parts as well as all property references from the WHERE part. It maintains a list of Java objects and an interface that you can use to access the property and type definitions given your current position in the statement. For an example refer to the class StoreManagerImpl of the InMemory Server and method query(). To be able to use this object QueryObj needs to get access to the types contained in your repository. For this purpose you need to pass an interface to a TypeManager as input parameter. Your code will typically look like this:

	public class MyWalker extends AbstractPredicateWalker {
                             // extends AbstractPredicateWalker
                             // or implements interface PredicateWalker
							 // or implements interface PredicateWalkerBase
	  // . . .
	}

    TypeManager tm = new MyTypeManager(); // implements interface TypeManager
    MyWalker myWalker = new MyWalker();    
    queryObj = new QueryObject(tm);
    QueryUtil queryUtil = new QueryUtil();

    CmisQueryWalker queryProcessor = queryUtil.traverseStatementAndCatchExc(statement, queryObj, myWalker);

queryUtil then will process the statement and call the interface methods of your walker (Note: This code is in opencmis, you don't have to implement it yourself.):

    try {
        walker = getWalker(statement);
        walker.query(queryObj, pw);
        return walker; 
	} catch (RecognitionException e) {
		String errorMsg = queryObj.getErrorMessage();
		throw new CmisInvalidArgumentException("Walking of statement failed with RecognitionException error: \n   " + errorMsg);
	} catch (CmisBaseException e) {
		throw e;
	} catch (Exception e) {
		throw new CmisInvalidArgumentException("Walking of statement failed with exception: \n   " + e);
    }

After this method returns you may for example ask your walker object myWalker for the generated SQL string.

Processing a node and referencing types and properties

While traversing the tree you often will need to access the property and type definitions that are referenced in the where clause. The QueryObject provides the necessary information for resolving the references. For example the statement

`... WHERE x < 123`

will result in calling the method walkLessThan() in your walker callback implementation:

    public Boolean walkLessThan(Tree ltNode, Tree leftNode, Tree rightNode) {
    
        Object rVal = walkLiteral(rightChild);
        ColumnReference colRef;
    
        CmisSelector sel = queryObj.getColumnReference(columnNode
			     .getTokenStartIndex());
    
        if (null == sel)
           throw new CmisInvalidArgumentException("Unknown property query name " +
		          columnNode.getChild(0));
        else if (sel instanceof ColumnReference)
           colRef = (ColumnReference) sel;
    
       TypeDefinition td = colRef.getTypeDefinition();
       PropertyDefinition pd =
           td.getPropertyDefinitions().get(colRef.getPropertyId());
        
       // process the statement, for example append it to a WHERE
       // in your generated SQL statement.
    }

The right child node is a literal and you will get an Integer object with value 123. The left node is a reference to a property and getColumnReference() will either give you a function (currently the only supported function is SCORE()) or a reference to a property in a type of your type system. The query object maintains several maps to resolve references. The key to the map is always the token index in the incoming token stream (an integer value). You can get the token index for each node by calling getTokenStartIndex() on the node.

Building the result list

After processing the query an ObjectList has to be returned containing the requested properties and function results. You can ask the query object for the requested information:

    Map props = queryObj.getRequestedProperties();
    Map funcs = queryObj.getRequestedFuncs();

Key of the map is the query name and value is the alias if an alias was used in the statement or the query name otherwise.

Limitations

Currently the query parser does not include the full text search part of the grammar. Support for JOIN is limited. This will be enhanced in a future version