A pythonic query language

This package provides tools to implement query languages in python. The query language is based on Python’s generator expression. A query in this language looks like this:

>>> from xotl.ql.core import get_query_object, this

>>> query = get_query_object(
...    child
...    for parent in this
...    if parent.children and parent.age > 32
...    for child in parent.children
...    if child.age < 6
... )

The result of the get_query_object callable is a query object that “describes” at the syntactical level the query expression above.

What’s new in 0.3.1.dev20170210?

Core Contents:

Overview

The main goal of xotl.ql is to provide tools that allow writing queries in a pythonic way. In this regard, xotl.ql has a similar outlook that LINQ queries have in C#.

A query expression takes the form of a Python generator expression:

>>> from xotl.ql import this
>>> parents = (parent for parent in this if len(parent.children) > 2)

The previous query is readable as it stands: get all the parents that have more than 2 children.

More complex queries are allowed, for instance:

>>> parents = (parent
...            for parent in this
...            if parent.children
...            if all(child.age > 10 for child in parent.children))

This would retrieve every “parent” whose children are all more than 10 years old (assuming age is measured in years).

Notice, however, that those interpretations of the queries match only our intuitions about them, and xotl.ql does not enforce any particular meaning to the query. xotl.ql is all about writing queries having this particular syntactical look.

The role of the query language and query translators

In the previous introduction, we have shown how the syntax of the query language looks, and we have indicated the intended meaning of the constructions. However, xotl.ql does not enforce any particular interpretation on the queries since the whole meaning of queries depends on the semantics of the object model in place. For instance, given a data model that honors transitive relations such as “is (physically) located in” between places; if you have that “B is located in A” and that “C is located in B”, then querying for every place that is located in A, should return both B and C.

One might encode such a query in a program like the following:

>>> locations = (place for place in this if place in A)

It’s expected that such a query will look up in the all the containment tree derived form the located-in relation, to fetch all places which are inside A either directly or indirectly.

In this model, the use of in A would imply a recursive computation; and such knowledge comes only from the object/store model and not the query language by itself. Other models (for instance the relational model) might not find more than directly related objects.

That’s why in order to execute queries one must use a query translator with enough knowledge of the object model and of the system configuration (specially how to communicate with the storage system).

xotl.ql won’t provide production quality translators. Instead, other packages will be released that implement translators and assist their configuration into other frameworks. Nevertheless the module xotl.ql.translation.py does contains an implementation of a translator that fetches objects from the Python’s memory. And we also provide utilities for translation in xotl.ql.translation.

Retrieving objects

Assuming you have a translator, like py, you may simply pass a query to it to obtain a query execution plan:

>>> from xotl.ql.translation import py
>>> query = py(parent for parent in this)

Query execution plans are iterable:

>>> for which in query:          
...    print(which)

A plan is required to be reusable, so that you may run the same query more than once and avoiding the translation phase. This does not means that you will always get the same results from the reused execution plan, since the underlying data source might have changed.

See the document about translators for more information.

Query expressions v. query objects

So far we have seen how queries are expressed in our code. A query as the python expression we see in our code (or as the generator object it implies) is more precisely referred as a query expression.

On the other hand, translators need a data structure that describes the query. Since we can’t actually provide translators with the query expression (what we see is a Python generator object), we need another object that precisely capture the query. This is the query object. In many cases, the distinction between those terms is not important but for internal documents is very important. Translators will mostly deal with query objects. Getting a query object from a query expression is what xotl.ql is supposed to do well.

The function xotl.ql.core.get_query_object() takes a query expression (i.e a generator object) and return a query object.

Translating query objects

Translation is the process that transforms a query object into a form feasible for execution in a data store and/or object model. The result of translating a query is a query execution plan.

Query translators are the components responsible for translation. xotl.ql does not provide production quality translators, but other packages are planned to have implementation of translators. Nevertheless, the module xotl.ql.translation.py provides an implementation of a naive translator that matches the Python object model and fetches objects from the current process memory.

General requirements about translators

Take query expressions and query objects

Translators are required to take either a query expression or a query object. For this, translators could use xotl.ql.core.normalize_query().

Re-usability of execution plans

Translators must allow the reuse of execution plans; i.e. once a query is translated you may execute the plan several times to fetch the objects that matches the query at the time the plan is executed.

This way you may use the translator only once per query and then reuse plan for several calls to retrieve objects.

This does not mean that each time you execute the plan it will return the same objects since the underlying data storage system might have been changed between calls.

This does not imply that plans should be cached either. If the translation process is executed many times for the same query, translators may return different (although equivalent) plans each time.

Documentation requirements

Translators authors are encouraged to provide as much documentation as necessary, so that application writers have the guidance they need for writing queries.

The following information is required in order for a translator documentation be complete:

  • A list of the supported expression operations. For instance, a translator might reject operations involving max() or min() or allow them only in some cases.
  • A list of additionally supported operations, and their related documentation. Package that implements translators may provide non-standards functions (e.g, a coalesce function available for SQL translators).
  • Documentation of functions, classes applications writers may use to access the translator functionality directly if they have to.

See also

The documentation for the module xotl.ql.translation.py as an example.

Writing translators

Writing a translator is much like writing a compiler. You get a query syntax tree and you need to produce a program that operates according a well established semantics. In this regard, writing a translator involves giving denotational and/or operational semantics for the queries.

In this section we provide some pointers to this task, but fail to be comprehensive.

xotl.ql.core – The core API

The module xotl.ql.core provide the high level API for obtaining a query object from a query expression.

xotl.ql.core.this

This is an object whose meaning is the entire universe of objects when used as a generator inside a query expression. Its precise semantics varies with the object model. The documentation of query translators must give the precise meaning of this object.

xotl.ql.core.get_query_object(generator, query_type='xotl.ql.core.QueryObject', frame_type=None, **kwargs)

Get the query object from a query expression.

This function expects a query expression in the form of a generator object and returns an object that complies with the interface xotl.ql.interfaces.QueryObject.

Parameters:

This function works by inspecting the byte-code of the generator object to obtain the Query Syntax Tree. This function uses the attribute gi_frame of the generator to build the frame object needed by query objects.

Nested sub-queries are not expanded automatically:

>>> from xotl.ql.core import this, get_query_object
>>> query = get_query_object(y for y in (x for x in this))

>>> print(query.qst)
<ast: Expression>
   body: <ast: GeneratorExp>
      elt: <ast: Name>
         id: 'y'
         ctx: <ast: Load>
      generators[0]: <ast: comprehension>
         target: <ast: Name>
            id: 'y'
            ctx: <ast: Store>
         iter: <ast: Name>
            id: '.0'
            ctx: <ast: Load>

The sub-query (x for x in this) is simply encoded as a variable ‘.0’.

If no frame_type is provided, use the attribute frame_type of the query object type.

Additional keyword arguments are passed unchanged when instantiating the query object.

xotl.ql.core.normalize_query(which, **kwargs)

Ensure a query object.

If which is a query expression (more precisely a generator object) it is passed to get_query_object() along with all keyword arguments.

If which is not a query expression it must be a query object, other types are a TypeError.

class xotl.ql.core.QueryObject(qst, frame, **kwargs)

A query object implementation.

Instances of this class implement the interface xotl.ql.interfaces.QueryObject and this class itself complies with xotl.ql.interfaces.QueryObjectType.

class xotl.ql.core.Frame(locals, globals)

Instances of this class implement the interface xotl.ql.interfaces.Frame and the class itself complies with xotl.ql.interface.FrameType.

The f_locals and f_globals are immutable mapping views that support all the collections.Mapping interface.

In order to support for the view concept to work we keep a references to the original locals and globals.

Additional attributes and methods:

auto_expand_subqueries

When trying to get the name ‘.0’ from either view, if the current value is a generator object obtained via a generator expression, we actually return the result of calling get_query_object() on the current value.

You may suppress this behavior by setting this attribute to False. The default is True.

Warning

Notice this will use the default query object type and frame type.

Example:

>>> from xotl.ql.core import this, get_query_object
>>> query = get_query_object(y for y in (x for x in this))

>>> query.locals['.0']  
<xotl.ql.core.QueryObject...>
xotl.ql.core.thesefy(target, make_subquery=True)

Allow an object to participate in queries.

Example as a wrapper:

class People(object):
    # ...
    pass

query = (who for who in thesefy(People))

Example as a decorator:

@thesefy
class People(object):
    pass

query = (who for who in People)

If target already support the iterable protocol (i.e implement __iter__), return it unchanged.

If make_subquery is True, then the query shown above will be equivalent to:

query = (who for who in (x for x in this if isinstance(x, People)))

If make_subquery is False, thesefy injects an __iter__() that simply returns the same object and a next() method that immediately stops the iteration.

Notice that in order to use make_subquery you call thesefy() as a decorator-returning function:

class Person(object):
    pass

query = (x for x in thesefy(make_subquery=False)(Person))

# or simply as a decorator

@thesefy(make_subquery=False)
class Person(object):
    pass

xotl.ql.interfaces – The core interfaces

Interfaces that describe the major types used in the Query Language API, and some internal interfaces as well.

Notice that we only aim for documentation of the types and not implementation.

We also explicitly divide the type of the objects from the type of object constructors. Given that types in Python are callable the type of object constructors can be provided via the __init__ method.

interface xotl.ql.interfaces.QueryObject

The required API-level interface for query objects.

Query objects provide access to the QST for the query.

qst

The Query Syntax Tree

locals

A MappingView for the locals in the query scope. See get_name()

globals

A MappingView for the globals in the query scope. See get_name()

get_name(name, only_globals=False)

Give the value for the name.

Queries are defined in a scope where they could access any name (e.g. variables). The translator may need to access the value of such names.

Get name will prefer locals over globals unless only_globals is True.

interface xotl.ql.interfaces.QueryObjectType

A QueryObject factory.

frame_type

Either a FrameType object or the fully qualified name of such an object. This object is used

__call__(qst, frame)

Return an instance of a QueryObject.

Parameters:

Different implementations of the QueryObject may required or support additional keyword arguments. For instance, the type of a PartionableQueryObject may allow for a partition argument.

interface xotl.ql.interfaces.Frame

A object that represents a Python stack frame.

This is an unavoidable requirement consequence of the query expression being a generator object that may access names which are not known to the translator.

The attributes f_locals and f_globals are required to be mappings (usually immutable) or mapping views that give access to the values of locals and globals of a stack frame.

f_locals

A mapping of the locals of this frame. Though not required this could be a mapping view provided it has the mapping interface.

f_globals

A mapping of the globals of this frame. Though not required this could be a mapping view provided it has the mapping interface.

interface xotl.ql.interfaces.FrameType

A Frame factory.

__call__(locals, globals)

Return a instance of a Frame object.

Parameters:
  • locals – A mapping that provide access to locals.
  • globals – A mapping that provide access to globals.
interface xotl.ql.interfaces.QueryTranslator

A query translator.

Note

Since Python classes are callable, you may implement a translator/execution plan in a single class:

>>> class ExecutionPlan(object):
...     def __init__(self, query, **kwargs):
...         pass
...
...     def __call__(self, **options):
...         pass
...
...     def __iter__(self):
...         return self()

However this may hurt some extensions. For instance, below we describe a couple of possible extensions for translators and plans which are not easily implemented in a single unit of code.

__call__(query)

Return an execution plan for the given query.

Parameters:query – The query to be translated. Translators must allow this object to be either a query expression or a query object that complies with the interface QueryObject.

Translators are allowed to provide other keyword-only arguments. Translators’ authors are encouraged to properly document those arguments.

Returns:The query execution plan.
Return type:QueryExecutionPlan
interface xotl.ql.interfaces.QueryExecutionPlan

Required API-level interface for a query execution plan.

query

The original query object this plan was built from. Even if the translator was given a query expression directly, like in most of our examples, this must be a query object.

__call__()

Execution plans are callable.

Return an iterator. The returned iterator must produce the objects retrieved from the query. Though the plan itself is reusable and can be called several times, the iterator obtained from this method will be exhausted.

Translators are required to properly document the optional keyword arguments. Positional arguments are not allowed. All arguments must be optional.

__iter__()

Execution plans are iterable.

This is exactly the same as calling the plan without any arguments: plan().

interface xotl.ql.interfaces.QueryTranslatorExplainExtension

Extends: xotl.ql.interfaces.QueryTranslator

This interface documents optional-methods for query translators that are deemed required to provide interactive access to the translator.

explain(query)

Prints information about how the query might be translated/executed.

The signature should allow the same arguments as the __call__ method of translators.

interface xotl.ql.interfaces.QueryExecutionPlanExplainExtension

Extends: xotl.ql.interfaces.QueryExecutionPlan

explain()

Prints information about this plan of execution.

The details of the information are specific to the kind of plan.

Terms and glossary

AST
Abstract Syntax Tree

A tree structure that represents a program in source form as a tree of syntactical elements; but removes several too concrete elements of the syntax; for instance in AST sentences separator are often removed and a subtree for each individual sentence appears.

See more on http://en.wikipedia.org/Abstract_Syntax_Tree

byte-code

Refers to the low-level code into which the Python interpreter compiles the source code.

For instance, given the query expression (generator object):

>>> from xotl.ql.core import this
>>> query = (parent for parent in this)

The byte code of the generator object is (in Python 2.7):

|\x00\x00]\x0b\x00}\x01\x00|\x01\x00V\x01q\x03\x00d\x00\x00S

Often the byte-code is shown in an expanded form that eases the task of reading it. The module dis prints this expanded form:

>>> import dis
>>> dis.dis(query.gi_code)                          
2           0 LOAD_FAST                0 (.0)
      >>    3 FOR_ITER                11 (to 17)
            6 STORE_FAST               1 (parent)
            9 LOAD_FAST                1 (parent)
           12 YIELD_VALUE
           13 POP_TOP
           14 JUMP_ABSOLUTE            3
      >>   17 LOAD_CONST               0 (None)
           20 RETURN_VALUE
object model

An object model is an object-oriented model which describes how objects may be and how they may relate to each other.

This include relational model; in such a model an object is a single collection of named scalars that belongs to a single entity. Relations are just foreign-keys, and the semantics associated with relations is that of referential integrity.

A relational database is a kind of storage that uses the relational model as is object model (usually with some variations).

xotl.ql does not provides an API for expressing object models, but it assumes that a translator exists which has enough knowledge to deal which such an object model.

Todo

Wouldn’t the semantics of an object model be captured by category theory?

The authors of [coSQL2011] point that this is possible; but I’ve not study that much yet ;)

QST
Query Syntax Tree

A type of abstract syntax tree. It describes the syntactical structure of a query expression.

Since the introduction of the revenge module that uses compiler techniques to reverse engineer the Python byte-code, the term AST was being used both as inner structure and as the main structure used by query translators. To disambiguate, the QST term specifically describes the AST that xotl.ql produces as its output; whereas AST is a more generic term that covers all AST structures, but most of the time will refer to intermediate structures.

query

The term query is used in this documentation with two meanings that depend on the context:

  1. The generator expression as seen in the code that express what is intended to fetch from the storage(s).

    In the most part of this documentation the term query will refer to this sense of the word. However, to disambiguate we’ll use the term query expression to refer to this sense of the word if needed.

  2. The (internal) data structure that represents the query (as in item a) to the program.

    We prefer the term query object for this sense of the word, but sometimes it just does not matter.

query execution plan

When a query object is processed by a query translator it produces an execution plan. Such a plan is a sort of a compiled form of the query.

The execution plan should include instructions to retrieve the objects expected. An execution plan may be as simple as:

just execute the SQL query SELECT * FROM sometable [WHERE ... ] [ORDER BY ...] [OFFSET ...] against the default relational database;

then, return an iterator for instances of those objects created by the factory class SomeModel.

Or it can be one that checks an index stored in a SQL database, but fetches objects from a remote system through REST interface.

In the API documentation this means any object that complies with the interface xotl.ql.interfaces.QueryExecutionPlan.

query expression
This term is used solely to distinguish a query as the construction expressed in the (Python) language from the internal data structure (query object).
query object

This term is used solely to distinguish a query as an internal data structure in contrast to the language construction (i.e the first meaning for the term query) that implies such a structure.

In the API documentation this means any object that complies with the interface xotl.ql.interfaces.QueryObject.

query translator
translator

In the general design a query translator is a component that receives a query object and produces a query execution plan. The query execution plan depends on the translator for it encompasses the knowledge about both the object model and the object storage. A CouchDB translator, for instance may simply translate the whole query to a CouchDB view and return a plan that just involves quering that view.

In the API documentation this means any object that complies with the interface xotl.ql.interfaces.QueryTranslator.

reverse engineering

Refers to either the (intellectual) activities, processes, and techniques to obtain the original Python source code given a byte-code string.

Depending on the compiler this is not always possible or it may result in a code that is not 100% identical to the original but that would produce the same byte-code as the original. For instance the following two query expressions produce the same byte-code:

>>> g1 = (parent
...      for parent in this
...      if parent.age > 1
...      if parent.children)

>>> g2 = (parent
...      for parent in this
...      if parent.age > 1 and parent.children)

>>> g1.gi_code.co_code  == g2.gi_code.co_code
True
storage
object storage

A software component that allows to “persists” objects. Most of the time the storage relates to a single object model. For instance relational databases use the relational model.

In general, a storage is a place from which one could draw objects from. We may then, relax the “persistence” requirement from a component to be considered a storage. For instance, a memcached server could be considered a key-value storage, that a query translator might target.

thread-local object
A thread-local object is an instance of the threading.local class. An instance of this class acts like a global variable, but it holds values local to a given thread; so, each thread has its own “global” variable. Please refer to Python’s documentation for more information.
transformation
Is the process of modifying a query object into another one.

Changelog

Unreleased. Release 0.3.0

Release 0.2.1

  • Updates to the latest xoutil release that introduces changes in xoutil.context API.

  • A lots of fixes to the xotl.ql.translation.py module. The core translation algorithm is now reasonably tested.

    The sub-query interpretation for functions like all_ and other like it, is now implemented and partially tested.

    We have also introduced a class-level protocol for instances so that the search space for objects be reduced in the hope of making this translator usable for one-user-only, short-lived applications.

Release 0.2.0

  • Another round of redesign has been completed: The old and clunky QueryPart concept was removed, now just expressions, bound terms, and generator tokens <generator token> are the needed.

    However a new protocol was introduced.

  • Compatible with Python 3.2 out-of-the-box, no need to use the 2to3 script.

  • Hooray! We have now a test-bed translator partially implemented. It’s quite new and under-tested and sub-queries functions (like xotl.ql.expressions.all_) are not yet translated.

    Although PyPy is not fully supported, it passes all tests of the core language, but fails in the translation. Nevertheless the xotl.ql.translation.py is not meant to be used in production.

2012/12/20 - Release 0.1.8

  • Fixed a bug discovered while cleaning up the implementation. Arguments for N_ARITY functions where not being properly handled.

    This was fixed actually by simplifying xotl.ql.core.QueryPart to implement the target protocol to extract is expressions.

  • Improves and updates documentation.

  • Provides a “wish list” for future releases in Next releases goals.

2012/12/18 - Release 0.1.7

  • Fixes pending bug that make tests fail randomly. Now this is deemed stable enough!

    Start developing translators!

  • Proposing to release a version 0.2, to mark the current level of maturity.

2012/12/08 - Release 0.1.6

  • Fixes several bugs. But there’s still pending a non-determinancy bug.
  • Improved an explanation of internal details of the current implementation.
  • Starts to comply more closely with PEP8: Use a single blank line to separate class-level definitions (we used 2); use two blank lines separate module-level definitions.

2012/11/05 - Release 0.1.5

  • Huge revamp of design (again). Introduced the metaphor of “particles bubble” to capture the query expression most precisely.

    A draft of the description of “Internal details...” is provided.

2012/10/22 - Release 0.1.4

  • Huge revamp of design. Now I’m proud to say the query language is almost done in its first stage.

    Introduces QueryPart, loads of documentation has been updated. Tests for design are now almost done, etc...

    You’re encourage to try it!

Contributors and acknowlegment

The following are the main contributors so far to this release:

  • Manuel Vázquez has got his hands dirty by programming this bunch of releases.
  • Medardo Rodríquez provided most of the conceptual framework, came with the idea of using comprehension syntax and has mentored the programmer.

Licence: GNU General Public License version 3

Version 3, 29 June 2007

Copyright (c) 2007 Free Software Foundation, Inc. http://fsf.org/

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

Preamble

The GNU General Public License is a free, copyleft license for software and other kinds of works.

The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program -to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things.

To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others.

For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it.

For the developers’ and authors’ protection, the GPL clearly explains that there is no warranty for this free software. For both users’ and authors’ sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions.

Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users’ freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users.

Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free.

The precise terms and conditions for copying, distribution and modification follow.

TERMS AND CONDITIONS

0. Definitions.

“This License” refers to version 3 of the GNU General Public License.

“Copyright” also means copyright-like laws that apply to other kinds of works, such as semiconductor masks.

“The Program” refers to any copyrightable work licensed under this License. Each licensee is addressed as “you”. “Licensees” and “recipients” may be individuals or organizations.

To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a “modified version” of the earlier work or a work “based on” the earlier work.

A “covered work” means either the unmodified Program or a work based on the Program.

To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.

To “convey” a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying.

An interactive user interface displays “Appropriate Legal Notices” to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion.

1. Source Code.

The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work.

A “Standard Interface” means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language.

The “System Libraries” of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A “Major Component”, in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it.

The “Corresponding Source” for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work’s System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work.

The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source.

The Corresponding Source for a work in source code form is that same work.

2. Basic Permissions.

All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.

You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you.

Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary.

4. Conveying Verbatim Copies.

You may convey verbatim copies of the Program’s source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program.

You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee.

5. Conveying Modified Source Versions.

You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:

  1. The work must carry prominent notices stating that you modified it, and giving a relevant date.
  2. The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to “keep intact all notices”.
  3. You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it.
  4. If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so.

A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation’s users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.

6. Conveying Non-Source Forms.

You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:

  1. Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange.
  2. Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge.
  3. Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b.
  4. Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements.
  5. Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d.

A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work.

A “User Product” is either (1) a “consumer product”, which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, “normally used” refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product.

“Installation Information” for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made.

If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM).

The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network.

Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying.

7. Additional Terms.

“Additional permissions” are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions.

When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission.

Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms:

  1. Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or
  2. Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or
  3. Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or
  4. Limiting the use for publicity purposes of names of licensors or authors of the material; or
  5. Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or
  6. Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors.

All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying.

If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms.

Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way.

8. Termination.

You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11).

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10.

9. Acceptance Not Required for Having Copies.

You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so.

10. Automatic Licensing of Downstream Recipients.

Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License.

An “entity transaction” is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party’s predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts.

You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it.

11. Patents.

A “contributor” is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor’s “contributor version”.

A contributor’s “essential patent claims” are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, “control” includes the right to grant patent sublicenses in a manner consistent with the requirements of this License.

Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor’s essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version.

In the following three paragraphs, a “patent license” is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To “grant” such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party.

If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. “Knowingly relying” means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient’s use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid.

If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it.

A patent license is “discriminatory” if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007.

Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law.

12. No Surrender of Others’ Freedom.

If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program.

13. Use with the GNU Affero General Public License.

Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such.

14. Revised Versions of this License.

The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License “or any later version” applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation.

If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy’s public statement of acceptance of a version permanently authorizes you to choose that version for the Program.

Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version.

15. Disclaimer of Warranty.

THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

16. Limitation of Liability.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

17. Interpretation of Sections 15 and 16.

If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee.

END OF TERMS AND CONDITIONS

How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the “copyright” line and a pointer to where the full notice is found.

one line to give the program's name and a brief idea of what it does.
Copyright (C) year name of author

This program is free software: you can redistribute it and/or modify it
under the terms of the :abbr:`GNU (GNU is Not Unix)` General Public
License as published by the Free Software Foundation, either version 3 of
the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the:abbr:`GNU (GNU is Not Unix)`
General Public License for more details.  You should have received a copy
of the :abbr:`GNU (GNU is Not Unix)` General Public License along with
this program.  If not, see http://www.gnu.org/licenses/.

Also add information on how to contact you by electronic and paper mail.

If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode:

program Copyright (C) year name of author

This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'.
This is free software, and you are welcome to redistribute it under
certain conditions; type 'show c' for details.

The hypothetical commands ‘show w’ and ‘show c’ should show the appropriate parts of the General Public License. Of course, your program’s commands might be different; for a GUI interface, you would use an “about box”.

You should also get your employer (if you work as a programmer) or school, if any, to sign a “copyright disclaimer” for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see http://www.gnu.org/licenses/.

The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read http://www.gnu.org/philosophy/why-not-lgpl.html.

Known issues

At release 0.3.0

  • Pypy support is not complete. Expressions like (x for x in this if not p(x) or z(x) or not h(x)) fail to be recognized.

  • `~xotl.ql.core.this`:object: may be hidden from the xotl.ql.revenge machinery by ‘enclosing’ it inside a function:

    ((i, obj) for i, obj in enumerate(this))
    

    This is because the generator only ‘sees’ the result of calling enumerate(this) and cannot reach the this within.

    Our current solution path is to enclose the entire query inside a lambda:

    lambda: ((i, obj) for i, obj in enumerate(this))
    

    Alternatively we may choose to explicitly make the universe an argument:

    lambda this: ((i, obj) for i, obj in enumerate(this))
    

    The second approach will make our query testable in pure Python code, and if we succeed in translation, it will also work in production. However, it will require annotations of the arguments.

Additional documents:

Comparison with Pony_

Date:Tue Apr 30, 2013 (2013-04-30.)
Last Updated:Mon Nov 9, 2015 (2015-11-09)
Document status:
 Final.
Author:Manuel Vázquez Acosta (@mvaled)
Summary:Describes and compares PonyORM and xotl.ql. Proposes to explore bytecode disassembling as way to implement xotl.ql.

Pony_ is an ORM implementation that shares our vision of writing queries using the generator expressions of Python. Their queries look like:

persons = select(p for p in Person if 'o' in p.name)

The project seems to be started in 2006, but I have just discovered today. And I like some of the external features it exposes; like the use of the true logical and and or binary operators; and everything we have had to circumvent in xotl.ql so far.

In this document I would like to describe how Pony_ is different/similar to xotl.ql and how they might influence our future work.

Note

Update 2015-11-07.

I updated my Pony clone today, and realized they’re now targeting both Python 2 and Python 3. So they’re still an interesting project to learn from.

What Pony is that xotl.ql is not

Pony is an ORM

Pony is an ORM; meaning it concerns itself with interacting with a relational database like SQLite, MySQL and PostgreSQL.

xotl.ql does not aim to target any specific object model not even the relational model, and thus it does not aim to target any specific database system. This job is left to query translators to perform.

In this regard xotl.ql does what the pony.decompiling module does; only different.

This is the only true difference in the broader aim that Pony and xotl.ql have. However, they differ a lot in design and implementation.

Pony disassembles the Python bytecode

Python does not make it easy to hook the logical operators and, or and not. There are no protocols for them. Additionally, the protocols for working with the in containment test operator always change the result to a boolean. These rules make it impossible to create a complete expression language using normal Python code.

Pony overcomes this difficulty by inspecting the low-level bytecode the Python interpreter generates for the generator object. This is way they can reconstruct an AST that is semantically equivalent [1] to the original expression.

The xotl.ql way is fully explained in Reverse Engineering using an Earley parser. Basically we tried to keep our implementation abstracted from Python’s implementation details like the bytecode. Full disclosure: although we knew the existence of the bytecode, we did not knew “the CPython bytecode”. Furthermore, we thought it wasn’t needed and probably should be avoided to gain interoperability between Python’s implementations (xotl.ql works the same over Python 2.7 than over Python 3.2).

Perhaps we were wrong. Perhaps what we should have done is study the several bytecodes for our target Python implementations and have implementations for those. Which leads me to:

How does Pony might influence our future work?

Perhaps the most impacting feature we would love to have is to write our queries with true Python operators and not having & for and; and all_ for all() and the like.

This would transform the usage of our expression language for queries; though, the expression languages for predicates would probably suffer. Nevertheless we would still need these, so we might just require predicates to be lambdas (which would be cool actually).

But we would keep our main goal of not being a data-accessing layer. So what would change and what wouldn’t change if we pursue this avenue:

  • We would keep the concept of a query translator. these will always return a (probably changed) xotl.ql.interfaces.IQueryObject with the AST of the query.
  • Syntactical pairing of query expressions and query objects would be lost. However, semantics would be kept.
  • Whether or not the Python ast module is a fit for our query/expression language is still not clear. See UnQL, SQL, and NoSQL (coSQL), specially the [coSQL2011] reference. Probably the Python’s AST serves as an internal intermediary language, but the AST exposed to translators would probably resemble the monadic query language. At this moment I just don’t know.
Next steps

In the next weeks I’ll be doing the following:

  1. Study the Python 2.7 bytecode as explained in dis standard module and other Internet public sources.

    I can use the pony.decompiling as a starting point. See the tweets.

  2. Do the same for Python 3.2 and probably Python 3.3.

  3. Propose a new API in an experimental branch.

Footnotes

[1]

Syntactical equivalence might not possible this way since Python uses the same bytecode for different syntactical constructions.

For example the following generators, which are semantically equivalent (but not syntactically) generate the same bytecode:

this = iter([])
g1 = (parent
      for parent in this
      if parent.age > 1
      if parent.children)

g2 = (parent
      for parent in this
      if parent.age > 1 and parent.children)

Thoughts on Query Languages

This document is not part of the official documentation. You may skip it entirely. It contains topics on UnQL, and coSQL that were taken as inspiration for this work; but the information in here does not have other practical effect in xotl.ql.

This document is rather a thinking pad I use when I need to think about query languages in general and how they apply to xotl.ql.

The goal of xotl.ql is not to provide a full implementation of a query language since that implies executing those queries. At the same time, we can’t restrict our interpretation of the language to exactly match any of the languages we find on the referenced articles. Our goal is to provide a foundation on the grounds of the syntactical layout of comprehensions for others to actually implement a full query language for a particular object store.

UnQL, SQL, and NoSQL (coSQL)

There’s a good article [UnQL] that describe several features of a UnQL (Unstructured Query Language), that are of interest to this module. Another article exposes the relation between NoSQL and SQL, and renames the former as coSQL following the categorical tradition since NoSQL is dual to SQL [coSQL2011] [coSQL2012].

In this section we shall explore those articles and will try to relate them with our work on xotl.ql. First, we’ll give a brief review of the work of Buneman et al. on UnQL. And then, explore the ideas of Meijer and Bierman ideas about NoSQL.

The [UnQL] papers uses an edge-labeled rooted directed graph (although they called labeled tree) to represent the data. In this model all the “real values” [1] are encoded in the labels of the graph. The following figure is extracted from the paper:

_images/unql-data.png

One may read this graph as:

  • It has many “Entries” which may be either “Movies” or “TV Shows”.
  • Following the branch to the left of the tree, it has an Entry, which is a Movie. Such a movie has:
    • A Title, which is “Casablanca”.
    • A Cast, which includes “Bogart”, and “Bacall”.
    • A Director, whose attributes are not shown in the image.

How does one tell whether the label of the edge is an attribute name or value? There’s no such thing as attribute name or attribute value in this setting. One may tell a terminal label because the node it points to has no outgoing edges.

In Python, the object model is more elaborate in this regard, but we can figure it as objects, which has attributes, and those attributes’ values are other objects. This is very similar to the edge-labeled graph; but in Python there’s not such thing as a single root. To overcome this, the method get_objects() from the gc module may be used to get all the objects on the Python’s VM; so it may take the place of the root, the objects returned may be the level one [2].

Although there’s no fixed structured (for the graph), there may types that restrict links to/from objects. For instance, it’s highly unlikely (or bizarre) that there will a third edge “down” the node to which an edge with label “Title” is pointing to; i.e. the following schema is not likely to happen:

* -------> * -------------> * ---------> *
   Title       "Casablanca"      what?

This is unlikely since we don’t expect strings to have attributes [3]. However, there’s nothing in the UnQL paper that limits us to do so but our own common sense.

The following figure shows with color-layers how the movie database may be interpreted:

_images/unql-data-layers.png

The language UnQL uses variable binding and pattern matching. The very first query they offer is the following (I included the braces for better readability):

select t
where {R1 => \t} <- DB

The query select all trees t which are below an edge with label R1 from the root of the DB. If we fix that level 1 labels are actually types this query may be written in xotl.ql like this:

(t for t in this if isinstance(t, R1))

If we don’t make the assumption of level 1 labels being types, then the other option is to assume is an attribute name:

(x.R1 for x in this)

A query with partial selection:

select {Tup => {A => x, B => y}}
where {R1 => Tup => {A =>\x, B => \y, C => 3}} <- DB

Because we can’t do the pattern matching stuff in Python our query is a bit bigger:

({"Tup": {"A": tup.A, "B": tup.B}}
for tup in this
if isinstance(tup, R1) & tup.A & tup.B & (tup.C == 3))

One of the most problematic query they propose is the following:

select {Tup => {Actor => x, Title => y}}
where {Entry => Movie => {Title => \y, Cast => \z}} <- DB,
      {\x => _} <- z ∪ (select u where _ => \u <- z), isstring(x)

Our query would be the union of two queries:

from itertools import chain as union
build_tup = lambda actor, title: {"Tup": {"Actor": actor, "Title": title}}
union((build_tup(actor, movie.title)
       for movie in this
       if is_instance(movie, Movie)
       for actor in movie.cast if is_instance(actor, basestring)),

      (build_tup(actor, movie.title)
       for movie in this if is_instance(movie, Movie)
       for actor_group in movie.cast
       for actor in actor_group if is_instance(actor, basestring))
)

Warning

We’re abusing of our query language here: chain can’t be used directly over the generator expressions.

In [coSQL2011] the authors only focused on key-value stores for noSQL databases. Although they claim that:

While we don’t often think of it this way, the RAM for storing object graphs is actually a key-value store where keys are addresses (l-values) and values are the data stored at some address in memory (r-values). Languages such as C# and Java make no distinction between r-values and l-values, unlike C or C++, where the distinction is explicit. In C, the pointer dereference operator *p retrieves the value stored at address p in the implicit global store.

In fact, this model is quite suitable to represent the labeled tree model of [UnQL]. Notice that the type of the labeled trees is informally described as:

a set of pairs of labels and trees.

We can see that labels may be the keys, and the trees may be encoded as references.

Generator Token

A generator token is related to the <- DB in the UnQL syntax, it’s related to the FROM clause in SQL and LinQ. It represents from where the objects are drawn. SQLAlchemy’s expression language has a similarity with xotl.ql’s Query API, it’s select() function, does not requires an explicit declaration of FROM, because it gathers the table from the SELECT-ed columns.

This is quite similar to the idea of having the expressions in the selection

Footnotes

[1]Of course, the edges (not its labels) carry very important information: from which object such a label is drawn and to what object it points. In this sense the labeled-edge carries all the information, and if the nodes are somehow identified, it carries the same information as the single Triplet in a RDF store.
[2]Since they are all the objects in the VM, we actually get a one-level only tree with edges between the siblings. But we can search for objects of specific types to be the level one objects.
[3]I know, I know... Python’s string do have attribute; but what’s the point in bringing them to this debate?

Reverse Engineering using an Earley parser

The goal of the package xotl.ql.revenge is to transform pieces of byte-code to a QST that is amenable to both transformation and translation.

The package is further explained in:

From byte-code to a query syntax tree

This document describes the core of the revenge package for getting a query synxtax tree from the byte-code of generator expressions. It spans several highly coupled modules:

  • The xotl.ql.revenge.scanners module,
  • the xotl.ql.revenge.parsers module,
  • the xotl.ql.revenge.walkers module,
  • and (parts of) the xotl.ql.revenge.qst module.

Though we have tried to make a linear exposition of the matter, the high degree of coupling between those phases imposes a challenge to justify decisions made in the first layers to accommodate the actions of the later ones.

This package derives from the work of John Aycock and others on the uncompyle2 package. In fact, this document was produced at the same time we were trying to grasp how that package worked and how we could modify it to better suite our needs. This fact has led to a pattern of exposition that may seem awkward at times because we have had to revise our previous assumptions when we analyzed further. Nevertheless, we have striven to remove those issues the best we could.

Since the goal of the original package is much more ambitious than ours we have stripped big parts of the original grammar and the scanner has been almost entirely reimplemented.

uncompyle2 handles the problem of getting Python source from the byte-code (reverse engineering the byte-code) by casting it as a parsing problem. Thus compiler techniques are applied.

The byte-code is passed to a scanner that produces the tokens. Then a parser recognizes if the stream of tokens can be generated by a given grammar, if so, it produces a syntax tree that captures the syntactical structure of the provide stream of tokens. Finally a walker transform the syntax tree into source code by traversing the syntax tree. xotl.ql.revenge differs in the last step where the walker produces a Query Syntax Tree (QST) instead.

In this document we try to expose how those components work together to reverse engineer a byte-code for a valid Python expression to a structure we call Query Syntax Tree.

Module xotl.ql.revenge.scanners

This module provides the Scanner that gets the byte-code and produces a stream of Tokens. Each token represents either an actual byte-code instruction with its arguments or a virtual instruction.

This document begins with an exploration of the byte-code for Python expressions and then describes the objects in the scanners module and how the tokens are produced. At the same time, we give proposals for the grammar that may be used to recognize higher level structures from those tokens.

Let’s inspect how simple expressions are compiled in different versions of Python.

Basic expressions

Let’s call basic expressions to those composed only by literals, variables (attributes, subscript), function calls, and binary and unary operator, excluding conditional and boolean expressions, generator expressions and other types of comprehensions and lambdas – which we’ll discuss later. Yield expressions are also out the question cause they can only occur within the body of a non-lambda function (i.e, not an expression).

The following are all valid basic expressions along with the disassembled byte-code [3]:

  • a

    dis.dis(compile('a', '', 'eval'))
    1           0 LOAD_NAME                0 (a)
                3 RETURN_VALUE
    
  • a + 1:

    dis.dis(compile('a + 1', '', 'eval'))
      1           0 LOAD_NAME                0 (a)
                  3 LOAD_CONST               0 (1)
                  6 BINARY_ADD
                  7 RETURN_VALUE
    
  • a(b[c**e/2:d], *y, **kw).at:

    dis.dis(compile('a(b[c**e/2:d], *y, **kw).at', '', 'eval'))  # doctest: +SKIP
     1           0 LOAD_NAME                0 (a)
                 3 LOAD_NAME                1 (b)
                 6 LOAD_NAME                2 (c)
                 9 LOAD_NAME                3 (e)
                12 BINARY_POWER
                13 LOAD_CONST               0 (2)
                16 BINARY_DIVIDE
                17 LOAD_NAME                4 (d)
                20 SLICE+3
                21 LOAD_NAME                5 (y)
                24 LOAD_NAME                6 (kw)
                27 CALL_FUNCTION_VAR_KW     1
                30 LOAD_ATTR                7 (at)
                33 RETURN_VALUE
    

Note

To understand the output of the dis.dis() function we refer the reader to the dis module of the Python standard library. We’ll assume a basic level of familiarity with it over the rest of the document. Nevertheless, we’ll link to the definition of a byte-code name the first time we use it.

The most important feature of basic expressions is they don’t have any kind of jump in the generated byte-code. The byte-code for basic expression resembles (with infrequent exceptions) the postfix notation where operands come before the operator.

Our parser can have several rules for recognizing this kind of expressions:

expr        ::=  atom
atom        ::=  identifier | literal
identifier  ::=  LOAD_FAST | LOAD_NAME | LOAD_GLOBAL | LOAD_DEREF
literal     ::=  LOAD_CONST
                
expr        ::=  binary_expr
binary_expr ::=  expr expr binary_op
binary_op   ::=  BINARY_ADD | BINARY_MULTIPLY | BINARY_AND |
                 BINARY_OR | BINARY_XOR | BINARY_SUBTRACT | BINARY_DIVIDE |
                 BINARY_TRUE_DIVIDE | BINARY_FLOOR_DIVIDE | BINARY_MODULO |
                 BINARY_LSHIFT | BINARY_RSHIFT | BINARY_POWER

... and so on.

Notice, however, the last instruction in all the basic expression shown so far is RETURN_VALUE and that is not recognized by the grammar presented. Before trying to find a solution for this issue, let’s inspect conditional expressions.

Conditional and boolean expressions

Conditional and boolean expressions, on the other hand, do produce jumps in the byte-code: they don’t run straightforwardly from top to bottom.

For instance, the simplest conditional expression “a if x else y”, in Python 2.7 and 3.4 compiles to:

dis.dis(compile('a if x else y', '', 'eval'))
  1           0 LOAD_NAME                0 (x)
              3 POP_JUMP_IF_FALSE       10
              6 LOAD_NAME                1 (a)
              9 RETURN_VALUE
        >>   10 LOAD_NAME                2 (y)
             13 RETURN_VALUE

In Pypy 2.7.3:

dis.dis(compile('a if x else y', '', 'eval'))  # doctest: +SKIP
  1           0 LOAD_NAME                0 (x)
              3 POP_JUMP_IF_FALSE       12
              6 LOAD_NAME                1 (a)
              9 JUMP_FORWARD             3 (to 15)
        >>   12 LOAD_NAME                2 (y)
        >>   15 RETURN_VALUE

The difference is the 9th offset, where Python uses a RETURN_VALUE, Pypy uses a JUMP_FORWARD.

It’s easy to see we can do a normalization step for Pypy byte-code so that the final byte-code is the same as that of CPython. Actually, later on we’ll find out that’s best to do the other way around and to keep a single RETURN_VALUE.

After the normalization step, we get the same byte-code and our analysis can proceed ignoring the differences.

We may think then, conditionals might be recognized by the following rules:

conditional ::= condition_expression POR_JUMP_IF_FALSE
                then_result else_result

then_result ::= returning_expression
else_result ::= returning_expression
returning_expression  ::= expr RETURN_VALUE

condition_expression := expr

expr ::= conditional

However, let’s first see how ‘nested’ conditional interact: a if (a1 if x1 else y1) else y2. Inspecting the actual byte-code we see there’s an issue with the then_result inside the condition expression since it does not end with the required RETURN_VALUE:

dis.dis(compile('a if (a1 if x1 else y1) else y2', '', 'eval'))
1           0 LOAD_NAME                0 (x1)
            3 POP_JUMP_IF_FALSE       12
            6 LOAD_NAME                1 (a1)
            9 JUMP_FORWARD             3 (to 15)
      >>   12 LOAD_NAME                2 (y1)
      >>   15 POP_JUMP_IF_FALSE       22
           18 LOAD_NAME                3 (a)
           21 RETURN_VALUE
      >>   22 LOAD_NAME                4 (y2)
           25 RETURN_VALUE

The inner conditional expression cannot return the value of a1 or y1 but let it in the top of the stack (TOS) to further inspection by the outer expression. Surely we’re going to see this pattern even when combining conditional expression with non-conditional ones as in:

dis.dis(compile('(a if x else y) + 1', '', 'eval'))
  1           0 LOAD_NAME                0 (x)
              3 POP_JUMP_IF_FALSE       12
              6 LOAD_NAME                1 (a)
              9 JUMP_FORWARD             3 (to 15)
        >>   12 LOAD_NAME                2 (y)
        >>   15 LOAD_CONST               0 (1)
             18 BINARY_ADD
             19 RETURN_VALUE

Notice that in both cases the JUMP_FORWARD targets the byte-code that handles (or leaves alone) the result of the conditional expression. So inner conditional expressions don’t end with neither a jump or a RETURN_VALUE, but leave the TOS as it is.

This is actually the same issue with basic expression having a final RETURN_VALUE.

And it means that expr should not be considered a standalone expression but that always occur inside a bigger structure, so if we change our rules to:

conditional          ::=  condition_expression POR_JUMP_IF_FALSE
                          then_result else_result

condition_expression ::=  expr

expr                 ::=  conditional

else_result          ::=  expr
then_result          ::=  expr JUMP_FORWARD
then_result          ::=  expr JUMP_ABSOLUTE

without the trailing RETURN_VALUE for the else result; now they work for recognizing the expression shown so far.

Since our expr cannot include the trailing RETURN_VALUE, that token needs to be matched by a higher level structure. We’ll call that ‘returning an expression sentence’, in fact we have already the rule:

returning_expression ::=  expr RETURN_VALUE

So instead of trying a recognizing an expr we should strive to get a sentence that returns an expression.

Yet, those productions rules are not sufficient. They alone would make these two programs produce the same AST:

dis.dis(compile('a if x else (o if i else n)', '', 'eval'))
1           0 LOAD_NAME                0 (x)
            3 POP_JUMP_IF_FALSE       10
            6 LOAD_NAME                1 (a)
            9 RETURN_VALUE
      >>   10 LOAD_NAME                2 (i)
           13 POP_JUMP_IF_FALSE       20
           16 LOAD_NAME                3 (o)
           19 RETURN_VALUE
      >>   20 LOAD_NAME                4 (n)
           23 RETURN_VALUE

dis.dis(compile('o if (a if x else i) else n', '', 'eval'))
 1           0 LOAD_NAME                0 (x)
             3 POP_JUMP_IF_FALSE       12
             6 LOAD_NAME                1 (a)
             9 JUMP_FORWARD             3 (to 15)
       >>   12 LOAD_NAME                2 (i)
       >>   15 POP_JUMP_IF_FALSE       22
            18 LOAD_NAME                3 (o)
            21 RETURN_VALUE
       >>   22 LOAD_NAME                4 (n)
            25 RETURN_VALUE

If we disregard the target of the jumps (as our grammar does) we can’t distinguish between those two programs. In fact those many RETURN_VALUE we see, are only produced after an optimization step because we’re compiling the expression with 'eval'. Otherwise, they only show up in lambdas:

dis.dis(compile('a if x else (o if i else n)', '', 'exec'))
1           0 LOAD_NAME                0 (x)
            3 POP_JUMP_IF_FALSE       12
            6 LOAD_NAME                1 (a)
            9 JUMP_FORWARD            15 (to 27)
      >>   12 LOAD_NAME                2 (i)
           15 POP_JUMP_IF_FALSE       24
           18 LOAD_NAME                3 (o)
           21 JUMP_FORWARD             3 (to 27)
      >>   24 LOAD_NAME                4 (n)
      >>   27 POP_TOP
           28 LOAD_CONST               0 (None)
           31 RETURN_VALUE

dis.dis(lambda: a if x else (o if i else n))
 1           0 LOAD_GLOBAL              0 (x)
             3 POP_JUMP_IF_FALSE       10
             6 LOAD_GLOBAL              1 (a)
             9 RETURN_VALUE
       >>   10 LOAD_GLOBAL              2 (i)
            13 POP_JUMP_IF_FALSE       20
            16 LOAD_GLOBAL              3 (o)
            19 RETURN_VALUE
       >>   20 LOAD_GLOBAL              4 (n)
            23 RETURN_VALUE

We’ll see the same problem in our analysis of boolean expressions, so let’s start with it.

Boolean expressions

Boolean expressions produce byte-code that “short-circuit” the evaluation as soon as the result is known:

dis.dis(compile('a and b', '', 'eval'))
 1           0 LOAD_NAME                0 (a)
             3 JUMP_IF_FALSE_OR_POP     9
             6 LOAD_NAME                1 (b)
       >>    9 RETURN_VALUE


dis.dis(compile('a or b', '', 'eval'))
 1           0 LOAD_NAME                0 (a)
             3 JUMP_IF_TRUE_OR_POP      9
             6 LOAD_NAME                1 (b)
       >>    9 RETURN_VALUE

Here the JUMP_IF_FALSE_OR_POP and JUMP_IF_TRUE_OR_POP do the evaluation and stop it if the result is known. However, when combined things change:

dis.dis(compile('x and a or y', '', 'eval'))
 1           0 LOAD_NAME                0 (x)
             3 POP_JUMP_IF_FALSE       12
             6 LOAD_NAME                1 (a)
             9 JUMP_IF_TRUE_OR_POP     15
       >>   12 LOAD_NAME                2 (y)
       >>   15 RETURN_VALUE

Now the and is implemented by POP_JUMP_IF_FALSE. This is because the jumps are optimized in several ways by Python VMs. Optimization changes the layout of the code. We have collected some of the optimization steps performed by the CPython VM below. If unoptimized we would’ve seen the same instructions. You may confirm that Pypy 2.7 doesn’t use the POP_JUMP_IF_FALSE for the expression above, but keeps the JUMP_IF_FALSE_OR_POP.

We’ll see that changing the precedence by grouping don’t change the instructions but the target of the first jump:

dis.dis(compile('x and (a or y)', '', 'eval'))  # doctest: +SKIP
 1           0 LOAD_NAME                0 (x)
             3 JUMP_IF_FALSE_OR_POP    15
             6 LOAD_NAME                1 (a)
             9 JUMP_IF_TRUE_OR_POP     15
            12 LOAD_NAME                2 (y)
       >>   15 RETURN_VALUE

Again not taking jump targets into account hurts our ability to properly interpret the byte-code.

But first let’s take more samples:

dis.dis(compile('x and a and b', '', 'eval'))  # doctest: +SKIP
 1           0 LOAD_NAME                0 (x)
             3 JUMP_IF_FALSE_OR_POP    15
             6 LOAD_NAME                1 (a)
             9 JUMP_IF_FALSE_OR_POP    15
            12 LOAD_NAME                2 (b)
       >>   15 RETURN_VALUE

dis.dis(compile('x or a or b', '', 'eval'))  # doctest: +SKIP
 1           0 LOAD_NAME                0 (x)
             3 JUMP_IF_TRUE_OR_POP     15
             6 LOAD_NAME                1 (a)
             9 JUMP_IF_TRUE_OR_POP     15
            12 LOAD_NAME                2 (b)
       >>   15 RETURN_VALUE

These cases show that several and and or together are simply changed to target the same “final” offset.

The first observation we can make is that jumps in conditional and boolean expressions target the first instruction that deals with the result of the boolean sub-expression. Pictorially for a a and b this is like:

expr JUMP_IF_FALSE_OR_POP  ..... <SOME_INSTRUCTION>
                       |         ^
                       +--------/

The entire program after the jump and before its target contains the instructions for the other operand to the boolean operator. The instruction just before the target is not necessarily the last instruction of the other operand, it may be another jump or RETURN_VALUE like in:

dis.dis(compile('a and b or c', '', 'eval'))   # doctest: +SKIP
 1           0 LOAD_NAME                0 (a)
             3 POP_JUMP_IF_FALSE       12
             6 LOAD_NAME                1 (b)
             9 JUMP_IF_TRUE_OR_POP     15
       >>   12 LOAD_NAME                2 (c)
       >>   15 RETURN_VALUE

Furthermore, jumps may skip several operands at once (remind a and b and c).

It seems that the key to solving this issue is to keep track of the jumps and their targets and produce virtual codes that allow the structure to be retrieved by the parser.

Keeping track of jumps and targets in the scanner

Our first approach is to place a simple <COME_FROM> [1] virtual token at the most likely end of the sub-expression after a conditional jump. Our definition of the most likely end is rather intuitive and we’ll extend it as we analyze further.

In the simplest cases, it will be placed just before the RETURN_VALUE at the end:

from xotl.ql.revenge.scanners import xdis
xdis(compile('a and b', '', 'eval'))   # doctest: +SKIP
1         0 LOAD_NAME            a    (a)
          3 JUMP_IF_FALSE_OR_POP 9    (9)
          6 LOAD_NAME            b    (b)
        9_0 COME_FROM            3    (from 3)
    >>    9 RETURN_VALUE

For the sake of readability COME_FROM indicates the offset from which it is reached, it’s virtual offset if composed of the offset of the target and an index – since many jumps may target the same offset:

xdis(compile('a or b or c', '', 'eval'))   # doctest: +SKIP
1         0 LOAD_NAME            a    (a)
          3 JUMP_IF_TRUE_OR_POP  15   (15)
          6 LOAD_NAME            b    (b)
          9 JUMP_IF_TRUE_OR_POP  15   (15)
         12 LOAD_NAME            c    (c)
       15_0 COME_FROM            3    (from 3)
       15_1 COME_FROM            9    (from 9)
    >>   15 RETURN_VALUE

In the first case the COME_FROM is at the end of the right-hand operand. In the second case we see that the first jumps spans over the second.

Note

This is as far as we’ve gotten in this document. Below are just stuff that need to be edited and integrated.

EXTENDED_ARG removal

The EXTENDED_ARG opcode only comes when arguments in the next opcode are two big to fit into 2 bytes.

The Bytecode knows how to deal with extended arguments and the produced Instruction set have the arguments fully computed. The instruction set keeps the EXTENDED_ARG but we can easily skip it and never produce a token for it.

Virtual codes

Most of the time the scanner will produce tokens that wrap byte-code instructions. But for byte-codes that take arguments which indicate how many items to take from the stack (BUILD_LIST, BUILD_TUPLE, BUILD_SLICE, etc) the scanner transforms them in customized versions depending on the argument.

For instance the instruction BUILD_LIST 3 yields a token <BUILD_LIST_3> [1]. The scanner actually returns the stream of tokens and a map from customized code names to arguments. This means that a customized code must be produced per argument value found in the original instruction set.

The program shown below produces two customized tokens <BUILD_LIST_3> and <BUILD_LIST_2>:

dis.dis(compile('[1, 2, 3] + [9, 0]', '', 'eval'))
 1           0 LOAD_CONST               0 (1)
             3 LOAD_CONST               1 (2)
             6 LOAD_CONST               2 (3)
             9 BUILD_LIST               3
            12 LOAD_CONST               3 (9)
            15 LOAD_CONST               4 (0)
            18 BUILD_LIST               2
            21 BINARY_ADD
            22 RETURN_VALUE

Before the parser begins to recognize the program it should build new rules from those customized tokens. For instance for recognizing a <BUILD_LIST_3> it could add the rule:

list_expression ::= expr expr expr BUILD_LIST_3

See also

The parsers module.

The byte-code that needs to be customized are: BUILD_LIST, BUILD_TUPLE, BUILD_SET, BUILD_SLICE, UNPACK_SEQUENCE, MAKE_FUNCTION, CALL_FUNCTION, MAKE_CLOSURE, CALL_FUNCTION_VAR, CALL_FUNCTION_KW, CALL_FUNCTION_VAR_KW, DUP_TOPX, and RAISE_VARARGS – this last in only include for completion, not because we’ll find it in expression.

JUMP_BACK

A JUMP_ABSOLUTE that targets a previous instruction yields a virtual code <JUMP_BACK>.

This allows the parser to recognize loops inside comprehensions more easily. The JUMP_ABSOLUTE in a comprehension always jumps to the FOR_ITER to get the next item.

Generator expressions, set and dictionary comprehensions

Since generator expressions are the constructor for expressing queries we need to make sure we can recognize them. However, they’re compiled using the same general algorithm and we need to analyze them together.

Generator expressions work by creating a nested function where the loops is actually done and call it immediately with the iterable as its single argument:

dis.dis(compile('(a for a in this)', '', 'eval'))  # doctest: +SKIP
 1           0 LOAD_CONST               0 (<code object <genexpr> ...>)
             3 MAKE_FUNCTION            0
             6 LOAD_NAME                0 (this)
             9 GET_ITER
            10 CALL_FUNCTION            1
            13 RETURN_VALUE

A set comprehension is indistinguishable from a generator expression:

dis.dis(compile('{a for a in this}', '', 'eval'))  # doctest: +SKIP
 1           0 LOAD_CONST               0 (<code object <setcomp> ...>)
             3 MAKE_FUNCTION            0
             6 LOAD_NAME                0 (this)
             9 GET_ITER
            10 CALL_FUNCTION            1
            13 RETURN_VALUE

The easiest way to distinguish if we’re looking at a set comprehension, dictionary comprehension or generator expression is by realizing that the code object has a unique name for each: “<genexpr>” for generator expressions, “<setcomp>” for set comprehensions and “<dictcomp>” for dictionary comprehensions [2].

This fact is used by the scanner to generate virtual tokens for each kind of comprehension: LOAD_GENEXPR, LOAD_DICTCOM, LOAD_SETCOMP and LOAD_LAMBDA. The last one is not related with comprehensions but is easy to show the similarity:

dis.dis(compile('(lambda t: None)(this)', '', 'eval')) # doctest: +SKIP
 1           0 LOAD_CONST               0 (<code object <lambda> ...>)
             3 MAKE_FUNCTION            0
             6 LOAD_NAME                0 (this)
             9 CALL_FUNCTION            1
            12 RETURN_VALUE

This shows that the parser needs to recognize this simple structure for comprehensions:

_comprehension ::= MAKE_FUNCTION_0 expr GET_ITER CALL_FUNCTION_1

genexpr  ::= LOAD_GENEXPR  _comprehension
setcomp  ::= LOAD_SETCOMP  _comprehension
dictcomp ::= LOAD_DICTCOMP _comprehension

The nested function can be inspected:

dis.dis(compile('(a for a in this)', '', 'eval').co_consts[0])
 1           0 LOAD_FAST                0 (.0)
       >>    3 FOR_ITER                11 (to 17)
             6 STORE_FAST               1 (a)
             9 LOAD_FAST                1 (a)
            12 YIELD_VALUE
            13 POP_TOP
            14 JUMP_ABSOLUTE            3
       >>   17 LOAD_CONST               0 (None)
            20 RETURN_VALUE

Note

Support for generator expressions from Python 2.7 and 3.2+.

Generator expressions were introduced in Python 2.4, but the byte-code was actually clean up in Python 2.7, when they removed block-related byte-code. In the related function ‘Python/compile.c’ we read:

This code *knows* that the loop cannot contain break, continue, or
return, so it cheats and skips the SETUP_LOOP/POP_BLOCK steps used in
normal loops.

We rely on this fact.

List comprehensions

Since (at least) Python 3.2, list comprehensions take the form of the other comprehensions, thus we’ll need a virtual code LOAD_LISTCOMP and a rule:

listcomp ::= LOAD_LISTCOMP _comprehension

Yet the code object of a list comprehension needs to be analyzed. We’ll show how to deal with Python 2.7 list comprehensions first and then see how the nested function in Python 3.2+ is to be integrated.

In Pypy there’s a BUILD_LIST_FROM_ARG opcode to speed list comprehensions up. This is basically an informed BUILD_LIST: it peeks the TOS and tries to guess the length of the list it will build, if it can’t it will behave as a BUILD_LIST 0.

A list comprehension “[a for a in b]” will look like:

1           0 LOAD_NAME                0 (b)
            3 BUILD_LIST_FROM_ARG      0
            6 GET_ITER
      >>    7 FOR_ITER                12 (to 22)
           10 STORE_NAME               1 (a)
           13 LOAD_NAME                1 (a)
           16 LIST_APPEND              2
           19 JUMP_ABSOLUTE            7
      >>   22 RETURN_VALUE

while in Python 2.7 the same comprehension yields:

1           0 BUILD_LIST               0
            3 LOAD_NAME                0 (b)
            6 GET_ITER
      >>    7 FOR_ITER                12 (to 22)
           10 STORE_NAME               1 (a)
           13 LOAD_NAME                1 (a)
           16 LIST_APPEND              2
           19 JUMP_ABSOLUTE            7
      >>   22 RETURN_VALUE

Notice that the only change is that Pypy don’t use the BUILD_LIST 0 but add a BUILD_LIST_FROM_ARG after loading the list.

We can’t simply swap the codes in this case, their order is also affected.

We think this issue should be addressed by the parser and not the scanner since the iterable can be any expression, i.e. not a single LOAD_NAME.

[1](1, 2) In this document we use angular brackets to distinguish tokens from byte-code instructions names.
[2]

Since CPython 3.3, MAKE_FUNCTION will actually be preceded by two LOAD_CONST, the second one with the function qualified name. But we can ignore that fact for the analysis.

It will be easily treated by overriding the rule _comprehension:

_comprehension ::= LOAD_CONST MAKE_FUNCTION_0 expr GET_ITER
                   CALL_FUNCTION_1
Terms used in this document
absolute jump
Any jump instructions that takes the actual target offset that it jumps to: JUMP_ABSOLUTE and all the conditional jumps.
conditional jump

An instruction that only jumps conditionally: JUMP_IF_FALSE_OR_POP, JUMP_IF_TRUE_OR_POP, POP_JUMP_IF_FALSE, and POP_JUMP_IF_TRUE.

Note

Despite the fact that FOR_ITER only jumps when the iterator is exhausted, we don’t consider it neither a conditional jump nor an unconditional jump.

jump
jump instruction
Any instructions that jumps: conditional jump, unconditional jump, relative jump and absolute jump.
relative jump

Any jump instruction that takes a delta argument to calculate the offset of the target: FOR_ITER, and JUMP_FORWARD.

Note

We don’t list all the instructions but only those that show up in expressions.

target
This term is used loosely to mean either the offset (after resolution for relative jumps), or the instruction at that offset.
unconditional jump
Any of the instructions JUMP_FORWARD and JUMP_ABSOLUTE
Optimizations
  • If the target of JUMP_IF_FALSE_OR_POP (JUMP_IF_TRUE_OR_POP) is a conditional jump of the same truth-testing kind then update the target to be that of the previous target.

    This is the rule that collapses several and and or together.

  • If the target of JUMP_IF_*_OR_POP is a conditional jump of the opposite truth-testing kind, change the instruction to POP_JUMP_IF_* targeting the next instruction of the previous target.

    This is what happens for x and a or b and also for (x or a) and b:

    dis.dis(compile('x and a or b', '', 'eval'))
     1           0 LOAD_NAME                0 (x)
                 3 POP_JUMP_IF_FALSE       12
                 6 LOAD_NAME                1 (a)
                 9 JUMP_IF_TRUE_OR_POP     15
           >>   12 LOAD_NAME                2 (b)
           >>   15 RETURN_VALUE
    
    dis.dis(compile('(x or a) and b', '', 'eval'))
     1           0 LOAD_NAME                0 (x)
                 3 POP_JUMP_IF_TRUE        12
                 6 LOAD_NAME                1 (a)
                 9 JUMP_IF_FALSE_OR_POP    15
           >>   12 LOAD_NAME                2 (b)
           >>   15 RETURN_VALUE
    

... and others.

API of the module
xotl.ql.revenge.scanners.keep_single_return(instructions)

Transform the instructions so the a single RETURN_VALUE is kept at the bottom.

If the original instructions don’t have a RETURN_VALUE at the very end, return the same instructions set.

Otherwise all but the last RETURN_VALUE will be replaced by a JUMP_FORWARD that targets the RETURN_VALUE at the end of the instructions.

Offsets and jump targets are updated.

Parameters:instructions (iterable) – The original instructions.
Return type:list
xotl.ql.revenge.scanners.without_nops(instructions)

Return the same instruction set with NOPs removed.

The jump targets are properly updated.

Loosely based on the same algorithm in Python/peephole.c.

Parameters:instructions – An iterable of instructions.
Returns:A generator of instructions.
xotl.ql.revenge.scanners.normalize_pypy_conditional(instructions)

Apply the pypy normalization rule.

The pypy normalization rule states that:

If the target of a JUMP_FORWARD (or JUMP_ABSOLUTE) is a RETURN_VALUE replace the JUMP with the following instructions:

RETURN_VALUE
NOP
NOP

This rule does not only apply when using Pypy, the name simply comes because Pypy compiles conditional expressions using JUMPs.

Parameters:instructions – An iterable of instructions.
Returns:A generator of instructions.
class xotl.ql.revenge.scanners.Token(name, arg=None, argval=None, argrepr=None, offset=-1, starts_line=False, instruction=None)

Class representing a byte-code token.

A byte-code token is equivalent to the contents of one line as output by dis.dis().

Footnotes
[3]Many of the examples on this document won’t look the same when you execute them on different versions or implementations of Python.

Query Syntax Tree

A Query Syntax Tree (QST) is just an enriched, but restricted to expressions, variant of the Abstract Syntax Tree (AST) provided by the standard ast module.

The change in name obeys the following rationale:

  • We don’t use the ast objects directly since they don’t support comparison. Nevertheless, we’ll use the names in that module to express the same syntactical element in Query Syntax Trees.
  • We don’t need any construction that does not appear in expressions, e.g ast.FunctionDef, ast.While and others won’t appear in any Query Syntax Tree. Although ast.Yield is an expression it is not valid within a Query.
  • We’d like to emphasize we describe queries and not any kind of program.

Monad Comprehensions

Monad comprehensions.

Warning

As noted in [QLFunc] algebra operators are an “abstraction of the algorithms implemented by target query engines.” Therefore, the following implementation of such operators are not designed to provide an efficient representation of those algorithms and data structures.

Algebra operator as abstractions

To stress the idea of abstraction of algorithms, notice that engines may proceed differently to obtain the result of \({\bf map}_f l\).

If we can sensibly split \(l\) into \(l_1, l_2, ..., l_n\) then:

\[{\bf map}_f l = {\bf map}_f (l_1 \oplus l_2 \oplus{} ... \oplus l_n) = \ {\bf map}_f {\bf join}([l_1, l_2, ..., l_n]) = \ {\bf join}([{\bf map}_f l_1, {\bf map}_f l_2, ..., {\bf map}_f l_n])\]

In fact, many times \(l\) is actually stored in chunks; it may be stored distributed across several nodes or in a single node but split in several size-bounded buckets.

Therefore, instead of getting all the pieces of \(l\) and then performing the map over the entire collection, we can perform the map over the pieces and then join them to get the result. The engine can make any choice it thinks it’s the best.

Refer to [MCQL] for more about catamorphisms and catamorphism fusion. See note on foldr notation.

Internal representation

class xotl.ql.translation._monads.Empty

Any empty collection.

class xotl.ql.translation._monads.Cons(x, xs)

This is an abstract representation of the “\(x :^\tau xs\)” operation as referred in [QLFunc]. It’s not meant to be efficient or to be used as true collection for Python programs. It will recursively build all ‘Cons’ when instantiated so it may hit the maximum recursion depth very soon.

The xs must be a collection or another Cons object. If a collection is passed it will be converted to a Cons object.

There’s no built-in concept of equality or equivalence since that would require a specific type. The proposition:

Cons(1, Cons(2, Empty())) == Cons(2, Cons(1, Empty()))

would be True for sets and bags but not for lists.

Cons support the extraction of head and tail we see in functional languages:

>>> from xotl.ql.translation._monads import Cons
>>> head, tail = Cons(1, [2, 3])
>>> head
1
>>> tail
Cons(2, Cons(3, Empty()))

Notice that unless you build the structure with Cons itself it may not be well defined the value of the head:

>>> _, (head, tail) = Cons(1, {128, 90})   # Is head 128 or 90?

There’s no direct support for walking the Cons besides its ability to extract the head and tail. Walking is easily defined by recursion or iteration:

>>> def walk(c):
...    h, t = c
...    while t:
...       yield h
...       h, t = t
...    yield h
>>> list(walk(Cons(1, [2, 3])))
[1, 2, 3]

Any of the arguments may take the value be xoutil.Undefined to “partialize” the constructor. Using this feature you may declare the monadic Unit operator as:

>>> from xoutil import Undefined
>>> Unit = Cons(Undefined, [])

And then use it like:

>>> Unit(1)
Cons(1, Empty())

You can’t walk a partial Cons:

>>> head, tail = Unit                             
Traceback (most recent call last):
...
TypeError: Cons as a partial function cannot be iterated
class xotl.ql.translation._monads.Foldr(operator, initial, collection)

The structural recursion operator.

foldr is defined by:

\[\begin{eqnarray} {\bf foldr}^\tau & :: & (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow \tau\ \alpha \rightarrow \beta \\ \\ {\bf foldr}^\tau (\oplus) z []^\tau & = & z \\ {\bf foldr}^\tau (\oplus)\ z\ (x :^\tau xs) & = & x \oplus ({\bf foldr}^\tau (\oplus)\ z\ xs) \end{eqnarray}\]

The foldr operation is also known as the reduce function. In fact Foldr(func, initial, coll)() returns the same result as reduce(func, coll.asiter(), initial):

>>> import operator
>>> from xotl.ql.translation._monads import Foldr
>>> Foldr(operator.add, 0, Cons(1, [2]))()
3
>>> from functools import reduce
>>> reduce(operator.add, Cons(1, [2]).asiter(), 0)
3

Foldr instances have the following attributes:

operator

The operator applied in each step of the computation.

arg

The initial value z.

collection

The collection of values.

Any of the attributes can be Undefined to make the instance a partial definition.

Calling a non-partial Foldr instance traverses recursively the collection and applies the given operator. If the collection is large enough this may hit the maximum recursion limit.

Calling a partial Foldr instance returns another Foldr instance if not enough arguments were provided in order to render it non-partial. If enough arguments are provided it behaves as if it were a non-partial. This makes it easy to build ‘functions’ based on the first two arguments as in all = Foldr(operator.and, True).

As noted in [QLFunc] the Cons operator (\(:^\tau\)) needs to be further specified for set and bags. Also the “\(\oplus\)” infix operator needs to be commutative if \(:^\tau\) is left-commutative and idempotent if \(:^\tau\) is left-idempotent.

For the purposes of xotl.ql this class is only meant for description and not functionality. So these questions are not directly addressed.

class xotl.ql.translation._monads.Union(xs, ys)

The Union operation.

Unions are defined over Cons instances. Unions instances are callables that perform the union when called.

Creating a Union:

>>> from xotl.ql.translation._monads import Union, Cons
>>> whole = Union(Cons(1, []), Cons(2, []))
>>> whole
Union(Cons(1, Empty()), Cons(2, Empty()))

Calling the union instance performs the union:

>>> whole()
Cons(1, Cons(2, Empty()))

A Union may be also be a partial by leaving one of its arguments Undefined:

>>> partial = Union(Undefined, Cons(1, []))

Calling partial unions will return the same object is no arguments are passed, or a performed union:

>>> partial() is partial
True
>>> partial(Cons(2, []))
Cons(2, Cons(1, Empty()))
Sorting

Sorting is only sensibly defined over lists. Our Cons constructor can be regarded as a list constructor. The problem of sorting a list can be defined with Foldr as well.

We simply need to define the \(x :^{\tau}_{\small <} xs\) operator that inserts x into xs in the “right” position assuming xs is already sorted.

\(x :^{\tau}_{\small <} xs\) is easily defined as:

\[\begin{eqnarray} :^\tau_< \quad & :: & \alpha \rightarrow \tau\alpha \rightarrow \tau\alpha \\ \\ x :^\tau_< [] & = & x :^\tau [] \\ x :^\tau_< (y :^\tau ys) & = & {\bf if}\, x < y\,\, {\bf then}\, x :^\tau (y :^\tau ys)\,\, {\bf else}\, y :^\tau (x :^\tau_< ys) \end{eqnarray}\]

Now sorting can be achieved by:

\[\begin{split}{\bf sort}^\tau_< = {\bf foldr} :^\tau_< []\end{split}\]

Defining \(:^\tau_>\) is just as easy, and then \({\bf sort}^\tau_>\) can be defined as well.

Sorting is only well defined if \(<\) (or \(>\)) are also properly defined.

Yet, sorting can be expensive and engines are allowed to take shortcuts. For instance, Google’s MapReduce [MAPRED] always sort the result of a map by keys (all values in Google’s MapReduce are a pair of key, value.)

Nevertheless, it can be useful to include sorting in our algebraic definition so that ordering instruction can be explicitly represented.

Notice, however, Python generator expressions don’t directly support expressing ordering. Other query languages like SQL do support them.

class xotl.ql.translation._monads.SortedCons(order)

The sorted insertion operation.

Parameters:order

The ordering function. It may be one of the strings ‘<’, ‘<=’, ‘>’, ‘>=’ or any callable that accepts two arguments x, y and returns True if x is in the right order with regards to y.

For instance, operator.lt() is a valid argument – in fact, ‘<’ is just an alias for it.

SortedCons won’t sort a list. It will simply “push” its first argument until it matches the order function:

>>> from xotl.ql.translation._monads import SortedCons, Empty
>>> SortedCons('<')(49, Cons(30, Cons(50, Cons(10, [-1]))))
Cons(30, Cons(49, Cons(50, Cons(10, Cons(-1, Empty())))))

Using Foldr we obtain a sort function:

>>> Foldr(SortedCons('<'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty())))))
Cons(-1, Cons(30, Cons(49, Cons(50, Empty()))))

>>> Foldr(SortedCons('>'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty())))))
Cons(50, Cons(49, Cons(30, Cons(-1, Empty()))))

Large collections

Although this module is not meant for execution of these operations, and thus truly large collections are out the question, we have a LazyCons class that allows to represent large collections. However, the current definition of Foldr makes it unpractical to really work with them.

class xotl.ql.translation._monads.LazyCons(x, xs)

A Cons that does not iterate over its arguments until needed.

Cons can’t represent a collection that exceeds the recursion limit of the underlying representation. LazyCons can represent such collections:

>>> from xotl.ql.translation._monads import LazyCons
>>> from xoutil.eight import range
>>> lc = LazyCons(1, range(10**6))
>>> lc                                            
LazyCons(1, ...range(...1000000))
>>> len(lc.aslist())
1000001

As with Cons the standard operation is to extract head and tail. The tail is always a LazyCons instance:

>>> head, tail = lc
>>> head
1
>>> tail                                         
LazyCons(0, <range...iterator...>)

It may even represent unbounded collections:

>>> import itertools
>>> lc = LazyCons(1, itertools.count(2))
>>> lc
LazyCons(1, count(2))

However you must be careful while iterating over such as collection:

>>> len(list(itertools.takewhile(lambda x: x < 100, lc.asiter())))
99

Warning

LazyCons is not provided for performance or efficiency.

Like all the objects in this module, LazyCons is not meant to be used in production but to test expectations about the execution plans you may be using based on monads comprehensions.

We needed a lazy version of Cons in out the xotl.ql.translation.py module

Memento for mathematical terms

monoid

An algebraic structure with an associative operator ++ having a distinguished element zero (0) as left and right identity:

0 ++ x = x
x ++ 0 = x

https://en.wikipedia.org/wiki/Monoid

I don’t know why in [MCQL] they claim ([], :) is a monoid. The structure of all lists with the concatenation operator (++) is a monoid. Since all lists and the operator ++ are fully defined using [] and :, it’s plausible to believe they refer to the monoid generated by those constructors.

Just another scratch pad

Notes on [Burstall1977]

  • Laws. Well expressed as known properties about the objects. Associativity, commutativity, etc..

    1. i + j — Does not commute on strings, but it does associate.
    2. A * M — Commutes if A is a scalar and M a matrix, but does not commute if both are matrices. It associates for scalars and matrices alike being mixed.

    In xotl.ql, this kind of information can be only provided by isinstance() inside a query expression, or by the object model (known only to the translator).

Notes on [MCQL]

  • Catamorphims. They express the property of the spine transformers as:

    (z; +) ⋅ cata n c  =  cata z +
    

    being + the general list-appending (or set/multiset member inclusion).

    This means that if we can prove cata a catamorphim, we can optimize by applying it to results.

References

[UnQL]Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. “A Query Language and Optimization Techniques for Unstructured Data”. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.2802 http://dl.acm.org/citation.cfm?id=233368 http://homepages.inf.ed.ac.uk/opb/papers/SIGMOD1996.pdf
[coSQL2011]Erik Meijer and Gavin Bierman. “A co-Relational Model of Data for Large Shared Data Banks”, Comm. ACM, 54(4) April 2011. http://queue.acm.org/detail.cfm?id=1961297
[coSQL2012]Maarten Fokkinga. “SQL versus coSQL — a compendium to Erik Meijer’s paper”, Jan 2012. http://wwwhome.ewi.utwente.nl/~fokkinga/mmf2011p.pdf
[MCQL](1, 2) Torsten Grust. “Monads Comprehensions: A Versatile Representation for Queries”. University of Konstanz, Department of Computer and Information Science, 78457 Konstanz, Germany.
[Peyton1987]Peyton, Simon. “The Implementation of Functional Programming Languages”. Prentince-Hall, 1987. Available at: http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/start.htm
[Burstall1977]Bustall, R and Darlington, John. “A Transformation System for Developing Recursive Programs”. Journal of the Assooat~on for Computing Machinery, Vol 24, No 1, January 1977, pp. 44-67. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.4684
[QLFunc]Torsten Grust and Marc H. Scholl. “How to comprehend queries functionally”. University of Konstanz, Department of Computer and Information Science, 78457 Konstanz, Germany.
[MAPRED]Dean, Jeffrey and Ghemawat, Sanjay. “MapReduce: Simplified Data Processing on Large Clusters”. Google Inc. Available at ...
[Hutton1999]Hutton, Graham. “A tutorial on the universality and expressiveness of fold”. University of Nottingham, UK.

Notes

foldr notation
[1]

Notational differences for the same concept: whereas in [QLFunc] we see \({\bf foldr}^\tau (\oplus)\, z\, []^\tau\) in [MCQL] we see \((|\, z; \oplus\, |)\)

We choose a notation that’s easy to read in python code comments and inlined documentation. In this documents (specially the parts extracted from source code) you’ll see foldr z + and (z; +) instead.

[2]Why does [MCQL] says \({\small ([], \uparrow)}\) is a monoid if \({\small \uparrow :: a \times [a] \rightarrow [a]}\) and \({\small x \uparrow [] = [x] \neq x}\)?

Index of the change proposals

This page contains changes proposals that might have an impact in the external API (i.e, the API of the expression language and the those that are directly or indirectly accesible from a query object; this is the API on which query translators depends).

The objective of this index is to record all the proposals (or mumblings) as they may be introduced or not into this package. We believe this will help clarify some of the rationale about the design decisions that have been made.

Index of the proposals:

Simpler xotl.ql.interfaces.ITerm interface

This proposes to simplify the xotl.ql.interfaces.ITerm to remove the parent attribute. The main reason is to make the query object closer to the syntactical structure of the expression.

XEP:1 – Simpler xotl.ql.interfaces.ITerm interface
Status:Obsolete, Removed
Author:Manuel Vázquez Acosta
Branch:none
Rationale

Currently, the interface for ITerm has a parent attribute. That attribute relates a term to another in the same way a getattr operation would do.

So it may be argued that this is actually an operation expressed by the dot operator, and not a term hierarchy. In fact, at the syntactical level there’s no such thing as a parent-child hierarchy for terms; just a <term> DOT <term> is present at that level.

This XEP proposed that the expression:

parent.child

Be represented equivalently to:

getattribute(parent, 'child')
Inclusion rationale
Changes in the API

There are two changes in the API:

  • The removal of parent since it won’t be needed any more.
  • The generalization of xotl.ql.interfaces.IGeneratorToken to support expressions.

Currently a translator must have a look-up table from token’s terms to current objects (names); and must take into account that every term in filters is bound to a token.

Let’s see how this might work in a relational setting for the query:

these(child
      for parent in this
      if parent.children & parent.age > 34
      for child in parent.children
      if child.age < 6)
Non-inclusion rationale

The current implementation complies better with an AST than resorting to transform the DOT into a getattribute.

If a translator is best understood (or implemented) by doing that transformation, it may do so internally.

It’s probably easier with current ITerm to see if we need to correlate two terms (i.e emitting a JOIN in a SQL query) by looking at the term’s parents. And it should not be complicated to know how to fetch an attribute from, given that binding is inherited from parents.

Use Python byte-code reverse engineering

This proposal removes the entire API in favor of byte-code reverse engineering.

XEP:002 – Use Python byte-code reverse engineering
Status:Approved, In progress.
Author:Manuel Vázquez Acosta
Branch:develop
Rationale

Trying to guess the AST of a query has proven cumbersome and in requirement of several hacks like the “particles bubble” to capture events that passed long before.

Furthermore, the queries needed to be specially crafted for the query language, i.e we could not use the and, or, in, is and other keywords in the queries. The same happened for built-in functions like all(), any(), sum(), max() and min:func. There were all being replicated in the expression language.

Since then the uncompyle2 package showed an Early Grammar that’s fairly easy to modify to build a recognizer for Python’s byte-code and extract an AST. Previous experimentation proved it would be practical to adapt it to the query language needs.

Besides, the article [MCQL] provides a language for queries. This language could be used to bootstrap an AST for queries in Python, taking into account several facts proven in the article that don’t apply to Python.

Inclusion rationale
Changes in the API

This is a total rewrite of the module, so the API is heavily changed. The following remarks are not the only things that changed but some highlights about the changes:

  • We are not using zope.component anymore. Those are deemed outside the scope of an AST for Query Languages.
  • The object this is becoming more nominal, since it won’t play major role. Most of the time, it would only be used to bootstrap comprehensions.
  • The expression language is Python, so the module xotl.ql.expressions will be rethought into utilities for expressions or will be totally removed.
Non-inclusion rationale

Not given.

What does xotl mean?

The word “xotl” is a Nahuatl word that means foundation, base. The xotl package comprises the foundation for building reliable systems, frameworks, and libraries. It also provides an object model that allows to build complex systems.

It is expected that xotl will use xotl.ql to:

  • Express predicates defining object relationships
  • Query the object store (of course!)
  • Update the object store.