Monad Comprehensions

Monad comprehensions.

Warning

As noted in [QLFunc] algebra operators are an “abstraction of the algorithms implemented by target query engines.” Therefore, the following implementation of such operators are not designed to provide an efficient representation of those algorithms and data structures.

Algebra operator as abstractions

To stress the idea of abstraction of algorithms, notice that engines may proceed differently to obtain the result of \({\bf map}_f l\).

If we can sensibly split \(l\) into \(l_1, l_2, ..., l_n\) then:

\[{\bf map}_f l = {\bf map}_f (l_1 \oplus l_2 \oplus{} ... \oplus l_n) = \ {\bf map}_f {\bf join}([l_1, l_2, ..., l_n]) = \ {\bf join}([{\bf map}_f l_1, {\bf map}_f l_2, ..., {\bf map}_f l_n])\]

In fact, many times \(l\) is actually stored in chunks; it may be stored distributed across several nodes or in a single node but split in several size-bounded buckets.

Therefore, instead of getting all the pieces of \(l\) and then performing the map over the entire collection, we can perform the map over the pieces and then join them to get the result. The engine can make any choice it thinks it’s the best.

Refer to [MCQL] for more about catamorphisms and catamorphism fusion. See note on foldr notation.

Internal representation

class xotl.ql.translation._monads.Empty

Any empty collection.

class xotl.ql.translation._monads.Cons(x, xs)

This is an abstract representation of the “\(x :^\tau xs\)” operation as referred in [QLFunc]. It’s not meant to be efficient or to be used as true collection for Python programs. It will recursively build all ‘Cons’ when instantiated so it may hit the maximum recursion depth very soon.

The xs must be a collection or another Cons object. If a collection is passed it will be converted to a Cons object.

There’s no built-in concept of equality or equivalence since that would require a specific type. The proposition:

Cons(1, Cons(2, Empty())) == Cons(2, Cons(1, Empty()))

would be True for sets and bags but not for lists.

Cons support the extraction of head and tail we see in functional languages:

>>> from xotl.ql.translation._monads import Cons
>>> head, tail = Cons(1, [2, 3])
>>> head
1
>>> tail
Cons(2, Cons(3, Empty()))

Notice that unless you build the structure with Cons itself it may not be well defined the value of the head:

>>> _, (head, tail) = Cons(1, {128, 90})   # Is head 128 or 90?

There’s no direct support for walking the Cons besides its ability to extract the head and tail. Walking is easily defined by recursion or iteration:

>>> def walk(c):
...    h, t = c
...    while t:
...       yield h
...       h, t = t
...    yield h
>>> list(walk(Cons(1, [2, 3])))
[1, 2, 3]

Any of the arguments may take the value be xoutil.Undefined to “partialize” the constructor. Using this feature you may declare the monadic Unit operator as:

>>> from xoutil import Undefined
>>> Unit = Cons(Undefined, [])

And then use it like:

>>> Unit(1)
Cons(1, Empty())

You can’t walk a partial Cons:

>>> head, tail = Unit                             
Traceback (most recent call last):
...
TypeError: Cons as a partial function cannot be iterated
class xotl.ql.translation._monads.Foldr(operator, initial, collection)

The structural recursion operator.

foldr is defined by:

\[\begin{eqnarray} {\bf foldr}^\tau & :: & (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow \tau\ \alpha \rightarrow \beta \\ \\ {\bf foldr}^\tau (\oplus) z []^\tau & = & z \\ {\bf foldr}^\tau (\oplus)\ z\ (x :^\tau xs) & = & x \oplus ({\bf foldr}^\tau (\oplus)\ z\ xs) \end{eqnarray}\]

The foldr operation is also known as the reduce function. In fact Foldr(func, initial, coll)() returns the same result as reduce(func, coll.asiter(), initial):

>>> import operator
>>> from xotl.ql.translation._monads import Foldr
>>> Foldr(operator.add, 0, Cons(1, [2]))()
3
>>> from functools import reduce
>>> reduce(operator.add, Cons(1, [2]).asiter(), 0)
3

Foldr instances have the following attributes:

operator

The operator applied in each step of the computation.

arg

The initial value z.

collection

The collection of values.

Any of the attributes can be Undefined to make the instance a partial definition.

Calling a non-partial Foldr instance traverses recursively the collection and applies the given operator. If the collection is large enough this may hit the maximum recursion limit.

Calling a partial Foldr instance returns another Foldr instance if not enough arguments were provided in order to render it non-partial. If enough arguments are provided it behaves as if it were a non-partial. This makes it easy to build ‘functions’ based on the first two arguments as in all = Foldr(operator.and, True).

As noted in [QLFunc] the Cons operator (\(:^\tau\)) needs to be further specified for set and bags. Also the “\(\oplus\)” infix operator needs to be commutative if \(:^\tau\) is left-commutative and idempotent if \(:^\tau\) is left-idempotent.

For the purposes of xotl.ql this class is only meant for description and not functionality. So these questions are not directly addressed.

class xotl.ql.translation._monads.Union(xs, ys)

The Union operation.

Unions are defined over Cons instances. Unions instances are callables that perform the union when called.

Creating a Union:

>>> from xotl.ql.translation._monads import Union, Cons
>>> whole = Union(Cons(1, []), Cons(2, []))
>>> whole
Union(Cons(1, Empty()), Cons(2, Empty()))

Calling the union instance performs the union:

>>> whole()
Cons(1, Cons(2, Empty()))

A Union may be also be a partial by leaving one of its arguments Undefined:

>>> partial = Union(Undefined, Cons(1, []))

Calling partial unions will return the same object is no arguments are passed, or a performed union:

>>> partial() is partial
True
>>> partial(Cons(2, []))
Cons(2, Cons(1, Empty()))

Sorting

Sorting is only sensibly defined over lists. Our Cons constructor can be regarded as a list constructor. The problem of sorting a list can be defined with Foldr as well.

We simply need to define the \(x :^{\tau}_{\small <} xs\) operator that inserts x into xs in the “right” position assuming xs is already sorted.

\(x :^{\tau}_{\small <} xs\) is easily defined as:

\[\begin{eqnarray} :^\tau_< \quad & :: & \alpha \rightarrow \tau\alpha \rightarrow \tau\alpha \\ \\ x :^\tau_< [] & = & x :^\tau [] \\ x :^\tau_< (y :^\tau ys) & = & {\bf if}\, x < y\,\, {\bf then}\, x :^\tau (y :^\tau ys)\,\, {\bf else}\, y :^\tau (x :^\tau_< ys) \end{eqnarray}\]

Now sorting can be achieved by:

\[\begin{split}{\bf sort}^\tau_< = {\bf foldr} :^\tau_< []\end{split}\]

Defining \(:^\tau_>\) is just as easy, and then \({\bf sort}^\tau_>\) can be defined as well.

Sorting is only well defined if \(<\) (or \(>\)) are also properly defined.

Yet, sorting can be expensive and engines are allowed to take shortcuts. For instance, Google’s MapReduce [MAPRED] always sort the result of a map by keys (all values in Google’s MapReduce are a pair of key, value.)

Nevertheless, it can be useful to include sorting in our algebraic definition so that ordering instruction can be explicitly represented.

Notice, however, Python generator expressions don’t directly support expressing ordering. Other query languages like SQL do support them.

class xotl.ql.translation._monads.SortedCons(order)

The sorted insertion operation.

Parameters:order

The ordering function. It may be one of the strings ‘<’, ‘<=’, ‘>’, ‘>=’ or any callable that accepts two arguments x, y and returns True if x is in the right order with regards to y.

For instance, operator.lt() is a valid argument – in fact, ‘<’ is just an alias for it.

SortedCons won’t sort a list. It will simply “push” its first argument until it matches the order function:

>>> from xotl.ql.translation._monads import SortedCons, Empty
>>> SortedCons('<')(49, Cons(30, Cons(50, Cons(10, [-1]))))
Cons(30, Cons(49, Cons(50, Cons(10, Cons(-1, Empty())))))

Using Foldr we obtain a sort function:

>>> Foldr(SortedCons('<'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty())))))
Cons(-1, Cons(30, Cons(49, Cons(50, Empty()))))

>>> Foldr(SortedCons('>'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty())))))
Cons(50, Cons(49, Cons(30, Cons(-1, Empty()))))

Large collections

Although this module is not meant for execution of these operations, and thus truly large collections are out the question, we have a LazyCons class that allows to represent large collections. However, the current definition of Foldr makes it unpractical to really work with them.

class xotl.ql.translation._monads.LazyCons(x, xs)

A Cons that does not iterate over its arguments until needed.

Cons can’t represent a collection that exceeds the recursion limit of the underlying representation. LazyCons can represent such collections:

>>> from xotl.ql.translation._monads import LazyCons
>>> from xoutil.eight import range
>>> lc = LazyCons(1, range(10**6))
>>> lc                                            
LazyCons(1, ...range(...1000000))
>>> len(lc.aslist())
1000001

As with Cons the standard operation is to extract head and tail. The tail is always a LazyCons instance:

>>> head, tail = lc
>>> head
1
>>> tail                                         
LazyCons(0, <range...iterator...>)

It may even represent unbounded collections:

>>> import itertools
>>> lc = LazyCons(1, itertools.count(2))
>>> lc
LazyCons(1, count(2))

However you must be careful while iterating over such as collection:

>>> len(list(itertools.takewhile(lambda x: x < 100, lc.asiter())))
99

Warning

LazyCons is not provided for performance or efficiency.

Like all the objects in this module, LazyCons is not meant to be used in production but to test expectations about the execution plans you may be using based on monads comprehensions.

We needed a lazy version of Cons in out the xotl.ql.translation.py module

Memento for mathematical terms

monoid

An algebraic structure with an associative operator ++ having a distinguished element zero (0) as left and right identity:

0 ++ x = x
x ++ 0 = x

https://en.wikipedia.org/wiki/Monoid

I don’t know why in [MCQL] they claim ([], :) is a monoid. The structure of all lists with the concatenation operator (++) is a monoid. Since all lists and the operator ++ are fully defined using [] and :, it’s plausible to believe they refer to the monoid generated by those constructors.