Monad Comprehensions¶
Monad comprehensions.
Warning
As noted in [QLFunc] algebra operators are an “abstraction of the algorithms implemented by target query engines.” Therefore, the following implementation of such operators are not designed to provide an efficient representation of those algorithms and data structures.
Algebra operator as abstractions¶
To stress the idea of abstraction of algorithms, notice that engines may proceed differently to obtain the result of \({\bf map}_f l\).
If we can sensibly split \(l\) into \(l_1, l_2, ..., l_n\) then:
\[{\bf map}_f l = {\bf map}_f (l_1 \oplus l_2 \oplus{} ... \oplus l_n) = \ {\bf map}_f {\bf join}([l_1, l_2, ..., l_n]) = \ {\bf join}([{\bf map}_f l_1, {\bf map}_f l_2, ..., {\bf map}_f l_n])\]
In fact, many times \(l\) is actually stored in chunks; it may be stored distributed across several nodes or in a single node but split in several size-bounded buckets.
Therefore, instead of getting all the pieces of \(l\) and then performing the map over the entire collection, we can perform the map over the pieces and then join them to get the result. The engine can make any choice it thinks it’s the best.
Refer to [MCQL] for more about catamorphisms and catamorphism fusion. See note on foldr notation.
Internal representation¶
-
class
xotl.ql.translation._monads.
Empty
¶ Any empty collection.
-
class
xotl.ql.translation._monads.
Cons
(x, xs)¶ This is an abstract representation of the “\(x :^\tau xs\)” operation as referred in [QLFunc]. It’s not meant to be efficient or to be used as true collection for Python programs. It will recursively build all ‘Cons’ when instantiated so it may hit the maximum recursion depth very soon.
The xs must be a collection or another Cons object. If a collection is passed it will be converted to a Cons object.
There’s no built-in concept of equality or equivalence since that would require a specific type. The proposition:
Cons(1, Cons(2, Empty())) == Cons(2, Cons(1, Empty()))
would be True for sets and bags but not for lists.
Cons support the extraction of head and tail we see in functional languages:
>>> from xotl.ql.translation._monads import Cons >>> head, tail = Cons(1, [2, 3]) >>> head 1
>>> tail Cons(2, Cons(3, Empty()))
Notice that unless you build the structure with Cons itself it may not be well defined the value of the head:
>>> _, (head, tail) = Cons(1, {128, 90}) # Is head 128 or 90?
There’s no direct support for walking the Cons besides its ability to extract the head and tail. Walking is easily defined by recursion or iteration:
>>> def walk(c): ... h, t = c ... while t: ... yield h ... h, t = t ... yield h
>>> list(walk(Cons(1, [2, 3]))) [1, 2, 3]
Any of the arguments may take the value be
xoutil.Undefined
to “partialize” the constructor. Using this feature you may declare the monadic Unit operator as:>>> from xoutil import Undefined >>> Unit = Cons(Undefined, [])
And then use it like:
>>> Unit(1) Cons(1, Empty())
You can’t walk a partial Cons:
>>> head, tail = Unit Traceback (most recent call last): ... TypeError: Cons as a partial function cannot be iterated
-
class
xotl.ql.translation._monads.
Foldr
(operator, initial, collection)¶ The structural recursion operator.
foldr is defined by:
\[\begin{eqnarray} {\bf foldr}^\tau & :: & (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow \tau\ \alpha \rightarrow \beta \\ \\ {\bf foldr}^\tau (\oplus) z []^\tau & = & z \\ {\bf foldr}^\tau (\oplus)\ z\ (x :^\tau xs) & = & x \oplus ({\bf foldr}^\tau (\oplus)\ z\ xs) \end{eqnarray}\]The foldr operation is also known as the reduce function. In fact
Foldr(func, initial, coll)()
returns the same result asreduce(func, coll.asiter(), initial)
:>>> import operator >>> from xotl.ql.translation._monads import Foldr >>> Foldr(operator.add, 0, Cons(1, [2]))() 3
>>> from functools import reduce >>> reduce(operator.add, Cons(1, [2]).asiter(), 0) 3
Foldr
instances have the following attributes:-
operator
¶ The operator applied in each step of the computation.
-
arg
¶ The initial value z.
-
collection
¶ The collection of values.
Any of the attributes can be
Undefined
to make the instance a partial definition.Calling a non-partial
Foldr
instance traverses recursively the collection and applies the given operator. If the collection is large enough this may hit the maximum recursion limit.Calling a partial
Foldr
instance returns anotherFoldr
instance if not enough arguments were provided in order to render it non-partial. If enough arguments are provided it behaves as if it were a non-partial. This makes it easy to build ‘functions’ based on the first two arguments as inall = Foldr(operator.and, True)
.As noted in [QLFunc] the
Cons
operator (\(:^\tau\)) needs to be further specified for set and bags. Also the “\(\oplus\)” infix operator needs to be commutative if \(:^\tau\) is left-commutative and idempotent if \(:^\tau\) is left-idempotent.For the purposes of xotl.ql this class is only meant for description and not functionality. So these questions are not directly addressed.
-
-
class
xotl.ql.translation._monads.
Union
(xs, ys)¶ The Union operation.
Unions are defined over
Cons
instances. Unions instances are callables that perform the union when called.Creating a Union:
>>> from xotl.ql.translation._monads import Union, Cons >>> whole = Union(Cons(1, []), Cons(2, [])) >>> whole Union(Cons(1, Empty()), Cons(2, Empty()))
Calling the union instance performs the union:
>>> whole() Cons(1, Cons(2, Empty()))
A Union may be also be a partial by leaving one of its arguments Undefined:
>>> partial = Union(Undefined, Cons(1, []))
Calling partial unions will return the same object is no arguments are passed, or a performed union:
>>> partial() is partial True
>>> partial(Cons(2, [])) Cons(2, Cons(1, Empty()))
Sorting¶
Sorting is only sensibly defined over lists. Our Cons
constructor
can be regarded as a list constructor. The problem of sorting a list can be
defined with Foldr
as well.
We simply need to define the \(x :^{\tau}_{\small <} xs\) operator that inserts x into xs in the “right” position assuming xs is already sorted.
\(x :^{\tau}_{\small <} xs\) is easily defined as:
Now sorting can be achieved by:
Defining \(:^\tau_>\) is just as easy, and then \({\bf sort}^\tau_>\) can be defined as well.
Sorting is only well defined if \(<\) (or \(>\)) are also properly defined.
Yet, sorting can be expensive and engines are allowed to take shortcuts. For
instance, Google’s MapReduce [MAPRED] always sort the result of a map by
keys (all values in Google’s MapReduce are a pair of key, value
.)
Nevertheless, it can be useful to include sorting in our algebraic definition so that ordering instruction can be explicitly represented.
Notice, however, Python generator expressions don’t directly support expressing ordering. Other query languages like SQL do support them.
-
class
xotl.ql.translation._monads.
SortedCons
(order)¶ The sorted insertion operation.
Parameters: order – The ordering function. It may be one of the strings ‘<’, ‘<=’, ‘>’, ‘>=’ or any callable that accepts two arguments x, y and returns True if x is in the right order with regards to y.
For instance,
operator.lt()
is a valid argument – in fact, ‘<’ is just an alias for it.SortedCons
won’t sort a list. It will simply “push” its first argument until it matches the order function:>>> from xotl.ql.translation._monads import SortedCons, Empty >>> SortedCons('<')(49, Cons(30, Cons(50, Cons(10, [-1])))) Cons(30, Cons(49, Cons(50, Cons(10, Cons(-1, Empty())))))
Using Foldr we obtain a sort function:
>>> Foldr(SortedCons('<'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty()))))) Cons(-1, Cons(30, Cons(49, Cons(50, Empty())))) >>> Foldr(SortedCons('>'), Empty())(Cons(30, Cons(49, Cons(50, Cons(-1, Empty()))))) Cons(50, Cons(49, Cons(30, Cons(-1, Empty()))))
Large collections¶
Although this module is not meant for execution of these operations, and thus
truly large collections are out the question, we have a LazyCons
class that allows to represent large collections. However, the current
definition of Foldr
makes it unpractical to really work with them.
-
class
xotl.ql.translation._monads.
LazyCons
(x, xs)¶ A Cons that does not iterate over its arguments until needed.
Cons
can’t represent a collection that exceeds the recursion limit of the underlying representation. LazyCons can represent such collections:>>> from xotl.ql.translation._monads import LazyCons >>> from xoutil.eight import range >>> lc = LazyCons(1, range(10**6)) >>> lc LazyCons(1, ...range(...1000000))
>>> len(lc.aslist()) 1000001
As with
Cons
the standard operation is to extract head and tail. The tail is always aLazyCons
instance:>>> head, tail = lc >>> head 1
>>> tail LazyCons(0, <range...iterator...>)
It may even represent unbounded collections:
>>> import itertools >>> lc = LazyCons(1, itertools.count(2)) >>> lc LazyCons(1, count(2))
However you must be careful while iterating over such as collection:
>>> len(list(itertools.takewhile(lambda x: x < 100, lc.asiter()))) 99
Warning
LazyCons is not provided for performance or efficiency.
Like all the objects in this module, LazyCons is not meant to be used in production but to test expectations about the execution plans you may be using based on monads comprehensions.
We needed a lazy version of Cons in out the
xotl.ql.translation.py
module
Memento for mathematical terms¶
- monoid
An algebraic structure with an associative operator ++ having a distinguished element zero (0) as left and right identity:
0 ++ x = x x ++ 0 = x
https://en.wikipedia.org/wiki/Monoid
I don’t know why in [MCQL] they claim ([], :) is a monoid. The structure of all lists with the concatenation operator (++) is a monoid. Since all lists and the operator ++ are fully defined using [] and :, it’s plausible to believe they refer to the monoid generated by those constructors.