Definite clause grammars and symbolic differentiation

Let's talk about calculus. It's an extremely important field for almost all branches of science, and you've probably seen at least some aspects of it during high school or possible further education. Applications range from physics, chemistry, and electrical engineering to finance, medicine, meteorology, and much more.

Fundamentally, calculus is the mathematical field concerned with studying changes in continuous quantities. (Discrete quantities are those that are in some sense "countable". E.g., one apple, two apples, et cetera. Whereas continuous quantities are not countable, and can take any value in an entire range. E.g., a bottle of water may be 0% filled, 100% filled, or any value in between.) In simple terms, the two key problems calculus answers are, given some mathematical function:

(We'll only look at the first question in this article.)

How quickly a function $f$ rises at a given point is a sort of "instanteneous rate of change" (you can think of it as the slope of the tangent line at that point). In fact, if we map every value from the domain of our function to the rate of change at that coordinate, we get a new function, called the derivative of $f$ (denoted by $\frac{d f}{d x}$ ). It is defined as:

\frac{d f}{d x} (x) = lim_{h \to 0} \frac{f (x + h) - f (x)}{h},

where you can interpret $h$ as an extremely small quantity that is almost identically zero. (Formally, we're taking the limit of the inner expression as $h$ approaches zero.) So the derivative at $x$ equals the rate of change in the function $f$ between two very nearby points $x$ and $x + h$ . Informally, you can evaluate the limit from the right-hand side of this equation as follows: manipulate the expression such that $h$ no longer appears in the denominator, then treat all remaining $h$ 's as zero. For example, the derivative with respect to $x$ of the function $2 x^{2}$ can be found as follows:

\begin{matrix} \frac{d}{d x} (2 x^{2}) & = & lim_{h \to 0} \frac{2 {(x + h)}^{2} - 2 x^{2}}{h} \\ = & lim_{h \to 0} \frac{2 (x^{2} + 2 x h + h^{2}) - 2 x^{2}}{h} \\ = & lim_{h \to 0} \frac{4 x h + 2 h^{2}}{h} \\ = & lim_{h \to 0} (4 x + 2 h) \\ = & 4 x . \end{matrix}

So the slope of the tangent line to $f$ at $x = 0$ equals $0$ , at $x = 1$ it equals $4$ , at $x = 2$ it equals $8$ , and in general the slope equals $4 x$ :

\frac{d f}{d x}

(

) =

While it is possible to determine every derivative using just the above definition, it is a pretty cumbersome process, and it turns out that there's just a few really general rules that we need to be able to apply to differentiate (to take the derivative of) most functions:

\begin{matrix} \frac{d}{d x} (A f (x)) & = & A \frac{d f}{d x} (x), & (constant factor rule) \\ \frac{d}{d x} (f (x) + g (x)) & = & \frac{d f}{d x} (x) + \frac{d g}{d x} (x), & (sum rule) \\ \frac{d}{d x} (f (x) * g (x)) & = & \frac{d f}{d x} (x) * g (x) + f (x) * \frac{d g}{d x} (x), & (product rule) \\ \frac{d}{d x} (f (g (x))) & = & \frac{d f}{d x} (g (x)) * \frac{d g}{d x} (x) . & (chain rule) \end{matrix}

Last time we took a short look at using Prolog for software development - today we'll leverage some of it's language features to write an elegant program to differentiate a given function! Breaking down the problem:

the program should parse an input string representing a mathematical function;
determine its corresponding derivative, and;
simplify the expression.

Definite clause grammars

The first language feature we will use are so-called definite clause grammars (DCGs). A native feature of most Prolog systems to be able to parse a string based on a set of rules that describes what a conforming string looks like.

Imagine we want to recognize strings where each character belonging to some kind of opening bracket "(", "[", "{", and "<" is, at some point, followed by a corresponding closing bracket character: ")", "]", "}", and ">" respectively, and each closing bracket needs to "close" the first unclosed opening bracket of the same type preceding it (e.g. "<[]()>" is valid, but "[(])" is invalid). Most imperative programming languages would require us to write some kind of parser designed for just this specific task.

The development of string and language parsers is a very rich field of software development, but it's not something most programmers are familiar with or interested in, let alone use on a day-to-day basis. The existence of popular "parser generator" tools like ANTLR is a testament to this. Luckily, compared to writing them in imperative language, writing parsers in Prolog is almost trivial! We only need to describe how accepted strings look.

While for normal Prolog rules, the rule body was a comma-separated list of predicates that all needed to hold. For DCG grammar rules, the body is a comma-separated list of strings and DCG grammar rules for which the order is significant: the order indicates in which order the DCG terms must occur in accepted strings.

The following DCG describes strings that adhere to the earlier specification:

:- set_prolog_flag(double_quotes,chars).

paired_braces --> "".
paired_braces --> "(", paired_braces, ")", paired_braces.
paired_braces --> "[", paired_braces, "]", paired_braces.
paired_braces --> "{", paired_braces, "}", paired_braces.
paired_braces --> "<", paired_braces, ">", paired_braces.

(The first line is some configuration that says that we will use double quotes to denote lists of characters. This makes it a bit easier to read and input data, but it's not part of the DCG.)

The code can be read as: we consider a string paired_braces if it's either an empty string, or if it is the characters "(" and ")" wrapped around paired_braces followed by more paired_braces, or similar for the other types of brackets. The way you can interact with definite clause grammars is through the phrase predicate - check the example queries below.

You can play around with the code on SWISH. (SWISH doesn't work on all mobile devices. There should be a blue button titled "Run!" in the bottom right corner; this button does not always appear. Remember to always end your query with a dot.) For example, you can check that the following strings are all valid paired_braces:

"()[]{}<>", checked through phrase(paired_braces, "()[]{}<>").;
"(())", checked through phrase(paired_braces, "(())")., and;
"(<>)[]", checked through phrase(paired_braces, "(<>)[]")..

Other things you can do:

phrase(paired_braces, "([)]"). to verify that the string "([)]" is not a valid paired_braces string;
length(Braces, 6), phrase(paired_braces, Braces). to generate all 320 different valid paired_braces strings of length 6, and;
phrase(paired_braces, ['(' | Remaining]). to generate (infinitely many) valid paired_braces strings starting with '('.

An equivalent way to write the previous DCG is:

:- set_prolog_flag(double_quotes,chars).

paired_braces --> "".
paired_braces --> nonempty_paired_braces, paired_braces.
nonempty_paired_braces --> "(", paired_braces, ")".
nonempty_paired_braces --> "[", paired_braces, "]".
nonempty_paired_braces --> "{", paired_braces, "}".
nonempty_paired_braces --> "<", paired_braces, ">".

Where we have used a helper predicate nonempty_paired_braces to simplify the grammar somewhat. You can play around with the code on SWISH. Can you describe in words what the above grammar means?

But this is just scratching the surface - DCGs are a lot more powerful than simple string recognition machines: we can treat the terms in a grammar similar to normal Prolog predicates, to allow us to reason about the string we are parsing. For example, let's say we want to check whether the brackets occurring in a string are correctly paired, and only if they are, to gather the remaining nonbracket characters in order. We can achieve this by extending the grammar from above as follows:

:- set_prolog_flag(double_quotes,chars).
:- use_module(library(dif)).

paired_braces([]) --> "".
paired_braces(Text) -->
  nonempty_paired_braces(Text1),
  paired_braces(Text2),
  { append(Text1, Text2, Text) }.

nonempty_paired_braces(Text) --> "(", paired_braces(Text), ")".
nonempty_paired_braces(Text) --> "[", paired_braces(Text), "]".
nonempty_paired_braces(Text) --> "{", paired_braces(Text), "}".
nonempty_paired_braces(Text) --> "<", paired_braces(Text), ">".
nonempty_paired_braces([Character]) -->
  % The following line matches any `Character`, and then
  % ensures it's not an opening or closing bracket.
  [Character],
  {
    dif(Character, '('),
    dif(Character, ')'),
    dif(Character, '['),
    dif(Character, ']'),
    dif(Character, '{'),
    dif(Character, '}'),
    dif(Character, '<'),
    dif(Character, '>')
  }.

Where we've changed the code in two major ways.

Firstly, each DCG term now tracks the nonbracket characters as its argument: the empty string has no nonbracket characters, the "composition" rule states that nonbracket characters from both composed parts are simply appended together (code within curly braces in a DCG rule is interpreted as normal Prolog code), and each of the bracket-recognizing rules simply transiently passes the gathered text.

Secondly, we've added a DCG rule that recognizes nonbracket characters (the dif predicate relates to things to another whenever they are not the same). This rule essentially reads: match any Character, and ensure it's not any of the listed bracket characters.

You can play around with the code on SWISH. Try out the following queries:

phrase(paired_braces(Text), "t(e<s>)[t]"). to get the nonbracket characters from "t(e<s>)[t]", and;
length(Braces, 8), phrase(paired_braces("test"), Braces). to generate all strings of length 8 that contain the nonbracket characters "test" interspersed by paired braces.

Expression parsing

A mathematical expression parser written in Prolog essentially works the same as the above paired_braces parser, except that, instead of collecting nonbracket characters while parsing the string, we will build up an abstract syntax tree (AST) while parsing the string. This implicitly leverages another language feature of Prolog, showing it is especially suited for what we're trying to do: Prolog's basic data structure, the compound term, is tree-like, making it ideal for representing ASTs.

There are really just two things we need to be mindful of when writing our grammar:

operator precedence (i.e. the order of operations), which we account for by having multiple "layers" in our grammar (deeper layers parse operators with higher precedence), and;
operator associativity (i.e. the grouping of operations), especially left-associativity, which we'll discuss in a bit.

With regards to precedence: consider the expression $x^{2} + 1$ (represented as the string "x^2+1"), according to "PEMDAS" it should be converted to the following AST (diagram generated using Mermaid):

              flowchart TD
                  op1((\+))
                  op2((\^))
                  x((x))
                  two((2))
                  one((1))
                  op1 --> op2
                  op2 --> x
                  op2 --> two
                  op1 --> one

                classDef default fill:#f9f,stroke:#333;

Which can be represented using Prolog terms as +(^(x, 2), 1), or, in infix notation, as (x ^ 2) + 1. These representations are equivalent, and they also do not evaluate to anything - they are simply compound terms: in this case, they are just a way to organise data.

The "sub-AST" for the exponentiation is nested in the total tree. It helps to think of it as the recognition of exponentiation occurring deeper than addition. As we will see, this is exactly how our grammar will be built up.

With regards to associativity: consider the expression $x - y - z$ (represented as the string "x-y-z"). Which of the following ASTs best represents the meaning of this expression? (It helps to mentally convert the ASTs back to the expressions they represent to answer this question.)

                flowchart TD
                  op1((\-))
                  op2((\-))
                  x1((x))
                  y1((y))
                  z1((z))
                  op1 --> op2
                  op2 --> x1
                  op2 --> y1
                  op1 --> z1 ~~~ hidden

                  op3((\-))
                  x2((x))
                  y2((y))
                  z2((z))
                  op4((\-))
                  op3 --> x2 ~~~ hidden
                  op3 --> op4
                  op4 --> y2
                  op4 --> z2

                classDef default fill:#f9f,stroke:#333;
                classDef hide display:none;
                class hidden hide;

By looking at the operands of the root operation in both ASTs, we find that the first AST corresponds to $(x - y) - z$ , while the second corresponds to $x - (y - z)$ . After simplification, we see that the second AST actually represents the expression: $x - y + z$ , which is not at all what we wanted! So the first AST is correct.

Generally speaking, there is a strict way in which expressions should be interpreted, because subtraction (and division) are both left-associative, meaning we should always start "grouping" operands from the left. As we will shortly see, the effect is that the grammar rule responsible for parsing subtraction (and division) will contain a reference to itself as its first term: we get what's known as left recursion.

For a lot of parsers, left recursion leads to nontermination, which is a big problem. Due to the self-reference within the grammar rule, you get stuck in an infinite loop when trying to parse an expression according to that rule: if we want to read a left recursive grammar rule from left to right, we are required to be able to read that same rule from left to right since it occurs as its own first term, which is a circular argument. We'll table that problem for later, and just write our grammar rules with left recursion for now.

A first version of our parser could look like this:

:- set_prolog_flag(double_quotes,chars).
:- use_module(library(dcg/basics)).

% Describes expressions
expression(A+B) --> expression(A), "+", term(B).
expression(A-B) --> expression(A), "-", term(B).
expression(E)   --> term(E).

% Describes expressions that have multiplication,
% division, or a higher precedence operator at their
% root.
term(A*B) --> term(A), "*", factor(B).
term(A/B) --> term(A), "/", factor(B).
term(E)   --> factor(E).

% Describes expressions that have exponentiation, or
% a higher precedence operator at their root.
factor(A^B) --> unary(A), "^", factor(B).
factor(E)   --> unary(E).

% Describes the remaining cases: unary operators,
% function calls, use of parenthesis, numbers, and
% variables.
unary(-E)     --> "-", expression(E).
unary(E)      --> "(", expression(E), ")".
unary(log(E)) --> "log(", expression(E), ")".
unary(exp(E)) --> "exp(", expression(E), ")".
unary(sin(E)) --> "sin(", expression(E), ")".
unary(cos(E)) --> "cos(", expression(E), ")".
unary(tan(E)) --> "tan(", expression(E), ")".
unary(number(N))   --> number(N).
unary(variable(V)) --> letters(V).

% Helper DCG rules
letters([L])    --> letter(L).
letters([L|Ls]) --> letter(L), letters(Ls).

letter(L) --> [L], { char_type(L, alpha) }.

There are a few things that are good to note:

we use the argument of each DCG term to build up the AST as we parse the expression string. As noted before, the first arguments, for example A+B and log(E), are just compound terms in this context, they don't evaluate to anything and are just a data representation;
the layering of the DCG rules (expression → term → factor → unary → expression) ensures operator precedence rules are respected. For our parser, the top level node will be addition or subtraction if possible, then multiplication or division if possible, etc. This ensures that higher precedence operators occur deeper in the AST, which is exactly what we want;
there is left recursion in both the expression and term rules. So as is, if we would check strings against this grammar, the query would not terminate;
there are just two non-recursive rules, which correspond to recognition of the leaves of our ASTs: numbers and variables. library(dcg/basics) defines the DCG rule number for recognizing numbers, and casting its string representation to a number. We introduce the helper predicates letter and letters to respectively recognize a single alphabetical character (checked through char_type), and multiple letters.

There are multiple ways to tackle the left recursion issue. We could try to rewrite our grammar with the left recursion in these rules somehow eliminated, we could amend the AST after we've generated it, or - the preferred solution - rather than changing the logic of our grammar, we could alter the execution of our program, so the grammar is parsed in a different way. The left recursion accurately models the left associative behaviour of some operators, so leaving it alone and not complicating the grammar seems like a logical choice. To do so, all that's required is to add the declaration:

:- table expression//1, term//1.

And the issue is fixed! Under the hood, this changes the execution behaviour for the indicated predicates to tabled execution, which is comparable to "memoization", except that it also helps escape nontermination in case a cycle is detected.

You can play around with the parser on SWISH. Here are a few queries to try out:

phrase(expression(X), "x^2+1"), write_canonical(X). to parse the first example expression, and to print the AST in "canonical form", to print the terms in prefix notation;
phrase(expression(X), "x-y-z"), write_canonical(X). to parse the second example expression, and print the AST in canonical form.

Differentiation

Given a compound term representing an AST from our parser, taking the derivative of the associated expression becomes relatively easy. We will define the predicate derivative(Variable, Function, Derivative) to mean that Derivative is the derivative of Function with respect to Variable. For example, the sum rule can then be defined as:

% Sum rule for derivatives
derivative(X, A+B, DA+DB) :-
  derivative(X, A, DA),
  derivative(X, B, DB).

I.e., given an AST with a + at its root, the derivative can be found by taking the derivatives of both branches, and adding those together.

In a similar fashion, we can define differentiation rules for all possible root nodes:

% Difference rule for derivatives
derivative(X, A-B, DA-DB) :-
  derivative(X, A, DA),
  derivative(X, B, DB).

% Product rule for derivatives
derivative(X, A*B, (DA*B)+(A*DB)) :-
  derivative(X, A, DA),
  derivative(X, B, DB).

% Quotient rule for derivatives
derivative(X, A/B, (DA*B-A*DB)/(B^number(2))) :-
  derivative(X, A, DA),
  derivative(X, B, DB).

% Power rule for derivatives
derivative(X, A^B, A^(B-number(1))*(B*DA + A*log(A)*DB)) :-
  derivative(X, A, DA),
  derivative(X, B, DB).

% Special case of constant factor rule for derivatives
derivative(X, -E, -DE) :-
  derivative(X, E, DE).

% Logarithmic derivative (+ chain rule)
derivative(X, log(E), DE/E) :-
  derivative(X, E, DE).

% Exponential derivative (+ chain rule)
derivative(X, exp(E), DE*exp(E)) :-
  derivative(X, E, DE).

% Sine derivative (+ chain rule)
derivative(X, sin(E), DE*cos(E)) :-
  derivative(X, E, DE).

% Cosine derivative (+ chain rule)
derivative(X, cos(E), -DE*sin(E)) :-
  derivative(X, E, DE).

% Tangent derivative (+ chain rule)
derivative(X, tan(E), DE/(cos(E)^number(2))) :-
  derivative(X, E, DE).

And, most notably:

% Constant derivative
derivative(_, number(_), number(0)).

% Variable derivative
derivative(X, variable(X), number(1)).
derivative(X, variable(OtherX), number(0)) :-
  X \= OtherX.

In words: it doesn't matter with respect to which variable we take the derivative of a number, the answer is always zero. And the derivative of a $x$ with respect to $x$ is $1$ , but the derivative of $x$ with respect to a different variable $y$ is $0$ .

This already fully handles the differentiation part of our problem. You can play around with the combined code of the parser and differentiation here on SWISH. Some queries to try out:

phrase(expression(X), "x-y-z"), derivative("x", X, Derivative). to find the derivative of $x - y - z$ with respect to $x$ . Note the "redundant" parts of the AST of the derivative, there is some room for simplification! And;
phrase(expression(X), "x^2+1"), derivative("x", X, Derivative). to find the derivative of $x^{2} + 1$ with respect to $x$ . The complexity of the resulting ASTs are getting out of hand pretty quickly!

Simplification

This turns out to be the hardest part. We need to define what it means for one AST to be "simpler" than another - what does simpler actually mean?

I've chosen to define a set of simplifications, where an AST is considered simplified if no simplification can be applied. If a simplification can be applied, we do so, and then further simplify the resulting AST. We're using a cut (denoted by !), to prevent trying alternative simplifications after we've already applied one. As a consequence, the first simplification that can be applied will be applied.

% The first applicable simplification that is found
% is applied, and no alternatives are explored. The
% resulting expression is further simplified if
% possible.
simplify(E1, E3) :-
  simplification(E1, E2),
  !, % This prevents searching for alternatives
  simplify(E2, E3).

% If there is no applicable simplification rule,
% then the expression E can not be simplified
% further.
simplify(E, E) :- \+ simplification(E, _).

There are a lot of logical simplifications, such as removing zeroes and ones where possible, identifying repeated sub-ASTs, moving around minus signs, actually evaluating ASTs whose leaves consist only of numbers, et cetera.

These are the simplifications I came up with:

% Rules in order of most to least preferred simplification
% 1. Additive and multiplicative identities
simplification(A+number(0), A).
simplification(number(0)+A, A).
simplification(A-number(0), A).
simplification(number(0)-A, -A).
simplification(number(0)*_, number(0)).
simplification(_*number(0), number(0)).
simplification(A*number(1), A).
simplification(number(1)*A, A).
simplification(A/number(1), A).
simplification(_^number(0), number(1)).
simplification(A^number(1), A).

% 2. Generic simplifications
simplification(A+A, number(2)*A).
simplification(A-A, number(0)).
simplification(A*A, A^number(2)).
simplification(A*(A^B), A^(B+number(1))).
simplification((A^B)*A, A^(B+number(1))).
simplification((A^B)*(A^C), A^(B+C)).
simplification(log(A)*log(B), log(A+B)).
simplification(log(A)/log(B), log(A-B)).

% 3. Unary minuses
simplification(A*number(Negative), -(A*number(Positive))) :-
  Negative < 0,
  Positive is -Negative.
simplification(number(Negative)*A, -(number(Positive)*A)) :-
  Negative < 0,
  Positive is -Negative.
simplification(A/number(Negative), -(A/number(Positive))) :-
  Negative < 0,
  Positive is -Negative.
simplification(number(Negative)/A, -(number(Positive)/A)) :-
  Negative < 0,
  Positive is -Negative.
simplification((-A)*B, -(A*B)).
simplification(A*(-B), -(A*B)).
simplification((-A)/B, -(A/B)).
simplification(A/(-B), -(A/B)).
simplification(-(-A), A).
simplification(-(A-B), B-A).
simplification((-A)-B, -(A+B)).
simplification(A+ -B, A-B).
simplification(-A + B, B-A).

% 4. Constant folding
simplification(number(N1)+number(N2), number(N3)) :- N3 is N1 + N2.
simplification(number(N1)-number(N2), number(N3)) :- N3 is N1 - N2.
simplification(number(N1)*number(N2), number(N3)) :- N3 is N1 * N2.
simplification(number(N1)/number(N2), number(N3)) :- N3 is N1 / N2.
simplification(number(N1)^number(N2), number(N3)) :- N3 is N1 ^ N2.
simplification(-number(N1), number(N2)) :- N2 is -N1.
simplification(log(number(N1)), number(N2)) :- N2 is log(N1).
simplification(exp(number(N1)), number(N2)) :- N2 is exp(N1).
simplification(sin(number(N1)), number(N2)) :- N2 is sin(N1).
simplification(cos(number(N1)), number(N2)) :- N2 is cos(N1).
simplification(tan(number(N1)), number(N2)) :- N2 is tan(N1).

% 5. Distributive laws
simplification(A*(B+C), A*B+A*C).
simplification(A*(B-C), A*B-A*C).
simplification((A+B)*C, A*C+B*C).
simplification((A-B)*C, A*C-B*C).

% 6. Recursive cases
simplification(A+B, SA+B) :- simplification(A, SA).
simplification(A-B, SA-B) :- simplification(A, SA).
simplification(A*B, SA*B) :- simplification(A, SA).
simplification(A/B, SA/B) :- simplification(A, SA).
simplification(A^B, SA^B) :- simplification(A, SA).
simplification(A+B, A+SB) :- simplification(B, SB).
simplification(A-B, A-SB) :- simplification(B, SB).
simplification(A*B, A*SB) :- simplification(B, SB).
simplification(A/B, A/SB) :- simplification(B, SB).
simplification(A^B, A^SB) :- simplification(B, SB).
simplification(-A, -SA) :- simplification(A, SA).
simplification(log(A), log(SA))  :- simplification(A, SA).
simplification(exp(A), exp(SA))  :- simplification(A, SA).
simplification(sin(A), sin(SA))  :- simplification(A, SA).
simplification(cos(A), cos(SA))  :- simplification(A, SA).
simplification(tan(A), tan(SA))  :- simplification(A, SA).

Which finally allows us to define the differentiate predicate as:

differentiate(Variable, String, SimplifiedDerivative) :-
  phrase(expression(Term), String),
  derivative(Variable, Term, Derivative),
  simplify(Derivative, SimplifiedDerivative).

The full program can be found on SWISH, and is only about 200 lines of code (including comments and blank lines). Here are a few queries to try out:

differentiate("x", "x-y-z", Derivative).
differentiate("x", "x^2*sin(x)", Derivative).
differentiate("x", "x-y-z", Derivative).
differentiate("y", "x^2*sin(y)", Derivative).

Wrapping up

Definite clause grammars are a very powerful tool for processing language/strings/expressions, and are often nothing more than a very concise, natural description of the thing you are processing. Regular Prolog code can be inserted within a DCG rule, making it very easy to construct complex structures as you parse a string. This is what allowed us to build an abstract syntax tree of the expression encoded in the string we were parsing - a much nicer approach than building a parser from scratch in an imperative programming language! Finally, Prolog's basic data structure, the compound term, can easily encode abstract syntax trees, and defining expression simplification rules is rather easy. All of which makes Prolog a great fit for a problem such as this!

Next time, we finish off the Prolog series (for now), with another application written in Prolog - so stay tuned!