Introduction
Background
Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas.
Yet not very efficient, Newick is a simple and popular format for representing trees, and provides a rather useful abstraction in coalescence theory for representing the shared history of populations (species trees, population trees) or gene genealogies that coalesce into these trees.
Quetzal provides some utilities to parse, generate and manipulate such format.
Grammar
Whitespace here refer to any of the following: spaces, tabs, carriage returns, and linefeeds.
- Whitespace within number is prohibited.
- Whitespace outside of node names is ignored.
- Grammar characters (semicolon, parentheses, comma, and colon) are prohibited
- Comments are enclosed in square brackets.
Objectives
In this tutorial section you will learn how to:
- Parse a Newick string to a Quetzal k-ary tree class,
- Customize the default Quetzal tree properties to describe your data and problem correctly,
- Parse a Newick string to your own legacy tree class
To a Quetzal k-ary tree
Default properties
In this example we will use the default (and simplistic) properties of the graph synthetized by the parsing. By properties we mean the type the data structures used to store arbitrary information along the vertices and edges of a graph.
For general graphs it could be anything, but since we are parsing Newick formats, we can reasonably expect ad minima:
- a
std::string
to describe a vertex (the name of the node)
- a
double
to describe an edge (the distance to a parent node).
Quetzal defaults to these minima when it tries to build a tree from a Newick string.
Input
1#include "quetzal/quetzal.hpp"
9 std::string s1 =
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;";
10 std::string s2 =
"(,,(,));";
13 auto [tree1, root1] = newick::to_k_ary_tree<>(s1);
14 auto [tree2, root2] = newick::to_k_ary_tree<>(s2);
16 std::cout <<
"These graphs are " << (tree1.is_isomorphic(tree2) ?
"toootally" :
"not") <<
" isomorphic!"
Output
5These graphs are toootally isomorphic!
Custom properties
In the previous example we used default types to represent the properties of a vertex and an edge. These defaults are useful, but not always sufficient. For example, you may want to visit your tree graph and trigger specific events when specific nodes or edges are visited.
In this case, you would need to define your own small structures to represent the graph properties you want:
- the vertex should be constructible from a
std::string
- the edge should be constructible from a
double
- the event would be to print to
std::cout
with a pretty display
Input
1#include "quetzal/quetzal.hpp"
17static inline auto &operator<<(std::ostream &os,
vertex_info const &v)
19 return os << quoted(v.name);
29 void operator()(
auto stage,
auto vertex)
33 case boost::visit::pre:
35 std::cout <<
"Discover vertex " << vertex <<
" (" << _tree[vertex] <<
")\n";
37 case boost::visit::in:
39 case boost::visit::post:
42 throw std::invalid_argument(
"received invalid DFS order");
52 std::string s =
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;";
56 auto [tree, root] = newick::to_k_ary_tree<vertex_info, edge_info>(s);
60 tree.depth_first_search(root, visitor);
Definition newick_parser_2.cpp:24
Definition geography_dispersal_kernel_4.cpp:13
Definition coalescence_binary_tree_2.cpp:5
Output
Interfacing legacy code
If you have been coding for a while, you may already depend heavily on your own class to describe a tree graph, and not feel very excited about refactoring your whole project to switch to Quetzal trees. Fair enough.
This section shows how to use Quetzal parser to populate your own legacy class and then forget about Quetzal.
- You will need to parse Newick strings into an AST (Abstract Syntax Tree): this is a temporary data structure that represents the syntactic structure of the tree.
- This AST can then be iterated over to convert it into the data structure of your choice.
- Note
- Quetzal does not know anything about your own classes, so you will need a bit of extra effort to write a recursion on the AST data
- But at least you don't have to worry about the parsing logic anymore!
- If the task becomes tedious, don't hesitate to ask for help !
- On most compilers but Apple Clang, you could simplify the example code by writing:
subtree.left_child = std::make_unique<MyNode>(ast.name, ast.distance_to_parent);
in the recursion function.
Input
1#include "quetzal/quetzal.hpp"
10 std::unique_ptr<MyNode> left_child;
11 std::unique_ptr<MyNode> right_child;
19 std::string s =
"(A:0.1,(C:0.3,D:0.4)B:0.5)E;";
22 using ast_type = newick::ast::node;
28 auto ok = quetzal::parse(begin(s), end(s), newick::parser::tree, ast);
31 std::cout << quoted(s) <<
"\t " << (ok ?
"Parsed" :
"Failed");
33 std::cout <<
"\n\nAbstract Tree Syntax:\n\n" << ast << std::endl;
36 static const auto populate = [](
auto &tree_root,
const auto &ast_root) {
37 static auto recurse = [](
auto &subtree,
const auto &ast,
auto &self)
mutable ->
void {
39 if (ast.children.size() == 2)
42 subtree.left_child = std::unique_ptr<MyNode>(
new MyNode{ast.name, ast.distance_to_parent});
43 self(*subtree.left_child, ast.children.front(), self);
45 subtree.right_child = std::unique_ptr<MyNode>(
new MyNode{ast.name, ast.distance_to_parent});
46 self(*subtree.right_child, ast.children.back(), self);
49 recurse(tree_root, ast_root, recurse);
53 MyNode my_root{.id = ast.name, .length = ast.distance_to_parent};
56 populate(my_root, ast);
58 std::cout << (my_root.left_child ==
nullptr ?
"Failed" :
"Populated") << std::endl;
59 std::cout << (my_root.right_child ==
nullptr ?
"Failed" :
"Populated") << std::endl;
Definition newick_parser_3.cpp:7
Output
1"(A:0.1,(C:0.3,D:0.4)B:0.5)E;" Parsed