Quetzal-CoaTL
The Coalescence Template Library
Loading...
Searching...
No Matches
Reading spatial variables from rasters

Reading a single variable

Landscapes, along with their spatial and temporal heterogeneity, play a significant role in shaping the dynamics of populations and lineages. In computational models, these landscapes are often represented as geospatial grids or rasters.

In most analyses, it is common for users to establish connections between local growth processes and a habitability raster obtained from Ecological Niche Models. Additionally, they may associate dispersal or connectivity with a Digital Elevation Model.

The Quetzal library offers a quetzal::geography::raster class with a streamlined interface to incorporate these spatial grids into georeferenced coalescence simulations.

By utilizing quetzal::geography::raster, you can define a single variable that encompasses:

  • Spatial heterogeneity: Rasters divide a geographical space into cells identified by their row and column coordinates.
  • Temporal heterogeneity: Rasters have a depth, meaning multiple bands (layers) that are utilized to model the temporal dimension.

When reading a variable using the from_file() function, there are a few considerations to keep in mind:

  1. You need to decide on the raster template argument time_type. In most cases, it would be an integer representing a band mapped to a year (e.g., 2023 for the latest band). However, you can opt for a more complex type such as a time period (e.g., 7000–3000 BCE).
  2. Keep in mind that duplicating identical bands is unnecessary and can make the raster bulky. Instead, you can virtualize repeated bands using the VRT (Virtual Dataset) format. See the Python Quetzal-CRUMBS helper library to do so.
  3. There are multiple ways to identify locations within a spatial grid, and the choice depends on the specific usage context. The quetzal::geography::raster interface allows for switching between them:
    • lonlat or latlon: Decimal longitude and latitude values, which may or may not fall within the spatial extent of the grid. Genetic samples and other user inputs will generally use these real-world coordinate formats and may require projection into a location_descriptor.
    • colrow or rowcol: The most intuitive way to index the cells of a grid.
    • location_descriptor: A one-dimensional index representing the grid cells. It can be thought of as mapping 0 to the top-left cell (grid origin) and width × height - 1 to the bottom-right cell. Since it's a simple integer, this 1D system is computationally efficient and designed for intensive simulations. Converting a location_descriptor to latlon or lonlat will provide the coordinates of the cell centroid. The to_centroid function is also available for this purpose.
  4. When reading the value of a variable at a specific location, it may or may not exist. In the latter case, it is considered a NA (Not Available) value, and an empty optional is returned. The actual representation of NA in the dataset can be obtained using the NA() function.

Input

1#include "quetzal/geography.hpp"
2#include <cassert>
3#include <filesystem>
4
5using namespace quetzal;
6
7// Expected dataSet structure
8//
9// * origin at Lon -5, Lat 52.
10// * 9 cells
11// * 10 layers
12// * pixel size (-5, 5)
13// * East and South limits ARE NOT in spatial extent
14//
15//
16// (origin) -5 0 5 10
17// \ / / / /
18// \ / / / /
19// 52 * ------ * ----- * ---- *
20// | . | . | .
21// | c0 | c1 | c2
22// 47 * ------ * ----- * ---- *
23// | . | . | .
24// | c3 | c4 | c5
25// 42 * ------ * ----- * ---- *
26// | . | . | .
27// | c6 | c7 | c8
28// 37 * * * *
29//
30//
31// 90
32// | (+)
33// 0 --------------> 180 X size positive in decimal degree (east direction positive)
34// | Y size negative in decimal degree (south direction negative)
35// |
36// |
37// |
38// (-) v
39// 0
40
41int main()
42{
43 using time_type = int;
44 using raster_type = geography::raster<time_type>;
45
46 // The raster has 10 bands that we will assign to 2001 ... 2011.
47 std::vector<time_type> times(10);
48 std::iota(times.begin(), times.end(), 2001);
49
50 auto file = std::filesystem::current_path() / "data/bio1.tif";
51
52 // Read the raster
53 auto bio1 = raster_type::from_file(file, times);
54
55 std::cout << bio1 << std::endl;
56
57 // Check there are 10 bands/layers/time periods
58 assert(std::ranges::distance(bio1.times()) == 10);
59
60 // There are 9 cells/spatial coordinates
61 assert(std::ranges::distance(bio1.locations()) == 9);
62
63 // You will typically have georeferenced sampling points
64 using latlon = typename raster_type::latlon;
65 using lonlat = typename raster_type::lonlat;
66
67 auto point_1 = latlon(52., -5.);
68 auto point_2 = lonlat(-5., 52.);
69
70 assert(point_1 == point_2);
71
72 // Defines a lambda expression for checking extent
73 auto check = [&](auto x) {
74 std::cout << "Point " << x << " is" << (bio1.contains(x) ? " " : " not ") << "in bio1 extent" << std::endl;
75 };
76
77 check(point_1);
78 auto point_3 = lonlat(-99., 99);
79 check(point_3);
80
81 // Computing distance will come handy for spatial graphs
82 std::cout << point_1 << " is " << point_1.great_circle_distance_to(point_3) << " km away from " << point_3
83 << std::endl;
84
85 // Coordinates
86 auto x = bio1.to_descriptor(bio1.origin());
87 auto t = bio1.times().front();
88
89 // 1D location descriptors can be converted to 2D coordinate systems
90 std::cout << "All equivalent:\n\t" << x << "\n\t" << bio1.to_rowcol(x) << "\n\t" << bio1.to_colrow(x) << "\n\t"
91 << bio1.to_latlon(x) << "\n\t" << bio1.to_lonlat(x) << std::endl;
92
93 // Retrieve the raster value, that may be defined.
94 std::optional<double> maybe_bio1 = bio1.at(x, t);
95
96 // It may be a NA, let's check for it:
97 std::cout << "bio1(" << x << "," << t << ") is " << (maybe_bio1.has_value() ? "" : "not") << "defined."
98 << std::endl;
99
100 // If value is not defined (empty optional), let's use the raster NA value.
101 std::cout << maybe_bio1.value_or(bio1.NA()) << std::endl;
102}
Discrete spatio-temporal variations of an environmental variable.
Definition raster.hpp:38
Simulation of coalescence-based models of molecular evolution.
Definition coalescence.hpp:21

Output

1Origin: (Lat: 52, Lon: -5)
2Width: 3
3Height: 3
4Depth: 10
5Resolution:
6 Lat: -5
7 Lon: 5
8Extent:
9 Lat min: 37
10 Lat max: 52
11 Lon min: -5
12 Lon max: 10
13NA value: -1.7e+308
14
15Point (Lat: 52, Lon: -5) is in bio1 extent
16Point (Lat: 99, Lon: -99) is not in bio1 extent
17(Lat: 52, Lon: -5) is 4.25619e+06 m km away from (Lat: 99, Lon: -99)
18All equivalent:
19 0
20 (Column: 0, Row: 0)
21 (Column: 0, Row: 0)
22 (Lat: 49.5, Lon: -2.5)
23 (Lat: 49.5, Lon: -2.5)
24bio1(0,0) is defined.
25104.602

Reading multiple variables

Aligning rasters refers to a spatial data representation technique used in Geographic Information Systems (GIS). In this context, raster data layers are aligned and registered to a common coordinate system and grid structure. This ensures that the cells or pixels in the raster layers are spatially aligned and correspond to the same geographic locations.

Alignment is achieved in quetzal::geography::raster objects by defining a consistent grid structure, such as a uniform cell size and orientation, for all raster layers.

In situations where multiple GIS variables that change over time are involved (such as a suitability and an elevation across geological times), an additional alignment mechanism is required. The quetzal::geography::landscape class addresses this need by combining multiple raster datasets, each potentially having multiple layers, into a single cohesive object. It ensures that all spatial grids are properly aligned and maintains the temporal dimension represented by multiple layers. This allows for accurate overlay, simulation, and composition of multiple datasets within the GIS environment.

Input

1#include "quetzal/quetzal.hpp"
2#include <cassert>
3#include <filesystem>
4
5using namespace quetzal;
6
7int main()
8{
9 auto file1 = std::filesystem::current_path() / "data/bio1.tif";
10 auto file2 = std::filesystem::current_path() / "data/bio12.tif";
11
12 // The raster have 10 bands that we will assign to 2001 ... 2010.
13 std::vector<int> times(10);
14 std::iota(times.begin(), times.end(), 2001);
15
16 // Initialize the landscape: for each var a key and a file, for all a time series.
17 using landscape_type = quetzal::geography::landscape<>;
18 auto env = quetzal::geography::landscape<>::from_file({{"bio1", file1}, {"bio12", file2}}, times);
19 std::cout << env << std::endl;
20
21 // We indeed recorded 2 variables: bio1 and bio12
22 assert(env.num_variables() == 2);
23
24 // The semantic shares strong similarities with a raster
25 landscape_type::latlon Bordeaux(44.5, 0.34);
26
27 assert(env.contains(Bordeaux));
28 assert(env.contains(env.to_centroid(Bordeaux)));
29
30 // These little function-objects will soon be very handy to embed the GIS variables
31 // into the simulation with quetzal::expressive
32 const auto &f = env["bio1"].to_view();
33 const auto &g = env["bio12"].to_view();
34
35 // But for now just print their raw data if defined or any other value if undefined
36 for (auto t : env.times())
37 {
38 for (auto x : env.locations())
39 {
40 f(x, t).value_or(0.0);
41 g(x, t).value_or(env["bio12"].NA());
42 }
43 }
44}
Discrete spatio-temporal variations of a set of environmental variables.
Definition landscape.hpp:42
static landscape from_file(const std::map< key_type, std::filesystem::path > &files, const std::vector< time_type > &times)
Read the rasters from a set of input geotiff files.
Definition landscape.hpp:135

Output

1Landscape of 2 aligned rasters:
2bio1 bio12
3Origin: (Lat: 52, Lon: -5)
4Width: 3
5Height: 3
6Depth: 10
7Resolution:
8 Lat: -5
9 Lon: 5
10Extent:
11 Lat min: 37
12 Lat max: 52
13 Lon min: -5
14 Lon max: 10

Next steps

Geospatial datasets play a crucial role in coalescence by providing valuable information about the demographic process. Once a raster is successfully parsed, users typically have three primary objectives they may want to pursue:


Demes on a regular spatial grid

The rasters grid structure can be employed to construct a spatial graph of demes that is fully connected and utilized for simulating the demographic process. This is pertinent when looking at very larged continuous land masses. In this context:

  • Vertices correspond to demes (populations) and are represented by the centroids of the raster cells.
  • Edges represent the distances between demes, and their connectivity can be modified using dispersal kernels.

Demes on a variable spatial graph

In archipelagos or clusters of loosely connected islands, the fluctuations in sea levels have a notable impact on the dynamics of species. These variations cause the intermittent connection and separation of land masses and their corresponding populations. Similarly, climatic shifts in sky-island complexes can have similar effects. As a result, the assumption of a completely interconnected graph representing population units (demes) is no longer applicable in these situations.

To address this, Digital Elevation Models (DEMs) can be utilized to establish a relationship between the dynamics of sea levels and species. In these cases, the connectivity of the graph at any given time is determined by the current elevation, and changes in elevation directly impact the level of connectivity between different areas.


Inform local processes

Ecological Niche Models (ENMs) play a crucial role in predicting how populations might respond to changes in climate, both in the past and the future.

Typically, these models produce raster outputs that represent estimates of suitability. These suitability values are then utilized to determine growth rates and carrying capacity.

By utilizing the values within the raster, it becomes possible to derive demographic quantities through the construction of mathematical expressions that incorporate both time and space. This approach is exemplified in the quetzal::expressive framework.

When encountering NA values, such as cells representing oceanic areas or those lying outside the spatial extent of the raster, the std::optional mechanism is employed. This means that these values may or may not have a defined value, leaving it to the user to decide how to handle such cases.