Reading a single variable

Landscapes, along with their spatial and temporal heterogeneity, play a significant role in shaping the dynamics of populations and lineages. In computational models, these landscapes are often represented as geospatial grids or rasters.

In most analyses, it is common for users to establish connections between local growth processes and a habitability raster obtained from Ecological Niche Models. Additionally, they may associate dispersal or connectivity with a Digital Elevation Model.

The Quetzal library offers a quetzal::geography::raster class with a streamlined interface to incorporate these spatial grids into georeferenced coalescence simulations.

By utilizing quetzal::geography::raster, you can define a single variable that encompasses:

Spatial heterogeneity: Rasters divide a geographical space into cells identified by their row and column coordinates.
Temporal heterogeneity: Rasters have a depth, meaning multiple bands (layers) that are utilized to model the temporal dimension.

When reading a variable using the from_file() function, there are a few considerations to keep in mind:

You need to decide on the raster template argument time_type. In most cases, it would be an integer representing a band mapped to a year (e.g., 2023 for the latest band). However, you can opt for a more complex type such as a time period (e.g., 7000–3000 BCE).
Keep in mind that duplicating identical bands is unnecessary and can make the raster bulky. Instead, you can virtualize repeated bands using the VRT (Virtual Dataset) format. See the Python Quetzal-CRUMBS helper library to do so.
There are multiple ways to identify locations within a spatial grid, and the choice depends on the specific usage context. The quetzal::geography::raster interface allows for switching between them:
- lonlat or latlon: Decimal longitude and latitude values, which may or may not fall within the spatial extent of the grid. Genetic samples and other user inputs will generally use these real-world coordinate formats and may require projection into a location_descriptor.
- colrow or rowcol: The most intuitive way to index the cells of a grid.
- location_descriptor: A one-dimensional index representing the grid cells. It can be thought of as mapping 0 to the top-left cell (grid origin) and width × height - 1 to the bottom-right cell. Since it's a simple integer, this 1D system is computationally efficient and designed for intensive simulations. Converting a location_descriptor to latlon or lonlat will provide the coordinates of the cell centroid. The to_centroid function is also available for this purpose.
When reading the value of a variable at a specific location, it may or may not exist. In the latter case, it is considered a NA (Not Available) value, and an empty optional is returned. The actual representation of NA in the dataset can be obtained using the NA() function.

Input

#include "quetzal/geography.hpp"
#include <cassert>
#include <filesystem>
 
using namespace quetzal;
 
//  Expected dataSet structure
//
//  * origin at Lon -5, Lat 52.
//  * 9 cells
//  * 10 layers
//  * pixel size (-5, 5)
//  * East and South limits ARE NOT in spatial extent
//
//
// (origin)     -5        0       5      10
//         \   /         /       /      /
//          \ /         /       /      /
//      52   * ------ * ----- * ---- *
//           |    .   |   .   |   .
//           |   c0   |  c1   |  c2
//      47   * ------ * ----- * ---- *
//           |    .   |   .   |   .
//           |   c3   |  c4   |  c5
//      42   * ------ * ----- * ---- *
//           |   .    |   .   |   .
//           |   c6   |  c7   |  c8
//      37   *        *       *      *
//
//
//          90
//          |         (+)
//      0 --------------> 180    X size positive in decimal degree (east direction positive)
//          |                    Y size negative in decimal degree (south direction negative)
//          |
//          |
//          |
//     (-)  v
//          0
 
int main()
{
    using time_type = int;
    using raster_type = geography::raster<time_type>;
 
    // The raster has 10 bands that we will assign to 2001 ... 2011.
    std::vector<time_type> times(10);
    std::iota(times.begin(), times.end(), 2001);
 
    auto file = std::filesystem::current_path() / "data/bio1.tif";
 
    // Read the raster
    auto bio1 = raster_type::from_file(file, times);
 
    std::cout << bio1 << std::endl;
 
    // Check there are 10 bands/layers/time periods
    assert(std::ranges::distance(bio1.times()) == 10);
 
    // There are 9 cells/spatial coordinates
    assert(std::ranges::distance(bio1.locations()) == 9);
 
    // You will typically have georeferenced sampling points
    using latlon = typename raster_type::latlon;
    using lonlat = typename raster_type::lonlat;
 
    auto point_1 = latlon(52., -5.);
    auto point_2 = lonlat(-5., 52.);
 
    assert(point_1 == point_2);
 
    // Defines a lambda expression for checking extent
    auto check = [&](auto x) {
        std::cout << "Point " << x << " is" << (bio1.contains(x) ? " " : " not ") << "in bio1 extent" << std::endl;
    };
 
    check(point_1);
    auto point_3 = lonlat(-99., 99);
    check(point_3);
 
    // Computing distance will come handy for spatial graphs
    std::cout << point_1 << " is " << point_1.great_circle_distance_to(point_3) << " km away from " << point_3
              << std::endl;
 
    // Coordinates
    auto x = bio1.to_descriptor(bio1.origin());
    auto t = bio1.times().front();
 
    // 1D location descriptors can be converted to 2D coordinate systems
    std::cout << "All equivalent:\n\t" << x << "\n\t" << bio1.to_rowcol(x) << "\n\t" << bio1.to_colrow(x) << "\n\t"
              << bio1.to_latlon(x) << "\n\t" << bio1.to_lonlat(x) << std::endl;
 
    // Retrieve the raster value, that may be defined.
    std::optional<double> maybe_bio1 = bio1.at(x, t);
 
    // It may be a NA, let's check for it:
    std::cout << "bio1(" << x << "," << t << ") is " << (maybe_bio1.has_value() ? "" : "not") << "defined."
              << std::endl;
 
    // If value is not defined (empty optional), let's use the raster NA value.
    std::cout << maybe_bio1.value_or(bio1.NA()) << std::endl;
}

Output

Origin: (Lat: 52, Lon: -5)
Width: 3
Height: 3
Depth: 10
Resolution: 
    Lat: -5
    Lon: 5
Extent:
    Lat min: 37
    Lat max: 52
    Lon min: -5
    Lon max: 10
NA value: -1.7e+308
 
Point (Lat: 52, Lon: -5) is in bio1 extent
Point (Lat: 99, Lon: -99) is not in bio1 extent
(Lat: 52, Lon: -5) is 4.25619e+06 m km away from (Lat: 99, Lon: -99)
All equivalent:
    0
    (Column: 0, Row: 0)
    (Column: 0, Row: 0)
    (Lat: 49.5, Lon: -2.5)
    (Lat: 49.5, Lon: -2.5)
bio1(0,0) is defined.
104.602

Reading multiple variables

Aligning rasters refers to a spatial data representation technique used in Geographic Information Systems (GIS). In this context, raster data layers are aligned and registered to a common coordinate system and grid structure. This ensures that the cells or pixels in the raster layers are spatially aligned and correspond to the same geographic locations.

Alignment is achieved in quetzal::geography::raster objects by defining a consistent grid structure, such as a uniform cell size and orientation, for all raster layers.

In situations where multiple GIS variables that change over time are involved (such as a suitability and an elevation across geological times), an additional alignment mechanism is required. The quetzal::geography::landscape class addresses this need by combining multiple raster datasets, each potentially having multiple layers, into a single cohesive object. It ensures that all spatial grids are properly aligned and maintains the temporal dimension represented by multiple layers. This allows for accurate overlay, simulation, and composition of multiple datasets within the GIS environment.

Input

#include "quetzal/quetzal.hpp"
#include <cassert>
#include <filesystem>
 
using namespace quetzal;
 
int main()
{
    auto file1 = std::filesystem::current_path() / "data/bio1.tif";
    auto file2 = std::filesystem::current_path() / "data/bio12.tif";
 
    // The raster have 10 bands that we will assign to 2001 ... 2010.
    std::vector<int> times(10);
    std::iota(times.begin(), times.end(), 2001);
 
    // Initialize the landscape: for each var a key and a file, for all a time series.
    using landscape_type = quetzal::geography::landscape<>;
    auto env = quetzal::geography::landscape<>::from_file({{"bio1", file1}, {"bio12", file2}}, times);
    std::cout << env << std::endl;
 
    // We indeed recorded 2 variables: bio1 and bio12
    assert(env.num_variables() == 2);
 
    // The semantic shares strong similarities with a raster
    landscape_type::latlon Bordeaux(44.5, 0.34);
 
    assert(env.contains(Bordeaux));
    assert(env.contains(env.to_centroid(Bordeaux)));
 
    // These little function-objects will soon be very handy to embed the GIS variables
    // into the simulation with quetzal::expressive
    const auto &f = env["bio1"].to_view();
    const auto &g = env["bio12"].to_view();
 
    // But for now just print their raw data if defined or any other value if undefined
    for (auto t : env.times())
    {
        for (auto x : env.locations())
        {
            f(x, t).value_or(0.0);
            g(x, t).value_or(env["bio12"].NA());
        }
    }
}

Output

Landscape of 2 aligned rasters:
bio1 bio12 
Origin: (Lat: 52, Lon: -5)
Width: 3
Height: 3
Depth: 10
Resolution: 
    Lat: -5
    Lon: 5
Extent:
    Lat min: 37
    Lat max: 52
    Lon min: -5
    Lon max: 10

Next steps

Geospatial datasets play a crucial role in coalescence by providing valuable information about the demographic process. Once a raster is successfully parsed, users typically have three primary objectives they may want to pursue:

Demes on a regular spatial grid

The rasters grid structure can be employed to construct a spatial graph of demes that is fully connected and utilized for simulating the demographic process. This is pertinent when looking at very larged continuous land masses. In this context:

Vertices correspond to demes (populations) and are represented by the centroids of the raster cells.
Edges represent the distances between demes, and their connectivity can be modified using dispersal kernels.

Demes on a variable spatial graph

In archipelagos or clusters of loosely connected islands, the fluctuations in sea levels have a notable impact on the dynamics of species. These variations cause the intermittent connection and separation of land masses and their corresponding populations. Similarly, climatic shifts in sky-island complexes can have similar effects. As a result, the assumption of a completely interconnected graph representing population units (demes) is no longer applicable in these situations.

To address this, Digital Elevation Models (DEMs) can be utilized to establish a relationship between the dynamics of sea levels and species. In these cases, the connectivity of the graph at any given time is determined by the current elevation, and changes in elevation directly impact the level of connectivity between different areas.

Inform local processes

Ecological Niche Models (ENMs) play a crucial role in predicting how populations might respond to changes in climate, both in the past and the future.

Typically, these models produce raster outputs that represent estimates of suitability. These suitability values are then utilized to determine growth rates and carrying capacity.

By utilizing the values within the raster, it becomes possible to derive demographic quantities through the construction of mathematical expressions that incorporate both time and space. This approach is exemplified in the quetzal::expressive framework.

When encountering NA values, such as cells representing oceanic areas or those lying outside the spatial extent of the raster, the std::optional mechanism is employed. This means that these values may or may not have a defined value, leaving it to the user to decide how to handle such cases.