Unit 5: More Use Case Examples

The previous unit demonstrated how to retrieve (reference metadata) configuration details for a dataflow and iterate over these attributes to drive process steps.

These metadata can be utilized to customize the behavior of diverse statistical processes, including data collection, validation, and mapping.

In this unit, we'll take a closer look at this configuration process, as well as the ways in which pysdmx can assist in creating the physical data model for a dataflow, facilitating data validation, data mapping and generating the filesystem structure, along with the required metadata in all cases. We'll conclude by going over the process of using vtlengine (VILT) for validation.

Configure your processes

In a scenario where we receive a data submission for validation, mapping, and integration, each step can be configured differently.

Select each option for more information.

Configuration options

Configuration options

Configuration options may depend on the ingested data or business unit practices. For instance, consider validation:

The data received might include only what has changed compared to the previous submission. Alternatively, it could be a complete dataset, requiring different validation approaches.
In case of validation problems, businesses may choose to quarantine only the invalid data, proceeding to the next step with the subset of valid data. Others may prefer quarantining the entire submission.

Configuration steps

Configuration steps

These configuration options can be captured using SDMX reference metadata.

To do this:

Create a Metadata Structure Definition with the configuration options using concepts and coded concepts.
Define the type(s) of attachment targets (e.g., a dataflow, a provision agreement).
Define a metadataflow (e.g., with the ID DCO for Dataflow Configuration Options) for which metadata reports will be provided.
Provide metadata reports (metadatasets) attached to the desired targets, defining their different configuration options.

Dataflow example

Dataflow example

For example, for the BIS_MACRO dataflow maintained by BIS, options could include:

partial_update set to the boolean value true (indicating acceptance of only new or updated data).
on_validation_error set to code F (Fail), signifying that the entire submission must be quarantined in case of validation issues.
structure_map set to the URN of the structure map to be used for mapping data from the BIS_MACRO dataflow structure to its target structure.
on_mapping_error set to code I (Ignore), as only a subset of data is mapped.

Create physical data model

Pysdmx can assist in creating the physical data model for a dataflow in a metadata-driven fashion, relying solely on the metadata stored in an SDMX Registry. For this scenario, Data Structure metadata are needed in the SDMX Registry:

The basic steps to follow to create the physical data model are:

Connect to the registry
Retrieve the schema information
Create the physical data model
Map the SDMX data types
Fine tune the physical model

A description of each of these steps, along with python code can be found on the pysdmx site.

Data Structure

A data structure describes the expected structure of data, including various components (dimensions, attributes, or measures) relevant for a statistical domain. It also provides component data types (string, integer, dates, etc.) and specifies whether these components are mandatory. In short, the data structure contains all the information needed to create a physical data model.

Validate your data

Pysdmx can be used to facilitate data validation in a metadata-driven approach, relying solely on the metadata stored in an SDMX Registry.

There are various types of validation, and we'll focus on structural validation in this scenario. Structural validation ensures that the structure of data meets the expectations.

For this scenario, the necessary metadata depends on the desired thoroughness of validation. At a minimum, we need the data structure information. However, for more comprehensive validation, we may consider additional constraints from the dataflow or provision agreement.

We'll examine more about the required metadata on the next screen.

Required metadata

Let's look at each of the necessary metadata for the scenario presented on the previous screen.

Select each option to learn more.

Data structure

A data structure describes the expected structure of data, including component types, data types, and whether components are mandatory. If components are coded, the allowed values are also specified. This is the minimum required for structural validation.

Dataflows

Dataflows allow defining one or more set of data sharing the same data structure. For example, if we have a data structure about locational banking statistics, we might want to define a dataflow representing the locational banking statistics by country (residence) and another dataflow representing the locational banking statistics by nationality. If we have a data structure representing bilateral foreign exchange reference rates, we might want to create a dataflow for the subset of exchange rates published on a website on a daily basis.

Expanding on this last example, we could define this subset of data using constraints, i.e. setting the frequency dimension to "daily" and the currency codes to the subset of codes that are published on a daily basis (e.g. CHF, CNY, EUR, JPY, USD, etc.) and we would "attach" these constraints to the dataflow. Taking these additional constraints into account makes the validation more strict.

Provisioning metadata

Provision agreements and data providers indicate which providers supply data for a dataflow. Constraints, such as expecting a provider to supply data only for its own country, can be applied.

Critical artifacts and steps

In summary, the following SDMX artifacts need to be in the SDMX Registry: AgencyScheme, Codelist, ConceptScheme, and Data Structure. For more thorough validation, additional metadata like Data Constraint, Dataflow, DataProviderScheme, DataStructure, and ProvisionAgreement is needed.

The basic steps to follow are:

Connect to the registry
Retrieve the schema information
Validating data
Validating the components
Validating the data type
Validating with facets
Validating coded components
Validating mandatory components

A description of each of these steps, along with python code can be found on the pysdmx site.

Map your data

Pysdmx facilitates mapping data in a metadata-driven fashion, relying solely on the metadata stored in an SDMX Registry.

Select each question below to learn more.

SDMX supports various mapping rules, ranging from simple mappings (e.g., mapping a list of non-standard country codes to ISO 3166 2-letter country codes) to more complex ones (e.g., many-to-many and time-dependent mapping rules). To support the definition of mapping rules, SDMX offers structure maps, component maps, representation maps, fixed value maps, epoch maps, and date pattern maps. Pysdmx supports all these types except epoch maps.

The basic steps to follow are: 1. Connect to the registry 2. Retrieve simple code mappings 3. Apply structure maps 4. Copy values 5. Set fixed values 6. Map codes 7. Reformat dates

A description of each of these steps, along with python code can be found on the pysdmx site.

Required metadata

For our example, the objective is to store data in folders organized by dataflows. In each dataflow folder, we want to have sub-folders by data providers. Access to folders should be granted via appropriate roles with access requests approved by the manager of the organizational unit owning the dataflow.

Pysdmx can aid in generating the filesystem structure in a metadata-driven fashion, relying solely on metadata stored in an SDMX Registry.

Select each option to learn more about the required metadata.

Dataflows

Dataflows define the first-level of the filesystem. Dataflows, related artifacts, and provisioning metadata are needed to create roles for data access.

Provisioning Metadata

Provision agreements and data providers indicate which providers supply data for a dataflow, defining the second-level of the filesystem.

Agencies

Agencies define the organizational unit owning the data. Contacts associated with agencies define the person in charge of approving (or denying) requests to access the data.

Category Schemes

Category Schemes define the dataflows to be considered when creating the filesystem structure. Dataflows are attached to categories of the category scheme via categorizations.

The basic steps to follow are:

Connecting to a Registry
Creating the Dataflow Folders

Get the list of dataflows needed to consider when creating the filesystem. This information is captured in a category scheme and related categorizations. Iterate over the categories (and their sub-categories) to find the dataflows attached to them.
Creating the Providers Folders

Get the list of respective dataflows for each provider. For the matching dataflows, create the provider folders.
Creating the Roles

Get the dataflow specific role information from the dataset metadata, get the agency role information, assign the defined roles.

More information on using pysdmx to create a filesystem layout, organize dataflows, and grant access via dedicated roles may be found here.

Using VTL for Validation

Pysdmx supports reading data and metadata to generate and operate on datapoints using vtlengine. Numerous types of operations can be performed and the metadata requirements change depending upon the operations. Nevertheless, the steps remain the same:

Read the data
Extract the data and the data structure
Prepare the dictionary
Define the expressions and execution

More information on pysdmx and vtlengine integration is available here.

What do you know?

Let's complete one final question before concluding. Which of the following metadata indicate which providers supply data for a dataflow?

Select your answer and then select Submit.

Data structure

Agency

Data flows

Provisioning metadata

That's right.

Provision agreements and data providers indicate which providers supply data for a dataflow. Constraints, such as expecting a provider to supply data only for its own country, can be applied.

That's not right.

The correct answer is option 4.

Provision agreements and data providers indicate which providers supply data for a dataflow. Constraints, such as expecting a provider to supply data only for its own country, can be applied.

Unit 5: More Use Case Examples

Configure your processes

Create physical data model

Validate your data

Required metadata

Critical artifacts and steps

Map your data

Required metadata

Using VTL for Validation

What do you know?

Welcome to SDMX AI assistant

SDMX AI assistant

How the assistant can help you

Understand SDMX standards

Navigate the website

Explore SDMX tools

Access documentation