Methodology

Purpose

On average, states receive more than 70 recommendations per year under the various human rights monitoring mechanisms of the UN. With 193 UN Member States, the Universal Periodic Review (UPR) alone has generated over 110,000 recommendations since its beginning in 2007. In order for governments to implement these recommendations in a systematic and efficient manner, it is necessary to categorize them into thematic clusters.

Following the adoption of the 2030 Agenda for Sustainable Development, there has been an increasing interest in linking these human rights recommendations to the Sustainable Development Goals (SDGs) and targets. Yet while existing thematic categories provide some cues, they do not fit neatly with the Goals and targets. Further, until now, this categorization has mostly been done by hand. Given their sheer numbers, the task of categorizing existing and future human rights recommendations against the SDGs one-at-a-time is daunting.

The Danish Institute for Human Rights has therefore set out to develop and train an algorithm for automatic classification of recommendations from UN human rights monitoring bodies, in collaboration with the social enterprise Specialisterne.

You can watch a presentation of Special Consultant Niels Jørgen Kjær explaining the methodology here. You can also find a technical note presenting the algorithm here.

The dataset

The dataset comprises all country-specific observations and recommendations from Treaty Bodies, Special Procedures under the Human Rights Council and the Universal Periodic Review that are currently available to us. It was created by compiling extracts of the datasets contained in the Universal Human Rights Index database operated by the Office of the High Commissioner for Human Rights as well as the Database of Recommendations maintained by UPR Info. The dataset will be updated periodically in the future, as new data becomes available.

Please be aware that data for individual mechanisms, countries and years may be incomplete due to processing backlogs at the source. In collaboration with OHCHR, we are working to identify and fill these gaps as soon as possible.

Metadata

“Metadata” is data that serves to describe or categorise other data, that is, “data about data”. The metadata contained in the SDG – Human Rights Data Explorer consists of the following two types:

  • Descriptive properties, informing, among other things, which human rights mechanism a given recommendation originates from, and which country it is addressed to;
  • Analytical categories that identify the rights-holder groups addressed in a given recommendation, and what SDG and targets the recommendation is linked to.

The analytical categories are further explained in the sections below.

Categories of rights-holder groups

The SDG – Human Rights Data Explorer identifies the group of rights-holders that are addressed by a given recommendation. Recommendations may be linked to none, one, or multiple categories of rights-holders. The categories of rights-holders identified in the SDG – Human Rights Data Explorer are:

Women and girls
Children
Indigenous peoples
Persons with disabilities
Migrants
Refugees and asylum-seekers
Internally displaced persons
Ethnic and religious minorities
Human rights defenders
Lesbian, Gay, Bisexual, Transgender and Intersex (LGBTI)
Older persons
Youth

 

Sustainable Development Goals categories

The 169 targets under the 17 Sustainable Development Goals serve as categories for the classification of the recommendations. The current data material reflects about 70 of the 169 targets. Recommendations are linked directly at the target level, with no residual categories at the Goal-level. This means that recommendations are only classified if they are linked to a specific target under one of the 17 Goals.

A list of all targets of the Sustainable Development Goals and their link to relevant human rights instruments and international labour standards can be accessed here: http://sdg.humanrights.dk/en/goals-and-targets

The analytical process

The human rights recommendations have been categorised through an analytical process using semi-supervised machine learning. Through this process, an algorithm has been initially trained to classify UPR recommendations based on a small set of training examples (classified by a human expert) and a large amount of unclassified data.

To prepare the machine learning analysis, an initial set of training examples was identified by a human expert for each of the 169 targets of the 2030 Agenda, where possible. This set of training examples is required to provide the algorithm with a basis to operate from (the so-called “ground truth”). To identify suitable training examples, the human expert used common quotes and terms that designated relevant human rights-related content under relevant targets of the SDGs. For example, search terms such as “violence against women” and “trafficking of women” were used to identify training examples for target 5.2 that calls for ending violence against women and girls, including trafficking and other types of exploitation.

Following this, the machine learning analysis was conducted in a two-stage process.

In a first stage of supervised machine learning, the classification algorithm was used to identify recommendations similar to the training examples under each category by drawing on existing analytical metadata, namely thematic classifications contained in the existing databases of OHCHR and UPR Info. Both databases categorise recommendations based on a set of keywords that comprise a set of affected groups, as well as several dozen human rights issues commonly referred to in recommendations. In the combined dataset used for the analysis, all existing keywords were collected. The algorithm was then used to identify patterns and correlations with the SDG targets in the combination of keywords between training examples and unclassified data. Through human feedback, the precision of the algorithm was enhanced in the course of the process.


In a second stage, the algorithm was equipped to analyse text directly for links to the SDGs, rather than drawing on metadata. This feature was added to enhance precision and eliminate the need for human classification entirely with a view to future data analysis. The text analysis draws on an “expert dictionary”, which is constructed by a human expert based on previously-identified ground truth. The expert dictionary collects terms and expressions that are typical for a given analytical category and assigns them with a weight. In combination with a standard dictionary of the English language and the training data, the algorithm then determines probability values for a link to all existing analytical categories for all recommendations in the sample. Through human feedback, additional ground truth is identified for each of the categories. These additional training examples are used to continuously update the expert dictionary by adjusting the values for terms and expressions.

Texmining


The algorithm has since been used to analyse and categorise recommendations from human rights monitoring bodies than the UPR, namely the Treaty Bodies as well as the Special Procedures under the Human Rights Council. This meant an expansion of the data from approximately 55,000 UPR recommendations to a total of approximately 200,000 recommendations and observations from the various bodies combined. The analysis setup is optimised and expanded on an ongoing basis.

To evaluate how well the algorithm is performing on each category a performance measure is calculated. This is an indication of the likelihood of false positives/false negatives within a category and aids in determining the need for refinement of the category and/or providing more training data. 
 
Full list of performance values: