Natural Language Processing Lab


Discovering Dutchness: Identity Formation in the context of Rising Nativism using Social Data Analysis

The project, situated at the department of Cultural Anthropology and the department of Methodology and Statistics, uses a combination of ethnographic and text mining methods.

Hyperparameter Optimization to accelerate active Learning Models

The goal of this project is to increase the performance of active learning for screening large amounts of textual data by optimizing the hyperparameters of learning algorithms in the ASReview open-source software. Users from social sciences should be able to select a set of hyperparameters optimized for textual data from their domain instead of the currently implemented values obtained from medical datasets.

Project website:

Hiring & Discrimination in the times of AI

Algorithmic screening is increasingly used in talent acquisition across large and small corporations worldwide, with an ambitious promise of eliminating idiosyncratic human biases by using identical and consistent evaluation criteria. Nonetheless, very little is known about how these algorithms are trained across a myriad of behavioral dimensions, from texts in CVs and interviews to non-verbal features in interviews. This lack of evidence and transparency raises serious concerns among job seekers and policymakers, particularly when it comes to leveling the playing field for ethnic minorities and female job applicants. Importantly, there is virtually no evidence of how job seekers perceive and respond to these automated procedures, as well as how professional recruiters assess persuasion strategies from candidates of various backgrounds, in combination with algorithmic tools. This project seeks to address these pressing issues by conducting a series of field and online survey experiments using a combination of causal inference and NLP methods. Answering these questions would enable us to develop effective policies around AI in the recruitment processes, by uncovering the black boxes of application & recruitment strategies of job seekers and HR professionals in the face of AI.

Human-AI Collaboration Based on Explainable NLP (Application in Social Science)

In this project, we are trying to present a method to increase interpretability by knowing about different explanatory learning models in the field of NLP and trying to improve the interaction, understanding, and use of it in collaboration with humans.

Mining the Dutch Disposition towards Animals and Plants

In this project, we address the topic of the human disposition towards animals and plants. Our aim is to establish a better understanding of the history of both the knowledge about animals/plants and the cultural representation of animals and plants in the Netherlands. We study large sets of digitized texts and images produced and circulated in the Netherlands between 1550 and 2000.

Modeling Biases in News and Political Debates

Biases and stereotypes in society can be reflected in different sectors of our everyday life including work environment, education and politics. News and political speeches are only two examples of textual content in which stereotypes are present. In this project, we focus on both gender and racial bias. Using approaches from NLP, first we explore how adjectives are used by female and male politicians (and particular when they refer to male and female gender) . Next, we model the evolution of biased language over the decades using a collection of debates from the UK House of Commons. Finally, we use sentiment analysis to analyze how different races are described in news. Our results indicate that adjectives are used in a different way from male and female politicians and that bias exists across several decades. Finally, we found that articles that discuss ethnic outgroups have a negative tone and emotion.

Social Bias Detection in Text Using NLP and their Associations with People’s Opinions

Although recent works have applied sophisticated NLP methods to quantify social bias in large corpora , there is still limited research regarding the associations between users’ beliefs and social bias expressed in online text. In this project, we are interested in filling this gap by exploring whether there are any associations between implicit social bias that different texts express with people’s beliefs on societal topics. In this way we will attempt to understand whether people consume information that confirms their beliefs or not and to which extent. (Funded by UU Focus Area Applied Data Science)

Unblackboxing BERT for Text-Mining: Case of Mental Disorder Prediction

Recent studies have used supervised learning to classify individuals into various mental disorder conditions using social media text. However, few have tried to identify which features are driving the prediction of specific disorders. While highly complex algorithms, such as BERT, tend to have state-of-the-art performance, this comes at the expense of model interpretability.  Such high-performing uninterpretable models are called black-boxes. Unblackboxing BERT  in terms of feature attributions, measures of individual predictor importance, remains a challenge.

The present work combines two existing feature attribution methods: LRP (AH + LN) and TransSHAP. Both methods are derivatives of commonly used attribution methods which have been adapted to work on transformer models such as BERT. Both methods will be adjusted to provide local and global explanations, and these explanations will be combined into a novel feature attribution method.