Natural Language Processing Group

Projects

Discovering Dutchness: Identity Formation in the context of Rising Nativism using Social Data Analysis

The project, situated at the department of Cultural Anthropology and the department of Methodology and Statistics, uses a combination of ethnographic and text mining methods.

NLP group members: Niek van de Pas

Hiring & Discrimination in the times of AI

Algorithmic screening is increasingly used in talent acquisition across large and small corporations worldwide, with an ambitious promise of eliminating idiosyncratic human biases by using identical and consistent evaluation criteria. Nonetheless, very little is known about how these algorithms are trained across a myriad of behavioral dimensions, from texts in CVs and interviews to non-verbal features in interviews. This lack of evidence and transparency raises serious concerns among job seekers and policymakers, particularly when it comes to leveling the playing field for ethnic minorities and female job applicants. Importantly, there is virtually no evidence of how job seekers perceive and respond to these automated procedures, as well as how professional recruiters assess persuasion strategies from candidates of various backgrounds, in combination with algorithmic tools. This project seeks to address these pressing issues by conducting a series of field and online survey experiments using a combination of causal inference and NLP methods. Answering these questions would enable us to develop effective policies around AI in the recruitment processes, by uncovering the black boxes of application & recruitment strategies of job seekers and HR professionals in the face of AI.

NLP group members: Huyen Nguyen

Human-AI Collaboration Based on Explainable NLP (Application in Social Science)

In this project, we are trying to present a method to increase interpretability by knowing about different explanatory learning models in the field of NLP and trying to improve the interaction, understanding, and use of it in collaboration with humans.

NLP group members: Hadi Mohammadi


Modeling Biases in News and Political Debates

Biases and stereotypes in society can be reflected in different sectors of our everyday life including work environment, education and politics. News and political speeches are only two examples of textual content in which stereotypes are present. In this project, we focus on both gender and racial bias. Using approaches from NLP, first we explore how adjectives are used by female and male politicians (and particular when they refer to male and female gender) . Next, we model the evolution of biased language over the decades using a collection of debates from the UK House of Commons. Finally, we use sentiment analysis to analyze how different races are described in news. Our results indicate that adjectives are used in a different way from male and female politicians and that bias exists across several decades. Finally, we found that articles that discuss ethnic outgroups have a negative tone and emotion.

Thesis project by Dimitri de Boer, Nina ten Pas, and Bruno Laiber de Pinho, supervised by Anastasia Giachanou

Social Bias Detection in Text Using NLP and their Associations with People’s Opinions

Although recent works have applied sophisticated NLP methods to quantify social bias in large corpora , there is still limited research regarding the associations between users’ beliefs and social bias expressed in online text. In this project, we are interested in filling this gap by exploring whether there are any associations between implicit social bias that different texts express with people’s beliefs on societal topics. In this way we will attempt to understand whether people consume information that confirms their beliefs or not and to which extent.

Funded by UU Focus Area Applied Data Science (2022) .

NLP group members: Anastasia Giachanou

Unblackboxing BERT for Text-Mining: Case of Mental Disorder Prediction


Recent studies have used supervised learning to classify individuals into various mental disorder conditions using social media text. However, few have tried to identify which features are driving the prediction of specific disorders. While highly complex algorithms, such as BERT, tend to have state-of-the-art performance, this comes at the expense of model interpretability.  Such high-performing uninterpretable models are called black-boxes. Unblackboxing BERT  in terms of feature attributions, measures of individual predictor importance, remains a challenge.

The present work combines two existing feature attribution methods: LRP (AH + LN) and TransSHAP. Both methods are derivatives of commonly used attribution methods which have been adapted to work on transformer models such as BERT. Both methods will be adjusted to provide local and global explanations, and these explanations will be combined into a novel feature attribution method.

NLP group members: Daniel Anadria