Natural Language and Text Processing Lab

Events

5th February 2024

๐ŸŒEvent Recap: Enhancing Cybercrime Detection through Predictive Text Mining

๐Ÿ”Overview of the Presentation:

The event focused on the innovative use of predictive text mining to improve the detection of cyber and digitized crime in police registrations. It delved into how current structured police data sources often provide an incomplete picture of the amount, nature, and suspects involved in cyber and digitized crime.

๐Ÿšจ Highlights of Insights:

  • Bridging Data Gaps: The potential of unstructured textual data from police registrations was highlighted as a means to alleviate the gaps left by structured data.
  • Quantifying Cybercrime: The presentation shared findings on incidence estimates of 8 types of cyber and digitized crime in the Netherlands. Inputs such as word (lemma) counts, meta-textual characteristics, and NLP text characteristics were utilized in a multilabel classification model. The model was applied to a sample of 100,000 registrations, examining suspect characteristics. Notably, the addition of meta-textual and NLP features did not significantly increase predictive accuracy beyond simpler techniques like lemma uni- and bigrams. Additionally, TF-IDF weighting was found to be ineffective in improving classification accuracy.
  • Suspect Profiling Limitations: The session also covered the challenges in obtaining accurate suspect characteristics. To describe the characteristics of suspects associated with online crimes, more stringent accuracy requirements for the machine learning model’s predictions were necessary. Consequently, detailed descriptions of suspects were only possible for three types of online crimes

The presentation concluded with a reflection on the feasibility and challenges of using machine learning models for estimating the incidence of various types of online crime.

๐ŸŒŸPresenter:
This insightful session was presented by Nikolaj Tollenaar.