Data Science Projects (Q1-2022)


This first quarter of the 2021-2022 academic year, I supervise 5 final year projects in the Data Science Master of the UOC University, three of which in “general” domains which are 1) argument mining, 2) mining of encyclopedic knowledge from wikipedia, 3) text anonymization, and two of which in applied domains that are 4) use of NLP in an online medical consultation application, and 5) detection of recurrent defects in aircraft safety reports. Below is a selection of the resources and references I give to the students to get them started (sorry, the bibliographical citations are a bit sloppy).

1. Argument Mining

Argument mining resources page

1.1. Tutorials and surveys

1.2. Papers

  • Stance Classification of Context-Dependent Claims, Bar-Haim et al, 2017.
  • Towards an Argument Mining Pipeline Transforming Texts to Argument Graphs, Lenz et al, 2020.

1.3. Datasets + tools/APIs

2. Mining of Encyclopedic Knowledge from Wikipedia

The idea of this project is to perform extraction of encyclopedic knowledge from Wikipedia, in order to expand Wikipedia-based knowledge graph. Example encyclopedic knowledge include categorical information (e.g., “crew neckline” is a type of neckline and is without specific article).

3. Text Anonymization

4. Gathering Insights from Medical Conversations

5. Identification of Recurrent Defects in Aircraft Maintenance Reports

Ontologies/Taxonomies

Datasets

  • MaintNet: the resource page and [the 2020 paper by Akhbardeh et al.](MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources).

NLP for aircraft maintenance