Nadjet's NLP Stuff

My cherry-picked highlights from the EMNLP'21 Conference

Posted on November 20, 2021 |

A couple of weeks ago I attended virtually the EMNLP'2021 Conference. This is the first time I attend a conference virtually. I must admit this is not without its advantages: cheaper registration fees, no Jet-Lag, the ability to get your brain blasted in the comfort of your own home, the ability to switch between sessions much more quickly than if it was in a physical setting, as well as in my experience, a more “intimate” one-to-one interaction during the poster sessions.

On the negative side, technical network problems meant the presentations were often recorded with no time for questions, and the likeliness that you will bump into someone you know is less likely, although I enjoyed “stalking” presenters to their poster rooms in “Gather Town”, the virtual conference venue :-)…

Here are some interesting talks I “attended” at the conference, organized in the following topics: Bias and Ethics, Language Models as Knowledge Bases, Pattern Exploiting Training (PET), Datasets, Q/A and Text Mining.

[Read More]

NLP conference

Data Science Projects (Q1-2022)

Posted on July 29, 2021 |

This first quarter of the 2021-2022 academic year, I supervise 5 final year projects in the Data Science Master of the UOC University, three of which in “general” domains which are 1) argument mining, 2) mining of encyclopedic knowledge from wikipedia, 3) text anonymization, and two of which in applied domains that are 4) use of NLP in an online medical consultation application, and 5) detection of recurrent defects in aircraft safety reports. Below is a selection of the resources and references I give to the students to get them started (sorry, the bibliographical citations are a bit sloppy).

[Read More]

master argument mining text mining wikipedia anonymization aircraft maintenance telemedecine

Sentence similarity with Bert vs SBert

Posted on October 22, 2020 |

We can compute the similarity between two sentences by calculating the similarity between their embeddings. A popular approach is to perform the mean or max averaging of the sentence word embeddings. Another approach, which is faster and more performant, is to use SBert models.

[Read More]

bert similarity

Data Science Projects (Q1-2021)

Posted on October 10, 2020 |

This first quarter of the 2020-2021 academic year, I supervise 6 final year projects in the Data Science Master of the UOC University, ranging from the development of a COVID-19 FAQ-based Q-A system to building knowledge graphs and performing end-to-end Natural Language Generation. Below is a list of resources and references I offer to get the students started.

[Read More]

master Q/A Knowledge graphs DBPedia NLG Bert

Aspect Based Sentiment Analysis

Posted on April 30, 2020 |

In this post I report on approaches to ABSA as a Bert Sentence Pair Classification (BERT-SPC) problem together with domain adaptation of the Bert language model (BERT-ADA). I also present my implementation of both BERT-SPC and BERT-ADA.

[Read More]

bert ABSA domain adaptation

Node similarity with graph embeddings using Node2Vec

Posted on March 22, 2020 |

I computed graph embeddings on Athletes in a graph modelling Olympic Games in Neo4j. I was then able to compute the most similar Athletes to any given Athlete using Node2Vec. I could then visually appreciate this similarity by looking at the corresponding subgraph subsuming the related nodes and their edges.

[Read More]

neo4j node2vec

Uploading pre-existing datasets in Neo4j: lessons learnt

Posted on March 19, 2020 |

Recently I developed a proof-of-concept to model and query data with Neo4J starting from a pre-existing dataset. I also performed a training session for loading and querying a Neo4j graph database from a pre-existing “toy” dataset. Here is what I learnt from those exercises.

[Read More]

cypher neo4j

My projects for Data Science Master Students

Posted on January 10, 2020 |

Here are the projects I propose to Data Science Master students for their final project this semester. They range from text mining, concept mining, question answering to natural language generation. Most of them will be based on DBPedia/Wikipedia datasets.

[Read More]

dbpedia master

Seq2seq Natural Language Generation

Posted on December 31, 2019 |

I implemented a seq2seq Natural Language Generator using the fastai library and the 2017 e2e NLG challenge dataset.

[Read More]

NLG deep_learning fastai

Mining and Geovisualization of Domain Specific DBPedia Concepts

Posted on November 7, 2019 |

I extracted domain specific concepts from DBPedia and showed them on an interactive map of the world. The geolocalization was done using Geonames. This was a fun project to do.

[Read More]

dbpedia knowledge_graphs visualization text_mining