My cherry-picked highlights from the EMNLP'21 Conference

A couple of weeks ago I attended virtually the EMNLP'2021 Conference. This is the first time I attend a conference virtually. I must admit this is not without its advantages: cheaper registration fees, no Jet-Lag, the ability to get your brain blasted in the comfort of your own home, the ability to switch between sessions much more quickly than if it was in a physical setting, as well as in my experience, a more “intimate” one-to-one interaction during the poster sessions.

On the negative side, technical network problems meant the presentations were often recorded with no time for questions, and the likeliness that you will bump into someone you know is less likely, although I enjoyed “stalking” presenters to their poster rooms in “Gather Town”, the virtual conference venue :-)…

Here are some interesting talks I “attended” at the conference, organized in the following topics: Bias and Ethics, Language Models as Knowledge Bases, Pattern Exploiting Training (PET), Datasets, Q/A and Text Mining.

[Read More]

Data Science Projects (Q1-2022)

This first quarter of the 2021-2022 academic year, I supervise 5 final year projects in the Data Science Master of the UOC University, three of which in “general” domains which are 1) argument mining, 2) mining of encyclopedic knowledge from wikipedia, 3) text anonymization, and two of which in applied domains that are 4) use of NLP in an online medical consultation application, and 5) detection of recurrent defects in aircraft safety reports. Below is a selection of the resources and references I give to the students to get them started (sorry, the bibliographical citations are a bit sloppy).

[Read More]

Sentence similarity with Bert vs SBert

We can compute the similarity between two sentences by calculating the similarity between their embeddings. A popular approach is to perform the mean or max averaging of the sentence word embeddings. Another approach, which is faster and more performant, is to use SBert models.

[Read More]

Data Science Projects (Q1-2021)

This first quarter of the 2020-2021 academic year, I supervise 6 final year projects in the Data Science Master of the UOC University, ranging from the development of a COVID-19 FAQ-based Q-A system to building knowledge graphs and performing end-to-end Natural Language Generation. Below is a list of resources and references I offer to get the students started.

[Read More]

Uploading pre-existing datasets in Neo4j: lessons learnt

Recently I developed a proof-of-concept to model and query data with Neo4J starting from a pre-existing dataset. I also performed a training session for loading and querying a Neo4j graph database from a pre-existing “toy” dataset. Here is what I learnt from those exercises.

[Read More]