Artificial Intelligence and Data Analytics Research Group
Ghent University - IDlab
About AIDA
The Artificial Intelligence and Data Analytics (AIDA) group at Ghent University is led by Prof. Tijl De Bie. Currently the group is comprised of 9 PhD students, 1 postdoctoral researchers, 1 Assistant Prof. and 1 Full-time Professor. The group is also part of the larger Internet Technology and Data Science Lab (IDLab) a joint research initiative between the University of Antwerp and Ghent University.
The research at AIDA covers a wide range of topics in the areas of Artificial Intelligence, Machine Learning and Data Science. More specifically, we conduct research on graph and text mining, social media analysis, bioinformatics, data visualization and music information retrieval.
The group is currently funded by Prof. De Bie’s ERC Consolidator Grant FORSIED (“Formalizing Subjective Interestingness in Exploratory Data Mining”), the Flemish Government under the "Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen" programme, and several FWO projects.
If you are interested in joining our group please check out our Contact section.
We currently have PhD and Postdoc opened positions, if you want to join our team contact us at:
tijl.debie@ugent.be
Representation Learning
Network representation learning or graph embedding methods aim to learn low-dimensional representations of network nodes as vectors, typically in an Euclidean space. These representations are constructed such that 'similar' nodes in the graph are 'embedded' nearby in the Euclidean space. These methods in general follow a similar strategy, defining a notion of similarity between nodes (typically deeming nodes more similar if they are nearby in the network in some metric), a distance measure in the embedding space, and minimizing a loss function that penalizes large distances for similar nodes or small distances for dissimilar nodes. The representatins obtained with these methods can then be used for a variety of downstream prediction tasks such as link prediction, multi-label classification or community detection.
Data visualization is the graphic representation of information. The goal is to produce images that communicate the relations among the represented data entities to the viewers.
Recommender systems aim to suggest relevant items to users. This is generally done by seaking those items a user would most likely give a good "rating" to.
Methods developed within these areas aim to find interesting and interpretable patterns in data while accounting for the data analyst's prior knowledge and beliefs about said information. The research conducted by our group in this area fall mostly under the umbrela of the ERC project FORSIED (Formalizing Subjective Interestingness in Exploratory Data Mining).
In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications.
We are soliciting applications for multiple fully-funded PhD student vacancies in the AIDA research group. Applications by top postdoctoral candidates will be considered as well.
Topics of interest:
Network and knowledge graph embedding.
Visual analytics, information visualization.
Representation learning.
Nonlinear dimensionality reduction (variants of UMAP, t-SNE,…).
Data mining algorithms on graphs / networks.
Topological data analysis.
Recommender systems.
Natural language processing (word embedding, conversational agents).
Deep learning approaches for these tasks.
Privacy, fairness, and explainability in machine learning and data science.
Applications to areas including biology (single-cell data), human resources (job-applicant matching), media and entertainment (catalog management).
Profile:
For PhD applicants:
Required:
A relevant Master’s degree (data science, artificial intelligence, computer science, physics, electrical engineering, mathematics).
Top-of-class performance (at least top 5% from top-100 institutes, at least top-1% from other institutes).
Three positive reference letters (solicited by us, no need to include these in your application).
Proven programming experience in Python.
English language level: C1 (CEFR), 95 (TOEFL), 7 (IELTS), or assessed as equivalent during the interview.
A solid background in at least three of the following: linear algebra; machine learning; deep learning; information theory; advanced algorithms; data mining; topological data analysis; information visualization / visual analytics; human computer interaction; recommender systems; natural language processing; privacy, fairness, and explainability in data science and machine learning.
Desired:
Strong programming skills in Python and other languages as proven by open source software projects.
Experience with machine learning frameworks (pytorch, keras, etc).
Experience with Spark.
Prior publications in top conferences or journals.
For Postdoctoral research positions:
Required:
Same as for PhD positions, additionally including:
A PhD in a relevant subject.
A strong publication track record (in top conferences and journals such as ICLR, KDD, NeurIPS, VIS, ICML, Data Mining and Knowledge Discovery, IEEE PAMI, JMLR, Machine Learning journal, etc.) on one of the relevant topics.
An eagerness to co-advise PhD and Master student projects.
Context:
These positions are funded in the context of three large research projects:
A large (EUR 1.6M) European Research Council (ERC) Consolidator grant, FORSIED.
A large (EUR 2.1M) FWO Odysseus Group I grant.
A substantial investment (EUR 12M annually) by the Flemish government in AI research, within which we play a leading role.
The research team currently consists of around 10 researchers, and is led by Prof. Tijl De Bie. The reputation of our research team is evidenced by substantial amounts of prestigious funding, high impact publications, and the organization of major international conferences in the area, including as general and program chairs. It is embedded within the IDLab of Ghent University, as well as the UGent Artificial Intelligence institute.
Ghent University:
Ghent University is the largest university in Belgium, and a leading research-intensive university world-wide, ranking consistently within the top-100 in international rankings that value research.
The city of Ghent in Belgium, at the heart of Europe:
Belgium is a small country at the heart of Europe, benefiting from one of the best social security and healthcare systems in the world.
Ghent is a picturesque historical but vibrant town, with an international mindset and an active startup scene. It is home to a large expat community. You will find it easy to get around speaking just English.
Brussels (30 mins), Amsterdam (2 hours), Paris (2.5 hours), London (3 hours) are all within a short train ride’s distance.
Application procedure:
Applications are welcome until all positions are filled. A first cut-off date is Sunday 23 June, with Skype interviews scheduled to take place end of June and early July. To be considered, applications must include:
A Concise CV (any format).
A motivation letter (~1 page).
A list of 3 referees with their contact details, and whether they can be already contacted.
For PhD applicants: a transcript of your Masters degree.
Applications should be sent by email to tijl.debie@ugent.be. IMPORTANT NOTE: the subject line of your email should be structured as follows (whichever applies):
“[PhD application 2019] first name, last name”
“[Postdoc application 2019] first name, last name”
Emails with different subject lines will probably be overlooked.
Vacancy: postdoc position
We currently have an open position for an outstanding postdoctoral researcher.
We have recently organized the 2020 edition of ECML-PKDD. ECML-PKDD is the premier European machine learning and data mining conference that builds upon over 18 years of successful events and conferences held across Europe. In 2020, ECML-PKDD was due to take place in Ghent, Belgium. However, owing to the COVID-19 pandemic, the conference was held virtually. The video presentations and papers are freely available on the official website.
We are co-organizing an international workshop on Applications of Topological Data Analysis, within the context of ECML-PKDD, on Monday 16 September 2019.
Workshop chairs:
Robin Vandaele Ghent University
Tijl De Bie Ghent University
John Harer Duke University
ECMLPKDD 2019
We are co-organizing an international workshop on Automating Data Science, within the context of ECML-PKDD, on Friday 20 September 2019.
Workshop chairs:
Tijl De Bie (UGent, Belgium)
Luc De Raedt (KU Leuven, Belgium)
Jose Hernandez-Orallo (Universitat Politecnica de Valencia, Spain)
ECMLPKDD 2019
We are co-organizing an international workshop on Graph Embedding and Mining, within the context of ECML-PKDD, on Monday 16 September 2019.
Workshop chairs:
Bo Kang Ghent University
Rémy Cazabet Université de Lyon
Christine Largeron Université Jean Monnet
Polo Chau Georgia Institute of Technology
Jefrey Lijffijt Ghent University
Tijl De Bie Ghent University
Dagstuhl 2018
We co-organized a Dagstuhl workshop on Automating Data Science, September 30-October 5, 2018.
Workshop chairs:
Tijl De Bie (Ghent University, BE)
Luc De Raedt (KU Leuven, BE)
Holger H. Hoos (Leiden University, NL)
Padhraic Smyth (University of California – Irvine, US)
KDD 2018
Opinions, Conflict, and Abuse in a Networked Society (OCEANS) is an ACM SIGKDD 2018 workshop that we are co-organizing. It is going to take place in London, UK, on August 20, 2018.
Description: Disruptive technological innovations affect what lies at the heart of the fabric of our society: how people interact with each other, what they think to be true or false, and what they value as right or wrong. Indeed, concepts such as post-truth-society, filter bubbles, and echo chambers are very recent terms, and online social interactions have proven more prone to abusive and anti-social behaviors than real-world interactions. While we are starting to see the challenges posed by the pervasive adoption of these technologies, society has no answers yet.
We contend that answers to these new challenges will require a transdisciplinary approach, involving social scientists, physicists, mathematicians, computer scientists, as well as close collaboration between academic researchers, industry practitioners, and relevant government agencies. This workshop is targeted at the KDD community, and aims to chart out the area from a KDD perspective, while bringing in insights from other areas and sharing state-of the-art technologies and best practices.
Jefrey Lijffijt – Yesterday I organised the Workshop on Interactive Data Exploration and Analysis (IDEA) in Halifax, Canada, as a side-event of the ACM SIGKDD Conference, the world-premier conference in data mining and knowledge discovery. The workshop was a great success, with three keynote speakers and twelve presentations and posters about new research that we selected after peer-review. The topic of the workshop aligns perfectly with the topic of my FWO Pegasus project, which is “Personalised, interactive, and visual exploratory mining of patterns in complex data”. This niche of data science sits at the interface of data mining, machine learning, human-computer interaction, and data visualisation, and is currently rapidly attracting more attention from the scientific community.
The major premise of this line of work is to transform the job of data scientists and data-intensive science, not by automating away analysis completely, but by building more intuitive, efficient, and effective tools to explore and analyse data. Ultimately, the possibility to analyse data should also become accessible to laypersons, e.g., for journalists or experts of other scientific domains that do not have years of training in programming and statistics. An aim also referred to as the democratisation of data and analytics. To achieve this requires scientists from different fields to collaborate, because the open problems range from cognition and interface design to statistics and computational complexity. My own background is in the latter two.
I am really happy to have received a Pegasus grant. I intentionally applied with a high-risk proposal, to work on a topic that is completely new and still in the margin of my field. Now, I have three years will almost no other obligations to grow both my expertise into the related domains, as well as to build prototypes of tools to show the potential of the ideas. I chose Ghent University as a host for the project, because Prof. Tijl De Bie from the IDLab is a world-wide expert on the computational aspects of data exploration and analysis. His group is also well-funded (ERC Consolidator and FWO Odysseus grants) so there are PhD students and other post-docs that can help me achieve these goals. I am deeply grateful for their support.
Data Mining: Beyond the Horizon was a workshop to discuss and work towards where the participants think the field of data mining should be headed. Beyond the Horizon took place in Bristol, UK, from Wed 19 – Fri 21 November 2014.
The goal of the workshop was two-fold:
Open-ended: to exchange thoughts about the challenges and opportunities for our generation of data mining researchers, and to establish future-proof visions.
Goal-oriented: to facilitate short-, medium-, and long-term collaborations among the participants, including joint project proposals, papers, co-organized workshops, tutorials, etc.
Attendance at Beyond the Horizon was by invitation only. To keep it small and effective as a workshop, only a limited number of researchers was invited to participate.
Jefrey Lijffijt and Tijl De Bie will give a tutorial on mining ‘Subjective Interesting’ patterns in data at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Dublin, September 2018.
Title:
Mining Subjectively Interesting Patterns in Data
Abstract:
The problem of formalizing interestingness of data mining results remains an important challenge in data mining research and practice. While it is widely recognized as a challenge in frequent pattern mining, in this tutorial we will explain that it also manifests itself in other data mining tasks such as dimensionality reduction, graph mining, clustering, and more. This tutorial aims to introduce the audience to a relatively new framework for addressing these challenges in a rigorous and generic manner. This framework is the result of the ERC project FORSIED (Formalizing Subjective Interestingness in Exploratory Data Mining), which has by now resulted in a body of work of sufficient maturity to make a well-rounded tutorial possible and useful to colleague researchers as well as practitioners.
Outline:
Part 1: Introduction and motivation (15mins)
Part 2: The FORSIED framework (40mins)
Part 3: Binary matrices, graphs, and relational data (45mins)
COFFEE BREAK (20mins)
Part 4: Numeric and mixed data (55mins)
Part 5: Advanced topics, outlook & conclusions (30mins)
Q&A (15mins)
ECML-PKDD 2015
Tutorial on ‘Making Sense of (Multi-) Relational Data’ at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Porto, Portugal 2015.
PART I: Mining relational data — an overview (40 mins)
Data types: Codd’s relational data model, triple stores (linked data), networks, n-ary relations
Pattern syntax
Algorithmic approach (e.g. exhaustive enumeration or not)
`Supervised’ or not, or in between
Interestingness measures
PART II: Exploration through targeted modelling (20 mins)
Safarii
RDB-Krimp
(Probabilistic) ILP
PART III: Exploration by descriptive modelling – semi-relational local algorithms (20 mins)
Frequent itemset mining on the join
SMuRFIG
— BREAK
PART IV: Exploration by descriptive modeling – fully-relational local algorithms (40 mins)
N-set mining
RMiner \& variants
Constraint programming for closed relational sets
Uncovering the plot
PART V: Exploration by descriptive modeling – fully-relational global algorithms (40 mins)
Joint matrix-tensor factorisations
PART VI: Perspectives (20 mins)
General conclusions and recommendations
Open problems and opportunities
KDD'18
The presentation given at IDEA’18 in August 2018 can he found here
Facebook Research
The presentation given at Facebook research in August 2018 can be found here.
Trust In Data Science
The increasingly central role of data in today’s economy, as well as in data-driven research (e.g. the digital humanities, medicine, biology), has brought to the fore important questions around data ownership, data protection, privacy, and fairness of data-driven algorithms. This course covers the technical and legal aspects of how to ensure data science approaches can be trusted to treat individuals fairly and with consideration for privacy. Trust is achieved when ethical practices in data science are followed.
This is a specialist course organized in the context of the UGent doctoral schools of Engineering and Natural Sciences. This course will be of interest to researchers into data science algorithm design as well as to researchers working with personal data; the target group will include computer scientists, electrical and biomedical engineers, bioinformaticians, neuroinformaticions, medical informaticians, statisticians, molecular biologists, and other researchers and developers.The summer school will be open to graduate students, PhD students, postdoctoral researchers and early-career professionals in any field related to data science.
Data science is a field that is rapidly growing in importance due to a rapid growth of available data, computing power, and recent algorithmic developments. This poses obvious risks to the privacy of the data subjects, and to data protection more generally. However, it also entails two complementary opportunities. First, it may result in more effective decisions based on e.g. advanced machine learning techniques. Second, it makes it possible to also formalize ethical constraints regarding e.g. fairness and (paradoxically) also privacy, which can then be enforced on those data-driven decisions. Both of these opportunities are intimately tied to the increased accessibility of data, and are undeniably beneficial. For example, today judges still make their verdicts based on a combination of the facts, a subset of the law and jurisdiction, and (often unconsciously) personal biases. Doctors still make decisions based on their (limited) expert knowledge, the symptoms, combined with personal intuition. The increased availability of data well beyond personal anecdotal experience can not only reveal the existence of personal biases or intuition, it can also ensure the decisions are in accordance with ethical and legal constraints, while also improving those decisions in making them more evidence-based.
Formalizing ethical and legal constraints is however non-trivial, and research has only recently started to substantially invest in these questions. Yet, providing a constructive answer to these questions is a prerequisite for data science approaches to deserve the trust of its users. This specialist course should be of interest to anyone performing or using data science research broadly defined.