Webinar
Datasets through the Lđź‘€king-Glass
Datasets through the Lđź‘€king-Glass is a webinar series focusing on the data aspects on learning-based methods. Our aim is to build a community of scientists interested in understanding how the data we use affects the algorithms and society as a whole, instead of only optimizing for a performance metric. We draw inspiration from a variety of topics, such as data curation to build datasets, meta-data, shortcuts, fairness, ethics and philosophy in AI.
All previous talks where the authors have agreed to share the talk, can be found in our YouTube playlist.
Next webinar: Multimodal datasets
Date: 24th February 2024 at 3pm CET
Where: Zoom - https://itucph.zoom.us/meeting/register/hkoEzRitQ6eotd3SkkaeYw
Add to: Google Calendar / Outlook Calendar
Hava Chaptoukaev & Dr. Maria A. Zuluaga (EURECOM, France)
- Title: Acquiring, curating and releasing a multi-modal dataset for stress detection: ambitions, achievements, mistakes and lessons learned
- Abstract: In 2023, we released StressID, a multi-modal dataset and benchmark, to encourage the development of novel methods for stress detection and their validation within a reproducible setup. This talk aims to introduce you to StressID and to share with you the challenges faced along the way, from the acquisition to its maintenance and release.
- Bio: Maria A. Zuluaga is an associate professor at EURECOM with an affiliate position within the School of Biomedical Engineering & Imaging Sciences at King’s College London. Hava Chaptoukaev is a 3rd year PhD student within Maria’s research group. They focus on developing novel machine learning methods from multimodal data that can be safely used to advance healthcare research and improve clinical practice. From an application standpoint, they aim to answer questions coming from neurovascular and cardiovascular imaging, as well as cancer research.
- Website: https://zuluaga.eurecom.io/
- Website: https://scholar.google.com/citations?user=b8P1k_oAAAAJ&hl=fr
Alice Jin (Massachusetts Institute of Technology, USA)
- Title: Fair Multimodal Checklists for Interpretable Clinical Time Series Prediction
- Abstract: Checklists are interpretable and easy-to-deploy models often used in real-world clinical decision-making. Prior work has demonstrated that checklists can be learned from binary input features in a data-driven manner by formulating the training objective as an integer programming problem. In this work, we learn diagnostic checklists for the task of phenotype classification with time series vitals data of ICU patients from the MIMIC-IV dataset. For 13 clinical phenotypes, we fully explore the empirical behavior of the checklist model in regard to multimodality, time series dynamics, and fairness. Our results show that the addition of the imaging data modality and the addition of shapelets that capture time series dynamics can significantly improve predictive performance. Checklist models optimized with explicit fairness constraints achieve the target fairness performance, at the expense of lower predictive performance.
- Bio: Alice Jin is a fourth-year Ph.D. student in the HealthyML Group at MIT, led by Professor Marzyeh Ghassemi. Her general research interests are in deep representation learning and generative modeling for healthcare applications. Specifically, she’s interested in exploring how expert medical knowledge can be incorporated into ML models, as well as how to ensure medical validity and clinical relevance during evaluation.
- Website: https://healthyml.org/author/qixuan-alice-jin/
Dr. Malihe Alikhani & Dr. Resmi Ramachandranpillai (Northeastern University, USA)
- Title: Towards Equity: Overcoming Fairness Challenges in Multimodal Learning
- Abstract: The world around us is both multimodal and intersectional. Human experiences are shaped by diverse streams of information—visual, textual, auditory, and structured data—all interacting in complex ways. Similarly, identities are layered and multifaceted, with factors like race, gender, and socioeconomic status intersecting to influence lived realities. In the context of healthcare, biases in automated clinical decision-making using Electronic Healthcare Records (EHR) exacerbate disparities in patient care and outcomes, particularly for intersectional subgroups. The multimodal nature of EHR data—integrating text, time series, tabular information, events, and images—adds complexity to bias mitigation, as the impact on minority groups varies across modalities. In this talk, we first discuss the ways to learn unified representations from heterogeneous data using pre-trained models. Then, we will discuss the limitations of conventional bias mitigation strategies, highlighting their inability to address the nuanced biases affecting intersectional subgroups in multimodal setting. We will then explore the impact of bagging strategies on fairness improvement using extensive multimodal datasets, MIMIC-Eye and MIMIC-IV ED. We will conclude by examining the conditions and factors that drive the positive correlation bagging and fairness improvements, showcasing scenarios where these strategies succeed. By advancing research in these directions, we aim to pave the way for more inclusive and equitable multimodal AI systems
- Bio: Malihe Alikhani is an Assistant Professor at Northeastern University’s Khoury College of Computer Sciences and a member of the Northeastern Ethics Institute. Her research focuses on designing inclusive and equitable language technologies, with an emphasis on addressing biases in machine learning models for critical applications in education, health, and social justice. By integrating insights from cognitive science, neuroscience, philosophy, policy, and social sciences with advanced computational techniques, she develops systems that communicate effectively across diverse and underserved populations. Dr. Alikhani’s interdisciplinary collaborations with educators, healthcare experts, and community leaders aim to create technology-enabled experiences that promote inclusivity and equity. Her work is supported by substantial funding from organizations such as NSF, NIH, DARPA, Amazon, and Google, enabling impactful projects like the Alexa Prize TaskBot Challenge, multimodal affect detection, and initiatives to improve accessibility in STEM education for deaf students.
- Bio: Resmi Ramachandranpillai is a researcher specializing in Responsible Artificial Intelligence with a focus on social benefits, including fairness, explainability, utility, and robustness in AI and generative AI systems. Her expertise spans Large Language Models (LLMs), multimodal healthcare, mental health, financial data, timeseries generation and forecasting, and computer vision. Currently, she is a Postdoctoral Researcher at Northeastern University’s Institute of Experiential AI, working on interdisciplinary projects involving algorithmic fairness, privacy, and robust evaluation frameworks for high-stakes generative AI applications. Resmi has also contributed to the EU-funded Trustworthy AI project during her postdoctoral fellowship at Linköping University, Sweden, where she focused on scientific foundations for AI trustworthiness. Her research is driven by a commitment to advancing equitable and transparent AI systems across diverse domains.
- Website: https://www.malihealikhani.com/
- Website: https://ai.northeastern.edu/our-people/resmi-ramachandranpillai
Previous talks:
All previous abstracts can be found here.
- S01E01 - Dr. Roxana Daneshjou (Stanford University School of Medicine, Stanford, CA, USA). 27th Feb 2023. Challenges with equipoise and fairness in AI/ML datasets in dermatology
- S01E02 - Dr. David Wen (Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, UK). 27th Feb 2023. Characteristics of open access skin cancer image datasets: implications for equitable digital health
- S01E03 - Prof. Colin Fleming (Ninewells Hospital, Dundee, UK). 27th Feb 2023. Characteristics of skin lesions datasets
- S02E01 - Prof. Amber Simpson (Queen’s University, Canada). 5th June 2023. The medical segmentation decathlon
- S02E02 - Dr. Esther E. Bron (Erasmus MC - University Medical Center Rotterdam, the Netherlands). 5th June 2023. Image analysis and machine learning competitions in dementia
- S02E03 - Dr. Ujjwal Baid (University of Pennsylvania, USA). 5th June 2023. Brain tumor segmentation challenge 2023
- S03E01 - Dr. Thijs Kooi (Lunit, South Korea). 18th September 2023. Optimizing annotation cost for AI based medical image analysis
- S03E02 - Dr. Andre Pacheco (Federal University of EspĂrito Santo, Brazil). 18th September 2023. PAD-UFES-20: the challenges and opportunities in creating a skin lesion dataset
- S04E01 - Dr. Jessica Schrouff (Google DeepMind, UK). 4th December 2023. Detecting shortcut learning for fair medical AI
- S04E02 - Rhys Compton and Lily Zhang (New York University, USA). 4th December 2023. When more is less: Incorporating additional datasets can hurt performance by introducing spurious correlations
- S04E03 - Dr. Enzo Ferrante (CONICET, Argentina). 4th December 2023. Building and auditing a large-scale x-ray segmentation dataset with automatic annotations: Navigating fairness without ground-truth
- S05E01 - Hubert Dariusz ZajÄ…c and Natalia-Rozalia Avlona (University of Copenhagen, Denmark). 25th March 2024. Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI
- S05E02 - Dr. Annika Reinke (DKFZ, Germany). 25th March 2024. Why your Dataset Matters: Choosing the Right Metrics for Biomedical Image Analysis
- S05E03 - Alceu Bissoto and Dr. Sandra Avila (UNICAMP, Brazil). 25th March 2024. The Performance of Transferability Metrics does not Translate to Medical Tasks
All previous abstracts can be found here.
Organizers
Amelia Jiménez-Sánchez, Théo Sourget & Veronika Cheplygina at the IT University of Copenhagen (Denmark), and Steff Groefsema at the University of Groningen (the Netherlands). This project has received funding from the Independent Research Fund Denmark - Inge Lehmann number 1134-00017B.
Newsletter
If you want to receive information about upcoming seminars, please sign up to our mailing list. We pick the GDPR-compliant Brevo (formerly Sendinblue) as our mail provider. If you have any concerns relating to our data handling, please read our privacy notice.
Please be aware that many mail providers are tagged as junk, and the confirmation email might end up in your spam folder. Double check if your confirmation email is there. The sender will be PURRlab @ IT University of Copenhagen (amji @ itu.dk). Please add this sender to your contacts. If you have any problems subscribing to our mailing list, please contact Amelia.