IDA Opening the Black Box Seminar (15 May 2019)

The IDA Opening the Black Box seminar was held from Professor John Holmes titled ‘Explainable AI for the (Not-Always Expert) Clinical Researcher’. It was held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM. Slides can be found here.

John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Institute for Biomedical Informatics, Director of the Master’s Program in Biomedical Informatics, and Chair of the Doctoral Program in Epidemiology, all at Penn. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new approaches to mining epidemiologic surveillance data, as well as his efforts at furthering educational initiatives in clinical research. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has collaborated as the informatics lead on an Agency for Healthcare Research and Quality-funded project at Harvard Medical School to establish a scalable distributed research network, and he has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network; he participates in the FDA Sentinel Initiative. Dr. Holmes has served as the evaluator for the PCORNet Obesity Initiative studies, where he was responsible for developing and implementing the evaluation plan and metrics for the initiative. Dr. Holmes is or has been a principal or co-investigator on projects funded by the National Cancer Institute, the National Library of Medicine, and the Agency for Healthcare Research and Quality, and he was the Penn principal Investigator of the NIH-funded Penn Center of Excellence in Prostate Cancer Disparities. Dr. Holmes is engaged with the Botswana-UPenn Partnership, assisting in building informatics education and clinical research capacity in Botswana. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).

Abstract Armed with a well-founded research question, the clinical researcher’s next step is usually to seek out the data that could help answer it, although the researcher can use data to discover a new research question. In both cases, the data will already be available, and so either approach to inquiry can be appropriate and justifiable. However, the next steps- data preparation, analytics, and inference- are often thorny issues that even the most seasoned researcher must address, and sometimes not so easily. Traditional approaches to data preparation, that include such methods as frequency distribution and contingency table analyses to characterize the data are themselves open to considerable investigator bias. In addition, there is considerable tedium resulting from applying these methods- for example, how many contingency tables does it take to identify variable interactions? It is arguable that feature selection and construction are two tasks not to be left only to human interpretation. Yet we don’t see much in the way of novel approaches to “experiencing” data such that new, data-driven insights arise during the data preparation process. The same can be said for analysis, where even state-of-the art statistical methods, informed or driven by pre-formed hypotheses and the results of feature selection processes, sometimes hampers truly novel knowledge discovery. As a result, inferences made from these analyses likewise suffer. However, new approaches to making AI explainable to users, in this case clinical researchers who do not have the time or inclination to develop a deep understanding of how this or that AI algorithm works, are critically important, and their dearth represents a gap that those of us in clinical research informatics need to fill. Yet, the uninitiated shy away from AI for the very lack of explainability. This talk will explore some new methods for making AI explainable, one of which, PennAI, has been developed at the University of Pennsylvania. PennAI will be demonstrated using several sample datasets.

IDA Meeting (20 Feb 2019)

The IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 2:00PM.

Spotlight presentations from the following PhD students:

Afees Odebode: A sampling-based Clustering Scheme for Large Data Sets
Bashir Dodo: Level Set Segmentation of Retinal OCT Images
Ben Evans: Camera Trapping + ML
Joanna Pawlik: Extracting Predictive Models from Flora Free-Text Documents at The Royal Botanic Gardens, Kew, London
Khalipha Nuhu: Investigating user responses to mandatory IT-induced changes in organizations
Leila Yousefi: Opening the Black Box – Discovering Hidden Variables in Type II Diabetes Prediction and Patient Modelling
Mashael Al-Luhaybi: Predicting Academic Performance (Learning Dynamic Bayesian Networks)

A talk from Dr Noureddin Sadawi titled ‘Embarrassingly Parallel’ (slides can be found here).

Dr Sadawi is a research fellow at the Department of Computer Science, College of Engineering, Design and Physical Sciences, Brunel University London. His scientific research focuses on the applications of machine learning and data mining in areas such as drug design and discovery, omics data, gesture recognition, financial data analysis and object recognition.


IDA Meeting (21st Nov 2018)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 4:00PM after Opening the Black Box seminar series talk by Professor Niels Peek.

Talk from Leila Yousefi (Brunel University London)

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (, Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Presentation slides can be found here and the paper transcript can be found here.

IDA Meeting (4th Jul 2018)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM

Talk from Natalia Viani, King’s College London

Electronic health records represent a great source of valuable information for both patient care and biomedical research. Despite the efforts put into collecting structured data, a lot of information is available only in the form of free-text. For this reason, developing natural language processing (NLP) systems that identify clinically relevant concepts (e.g., symptoms, medication) is essential. Moreover, contextualizing these concepts from the temporal point of view represents an important step.
Over the past years, many NLP systems have been developed to process clinical texts written in English and belonging to specific medical domains (e.g., intensive care unit, oncology). However, research for multiple languages and domains is still limited. Through my PhD years, I applied information extraction techniques to the analysis of medical reports written in Italian, with a focus on the cardiology domain. In particular, I explored different methods for extracting clinical events and their attributes, as well as temporal expressions. At the moment, I am working on the analysis of mental health records for patients with a diagnosis of schizophrenia, with the aim to automatically identify symptom onset information starting from clinical notes.

Dr Viani is a postdoctoral research associate at the Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London. She received her PhD in Bioengineering and Bioinformatics from the Department of Electrical, Computer and Biomedical Engineering, University of Pavia, in January 2018. During her PhD, she spent six months as a visiting research scholar in the Natural Language Processing Laboratory at the Computational Health Informatics Program at Boston Children’s Hospital – Harvard Medical School. Her research interests are natural language processing, clinical and temporal information extraction, and biomedical informatics. I am especially interested in the reconstruction of clinical timelines starting from free-text.

Slides from the talk can be found here.

Machine Learning Reading Group (4th Jul 2018)

The Machine Learning Reading Group was held on 04/07/2018 1:30 PM (IDA/BSEL Lab). The core concept for this meeting is Random forests  and the proposed article to discuss is  “Prediction of the FIFA World Cup 2018 – A random forest approach with an emphasis on estimated team ability parameters”

A short presentation on Random forests can be found here.

Machine Learning Reading Group (12th Jun 2018)

The Machine Learning Reading Group was held on 12/06/2018 11:00 AM (IDA/BSEL Lab) on Reinforcement learning . It was led by Dr Alina Miron.

The core concept for the meeting was Reinforcement learning and the article discussed was “Mastering the game of Go without human knowledge”

A short presentation on reinforcement learning can be found here.

IDA Meeting (8th Feb 2018)

IDA meetings will now be held at our new IDA-BSEL Research Group Laboratory – WBB 208 (2nd floor of Wilfred Brown)

Today’s talks are from:

Samy Ayed on an exploratory study of the inputs for ensemble clustering technique as a subset selection problem (PDF Slides can be found HERE).

Leila Yousefi and Weibuo Liu both discussed Deep Learning and how latent variables are used in their PhDs (PDF Slides can be found HERE).