IDA Opening the Black Box Seminar (04 Oct 2019)

Jaakko Hollmén, Stockholm University, Department of Computer and Systems Sciences, Sweden

Jaakko Hollmén is a faculty member at Department of Computer and Systems Sciences at Stokcholm University in Sweden (since September 2019). Prior to joining Stokcholm university, he was a faculty member at the Department of Computer Science at Aalto University in Finland. His research interests include theory and practice of machine learning and data mining, in particular in the context of health, medicine and environmental sciences. He has been involved in the organization of many IDA conferences for the past ten years. He is also the secretary of the IDA council.

Title of Talk: Diagnostic prediction in neonatal intensive care units

Abstract: Preterm infants, born before 37 weeks of gestation, are subject to many developmental issues and health problems. Very Low Birth Weight (VLBW) infants, with a birth weight under 1500 g, are the most afflicted in this group. These infants require treatment in the neonatal intensive care unit before they are mature enough for hospital discharge. The neonatal intensive care unit is a data-intensive environment, where multi-channel physiological data is gathered from patients using a number of sensors to construct a comprehensive picture of the patients’ vital signs. We have looked into the problem how to predict neonatal in-hospital mortality and morbidities. We have used time series data collected from Very Low Birth Weight infants treated in the neonatal intensive care unit of Helsinki University Hospital between 1999 and 2013. Our results show that machine learning models based on time series data alone have predictive power comparable with standard medical scores, and combining the two results in improved predictive ability. We have also studied the effect of observer bias on recording vital sign measurements in the neonatal intensive care unit, as well as conducted a retrospective cohort study on trends in the growth of Extremely Low Birth Weight (birth weight under 1000 g) infants during intensive care.

IDA Opening the Black Box Seminar (15 May 2019)

The IDA Opening the Black Box seminar was held from Professor John Holmes titled ‘Explainable AI for the (Not-Always Expert) Clinical Researcher’. It was held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM. Slides can be found here.

John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Institute for Biomedical Informatics, Director of the Master’s Program in Biomedical Informatics, and Chair of the Doctoral Program in Epidemiology, all at Penn. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new approaches to mining epidemiologic surveillance data, as well as his efforts at furthering educational initiatives in clinical research. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has collaborated as the informatics lead on an Agency for Healthcare Research and Quality-funded project at Harvard Medical School to establish a scalable distributed research network, and he has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network; he participates in the FDA Sentinel Initiative. Dr. Holmes has served as the evaluator for the PCORNet Obesity Initiative studies, where he was responsible for developing and implementing the evaluation plan and metrics for the initiative. Dr. Holmes is or has been a principal or co-investigator on projects funded by the National Cancer Institute, the National Library of Medicine, and the Agency for Healthcare Research and Quality, and he was the Penn principal Investigator of the NIH-funded Penn Center of Excellence in Prostate Cancer Disparities. Dr. Holmes is engaged with the Botswana-UPenn Partnership, assisting in building informatics education and clinical research capacity in Botswana. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).

Abstract Armed with a well-founded research question, the clinical researcher’s next step is usually to seek out the data that could help answer it, although the researcher can use data to discover a new research question. In both cases, the data will already be available, and so either approach to inquiry can be appropriate and justifiable. However, the next steps- data preparation, analytics, and inference- are often thorny issues that even the most seasoned researcher must address, and sometimes not so easily. Traditional approaches to data preparation, that include such methods as frequency distribution and contingency table analyses to characterize the data are themselves open to considerable investigator bias. In addition, there is considerable tedium resulting from applying these methods- for example, how many contingency tables does it take to identify variable interactions? It is arguable that feature selection and construction are two tasks not to be left only to human interpretation. Yet we don’t see much in the way of novel approaches to “experiencing” data such that new, data-driven insights arise during the data preparation process. The same can be said for analysis, where even state-of-the art statistical methods, informed or driven by pre-formed hypotheses and the results of feature selection processes, sometimes hampers truly novel knowledge discovery. As a result, inferences made from these analyses likewise suffer. However, new approaches to making AI explainable to users, in this case clinical researchers who do not have the time or inclination to develop a deep understanding of how this or that AI algorithm works, are critically important, and their dearth represents a gap that those of us in clinical research informatics need to fill. Yet, the uninitiated shy away from AI for the very lack of explainability. This talk will explore some new methods for making AI explainable, one of which, PennAI, has been developed at the University of Pennsylvania. PennAI will be demonstrated using several sample datasets.

IDA Meeting (20 Feb 2019)

The IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 2:00PM.

Spotlight presentations from the following PhD students:

Afees Odebode: A sampling-based Clustering Scheme for Large Data Sets
Bashir Dodo: Level Set Segmentation of Retinal OCT Images
Ben Evans: Camera Trapping + ML
Joanna Pawlik: Extracting Predictive Models from Flora Free-Text Documents at The Royal Botanic Gardens, Kew, London
Khalipha Nuhu: Investigating user responses to mandatory IT-induced changes in organizations
Leila Yousefi: Opening the Black Box – Discovering Hidden Variables in Type II Diabetes Prediction and Patient Modelling
Mashael Al-Luhaybi: Predicting Academic Performance (Learning Dynamic Bayesian Networks)

A talk from Dr Noureddin Sadawi titled ‘Embarrassingly Parallel’ (slides can be found here).

Dr Sadawi is a research fellow at the Department of Computer Science, College of Engineering, Design and Physical Sciences, Brunel University London. His scientific research focuses on the applications of machine learning and data mining in areas such as drug design and discovery, omics data, gesture recognition, financial data analysis and object recognition.

 

IDA Meeting (21st Nov 2018)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 4:00PM after Opening the Black Box seminar series talk by Professor Niels Peek.

Talk from Leila Yousefi (Brunel University London)

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Presentation slides can be found here and the paper transcript can be found here.

Paper accepted in IDEAL 2018 – Roja Ahmadi

Title: Intrusion Detection Using Transfer Learning in Machine Learning Classifiers Between Non-cloud and Cloud Datasets

Authors: Roja Ahmadi, Robert D. Macredie and Allan Tucker

Abstract: One of the critical issues in developing intrusion detection systems (IDS) in cloud-computing environments is the lack of publicly available cloud intrusion detection datasets, which hinders research into IDS in this area. There are, however, many non-cloud intrusion detection datasets. This paper seeks to leverage one of the well-established non-cloud datasets and analyze it in relation to one of the few available cloud datasets to develop a detection model using a machine learning technique. A complication is that these datasets often have different structures, contain different features and contain different, though overlapping, types of attack. The aim of this paper is to explore whether a simple machine learning classifier containing a small common feature set trained using a non-cloud dataset that has a packet-based structure can be usefully applied to detect specific attacks in the cloud dataset, which contains timebased traffic. Through this, the differences and similarities between attacks in the cloud and non-cloud datasets are analyzed and suggestions for future work are presented.

Conference: The 19th International conference on Intelligent Data Engineering and Automated Learning (IDEAL 2018), Madrid, Spain.

The paper will publish in Springer LNCS/LNAI Proceedings.

Paper accepted in IEEE BIBM 2018 – Leila Yousefi

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Conference: IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2018)