csxxmma2 – IDA Research Brunel

New Publications

Congratulations to the following for their recent paper acceptance!

Biraja Ghoshal and Allan Tucker, “On Cost-Sensitive Calibrated Uncertainty in Deep Learning: An application on COVID-19 detection” to IEEE CBMS 2021

Seyed Erfan Sajjadi and Allan Tucker, “Exploiting Clinical Staging Data to Constrain Pseudo-Time Modelling of Disease Progression” to IEEE CBMS 2021

Ben Evans and Allan Tucker
Evans B.C., Tucker A., Wearn O.R., Carbone C. (2021). Reasoning About Neural Network Activations: An Application in Spatial Animal Behaviour from Camera Trap Classifications. In: Koprinska I. et al. (eds) ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham. https://doi.org/10.1007/978-3-030-65965-3_2

Lianghao Han
Zhihao Dai, Zhong Li and Lianghao Han. (2021). BoneBert: A BERT-based Automated Information Extraction System of Radiology Reports for Bone Fracture Detection and Diagnosis, IDA 2021.

Yue Shi, Liangxiu Han, Wenjiang Huang, Sheng Chang, Yingying Dong, Darren Dancey, Lianghao Han. (2021). A Biologically Interpretable Two-stage Deep Neural Network (BIT-DNN) For Vegetation Recognition From Hyperspectral Imagery, 10.1109/TGRS.2021.3058782, IEEE Transactions on Geoscience and Remote Sensing, IDA 2021.

Biraja Ghoshal and Allan Tucker
Ghoshal, B. and Tucker, A. (2021) Hyperspherical Weight Uncertainty in Neural Networks, IDA 2021.

Bjaveet Nagaria, Ben Evans, Ashley Mann and Mahir Arzoky (2021). ‘Using an Instant Visual and Text Based Feedback_Tool to Teach Path Finding Algorithms A Concept’.SEENG 2021 Third International Workshop on Software Engineering Education for the Next Generation. Virtual.

Biraja Ghoshal, Bhargab Ghoshal, Stephen Swift, Allan Tucker (2021). “Uncertainty Estimation in SARS-CoV-2 B-cell Epitope Prediction for Vaccine Development”, AIME 2021

Biraja Ghoshal, Stephen Swift, Allan Tucker (2021). Bayesian Deep Active Learning for Medical Image Analysis, AIME 2021

IDA Seminar (06 Nov 2019)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM

Talks from:

Leila Yousefi: The Prevalence of Errors in Machine Learning Experiments (slides can be found here)

Marco Ortu: The Butterfly “Affect”: Impact of Development Practices on Cryptocurrency Prices (slides can be found here)

Gabriel Scali: Constraint Satisfaction Problems and Constraint Programming (slides can be found here)

IDA Opening the Black Box Seminar (04 Oct 2019)

Jaakko Hollmén, Stockholm University, Department of Computer and Systems Sciences, Sweden

Jaakko Hollmén is a faculty member at Department of Computer and Systems Sciences at Stokcholm University in Sweden (since September 2019). Prior to joining Stokcholm university, he was a faculty member at the Department of Computer Science at Aalto University in Finland. His research interests include theory and practice of machine learning and data mining, in particular in the context of health, medicine and environmental sciences. He has been involved in the organization of many IDA conferences for the past ten years. He is also the secretary of the IDA council.

Title of Talk: Diagnostic prediction in neonatal intensive care units

Abstract: Preterm infants, born before 37 weeks of gestation, are subject to many developmental issues and health problems. Very Low Birth Weight (VLBW) infants, with a birth weight under 1500 g, are the most afflicted in this group. These infants require treatment in the neonatal intensive care unit before they are mature enough for hospital discharge. The neonatal intensive care unit is a data-intensive environment, where multi-channel physiological data is gathered from patients using a number of sensors to construct a comprehensive picture of the patients’ vital signs. We have looked into the problem how to predict neonatal in-hospital mortality and morbidities. We have used time series data collected from Very Low Birth Weight infants treated in the neonatal intensive care unit of Helsinki University Hospital between 1999 and 2013. Our results show that machine learning models based on time series data alone have predictive power comparable with standard medical scores, and combining the two results in improved predictive ability. We have also studied the effect of observer bias on recording vital sign measurements in the neonatal intensive care unit, as well as conducted a retrospective cohort study on trends in the growth of Extremely Low Birth Weight (birth weight under 1000 g) infants during intensive care.

IET: Human Motion Analysis for Healthcare Applications – Noureddin Sadawi

Two talks were presented by Dr Noureddin Sadawi on 26/06/2019 at the Human Motion Analysis for Healthcare Applications conference at the Institute of Engineering and Technology.

Talk 1: Rehabilitation Movement Correctness Classification (slides can be found here)

Talk 2: Challenges encountered while developing MIRA (slides can be found here)

IDA Opening the Black Box Seminar (15 May 2019)

The IDA Opening the Black Box seminar was held from Professor John Holmes titled ‘Explainable AI for the (Not-Always Expert) Clinical Researcher’. It was held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM. Slides can be found here.

John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Institute for Biomedical Informatics, Director of the Master’s Program in Biomedical Informatics, and Chair of the Doctoral Program in Epidemiology, all at Penn. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new approaches to mining epidemiologic surveillance data, as well as his efforts at furthering educational initiatives in clinical research. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has collaborated as the informatics lead on an Agency for Healthcare Research and Quality-funded project at Harvard Medical School to establish a scalable distributed research network, and he has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network; he participates in the FDA Sentinel Initiative. Dr. Holmes has served as the evaluator for the PCORNet Obesity Initiative studies, where he was responsible for developing and implementing the evaluation plan and metrics for the initiative. Dr. Holmes is or has been a principal or co-investigator on projects funded by the National Cancer Institute, the National Library of Medicine, and the Agency for Healthcare Research and Quality, and he was the Penn principal Investigator of the NIH-funded Penn Center of Excellence in Prostate Cancer Disparities. Dr. Holmes is engaged with the Botswana-UPenn Partnership, assisting in building informatics education and clinical research capacity in Botswana. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).

Abstract Armed with a well-founded research question, the clinical researcher’s next step is usually to seek out the data that could help answer it, although the researcher can use data to discover a new research question. In both cases, the data will already be available, and so either approach to inquiry can be appropriate and justifiable. However, the next steps- data preparation, analytics, and inference- are often thorny issues that even the most seasoned researcher must address, and sometimes not so easily. Traditional approaches to data preparation, that include such methods as frequency distribution and contingency table analyses to characterize the data are themselves open to considerable investigator bias. In addition, there is considerable tedium resulting from applying these methods- for example, how many contingency tables does it take to identify variable interactions? It is arguable that feature selection and construction are two tasks not to be left only to human interpretation. Yet we don’t see much in the way of novel approaches to “experiencing” data such that new, data-driven insights arise during the data preparation process. The same can be said for analysis, where even state-of-the art statistical methods, informed or driven by pre-formed hypotheses and the results of feature selection processes, sometimes hampers truly novel knowledge discovery. As a result, inferences made from these analyses likewise suffer. However, new approaches to making AI explainable to users, in this case clinical researchers who do not have the time or inclination to develop a deep understanding of how this or that AI algorithm works, are critically important, and their dearth represents a gap that those of us in clinical research informatics need to fill. Yet, the uninitiated shy away from AI for the very lack of explainability. This talk will explore some new methods for making AI explainable, one of which, PennAI, has been developed at the University of Pennsylvania. PennAI will be demonstrated using several sample datasets.

IDA Meeting (20 Feb 2019)

The IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 2:00PM.

Spotlight presentations from the following PhD students:

Afees Odebode: A sampling-based Clustering Scheme for Large Data Sets
Bashir Dodo: Level Set Segmentation of Retinal OCT Images
Ben Evans: Camera Trapping + ML
Joanna Pawlik: Extracting Predictive Models from Flora Free-Text Documents at The Royal Botanic Gardens, Kew, London
Khalipha Nuhu: Investigating user responses to mandatory IT-induced changes in organizations
Leila Yousefi: Opening the Black Box – Discovering Hidden Variables in Type II Diabetes Prediction and Patient Modelling
Mashael Al-Luhaybi: Predicting Academic Performance (Learning Dynamic Bayesian Networks)

A talk from Dr Noureddin Sadawi titled ‘Embarrassingly Parallel’ (slides can be found here).

Dr Sadawi is a research fellow at the Department of Computer Science, College of Engineering, Design and Physical Sciences, Brunel University London. His scientific research focuses on the applications of machine learning and data mining in areas such as drug design and discovery, omics data, gesture recognition, financial data analysis and object recognition.

IDA Meeting (21st Nov 2018)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 4:00PM after Opening the Black Box seminar series talk by Professor Niels Peek.

Talk from Leila Yousefi (Brunel University London)

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Presentation slides can be found here and the paper transcript can be found here.

Paper accepted in IDEAL 2018 – Roja Ahmadi

Title: Intrusion Detection Using Transfer Learning in Machine Learning Classifiers Between Non-cloud and Cloud Datasets

Authors: Roja Ahmadi, Robert D. Macredie and Allan Tucker

Abstract: One of the critical issues in developing intrusion detection systems (IDS) in cloud-computing environments is the lack of publicly available cloud intrusion detection datasets, which hinders research into IDS in this area. There are, however, many non-cloud intrusion detection datasets. This paper seeks to leverage one of the well-established non-cloud datasets and analyze it in relation to one of the few available cloud datasets to develop a detection model using a machine learning technique. A complication is that these datasets often have different structures, contain different features and contain different, though overlapping, types of attack. The aim of this paper is to explore whether a simple machine learning classifier containing a small common feature set trained using a non-cloud dataset that has a packet-based structure can be usefully applied to detect specific attacks in the cloud dataset, which contains timebased traffic. Through this, the differences and similarities between attacks in the cloud and non-cloud datasets are analyzed and suggestions for future work are presented.

Conference: The 19th International conference on Intelligent Data Engineering and Automated Learning (IDEAL 2018), Madrid, Spain.

The paper will publish in Springer LNCS/LNAI Proceedings.

Paper accepted in IEEE BIBM 2018 – Leila Yousefi

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Conference: IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2018)

Poster accepted in Intelligent Data Analysis 2018 – Mashael Al-Luhaybi

Title: Identification of Student “Types” from Online Self- Assessment Temporal Trajectories With Dynamic Time Warping for Performance Prediction

Authors: Mashael Al-Luhaybi, Leila Yousefi, Stephen Swift, Steve Counsell and Allan Tucker

Affiliations: Brunel University London (UK)

Conference website: https://ida2018.org

Spotlight Presentation Slides: Here