News

IDA Meeting (21st Nov 2018)

IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 4:00PM after Opening the Black Box seminar series talk by Professor Niels Peek.

Talk from Leila Yousefi (Brunel University London)

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Presentation slides can be found here and the paper transcript can be found here.

Paper accepted in IDEAL 2018 – Roja Ahmadi

Title: Intrusion Detection Using Transfer Learning in Machine Learning Classifiers Between Non-cloud and Cloud Datasets

Authors: Roja Ahmadi, Robert D. Macredie and Allan Tucker

Abstract: One of the critical issues in developing intrusion detection systems (IDS) in cloud-computing environments is the lack of publicly available cloud intrusion detection datasets, which hinders research into IDS in this area. There are, however, many non-cloud intrusion detection datasets. This paper seeks to leverage one of the well-established non-cloud datasets and analyze it in relation to one of the few available cloud datasets to develop a detection model using a machine learning technique. A complication is that these datasets often have different structures, contain different features and contain different, though overlapping, types of attack. The aim of this paper is to explore whether a simple machine learning classifier containing a small common feature set trained using a non-cloud dataset that has a packet-based structure can be usefully applied to detect specific attacks in the cloud dataset, which contains timebased traffic. Through this, the differences and similarities between attacks in the cloud and non-cloud datasets are analyzed and suggestions for future work are presented.

Conference: The 19th International conference on Intelligent Data Engineering and Automated Learning (IDEAL 2018), Madrid, Spain.

The paper will publish in Springer LNCS/LNAI Proceedings.

Paper accepted in IEEE BIBM 2018 – Leila Yousefi

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

Authors: Leila Yousefi (Leila.Yousefi@brunel.ac.uk), Stephen Swift, Mahir Arzoky, Lucia Saachi, Luca Chiovato and Allan Tucker

Abstract: Clinicians predict disease and related complications based on prior knowledge and each individual patient’s clinical history. The prediction process is complex because of the existence of unmeasured risk factors, the unexpected development of complications, and varying responses of patients to the disease
over time. Exploiting hidden variables (i.e., unmeasured risk factors) can improve the modeling of disease progression and being able to understand the semantics of the hidden variables will enable clinicians to focus on the early diagnosis and treatment of unexpected conditions among sufferers. However, the overuse
of hidden variables can lead to complex models that can overfit and are not well understood (being ‘black box’ in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand the manner of disease progression in different patients while improving prediction. Here, we explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping (DTW) and hierarchical clustering to cluster patients based upon these hidden variables to begin to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus (T2DM) patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

Conference: IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM 2018)

Poster accepted in Intelligent Data Analysis 2018 – Mashael Al-Luhaybi

Title: Identification of Student “Types” from Online Self- Assessment Temporal Trajectories With Dynamic Time Warping for Performance Prediction

Authors: Mashael Al-Luhaybi, Leila Yousefi, Stephen Swift, Steve Counsell and Allan Tucker

Affiliations: Brunel University London (UK)

Conference websitehttps://ida2018.org

Spotlight Presentation Slides: Here

Poster accepted in Intelligent Data Analysis 2018 – Leila Yousefi

Title: Opening the Black Box: Discovering and Explaining Hidden Variables in Patient Modelling

Authors: Leila Yousefi1, Stephen Swift1, Mahir Arzoky1, Allan Tucker1, Lucia Saachi2 and Luca Chiovato2

Affiliations: 1. Brunel University London (UK), 2. University of Pavia, Instituti Maugeri (Italy)

Conference websitehttps://ida2018.org

Spotlight Presentation Slides: Here

Poster accepted in Intelligent Data Analysis 2018 – Nicky Nicolson

Title: Interactive visualisation of field collected botanical specimen metadata: supporting data mining process development

Authors: Nicky Nicolson (n.nicolson@kew.org)1,2, Allan Tucker2

Affiliations: 1. Biodiversity Informatics & Spatial Analysis, RBG Kew (UK), 2. Department of Computer Science, Brunel University London (UK)

Abstract: This slide deck outlines the development and utilisation of an interactive data visualisation tool, developed throughout a PhD level research project. Originally designed to aid initial data exploration and gather expert input, the toolkit was further refined to support process design, quality assurance and refinement by viewing data mining results at known stages of a pipeline process, and to enable visualisation of data aggregations used to define new features for use in predictive models. Newly defined features can be regarded as additional data, feeding back into data exploration and forming an iterative process. The toolkit has contributed to reproducible research by adding tool support and activity logging at one of the loosest stages of the research process.

Conference website: https://ida2018.org

Slides: http://bit.ly/nicolson-ida2018