Making Sense out of Software Engineering Data And an introduction to R
Prof Sandro Morasca, Università degli studi dell’Insubria, Italy
The FREE summer short course (funded by Erasmus+) was organised by Prof Martin Shepperd on 11-12 July, 2018 (13:00-17:00 in WLFB208).
The course addressed the techniques that can be sensibly used to extract knowledge out of Software Engineering data acquired via experiments or routine data collection in industrial contexts, to make it practically useful. The course described and critically discussed a number of data analysis techniques, by explaining their preconditions and their outcomes. The course illustrated both basic, traditional techniques and innovative ones, like those based on Robust Regression or machine learning. Also, it explained how the models obtained can be validated.
A big thank you to Sandro and Martin for running this fantastic short course.
Lecture slides can be found here.
IDA meeting held at WLFB 207/208 (2nd floor of Wilfred Brown) at 3:00PM
Talk from Natalia Viani, King’s College London
Electronic health records represent a great source of valuable information for both patient care and biomedical research. Despite the efforts put into collecting structured data, a lot of information is available only in the form of free-text. For this reason, developing natural language processing (NLP) systems that identify clinically relevant concepts (e.g., symptoms, medication) is essential. Moreover, contextualizing these concepts from the temporal point of view represents an important step.
Over the past years, many NLP systems have been developed to process clinical texts written in English and belonging to specific medical domains (e.g., intensive care unit, oncology). However, research for multiple languages and domains is still limited. Through my PhD years, I applied information extraction techniques to the analysis of medical reports written in Italian, with a focus on the cardiology domain. In particular, I explored different methods for extracting clinical events and their attributes, as well as temporal expressions. At the moment, I am working on the analysis of mental health records for patients with a diagnosis of schizophrenia, with the aim to automatically identify symptom onset information starting from clinical notes.
Dr Viani is a postdoctoral research associate at the Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London. She received her PhD in Bioengineering and Bioinformatics from the Department of Electrical, Computer and Biomedical Engineering, University of Pavia, in January 2018. During her PhD, she spent six months as a visiting research scholar in the Natural Language Processing Laboratory at the Computational Health Informatics Program at Boston Children’s Hospital – Harvard Medical School. Her research interests are natural language processing, clinical and temporal information extraction, and biomedical informatics. I am especially interested in the reconstruction of clinical timelines starting from free-text.
Slides from the talk can be found here.
The Machine Learning Reading Group was held on 04/07/2018 1:30 PM (IDA/BSEL Lab). The core concept for this meeting is Random forests and the proposed article to discuss is “Prediction of the FIFA World Cup 2018 – A random forest approach with an emphasis on estimated team ability parameters” https://arxiv.org/pdf/1806.03208.pdf
A short presentation on Random forests can be found here.
Congratulations to Leila Yousefi who won best student paper at IEEE CBMS 2018. The paper is titled “Predicting Disease Complications Using a Step-Wise Hidden Variable Approach for Learning Dynamic Bayesian Networks”
Below is the abstract and full list of authors.
Predicting Diabetes Type 2 Mellitus (T2DM) complications such as retinopathy and liver disease is still a challenge despite being a growing public health concern worldwide. This is due to the complex interactions between complications and other features, as well as between the different complications, themselves. What is more, there are likely to be many unmeasured effects that impact the disease progression of different patients. Probabilistic graphical models such as Dynamic Bayesian Networks (DBNs) have demonstrated much promise in the modeling of disease progression and they can naturally incorporate hidden (latent) variables using the EM algorithm. Unlike deep learning approaches that attempt to model complex interactions in data by using a large number of hidden variables, we adopt a different approach. We are interested in models that not only capture unmeasured effects but are also transparent in how they model data so that knowledge about disease processes can be extracted and trust in the model can be maintained by clinicians. As a result, we have developed a step-wise hidden variable structure learning process that incrementally adds hidden variables based on the IC* algorithm. To the best of our knowledge, this is the first study for classifying disease complication using a step-wise learning methodology for identifying hidden and T2DM features with a DBN structure from clinical data. Our extensive set of experiments show that the proposed method improves classification accuracy, identifying the correct number of hidden variables, and targeting their precise location within the network structure.
Leila Yousefi, Allan Tucker, Mashael Al-luhaybi, Lucia Saachi, Riccardo Bellazzi and Luca Chiovato.
Welly done Lilly!
The Machine Learning Reading Group was held on 12/06/2018 11:00 AM (IDA/BSEL Lab) on Reinforcement learning . It was led by Dr Alina Miron.
The core concept for the meeting was Reinforcement learning and the article discussed was “Mastering the game of Go without human knowledge” https://www.nature.com/articles/nature24270.
A short presentation on reinforcement learning can be found here.
IDA meetings will now be held at our new IDA-BSEL Research Group Laboratory – WBB 208 (2nd floor of Wilfred Brown)
Today’s talks are from:
Samy Ayed on an exploratory study of the inputs for ensemble clustering technique as a subset selection problem (PDF Slides can be found HERE).
Leila Yousefi and Weibuo Liu both discussed Deep Learning and how latent variables are used in their PhDs (PDF Slides can be found HERE).
We are pleased to announce that Bashir Dodo’s paper “Graph-Cut Segmentation of Retinal Layers from OCT Images” has won the BIOIMAGING 2018 Best Student Paper Award.
Below is the abstract and full list of authors.
The segmentation of various retinal layers is vital for diagnosing and tracking progress of medication of various ocular diseases. Due to the complexity of retinal structures, the tediousness of manual segmentation and variation from different specialists, many methods have been proposed to aid with this analysis. However image artifacts, in addition to inhomogeneity in pathological structures, remain a challenge, with negative influence on the performance of segmentation algorithms. Previous attempts normally pre-process the images or model the segmentation to handle the obstruction but it still remains an area of active research, especially in relation to the graph based algorithms. In this paper we present an automatic retinal layer segmentation method, which is comprised of fuzzy histogram hyperbolization and graph cut methods to segment 8 boundaries and 7 layers of the retina on 150 OCT B-Sans images, 50 each from the temporal, nasal and centre of foveal region. Our method shows positive results, with additional tolerance and adaptability to contour variance and pathological inconsistency of the retinal structures in all regions.
Bashir Isa Dodo, Yongmin Li, Khalid Eltayef and Xiaohui Liu.
IDA meeting held at WBB 208 (2nd floor of Wilfred Brown) – Weds 8th November at 3pm
Alina Miron on Exergames in healthcare.
Navid Dorudian on Moving object detection in challenging scenarios using colour and depth images.
The Sixteenth International Symposium on Intelligent Data Analysis (IDA 2017) was held between 26th October and 28th October, 2017 in London, UK.
– Opportunities and Challenges of Learning Health Systems by Professor Niels Peek
– Reverse engineering human decision making from high resolution analysis of behaviour by Dr A. Aldo Faisal
– Mining for Causal Results by Professor Paul Cohen
IDA this year was great success!
Special thanks goes to David Weston (General Chair), Niall Adams and Allan Tucker (Program Chairs), and Stephen Swift (Local Chair) and his team for the local arrangements.
Next year it will be held in Den Bosch, the Netherlands.
Updated on 3rd novemeber 2017.
Nicky Nicolson attended the Biodiversity Information Standards annual conference (https://tdwg.github.io/conferences/2017/) in Ottawa, Canada. This is an association which develops standards to represent and interchange biodiversity information. The conference includes both practical work on standards development and documentation, and traditional conference presentations on research which has been enabled by the provision of standardised data from heterogenous sources. Recent conferences have shown greater take-up of intelligent data analysis techniques. Nicky co-authored a presentation on the development of large-scale analytics infrastructures as part of a collaboration with the Advanced Computing and Information Systems Laboratory at University of Florida in the iDigBio (https://www.idigbio.org/) project.
Conference abstracts are available here: https://biss.pensoft.net/collection/25/
Updated on 18th October 2017.