Seminars – IDA Research Brunel

Upcoming Seminars

Part of the “Exploiting Simulation, AI & Knowledge to Improve Healthcare” Series (online):

5th June 2024 @ 3pm online –

STARS: Sharing Tools and Artefacts for Reproducible Simulation – Dr Alison Harper from the University of Exeter Business School

Abstract: STARS aims to improve the quality and quantity of shared discrete-event simulation models, tools, and other research artefacts in healthcare through the development of a framework for sharing computer models in health service design and health economics. We present an example of an early application of STARS for orthopaedic capacity planning using discrete-event simulation, with a focus on reusability. The allocation of NHS capital funds to increase capacity for managing elective waits has created planning and operational challenges for health services. An interactive web-based discrete-event simulation model was developed to support capacity planning of surgical activity and ward stay in a proposed new ring-fenced elective orthopaedic facility in Bristol. We will outline recent progress in STARS, and our plans to progress the work over the next two years to support reproducibility of healthcare simulation models. We welcome your input with regard to health economics models, reusability, and reproducibility challenges.

Bio: ALISON HARPER is a lecturer in operations and analytics at the Centre for Simulation, Analytics and Modelling, University of Exeter. Her research interests include applied health and social care modelling and simulation, real-time simulation, and open and reusable modelling in healthcare.

Validating, Monitoring and Interacting with model decisions – Dr Isabel Sasson from the Department of Computer Science, Brunel University London.

Abstract: As Machine Learning (AI) models underpin many of the decision tools used in day-to-day life there is a need for such models to be validated and monitored over time, and in some contexts there may be added value for model decisions to be not just explainable but for a facility to have a dialogue with the model decision or recommendation.

In this talk I will cover two projects one that focuses on the application of computational argumentation to support the dialogue with a model decision or recommendation. This will explain a use case where a model provides a recommendation for medication. The second will focus on model validation and monitoring in the context of clinical quality outcomes benchmarking, specifically when volumes of data and validation cases are small. This will describe the work done on validating risk-adjustment models used in Quality and Outcomes in Oral and Maxillofacial Surgery (QOMS) and considering whether simulation can enhance this process.

Previous Seminars

28th February 2024 @ 3:30pm online – Synthetic Data & Simulation Uncertainty – Ana Maria Cretu (EPFL) / Derek Groen (Brunel)

First Talk: Synthetic data and membership inference attacks

Ana Maria Cretu (EPFL)

Abstract: Synthetic data is seen as a very promising solution to share individual-level data while limiting privacy risks. The promise of synthetic data lies in the fact that it is generated by sampling new values from a statistical model. Since the generated records are artificial, direct reidentification of individuals by singling out their record in the dataset is not possible. Synthetic data, if truly privacy-preserving, can be shared and used freely as it would no longer fall under the scope of data protection regulations such as the EU’s GDPR. Researchers have however shown that synthetic data is not automatically privacy-preserving. This is because the statistical models used to generate synthetic data, so-called generative models, are fitted on real data in order to approximate the true data distribution, and models can leak information about their training dataset.

In this talk, I will focus on membership inference attacks (MIA) which are the standard tool to evaluate privacy leakage in data releases, including machine learning models trained on sensitive datasets and, more recently, synthetic datasets. MIAs aim to infer whether a particular sample was part of the private data used to train the generative model. I will describe the challenges of MIAs and dive deeper into two of my recent works on the topic. First, I will describe a method to identify vulnerable records of the private dataset on which the generative model is trained, using MIA risk as a measure of vulnerability. Second, I will describe a new MIA which removes an assumption commonly made in previous works about the adversary’s background knowledge. More specifically, this MIA can be performed using only synthetic data to learn a distinguishing boundary between releases trained with or without a particular record of interest.

Bio: Ana-Maria Cretu is a postdoctoral researcher in the SPRING Lab at EPFL in Switzerland, where she works on privacy and security. She is a recipient of the CYD Distinguished Postdoctoral Fellowship of the Swiss Cyber-Defense Campus. She completed her PhD in 2023 at Imperial College London where she was supervised by Dr. Yves-Alexandre de Montjoye. In her thesis, she studied privacy and security vulnerabilities in modern data processing systems, including machine learning models, query-based systems, and synthetic data, developing new methods for automated auditing of such systems. Through a rigorous study of privacy vulnerabilities, her research aims to inform the design of principled countermeasures allowing to prevent them and, ultimately, to use data safely. Ana-Maria holds an MSc in Computer Science from EPFL, Switzerland and a BSc and MSc from Ecole Polytechnique, France. She was a visiting researcher at the University of Oxford where she worked on deep learning techniques for natural language processing. She did two internships at Google (2016 and 2017), one at Twitter (2020) and one at Microsoft (2022).

Second Talk: Dante’s Uncertainty: A story of AIngels and dAImons in COVID modelling

Derek Groen (Brunel)

Abstract: My goal during the pandemic was to review epidemiological COVID models, but as the need arose I ended up both reviewing and creating them. Navigating the Nine* Layers of COVID uncertainty, I discovered where my lack of AI expertise led to serious development issues, where the AIngels shone brightest, and where the “dAImons” efforts ruined the party for more serious modellers. When it comes to COVID forecasting, all of the models are wrong and often less than useful, but nevertheless as a community we ended up with a much better understanding of pandemics and pandemic preparation than before. In my talk I will try to explain why we actually did end up being more able to do forecasts, whilst also showing a fairly sobering perspective on the enormous uncertainty that surrounds epidemic modelling forecasts (and some AI and other tools to address that uncertainty).

*open to interpretation: two layers or forty layers could also be justifiable

Bio: Derek Groen is a Reader in Computer Science at Brunel University London, and a Visiting Lecturer at University College London. He has a PhD from the University of Amsterdam (2010) in Computational Astrophysics, and was a Post-Doctoral Researcher at UCL for five years prior to joining Brunel as Lecturer. Derek has a strong interest in high performance simulations, multiscale modelling and simulation, and so-called VVUQ (verification, validation and uncertainty quantification). In terms of applications he is a lead developer on the Flee migration modelling code and the Flu And Coronavirus Simulator (FACS) COVID-19 model. He has also previously worked on applications in astrophysics, materials and bloodflow. Derek has been PI for Brunel in two large and recent Horizon 2020 research projects (VECMA on uncertainty quantification, and HiDALGO on global challenge simulations) and he is currently the technical manager of the UK-funded SEAVEA project which develops a VVUQ toolkit for large-scale computing applications (seavea-project.org). His most recent publication (at time of writing) is a software paper about the FabSim3 research automation toolkit, which was selected as a Feature Paper for Computer Physics Communications.

7th February 2024 @ 3pm in WBB207/208 – On the use of protein contact networks (PCNs) in biomedical health applications.

Pierangelo Veltri (University of Catenzaro, Italy)

https://scholar.googleusercontent.com/citations?view_op=medium_photo&user=dfx_8LwAAAAJ&citpid=1

Abstract – We report about protein interaction networks and their applications in different contexts, such as the study among protein structures and functionalities as well as to predict biological mechanisms. We use PCNs and interaction models to detect pathways relating genes and comorbidities, and the relations among genes expression mutation with age and sex in dysmetabolic patients. Using PCNs framework, we detected interesting pathways related to type2 diabetes whose genes change their expression with age. We also found many pathways related to insulin regulation and brain activities, which can be used to develop specific therapies [1]. Use of PCNs also found application in our studies for SARS-CoV-2 Spike protein, to scrutinize the topological properties and their association with molecular structure. We reported about understanding of sequence mutations on protein structure and function, particularly within the context of various SARS-CoV-2 strains [2]. Further, the study accentuates the necessity for innovative models and algorithms to establish a connection between sequence evolution and functional alterations, especially pertaining to the SARS-CoV-2 Spike protein. An in-depth topological analysis has been conducted to investigate the interplay between protein stability, functionality, and sequence mutations, offering profound insights into the structural evolution of virus variants, including XBB variant of SARS-CoV-2 [3].

[1] Guzzi, Pietro Hiram, et al. “Analysis of age-dependent gene-expression in human tissues for studying diabetes comorbidities.” Scientific Reports 13.1 (2023): 10372.

[2] Guzzi, Pietro Hiram, et al. “Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks.” Scientific Reports 13.1 (2023): 2837.

[3] Guzzi, Pietro Hiram, et al. “Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks.” Scientific Reports 13.1 (2023): 2837.

31st January @ 12 noon in WBB207/208 and online –

Artificial Intelligence and copyright in the creative industries

Hayleigh Bosher

Bio: Dr Hayleigh Bosher is a Reader in Intellectual Property Law and Associate Dean (Professional Development and Graduate Outcomes) at Brunel University London where she also runs the Brunel Law School IP Pro Bono Service and is a member of the Brunel Centre for Artificial Intelligence: Social and Digital Innovation. Hayleigh’s research focuses on copyright and related laws and policy in the creative industries, particularly in context of music, social media, and artificial intelligence. She is the author of Copyright in the Music Industry and the producer and host of the podcast Whose Song is it Anyway? She is well-recognised in the field of intellectual property law, in particular copyright law and the creative industries, and has attained an international reputation in the field of music copyright in particular. Her work in this area has been cited extensively in academic, practitioner and policy outputs and she is regularly interviewed by national and international media outlets, including the BBC, ITV, Sky News, Channel 5 News and The Guardian, The Times and The Wall Street Journal. Hayleigh has been invited to give evidence in Parliament on several occasions including the Science, Innovation and Technology Select Committee for their Inquiry on the Governance of Artificial Intelligence and the House of Lords Communications and Digital Committee Inquiry on Large Language Models and IP.

Related links if useful:

https://www.brunel.ac.uk/people/hayleigh-bosher

https://twitter.com/BosherHayleigh

https://www.linkedin.com/in/hayleigh-bosher/

https://podcasters.spotify.com/pod/show/whosesongisitanyway

Abstract: In this talk, Hayleigh will discuss the copyright implications of AI development, addressing three key questions:

Do you need permission to train AI on copyright protected data?
Should AI-generated content be protected by copyright?
Can AI-generated content infringe copyright holders’ rights?

In doing so, she will explain what the current law says – or doesn’t say – about these questions, what the current policy landscape is with regards to changes in the law, and what AI developers are doing to mitigate risks in the meantime.

17th January 2024 @ 3pm in WBB207/208 –

Life course agent-based simulation for covid-19 considering comorbidities

Katie Mintram (Brunel)

Addressing Bias in Primary Healthcare Data and Reshaping Clinical Trials with High-Fidelity Synthetic Data

Barbara Draghi (MHRA)

29th November 2023 @ 4pm in WBB207/208

Introduction to Seminar Series: Exploiting Simulation, AI & Knowledge to Improve Healthcare

Allan Tucker and Anastasia Anagnostou who lead the Intelligent Data Analysis and Modelling and Simulation groups respectively, will introduce this new seminar series that aims to explore the synergies between simulation modelling and machine-learning on health data. Key objectives of the seminar series will include:

To explore how simulations and synthetic data can augment machine learning to improve performance, explainability and reduce bias in healthcare models
To investigate the privacy of computational methods including how synthetic data, simulations and federated learning can protect against privacy attacks
To survey how important the trust of AI-system users is in critical decision-making

10th May 2023 @ 3pm in WBB207/208 –3 Talks:

The Impact of Bias on Drift Detection in AI Health Software, Asal Khoshravan Azar

Despite the potential of AI in healthcare decision-making, there are also risks to the public for different reasons. Bias is one risk: any data unfairness present in the training set, such as the under-representation of certain minority groups, will be reflected by the model resulting in inaccurate predictions. Data drift is another concern: models trained on obsolete data will perform poorly on newly available data. Approaches to analysing bias and data drift independently are already available in the literature, allowing researchers to develop inclusive models or models that are up-to-date. However, the two issues can interact with each other. For instance, drifts within under-represented subgroups might be masked when assessing a model on the whole population. To ensure the deployment of a trustworthy model, we propose that it is crucial to evaluate its performance both on the overall population and across under-represented cohorts. In this talk, we explore a methodology to investigate the presence of drift that may only be evident in sub-populations in two protected attributes, i.e., ethnicity and gender. We use the BayesBoost technique to capture under-represented individuals and to boost these cases by inferring cases from a Bayesian network. Lastly, we evaluate the capability of this technique to handle some cases of drift detection across different sub-populations.

Privacy Assessment of Synthetic Patient Data, Ferdoos Hossein Nezhad

In this talk, we quantify the privacy gain of synthetic patient data drawn from two generative models, MST
and PrivBayes, which is based on real anonymized primary care patient data. This evaluation is implemented for two types of inference attacks, namely membership and attribute inference attacks using a new toolbox, TAPAS. The aim is to quantitatively evaluate the privacy gain of each attack where these two differentially private generators and different threat models are used with a focus on black-box knowledge. The evaluation that was carried out in this paper demonstrates that vulnerabilities of
synthetic patient data depend on the different attack scenarios, threat models, and algorithms used to generate the synthetic patient data. It was shown empirically that although the synthetic patient data achieved high privacy gain in most attack scenarios, it does not behave uniformly against adversarial attacks, and some records and outliers remain vulnerable depending on the attack scenario. Moreover, it was shown that the PrivBayes generator is the more robust generator in comparison to MST in terms of the privacy-preservation of synthetic data.

Creating Synthetic Geospatial Patient Data to Mimic Real Data Whilst Preserving Privacy, Dima Alattal

Synthetic Individual-Level Geospatial Data (SILGSD) offers a number of advantages in Spatial Epidemiology when compared to census data or surveys conducted on regional or global levels. The use of SILGSD will bring a new dimension to the study of the patterns and causes of diseases in a particular
location while minimizing the risk of patient identity disclosure, especially for rare conditions. Additionally, SILGSD will help in building and monitoring more stable machine learning models in
local areas, improving the quality and effectiveness of healthcare services. Finally, SILGSD will be highly effective in controlling the spread and causes of diseases by studying disease movement across areas through the commuting patterns of those affected. To our knowledge, no real or synthetic health records data containing geographic locations for patients has been published for research purposes so far. Therefore, in this talk we provide SILGSD by allocating synthetic patients to general practices (healthcare providers) in the UK using the prevalence of health conditions in each practice. The assigned general practice locations are used as physical geolocations for the patients because in reality the patients are registered in the nearest practice to their homes. To generate high fidelity data we allocate synthetic primary care patients from the Clinical Practice Research Datalink (CPRD) instead of real patients to England’s general practices (GPs), using the publicly available GP health conditions statistics from the Quality and Outcomes Framework (QOF) without using more precise data. Further, the allocation relies on the similarities between patients in different locations without using the real location for the patients. We demonstrate that the Allocation Data is able to accurately mimic the real health conditions distribution in the general practices and also preserves the underlying distribution of the original primary care patients data from CPRD (Gold Standard).

Pavithra Rajendran – Information Extraction from Genomic Reports

Bio: Dr. Pavithra Rajendran currently works at DRIVE unit within Great Ormond Street Hospital NHS Foundation Trust as the NLP Technical Lead. Previously, worked at KPMG UK as NLP Data Scientist, using both traditional and deep learning-based NLP techniques for various client projects in both public and private sectors, from Proof-of-Concept to Production (Healthcare, Oil and Gas, Travel, Finance etc.). She received her PhD degree in Computer Science from University of Liverpool and her research interests includes Natural Language Processing and applications of NLP within the healthcare domain and Argument Mining.

Abstract: Genomic testing has the potential to deliver precision medicine by providing a greater understanding on diagnosis and treatments that can benefit patients. Often, the genomic test reports are written by clinical scientists and stored as unstructured data in the form of PDFs, which makes it a challenge for secondary usage (e.g. research) and clinical decision making. In this talk, I will explain about the end to end pipeline developed for extracting relevant information from genomic reports with a brief overview on the NLP techniques used.

Dr. Tahmina Zebin is a Senior Lecturer in Computer Science. Prior to this post, she was a Lecturer in the School of Computing Sciences at the University of East Anglia and has led an On-device and Explainable AI Research Group. She completed her PhD studies in 2017, and an MSc in Digital Image and Signal Processing in 2012 from the University of Manchester. Following her PhD, Tahmina was employed as a postdoctoral research associate on the EPSRC funded project Wearable Clinic: Self, Help and Care at the University of Manchester and was a Research Fellow in Health Innovation Ecosystem at the University of Westminster. Her research expertise includes Advanced Video and Signal Processing, Explainable and Inclusive AI, Human Activity Recognition, Risk Prediction Modelling from Longitudinal Electronic Health Records using various statistical, machine learning , and deep learning techniques.

Matloob Khushi – Seeing is believing: pattern recognition in bioinformatics, imaging, text & finance

In this talk, Matloob will summarise his major research themes: i) DNA binding proteins called transcription factors (TFs) regulate various cell functions and play a key role in the development and progression of genetic diseases such as cancer. His work in bioinformatics and microscopy imaging identified DNA locations & TFs that involve in various diseases including breast cancer. His bioinformatics expertise has won him £355K from NERC, UKRI this year. ii) Social media platforms such as Twitter and Reddit have become valuable sources of information for public health surveillance applications. Matloob will summarise some of his recent works in natural language processing. iii) Financial markets are very dynamic and (over)react to all types of economic news making it difficult to predict the prices of financial instruments. Matloob will share some of his algorithms for dealing with the issue.

How to Deal with Privacy, Bias & Drift in AI Models of National Healthcare Data

Allan Tucker, Ylenia Rotalinti, Barbara Draghi, Awad Alyousef

Abstract: Primary healthcare care data offers huge value in modelling disease and illness. However, this data holds extremely private information about individuals and privacy concerns continue to limit the wide-spread use of such data, both by public research institutions and by the private health-tech sector. One possible solution is the use of synthetic data which mimics the underlying correlational structure and distributions of real data but avoids many of the privacy concerns. Brunel University London has been working in a long-term collaboration with the Medicine and Health Regulatory Authority in the UK to construct a high-fidelity synthetic data generator using probabilistic models with complex underlying latent variable structures. This work has led to multiple releases of synthetic data on a number of diseases including covid and cardiovascular disease, which are available for state-of-the-art AI research. Two major issues that have arisen from our synthetic data work are issues with bias, even when working with comprehensive national data, and with concept drift where subsequent batches of data move away from current models and what impact this may have on regulation.

In this talk the Synthetic Data Team within the IDA group will discuss some of the key results of the collaboration: on our experiences of synthetic data generation, on the detection of bias and how to better represent the true underlying UK population, and how to handle concept drift when building models of healthcare data that evolves over time.

dunXai: DO-U-Net for Explainable (Multi-Label) Image Classification Applications to Biomedical Images.

Abstract. Artificial Intelligence (AI) and Machine Learning (ML) are becoming some of the most dominant tools in scientific research. Despite this, little is often understood about the complex decisions taken by the models in predicting their results. This disproportionately affects biomedical and healthcare research where explainability of AI is one of the requirements for its wide adoption. To help answer the question of what the network is looking at when the labels do not correspond to the presence of objects in the image but the context in which they are found, we propose a novel framework for Explainable AI that combines and simultaneously analyses Class Activation and Segmentation Maps for thousands of images. We apply our approach to two distinct, complex examples of real-world biomedical research, and demonstrate how it can be used to provide a global and concise numerical measurement of how distinct classes of objects affect the final classification. We also show how this can be used to inform model selection, architecture design and aid traditional domain researchers in interpreting the model results.

Two talks on Machine Learning in Industry:

Dr Zhihuo Wang: Human-decision control in the mining industry through distributional-value-based deep reinforcement learning

Abstract: Algorithms touched real-world complex situations and problems, of which the reinforcement learning provided a mechanism interacting with environments fully and achieved some useful exploration among high-dimensional inputs of a variety of domains. It is easy to find from the history that the reinforcement learning algorithms were developing from model-based to model-free, and from simple to complex environments. The essential hypothesis and theory of distributions lies naturally in the big-data used for the aims of prediction or optimisation were largely ignored, that is, the distributional behaviour in big data which determines the reasoning and informing process of a modelling was ignored in most RL researches. However, the prediction emerging from big data and experience interaction should be a distribution of values instead of a single value. Our research has adopted the deep reinforcement learning to solve the decision-making control problem in mining industry base on the research of human decision-making experience and RL agent exploration; furthermore, the distributional perspective works to optimise the decision-making control.

Biography: Zhihuo Wang is a Research Fellow with the Department of Computer Science, Brunel University, starting from December 2020. He currently works on EU Horizon 2020 DIG_IT project focusing on big data analysis for sustainable digital mine of the future. He received his PhD degree in Mechanical Engineering from the University of Hull in 2018, with research experience covering advanced control theory and engineering application in induction motor system. Then He joined Cranfield University as research fellow in battery state estimation and worked on one EU H2020 project (ALISE project) and two Innovate UK projects (LIS:FAB project and ICP project), covering the researches on battery state (SOC, useful life cycles, thermal variation) estimation/control strategies and real-time deployment.

Dr Yiming Wang: A Transfer Learning-based Method for Defect Detection in Additive Manufacturing

Abstract: Transfer learning is a one of the key techniques of deep learning. Modern deep learning methods require a large-scale dataset for training. Transfer learning methods take a deep model pre-trained on a large-scale dataset and transfer its knowledge to a small-scale target dataset for customized task. This is the fastest and easiest way for users to build their own deep models. However, transfer learning requires that the customized dataset should be similar with the original pretraining dataset. Furthermore, the architecture of the pre-trained model should be exactly the same as the model to be trained on the target dataset. Therefore, it is difficult to apply the transfer learning methods to a specifically designed deep learning model whose architecture is well fitted with the target dataset but very different from the pre-trained model. In this talk, we will present how to transfer knowledge between two models with different architectures. We will 1) introduce the background of transfer learning, 2) provide a brief introduction of the real-world application of defect detection in additive manufacturing, 3) present a depth-connected method using a novel specifically designed deep learning architecture, as well as a corresponding region proposal method, for defect detection, 4) show how to transfer knowledge from a model pre-trained for common computer vision task to the target model for defect detection task.

Biography: Dr Yiming Wang is a Research Fellow at Computer Science Department at Brunel University London. He currently works on EU Horizon 2020 Integradde project which focuses on developing novel computer vision methods for assisted quality assessment of additive manufacturing. He completed his Doctor of Philosophy (PhD), in the field of computer vision, from University of Portsmouth, UK, in 2018. His research interests include machine learning, computer vision and human machine interaction.

17th November @ 4pm – Two Post Docs Talk About Many Objective Optimisation:

Yani Xue – Many-objective optimization and its application in forced migration

Abstract: Many-objective optimization is core to both artificial intelligence and data analytics as real-world problems commonly involve multiple objectives which are required to be optimized simultaneously. A large number of evolutionary algorithms have been developed to search for a set of Pareto optimal solutions for many-objective optimization problems. It is very rare that a many-objective evolutionary algorithm performs well in terms of both effectiveness and efficiency, two key evaluation criteria. Some algorithms may struggle to guide the solutions towards the Pareto front, e.g., Pareto-based algorithms, while other algorithms may have difficulty in diversifying the solutions evenly over the front on certain problems, e.g., decomposition-based algorithms. Furthermore, some effective algorithms may become very computationally expensive as the number of objectives increases, e.g., indicator-based algorithms.

In this talk, we will investigate how to make evolutionary algorithms perform well in terms of effectiveness and efficiency in many-objective optimization. We will show 1) how to improve the effectiveness of conventional Pareto-based algorithms, 2) how to further enhance the effectiveness of leading many-objective evolutionary algorithms in general, 3) how to strike a balance between effectiveness and efficiency of evolutionary algorithms when solving many-objective optimization problems, and 4) how to apply evolutionary algorithms to a real-world case.

Biography: Dr Yani Xue is currently a research fellow in multi-objective optimization at Brunel University London, UK. She received the Ph.D. degree in computer science from Brunel University London, UK, in 2021. Her main research interests include evolutionary computation, multi-objective optimization, search-based software engineering, and engineering applications.

Futra Fadzil – Phase-wise, Constrained, Many Objective Optimization Problem in Mining Industry

Abstract: The digital mine process exhibits three distinct features, that is: (1) phase-wise evolution leading to dynamic changes of objective functions as the phase changes, (2) physical/resource constraints leading to feasibility challenges to the optimization problems, and (3) many-objective characteristics, including the design process, energy gains, platform, operational profile, mine management and finally the life-cycle costs. Traditionally, the employed optimization techniques include a constrained programming approach, ant colony optimization, fuzzy logic, evolutionary algorithms, and combinatorial techniques. However, all existing results have been limited to certain particular parts of the life-cycle of the digital mine process, and there has been very little effort devoted to the optimization on the full life-cycle of the mining process consisting of several phases over time that includes energy, waste, emissions, ventilation, routes, and cooling. The proposed concept of phase-wise, constrained, many objective optimization (PWCMOO) smart scheduling tool stems from mine production practice, is completely new and opens a new branch of research for both computational intelligence and engineering design communities, which demands novel approaches if we are to advance significantly beyond state of the art. Working across the system modelling, optimization and evolutionary computation, we are set to develop: 1) a novel algorithm for phase-wise optimization problem (POP) that caters for the switching phenomena between the phases; 2) a novel computational framework for phase-based evolutionary computation that balances between convergence and diversity among different phases.

In this talk, we are formulating three optimization problems in the mining activities: the open pit stability problem, the
truck movement scheduling problem, and the water discharge monitoring problem. All these phases have their independent objectives and constraints. Later, we integrate it as an optimal scheduling problem in the intelligence layer of a human-centred Internet of a Thing platform for the sustainable digital mine of a future.

Biography: Futra Fadzil is a Research Fellow at Computer Science Department. He currently works on Horizon 2020 DIG IT project which focuses on many objective optimization and smart scheduling for the sustainable digital mine of the future. He received his PhD degree in Electrical Engineering and Electronic from Brunel University London in 2020. Over the years he has gained experience in the power industry and participated in numerous research projects in the following areas: electrical & instrumentation; operation and maintenance; project management; industrial data acquisition; real-time data analytics; system modelling, system optimization, machine learning and industrial internet of thing (IIoT)

20th October @ 4pm – Two Doctoral Researchers Talks on Novel Deep Learning Architectures and Applications:

Ben Evans – Species Identification with Friends: Camera Traps have become an increasingly popular method for ecologists to survey species occupancies, and behaviour dynamics without the need for physical capture. The abundance of the technology had led to an increasing dataset size which in turn requires further human support in labelling each of the images taken. We present a semi-automatic method of identifying species quickly based on prior labelled datasets held within an organisation but also querying partner organisations databases without having to share the picture itself. This is to minimise risk where protected species and protected metadata such as location stored within the image cannot be shared. Further we identify a method for quickly identifying differing species with an aim of running on commodity hardware for running in the field.

Biraja Ghoshal – When the Machine Does Not Know: Measuring Uncertainty in Deep Learning Models: Deep learning has recently been achieving state-of-the-art performance almost every field of science on a variety of pattern-recognition tasks, most notably visual classification problems – such as medical image analysis. However, in spite of these successes, these methods focus exclusively on improving the accuracy of point predictions without delivering certainty estimates or suffer from over or under confidence, i.e. are badly calibrated. So, there is a need to express the ambiguity of an image and unreliable predictions in the same way as a clinician may express uncertainty and ask for experts’ help. Different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in deep learning have been proposed. In this talk I will explore an overview of uncertainty estimation in neural networks, recent advances in the field – a new theoretical framework casting dropweights training in neural networks as approximate Bayesian inference, different measures of uncertainty, approaches for the calibration of neural networks, highlights current challenges, and identifies potential research opportunities.

23rd June 2021 @ 2pm – Dr James Westland Cain, Grass Valley

DeepFakes – What are they and why should I care?

Abstract: This talk will introduce DeepFakes – videos that have been created by AI, and shared on the Internet as if they were recordings of real events. It will describe their history and the technical process of creating a DeepFake. It will then explore the ways some of the Internet companies are trying to distinguish DeepFakes from real Content. The presenter will argue that all detection approaches are doomed to failure. As our very democracy is being eroded by lack of trust and Fake News, the talk will conclude with a discussion of what the media industry is doing about establishing truth and veracity in the age of Alternate Facts.

Bio: Dr James Westland Cain is Chief Software Architect at Grass Valley, where he develops innovative software to support collaborative workflows in News and Sports Television Production. Responsible for software architecture across the whole of Grass Valley, James is leading the transition to developing microservice based applications. His work research interests include file systems innovation and browser based video production. He was granted a PhD in Advanced Software Engineering by Reading University and is a Visiting Research Fellow at Brunel University. He has nearly twenty international patents granted, has published dozens of refereed academic papers and regularly speaks at conferences.

9th June Panagiotis Papapetrou, Stockholm University, “Interpretable feature learning and classification: from time series counterfactuals to temporal abstractions in medical records

Abstract: The first part of the talk will tackle the issue of interpretability and explainability of opaque machine learning models, with focus on time series classification. Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. This talk will formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, the objective is to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. Moreover, it will be shown that the problem is NP-hard. Two instantiations of the problem will be presented. The classifier under investigation will be the random shapelet forest classifier. Moreover, two algorithmic solutions for the two problem instantiations will be presented along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. The second part of the talk will focus on temporal predictive models and methods for learning from sparse Electronic Health Records. The main application area is the detection of adverse drug events by exploiting temporal features and applying different levels of abstraction, without compromising predictive performance in terms of AUC.

Bio: Panos Papapetrou is Professor at the Department of Computer and Systems Sciences and the Data Science Group at Stockholm University, Sweden and Adjunct Professor at the Department of Computer Science at Aalto University, Finland. He is also a Board member of the Swedish Artificial Intelligence Society (SAIS). His research interests focus on algorithmic data mining on large and complex data. Specifically he is interested in the following topics: time series classification, interpretable and explainable machine learning, searching and mining large and complex sequences, learning from electronic health records, and reinforcement learning for healthcare.

12th May 2021 – Stefano Vacca, Alkemy S.p.A (Milan, Italy):

Listening to social media through Natural Language Processing and Geo Intelligence

Abstract: Social Data Intelligence is a particular form of data analysis that is focused on social media data. On the web and especially on social networks, vast amounts of different kinds of data are produced. People use Twitter, Instagram, Tik Tok, Fortnight, Twitch, and write down their impressions, opinions, and feelings about political events, social phenomenon, and famous people. This seminar aims to show, through practical use cases, how it is possible to use innovative data mining techniques to extract hidden information from social media to measure a specific phenomenon to help management make decisions around specific trends (and discover new opportunities) or how to use these techniques for research purposes.

Biography: Stefano Vacca is a Data Scientist at Alkemy S.p.A (Milan, Italy) since 2018. He holds a bachelor’s degree in Economics and Finance, defending a thesis focused on Smart Cities’ European legislation. In 2020 he obtained a master’s degree in Data Science, Business Analytics and Innovation at the University of Cagliari (Italy), bringing a thesis entitled “Hawkes processes for the identification of premonitory events in the cryptocurrency market”. In 2019 he worked on a project for Enlightenment.ai company (Lisbon, Portugal) to construct computer vision algorithms to recognise digits starting from counter meter images. Stefano regularly gives seminars for both academia and industry and he has published several research papers in the field of data mining for cryptocurrencies.

24th Feb Pedro Pereira Rodrigues, (CINTESIS & University of Porto),“Estimating long-term cancer-related survival from multiple prophylactic strategies: a temporal Bayesian network simulation”

Abstract: Estimating comparative effectiveness of multiple prophylactic strategies used in clinical practice to prevent cancer-related mortality in patients with specific gene mutations is not something we can cleanly design in clinical trial studies. Future routinely collected electronic health records might present new ways of estimating such comparative effectiveness from real-world data, but if the target population in study is too specific, collecting data from a large enough sample to enable comparison of multiple strategies might prove to be impossible. To empower clinical decisions, aiming to develop a personalized risk management guideline, we have constructed a temporal Bayesian network model to simulate the expected overall mortality in patients who underwent different prevention strategies taking into account the patient’s prognostic parameters and received treatment, allowing the long term survival comparison of 9 multiple prophylactic strategies. Transition probabilities were derived from literature after a critical review of studies published in PubMed, where all risk estimates were converted into yearly estimates by means of conditional probabilities, depending on the original metric published in literature with needed conversions. For each simulated patient, the first temporal node to be activated was identified, with survival being therefore computed for each patient. Overall survival of patients from each subgroup x policy combination was then plotted as Kaplan-Meier curves. We illustrate our approach with a specific real-world problem in breast-cancer survival analysis, simulating 2.5M patients across 144 subgroup cohorts and 9 different policies, during a 40-year follow up – the illustrated example was a result of joint work with Jelena Maksimenko (Riga Stradins University, Latvia) and Maria Joao Cardoso (Champalimaud Foundation, Portugal).

Bio: Pedro Pereira Rodrigues holds a PhD (2010) in Computer Science from the Faculty of Sciences of the University of Porto, and is currently an assistant professor in the Department of Community Medicine, Information and Decision in Health at the Faculty of Medicine of the University of Porto (FMUP), where he teaches since 2008. He is the current director of the Doctoral Program in Health Data Science at FMUP, of which he was the main promoter, and coordinator of the thematic line on Data and Decision Sciences and Technology at Information (with more than 100 researchers, including 40 PhDs), from the Center for Technology and Health Services Research (CINTESIS), a research unit with more than 500 researchers, which includes the research group on Artificial Intelligence in Healthcare, from which is an integrated member. Having participated in several national and international projects, he was the coordinator at CINTESIS of the NanoSTIMA project (financed by NORTE2020 with more than € 1.1m just for the research unit), leading the line of research dedicated to Data Analysis and Decision in Health. International research and collaboration has led him to publish more than 100 complete articles in indexed journals and conference proceedings and to carry out more than 100 scientific communications, as well as to coordinate the scientific review team of several international events in the fields of data science and medical informatics. He is frequently invited as a speaker in international health data science panels and was the rapporteur for the Digital Health and Medical Technologies subtopic of the Portuguese Health Research Agenda. Having reviewed more than 300 articles for major journals and conferences, he also regularly serves as an expert project reviewer for National Science Foundations. He was the coordinator of more than 50 editions of course units in support systems for clinical decision, medical informatics, data science, biostatistics and research methodology, supervises 9 PhD students (in digital health and clinical and helath services research, with 4 alumni already), having participated in more than 50 master’s and doctoral juries. He was director of the postgraduate course in Health Informatics and is a member of the scientific committee of the Master in Medical Informatics at FMUP since 2012, and the Integrated Master in Medicine since 2019.

27th Jan @4pm: Ben Parker, “Intelligent Data Analysis Needs Intelligent Design: Examples from Designing Experiments on Networks”

Abstract: The world is full of data, and certainly intelligent methods to analyse it are vitally needed. However, we need to remember some hard-learned principles from statistics, and consider how the data can be collected to allow us to do more efficient experiments.

This talk will explain how vital a good experimental design is, and consider some applications where the statistical principles of design of experiments (DOE) have been applied, and the benefits of using this methodology. We will talk particularly about some recent of our recent research involving experiments on networks, and talk about the vital importance of including the network structure in our model where it exists. Weshow that by not taking into account network structure, we can design experiments which have very low efficiency and/or produce biased results, and provide some guidelines for performing robust experiments on networked data.

Bio: Ben Parker (ben.parker@brunel.ac.uk) is a Senior Lecturer in Statistics in Brunel University, London, within the Statistics and Data Analysis Group in the School of Mathematics. He has research interests in Design of Experiments, particularly optimal design; statistics of networks, specialising in data communications networks and social networks; statistical inference of queues; computer simulation. Computational statistics, particularly algorithms for design. https://www.brunel.ac.uk/people/ben-parker/researchHe is involved with outreach for mathematics, and appears in the popular mathematics podcast “Maths at” (www.mathsat.co.uk).

14th Oct at 3pm: Lianghao Han Biomechanically Informed Image Registration Framework for Organ Motion and Tissue Deformation Estimations in Medical Image Analysis and Image-Guided Intervention

Abstract: Organ motion and tissue deformation are two big challenges in medical image analysis and image-guided intervention. In this talk, I will introduce a biomechanically informed image registration framework for estimating organ motion and tissue deformation from multimodal medical images, in which biomechanical models are incorporated into image registration and simulation algorithms. I will also introduce several applications in breast cancer detection, lung cancer radiotherapy and prostate cancer biopsy.

Bio: Dr Lianghao Han is a Senior Research Fellow at the Department of Computer Science, Brunel University. He received his PhD degree in Cambridge University and had worked in the Medical Vision Lab in Oxford University and the Centre for Medical Image Computing in UCL. Before joining Brunel, he was a Professor in Tongji University (P.R. China).

Lianghao’s research interests are in Medical Image Analysis for Cancer Detection and Diagnosis (Lung, Breast, Liver and Prostate), Image-Guided Intervention, Biomechanics and Machine Learning.

30th Sept at 3pm: Lorraine Ayad, MARS: improving multiple circular sequence alignment using refined sequences

Abstract: A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary. This is relevant for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program.

Bio: Dr Ayad completed her PhD at King’s College London in 2019 with a thesis titled “Efficient Sequence Comparison via Combining Alignment and Alignment-Free Techniques”. Her research interests lie within string algorithms for sequence analysis and information retrival with applications in computational biology and image processing as well as bioinformatics.

She spent a year at King’s as Lecturer of Computer Science Education and then moved into industry working on next generation sequencing and helping to build pipelines for sequence alignments. She has recently been appointed as Associate Lecturer of Computer Science in Brunel.

She has published several research papers in journals such as Oxford Bioinformatics, BMC Genomics, Pattern Recognition Letters and Oxford Genome Biology and Evolution. She is also guest editor of a special issue of the journal of Theoretical Computer Science and I am currently part of the program committee of the 27th International Symposium on String Processing and Information Retrieval.

3rd June at 3pm: Isabel Sassoon, Applications of Computational Argumentation in Data Driven Decision Support

Abstract: As automated reasoning models, core to decision support, are continuing to become part of day to day life there is a requirement to make decisions such models make explainable, this requirement can be even more challenging to achieve in the face of incomplete or conflicting data as inputs. In this talk I will briefly introduce Computational Argumentation, an AI technique that facilitates reasoning in such situations. I will describe some example applications from my research, explain the link to data science and discuss the potential role argumentation can play in supporting model explanations.

Bio: Isabel Sassoon is a Lecturer in Computer Science at Brunel University and visiting researcher at King’s College London. Before joining Brunel Isabel was Research Associate on the CONSULT (Collaborative Mobile Decision Support for Managing Multiple Morbidities) project. This project developed a collaborative mobile decision-support system to help patients suffering from chronic diseases to self-manage their treatment, by bringing together and reasoning with wellbeing sensor data, clinical guidelines and patient data. Prior to that Isabel was Teaching Fellow in the Department of Informatics in King’s College London, primarily on the Data Science MSc.

Isabel’s research interests are in data-driven automated reasoning, and its transparency and explainability. Her PhD research developed a computational argumentation based system to support the appropriate selection of statistical model given a research objective and available data. Her current research continues to explore how computational argumentation can assist in model explainability and trust.

Prior to joining King’s College London Isabel worked for more than 10 years as a data science consultant in industry, including 8 years in SAS UK. Isabel read Statistics, Operations Research and Economics at Tel Aviv University and received her Ph.D. in Informatics from King’s College London.

15th April at 16:30 : We have 3 talks that will be given virtually at the IDA conference at the end of the month (https://ida2020.org/online-program/):

Toyah Overton: “DO-U-Net for Segmentation and Counting”

Biraja Ghoshal: “Estimating Uncertainty in Deep Learning for Reporting Confidence: An Application on Cell Type Prediction in Testes Based on Proteomics”

Yani Xue: “Angle-based Crowding Degree Estimation for Many-Objective Optimization

Opening the Black Box (2018-2020)

Both the commercial and academic sector are exploring the use of their state-of-the-art algorithms to make important decisions, for example in healthcare. These algorithms exploit a heterogeneous mix of on-body sensor data, clinical test results, socio-economic information, and digitised electronic health records. A major issue is that many of the algorithms on offer are often black box in nature (defined as a system which can be viewed in terms of its inputs and outputs without any knowledge of its internal workings). This is because the algorithms are often extremely complex with many parameters (such as deep learning) and also because the algorithms themselves are now valuable commodities. Not knowing the underlying mechanisms of these black box systems is a problem for two important reasons. Firstly, if the predictive models are not transparent and explainable, we lose the trust of experts such as healthcare practitioners. Secondly, without access to the knowledge of how an algorithm works we cannot truly understand the underlying meaning of the output. These problems need to be addressed if we are to make new insights into data such as disease and health.

This seminar series will include talks from experts from statistics, computer science, psychology, and health / medicine in order to address this issue head-on. The seminar will focus on building a network of experts in state-of-the-art technologies that exploit the huge data resources available, while ensuring that these systems are explainable to domain experts. This will result in systems that not only generate new insights but are also more fully trusted.

19 March 2pm: Symbiosis Centre for Applied Artificial Intelligence

Prof. Ketan Kotecha: Head of Symbiosis Institute of Technology, Head of Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Pune University

Dr Rahee Walambe: Associate Professor and Faculty at Symbiosis Centre of Applied AI. She is also Faculty at Symbiosis Institute of Technology, Dept of ENTC.

Abstract: An exceptional scientific revolution in the field of AI has been instigated recently with the advent of deep learning and high-power graphical processing units. Exploiting such tremendous potential of AI as part of a digital transformation strategy will be a key component in creating intelligent endeavours of future. Artificial intelligence has seen exponential rise in the past few years due to its applicability to various aspects of our daily lives. It has proved to play a significant role in serving enterprises re-imagine and re-design their products and services, increase their revenues, realize business efficiencies, and enrich customer experience. Moreover, AI has been able to find solutions to problems which were not available until very recently. Some noteworthy examples being: voice-based assistants such as SIRI, Alexa, autonomous vehicles such as Google Cars, Autonomous drones for defense applications and applications in medical and healthcare domains. With this backdrop, Symbiosis Centre for Applied Artificial Intelligence (SCAAI) was established in May 2019. The primary aim of this centre is to spearhead the advancement and novel developments in various application areas and will promote the advanced research and education in the field of AI, attracting more corporate as well as students to Symbiosis. Further, it promotes interdisciplinary research to address pertinent issues that could involve a huge spectrum such as finance and banking to healthcare.

The seminar will be focused on the development and research projects being currently underway at SCAAI. We will discuss in brief about our collaborations with various organizations and the projects in identified thrust areas. Possible research funding opportunities between UK and India will also be discussed.

22 January: Nadine Aburumman , Brunel University London.

Being with a Virtual Character: Nonverbal Communication in Virtual Reality

Nadine Aburumman is a lecturer in Computer Science at Brunel University London and a visiting researcher at the Institute of Cognitive Neuroscience in University College London. Her main academic interests are in the area of nonlinear deformable and particle-based simulation for time-critical applications like surgical simulation, physically based animation, virtual characters, industrial prototyping and design, VR/AR/XR, etc. Nadine obtained her PhD in Computer Science from Sapienza University of Rome in 2016 and worked as a postdoctoral research associate at the Institute of Multiscale and Simulation at Friedrich-Alexander-Universität Erlangen-Nürnberg (MSS, Germany), and Institut de Recherche en Informatique de Toulouse (IRIT, France).

Abstract: Real-time face-to-face social conversation involves complex coordination of nonverbal cues such as head movement (nodding), gaze (i.e. eye contact), facial expressions and gestures. In this research, we investigate how coordination in such subtle nonverbal communication can have effects over closeness and trust. There is increased research interest in how and why non-verbal social coordination occurs, and an increased need to generate realistic conversation behaviors in artificial characters for virtual reality. We focus our study on head nodding as social signals, in which the head-mounted display in the VR system allows us to detect the participants nodding during the interaction experiment. In our experiments, participants are represented as virtual humans, and engaged in structured conversations with another programmed virtual character. The conversation consists of 4 trials, alternating turns between the participant and virtual character. The participants evaluate the virtual character’s personality, attractiveness, and rate their feelings of similarity, rapport and trust toward the virtual character. Our data reveals that synchrony at low frequency (0.2 1.1 Hz) nodding has positive coherence between the participant and the virtual character, compared to the situations of non-synchrony.

11 Dec 2019: Arianna Dagliati, Manchester University

Temporal phenotyping for precision medicine

Arianna is a Research Fellow in the Manchester Molecular Pathology Innovation Centre, and the Division of Informatics, Imaging & Data Sciences, University of Manchester. Her background is in Bioengineering and Bioinformatics, with broad experience in applying machine learning approaches to knowledge discovery, predictive modeling, temporal and process mining, and software engineering. Near the Manchester Molecular Pathology Innovation Centre her research is dedicated to the discovery of novel biomarker in autoimmune diseases, based on the integration of clinical and multi-omic data. She develop novel analysis pipelines for Precision Medicine analytical approaches for identifying temporal patterns and electronic phenotypes in longitudinal clinical data, and for their exploitation in clinical decision support. Together a multi-disciplinary team of researchers, developers and clinicians, her research is aimed at understanding how under-used health data can be re-purposed to improve health. Working on different research projects, she combines the informatics technology with statistical models to answer scientific questions using data derived from EHR, cohort studies and public data resources. In the past she collaborated with the Harvard Medical School, Informatics for Integrating Biology and the Bedside team for its first implementation in oncologic care in Europe and with the Department of Biostatistics and Epidemiology at the University of Pennsylvania for the development of novel careflow mining approaches for enabling the recognition of temporal patterns and electronic phenotypes in longitudinal clinical data.

Abstract A key trend in current medical research is a shift from a one-size-fit-all to precision treatment strategies, where the focus is on identifying narrow subgroups of the population who would benefit from a given intervention. Precision medicine greatly benefits from algorithms and accessible tools that clinicians can use to identify such subgroups, and to generate novel inferences about the patient population they are treating. Complexity and variability of patients’ trajectories in response to treatment poses significant challenges in many medical fields, especially in those requiring long-term care, longitudinal analytics methods, their exploitation in the context of clinical decisions, and their translation into clinical practice through accessible tools, represents a potential for enabling precision healthcare.

The seminar will discuss challenges for Precision Medicine and approaches to exploit longitudinal data for subgroup discovery in Rheumatoid Arthritis and to identify trajectories representing different temporal phenotypes in Type 2 Diabetes.

11 Dec 2019: Lucia Sacchi, University of Pavia, Italy

Longitudinal data analytics for clinical decision support

Lucia Sacchi is Associate Professor at the Department of Electrical, Computer and Biomedical Engineering at the University of Pavia, Italy. She’s got a Master Degree in Computer Engineering and a PhD in Bioengineering and Bioinformatics, both taken at the University of Pavia. She was post-doctoral fellow at the University of Pavia, Senior Research Fellow at the Brunel University London (UK), and Assistant Professor at the University of Pavia. Her research interests are related to data mining, with particular focus on temporal data, clinical decision support systems, process mining, and technologies for biomedical data analysis.

She is the Chair of the IMIA working group on Data Mining and Big Data Analytics, vice-chair of the board of the Artificial Intelligence in Medicine (AIME) Society, and member of the board of the Italian Society of Biomedical Informatics (SIBIM). She is part of the Editorial Board of BMC Medical Informatics and Decision Making, Artificial Intelligence in Medicine, Journal of Biomedical Informatics (JBI), and she is Academic Editor for PLOS ONE. She has co-authored more than 90 scientific peer-reviewed publications on international journals and international conferences.

Abstract The increasing availability of time-dependent health-related data, both collected in Hospital Information Systems during clinical practice, and by patients who use wearable monitoring devices, offers some interesting research challenges. Among these, enriching clinical decision support systems with advanced tools for the analysis of longitudinal data is of paramount importance. Such tools can be useful to synthesise the patients’ conditions in between encounters to identify critical situations in advance, or to study temporal trajectories of chronic disease evolution to plan timely targeted interventions. This talk will introduce the problem of the analysis of temporal data coming from different sources, and will describe some methodologies that can be useful to analyse heterogeneous data. Moreover, it will present some examples on how such analysis has been integrated in real-world clinical decision support systems.

6th Nov 2019: In house speakers:

Gabriele Scali: Constraint Satisfaction Problems and Constraint Programming

Leila Yousefi: The Prevalence of Errors in Machine Learning Experiments

photograph of Jaakko Hollmén

4th Oct 2019: Jaakko Hollmén, Stockholm University, Department of Computer and Systems Sciences, Sweden

Diagnostic prediction in neonatal intensive care units

Jaakko Hollmén is a faculty member at Department of Computer and Systems Sciences at Stokcholm University in Sweden (since September 2019). Prior to joining Stokcholm university, he was a faculty member at the Department of Computer Science at Aalto University in Finland. His research interests include theory and practice of machine learning and data mining, in particular in the context of health, medicine and environmental sciences. He has been involved in the organization of many IDA conferences for the past ten years. He is also the secretary of the IDA council.

Abstract: Preterm infants, born before 37 weeks of gestation, are subject to many developmental issues and health problems. Very Low Birth Weight (VLBW) infants, with a birth weight under 1500 g, are the most afflicted in this group. These infants require treatment in the neonatal intensive care unit before they are mature enough for hospital discharge. The neonatal intensive care unit is a data-intensive environment, where multi-channel physiological data is gathered from patients using a number of sensors to construct a comprehensive picture of the patients’ vital signs. We have looked into the problem how to predict neonatal in-hospital mortality and morbidities. We have used time series data collected from Very Low Birth Weight infants treated in the neonatal intensive care unit of Helsinki University Hospital between 1999 and 2013. Our results show that machine learning models based on time series data alone have predictive power comparable with standard medical scores, and combining the two results in improved predictive ability. We have also studied the effect of observer bias on recording vital sign measurements in the neonatal intensive care unit, as well as conducted a retrospective cohort study on trends in the growth of Extremely Low Birth Weight (birth weight under 1000 g) infants during intensive care.

May 15th: John Holmes, University of Pennsylvania

Explainable AI for the (Not-Always-Expert) Clinical Researcher

John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Institute for Biomedical Informatics, Director of the Master’s Program in Biomedical Informatics, and Chair of the Doctoral Program in Epidemiology, all at Penn. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new approaches to mining epidemiologic surveillance data, as well as his efforts at furthering educational initiatives in clinical research. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has collaborated as the informatics lead on an Agency for Healthcare Research and Quality-funded project at Harvard Medical School to establish a scalable distributed research network, and he has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network; he participates in the FDA Sentinel Initiative. Dr. Holmes has served as the evaluator for the PCORNet Obesity Initiative studies, where he was responsible for developing and implementing the evaluation plan and metrics for the initiative. Dr. Holmes is or has been a principal or co-investigator on projects funded by the National Cancer Institute, the National Library of Medicine, and the Agency for Healthcare Research and Quality, and he was the Penn principal Investigator of the NIH-funded Penn Center of Excellence in Prostate Cancer Disparities. Dr. Holmes is engaged with the Botswana-UPenn Partnership, assisting in building informatics education and clinical research capacity in Botswana. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).

Abstract Armed with a well-founded research question, the clinical researcher’s next step is usually to seek out the data that could help answer it, although the researcher can use data to discover a new research question. In both cases, the data will already be available, and so either approach to inquiry can be appropriate and justifiable. However, the next steps- data preparation, analytics, and inference- are often thorny issues that even the most seasoned researcher must address, and sometimes not so easily. Traditional approaches to data preparation, that include such methods as frequency distribution and contingency table analyses to characterize the data are themselves open to considerable investigator bias. In addition, there is considerable tedium resulting from applying these methods- for example, how many contingency tables does it take to identify variable interactions? It is arguable that feature selection and construction are two tasks not to be left only to human interpretation. Yet we don’t see much in the way of novel approaches to “experiencing” data such that new, data-driven insights arise during the data preparation process. The same can be said for analysis, where even state-of-the art statistical methods, informed or driven by pre-formed hypotheses and the results of feature selection processes, sometimes hampers truly novel knowledge discovery. As a result, inferences made from these analyses likewise suffer. However, new approaches to making AI explainable to users, in this case clinical researchers who do not have the time or inclination to develop a deep understanding of how this or that AI algorithm works, are critically important, and their dearth represents a gap that those of us in clinical research informatics need to fill. Yet, the uninitiated shy away from AI for the very lack of explainability. This talk will explore some new methods for making AI explainable, one of which, PennAI, has been developed at the University of Pennsylvania. PennAI will be demonstrated using several sample datasets.

March 13th : Mario Cannataro, Università degli Studi Magna Graecia di Catanzaro

Mario Cannataro is a Full Professor of Computer Engineering and Bioinformatics at University “Magna Graecia” of Catanzaro, Italy. He is the director of the Data Analytics research centre and the chair of the Bioinformatics Laboratory at University “Magna Graecia” of Catanzaro. His current research interests include bioinformatics, medical informatics, data analytics, parallel and distributed computing. He is a Member of the editorial boards of IEEE/ACM Transaction on Computational Biology and Bioinformatics, Briefings in Bioinformatics, High-Throughput, Encyclopaedia of Bioinformatics and Computational Biology, Encyclopaedia of Systems Biology. He was guest editor of several special issues on bioinformatics and he is serving as a program committee member of several conferences. He published three books and more than 200 papers in international journals and conference proceedings. Mario Cannataro is a Senior Member of IEEE, ACM and BITS, and a member of the Board of Directors for ACM SIGBio.

Abstract: Recently, several factors are moving biomedical research towards a (big) data-centred science:(i) the Volume of data in bioinformatics is having an explosion, especially in healthcare and medicine; (ii) new bioinformatics data is created at increasing Velocity due to advances in experimental platform and increased use of IoT (Internet of Things) health monitoring sensors; (iii) increasing Variety and (iv) Variability of data (omics, clinical, administration, sensors, and social data are inherently heterogeneous) that may lead to wrong modelling, integration and interpretation, and finally (v) increasing Value of data in bioinformatics due to costs of infrastructures to produce and analyze data, as well as, value of extracted biomedical knowledge. The emerging of this Big Data trend in Bioinformatics poses new challenges for computer science solutions, regarding the efficient storage, preprocessing, integration and analysis of omics (e.g. genomics, proteomics, and interactomics) and clinical (e.g. laboratory data, bioimages, pharmacology data, social network data, etc.) data, resulting in a main bottleneck of the analysis pipeline. To face those challenges, main trends are: (i) use of high-performance computing in all steps of analysis pipeline, including parallel processing of raw experimental data, parallel analysis of data, and efficient data visualization; (ii) deployment of data analysis pipelines and main biological databases on the Cloud; (iii) use of novel data models that combine structured (e.g. relational data) and unstructured (e.g. text, multimedia, biosignals, bioimages) data, with special focus on graph databases; (iv) development of novel data analytics methods such as Sentiment Analysis, Affective Computing and Graph Analytics, that integrate traditional statistical and data mining analysis; (v) particular attention to issues regarding privacy of patients, as well as permitted ways to use and analyze biomedical data.

After recalling main omics data, the first part of the talk presents some experiences and applications related to the preprocessing and data mining analysis of omics, clinical and social data, conducted at University Magna Graecia of Catanzaro. Some case studies in the oncology (pharmacogenomics data) and paediatrics (sentiment analysis) domains are also presented. With the availability of large datasets, Deep Learning algorithms have proved to lead to state of the art performance in many different problems, as for example in text classification. However, deep models have the drawback of not being human-interpretable, raising various problems related to model’s interpretability. Model interpretability is another important aspect to be considered in order to develop a Clinical Decision Support System (CDSS) that clinicians can trust. In particular, an interpretable CDSS can ensure that: i) clinicians understand the system predictions (in the sense that predictions are required to be consistent with medical knowledge); ii) the decisions will not negatively affect the patient; iii) the decisions are ethical; iv) the system is optimized on complete objectives; and v) the system is accurate and sensible patient data are protected. Therefore there is the need of new strategies for developing explainable AI systems for supporting medical decisions and, in particular, for presenting human-understandable explanations to clinicians and that can also take into account sentiment analysis or, more in general, explainable text classification methodologies. Recently, the deep network architecture called Capsule Networks has gained a lot of interest, also showing intrinsic properties that can potentially improve model explainability in image recognition. However, to the best of our knowledge, if Capsule Networks might improve explainablity for text classification problems is a point that needs to be further investigated. The second part of this talk will focus on a brief overview of proposed explainable models and then will present some discussion related to how Capsule Networks can be adapted to sentiment classification problems in order to improve explainability.

January 16th: Norman Fenton, Queen Mary, University of London

- Norman Fenton is Professor of Risk Information Management at Queen Mary London University and is also a Director of Agena, a company that specialises in risk management for critical systems. Norman is a mathematician by training whose current research focuses on critical decision-making and, in particular, on quantifying uncertainty using a ‘smart data’ that combines data with expert judgment. Applications include law and forensics (Norman has been an expert witness in major criminal and civil cases), health, security, software reliability, transport safety and reliability, finance, and football prediction. Norman has been PI in grants totalling over £10million. He currently leads an EPRSC Digital Health Technologies Project (PAMBAYESIAN) and a Leverhulme Trust grant (CAUSAL-DYNAMICS). In 2014 Norman was awarded a prestigious European Research Council Advanced Grant (BAYES-KNOWLEDGE) in which the ‘smart data’ approach evolved. Since June 2011 he has led an international consortium (Bayes and the Law) of statisticians, lawyers and forensic scientists working to improve the use of statistics in court. In 2016 he led a prestigious 6-month Programme on Probability and Statistics in Forensic Science at the Isaac Newton Institute for Mathematical Sciences, University of Cambridge where he was also a Simons Fellow. He was appointed as a Fellow of The Turing Institute in 2018. In March 2015 Norman presented award-winning BBC documentary Climate Change by Numbers.
  
  Abstract: Misunderstandings about risk, statistics and probability often lead to flawed decision-making in many critical areas such as medicine, finance, law, defence, and transport. The ‘big data’ revolution was intended to at least partly address these concerns by removing reliance on subjective judgments. However, even where (relevant) big data are available there are fundamental limitations to what can be achieved through pure machine learning techniques. This talk will explain the successes and challenges in using causal probabilistic models of risk – based on a technique called Bayesian networks – in providing powerful decision-support and accurate predictions by a ‘smart data’ approach. This combines minimal data with expert judgment. The talk will provide examples in chronic diseases, forensics, terrorist threat analysis, and even sports betting.
- December 12th 2018: Pearse Keane, Moorfields Eye Hospital “Artificial Intelligence in Ophthalmology“.
Pearse A. Keane, MD, FRCOphth, is a consultant ophthalmologist at Moorfields Eye Hospital, London and an NIHR Clinician Scientist, based at the Institute of Ophthalmology, University College London (UCL). Dr Keane specialises in applied ophthalmic research, with a particular interest in retinal imaging and new technologies. In April 2015, he was ranked no. 4 on a worldwide ranking of ophthalmologists under 40, published in “the Ophthalmologist” journal (https://theophthalmologist.com/the-power-list-2015/). In 2016, he initiated a formal collaboration between Moorfields Eye Hospital and Google DeepMind, with the aim of applying machine learning to automated diagnosis of optical coherence tomography (OCT) images. In August 2018, the first results of this collaboration were published in the journal, Nature Medicine.

The Moorfields-DeepMind Collaboration – Reinventing the Eye Examination

Ophthalmology is among the most technology-driven of the all the medical specialties, with treatments utilizing high-spec medical lasers and advanced microsurgical techniques, and diagnostics involving ultra-high resolution imaging. Ophthalmology is also at the forefront of many trailblazing research areas in healthcare, such as stem cell therapy, gene therapy, and – most recently – artificial intelligence. In July 2016, Moorfields announced a formal collaboration with the world’s leading artificial intelligence company, DeepMind. This collaboration involves the sharing of >1,000,000 anonymised retinal scans with DeepMind to allow for the automated diagnosis of diseases such as age-related macular degeneration (AMD) and diabetic retinopathy (DR). In my presentation, I will describe the motivation – and urgent need – to apply deep learning to ophthalmology, the processes required to establish a research collaboration between the NHS and a company like DeepMind, the initial results of our research, and finally, why I believe that ophthalmology could be first branch of medicine to be fundamentally reinvented through the application of artificial intelligence.
- November 21st 2018 : Niels Peek, “Learning Health Systems“, University of Manchester
Niels Peek is Professor of Health Informatics and Strategic Research Domain Director for Digital Health at the University of Manchester. He has a background in Computer Science and Artificial Intelligence, and his research focuses on data-driven methods for health research, healthcare quality improvement, and computerised decision support. From 2013 to 2017 he was the President of the Society for Artificial Intelligence in Medicine (AIME). He is a member of the editorial boards of the Journal of the American Medical Informatics Association and the Artificial Intelligence in Medicine journal. In April 2017, he organised the Informatics for Health 2017 conference in Manchester which was attended by more than 800 people from 30 countries. He also co-chaired the Scientific Programme Committee of MEDINFO-2017, the 16th World Congress on Health and Biomedical Informatics, which was held in Hangzhou, China, in August 2017. In 2018 he was elected to become a fellow of the American Collecege of Medical Informaticians and a fellow of the Alan Turing Institute.

My talk will introduce the concept of “Learning Health Systems” and focus on the role of clinical prediction models within these systems. Building on the distinction between explanatory and predictive models (which is commonly made in statistics and epidemiology but not in computer science) I will review the use of machine learning and statistical modelling in healthcare; discuss the role of model interpretation and transparency in explanatory and predictive models; and discuss the suitability of different analytical methods to facilitate interpretability and transparency
October 17th: Allan Tucker, “Opening the Black Box“, Brunel University London

July 2018: “Temporal Information Extraction from Clinical Narratives”
Natalia Viani, King’s College London Electronic health records represent a great source of valuable information for both patient care and biomedical research. Despite the efforts put into collecting structured data, a lot of information is available only in the form of free-text. For this reason, developing natural language processing (NLP) systems that identify clinically relevant concepts (e.g., symptoms, medication) is essential. Moreover, contextualizing these concepts from the temporal point of view represents an important step.
Over the past years, many NLP systems have been developed to process clinical texts written in English and belonging to specific medical domains (e.g., intensive care unit, oncology). However, research for multiple languages and domains is still limited. Through my PhD years, I applied information extraction techniques to the analysis of medical reports written in Italian, with a focus on the cardiology domain. In particular, I explored different methods for extracting clinical events and their attributes, as well as temporal expressions.
At the moment, I am working on the analysis of mental health records for patients with a diagnosis of schizophrenia, with the aim to automatically identify symptom onset information starting from clinical notes.Dr Viani is a postdoctoral research associate at the Department of Psychological Medicine, NIHR Biomedical Research Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London. She received her PhD in Bioengineering and Bioinformatics from the Department of Electrical, Computer and Biomedical Engineering, University of Pavia, in January 2018. During her PhD, she spent six months as a visiting research scholar in the Natural Language Processing Laboratory at the Computational Health Informatics Program at Boston Children’s Hospital – Harvard Medical School.
Her research interests are natural language processing, clinical and temporal information extraction, and biomedical informatics. I am especially interested in the reconstruction of clinical timelines starting from free-text.
March 2018: “Optimal Low-dimensional Projections for Spectral Clustering”
Nicos Pavlidis, Lancaster University The presentation will discuss the problem of determining the optimal low dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. It can be shown that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation.
December 2016: “The value of evaluation: towards trustworthy machine learning”
Peter Flach, University of Bristol Machine learning, broadly defined as data-driven technology to enhance human decision making, is already in widespread use and will soon be ubiquitous and indispensable in all areas of human endeavour. Data is collected routinely in all areas of significant societal relevance including law, policy, national security, education and healthcare, and machine learning informs decision making by detecting patterns in the data. Achieving transparency, robustness and trustworthiness of these machine learning applications is hence of paramount importance, and evaluation procedures and metrics play a key role in this.In this talk I will review current issues in theory and practice of evaluating predictive machine learning models. Many issues arise from a limited appreciation of the importance of the scale on which metrics are expressed. I will discuss why it is OK to use the arithmetic average for aggregating accuracies achieved over different test sets but not for aggregating F-scores. I will also discuss why it is OK to use logistic scaling to calibrate the scores of a support vector machine but not to calibrate naive Bayes. More generally, I will discuss the need for a dedicated measurement theory for machine learning that would use latent-variable models such as item-response theory
from psychometrics in order to estimate latent skills and capabilities from observable traits.
October 2016: “On Models, Patterns, and Prediction”
Jaakko Hollmén, Aalto University, Helsinki Pattern discovery has been the center of attention of data mining research for a long time, with patterns languages varying from simple to complex, according to the needs of the applications and the format of data. In this talk, I will take a view on pattern mining that combines elements from neighboring areas. More specifically, I will describe our previous research work in the intersection of the three areas: probabilistic modeling, pattern mining and predictive modeling. Clustering in the context of pattern mining will be explored, as well as linguistic summarization patterns. Also, multiresolution pattern mining as well as semantic pattern discovery and pattern visualization will be visited. Time allowing, I will speak about patterns of missing data and its implications on predictive modeling.Jaakko Hollmén is faculty member at the Department of Computer Science at Aalto University in Espoo, Finland. He received his doctoral degree with distinction in 2000. His research interests include data analysis, machine learning and data mining, with applications in health and in environmental informatics. He has chaired several conferences in his areas of interest, including IDA, DS, IEEE Computer-Based Medical Systems. Currently, he is co-chair of the Program Committee of ECML PKDD 2017, which is organized in Skopje, Macdonia during September 19-23, 2017. His publications can be found at: https://users.ics.aalto.fi/jhollmen/Publications/
May 2016: “Beyond Clinical Data Mining: Electronic Phenotyping for Research Cohort Identification”
John Holmes, University of Pennsylvania The availability of ever-increasing amounts of highly heterogeneous clinical data poses both opportunities and challenges for the data scientist and clinical researcher. Electronic medical records are more prevalent than ever, and now we see that other data sources contribute greatly to the clinical research enterprise. These sources provide genetic, image, and environmental data, just to name three. Now, it is possible to investigate the effects of built environment, such as the availability of food markets, sidewalks, and playgrounds, coupled with clinical observations noted in in the process of providing patient care, along with identified genetic variants that could predispose one to diabetes mellitus. Furthermore, these data could be used in a truly integrated sense to manage such patients more effectively than relying solely on the traditional medical record. The opportunity for enhanced clinical research is manifest in this expanding data and information ecosystem. The challenges are more subtly detected, but present nonetheless. Merging these heterogeneous data into an analyzable whole depends on the availability of a robust unique identifier that has yet to be created, at least in the US. As a result, researchers have developed various probabilistic methods of record matching, occasionally at the expense of data privacy and confidentiality. Another challenge is the sheer heterogeneity of the data; it is not easy to understand the clinical context of an image or waveform without their semantic integration with clinical observation data. In addition, there is the problem of ecologic fallacy, which arises from using data that have no real connection to a clinical record in the service of proposing or testing hypotheses. This problem is quite evident when coupling environmental and clinical data: just because there is a well-stocked market with a surfeit of inexpensive, healthy food options in a person’s neighborhood doesn’t mean that that person avails herself of these items. Finally, there is the problem of data quality. Much of the data we use- whether collected by us or obtained from another source- is replete with problems, such as missingness, contradictions, and errors in representation. We will explore in detail the opportunities and challenges posed to informatics and clinical researchers as they are faced with these seemingly endless sources of data. We will also discuss novel approaches to mining these complex, heterogeneous data for the purpose of constructing cohorts for research.John Holmes is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Penn Institute for Biomedical Informatics and is Chair of the Graduate Group in Epidemiology and Biostatistics. Dr. Holmes’ research interests are focused on several areas in medical informatics, including evolutionary computation and machine learning approaches to knowledge discovery in clinical databases (data mining), interoperable information systems infrastructures for epidemiologic surveillance, regulatory science as it applies to health information and information systems, clinical decision support systems, semantic analysis, shared decision making and patient-physician communication, and information systems user behavior. Dr. Holmes is a principal or co-investigator on projects funded by the National Cancer Institute, the Patient-Centered Outcomes Research Institute, the National Library of Medicine, and the Agency for Healthcare Research and Quality, and he is the principal investigator of the NIH-funded Penn Center of Excellence in Prostate Cancer Disparities. Dr. Holmes is engaged with the Botswana-UPenn Partnership, assisting in building informatics education and clinical research capacity in Botswana. He leads the evaluation of the National Obesity Observational Studies of the Patient-Centered Clinical Research Network. Dr. Holmes is an elected Fellow of the American College of Medical Informatics and the American College of Epidemiology