Projects


Decision support tool for the management of severe trauma: Traumamatrix.

In collaboration with Jean-Pierre Nadal, the Traumabase group, APHP (Public Assistance – Hospitals of Paris) especially with Tobias Gauss, Sophie Hamada, Jean-Denis Moyer, passionate students and Capgemini Invent.

Official website of the project.

Short slides presentation.Video of presentation of the project and collaboration with Capgemini Invent -Medias: Podcast Pharmaradiorevue de presse –  article BioWorldMedTech

Context:
Major trauma is defined as any injury that endangers the life or the functional integrity of a person. The Global Burden of Disease working group of the WHO has recently shown that major trauma in its various manifestations, from road traffic accidents, interpersonal violence, self-harm to falls, remains a public health challenge and major source of mortality and handicap around the world. Hopefully, it has also been shown that management of major trauma based on standardized and protocol based care improves prognosis of patients especially for the two main causes of death in major trauma i.e., hemorrhage and traumatic brain injury.  The classic pathway of a traumatized patient takes place in several stages: from the site of the accident where the patient is taken care of by the ambulance to the transfer to an intensive care unit (ICU) for immediate interventions and finally to the comprehensive care at the hospital. To be effective, patient management protocols require adjustments to the individual patient and clinical context on one hand and to the organizational context and trauma system on the other hand.
However, evidence shows that patient management even in mature trauma systems often exceeds acceptable time frames, and despite existing guidelines deviations from protocol-based care are often observed. These deviations lead to a high variability in care and are associated with bad outcome such as inadequate hemorrhage control or delayed transfusion. Two main factors explain these observations. First, decision-making in trauma care is particularly demanding, because it requires rapid and complex decisions under time pressure in a very dynamic and multi-player environment characterized by high levels of uncertainty and stress. Second, being a complex and multiplayer process, trauma care is affected by fragmentation. Fragmentation is often the result of loss or deformation of information. This disruptive influence prevents providers to engage with each other and commit to the care process.In order to respond to this challenge, our program has set the ambitious goal to develop a trauma decision support tool, the TraumaMatrix. The program aims to provide an integrative decision support and information management solution to clinicians for the first 24 hours of major trauma management. This program is divided into three steps.
Based on a detailed and high quality trauma database, Step 1 consists in developing the mathematical tools and models to predict trauma specific outcomes and decisions. This step raises considerable scientific and methodological challenges.
Step 2 will use these methods to apply them to develop in close cooperation with trauma care experts the decision support tool and develop a user friendly and ergonomic interface to be used by clinicians.
Step 3 will further develop the tool and interface and test in real-time its impact on clinician decision making and patient outcome.

Hypothesis: The global program TraumaMatrix stands for the hypothesis that an integrative, interactive decision support tool relying on advanced machine learning based on detailed and heterogenous clinical data can considerably improve patient care and survival in major trauma.

Objectives: The objective of the global project is to develop an integrative solution for trauma management during the first 24 hours, the Trauma Matrix. Trauma Matrix will be an adaptive information management platform providing ergonomic, real-time decision-support to a broad range of clinicians. Trauma Matrix will make use of advanced statistical tools and machine learning algorithms and articulate these with existing clinical recommendations in order to enhance clinician-driven decision-making. The platform will streamline the care process to make it patient-centered and facilitate information sharing among all professionals involved (dispatchers, nurses, anesthetists, radiologists, surgeons, blood bank specialists, etc). Such a tool is not intended to become a substitute to human-decision making but accompany clinicians and professionals to create a synergy.

Originality:  Firstly, the proposal relies on an unlimited access to a unique database: the Traumabase (www.traumabase.eu). With the objective of evaluating and improving the care of trauma patients, 20 French Trauma centers have decided to collaborate to collect detailed, high quality clinical data from the scene of the accident to discharge from the hospital. The resulting database, the Traumabase has prospectively gathered more than 20000 trauma admissions data, and new cases are permanently recruited. The granularity of the collected data makes this observatory unique in Europe. The present consortium takes strategic advantage of an unrestricted access to this database to propose an innovative response to the public health challenge of major trauma.

Secondly, to the best of our knowledge, such a trauma information platform currently does not exist. To develop and design an interactive, real-time, probabilistic decision-support and information management platform constitutes a major conceptual and scientific innovation. No proof of concept study exists that evaluates this approach on a large scale for complex medical decisions such as trauma care.Thirdly, handling trauma patients requires complex and multiplayer strategies and the medical community recognizes the need to adopt and develop new methods to eliminate preventable deaths and disabilities. Thus, the community is willing to make use of a large amount of data for diagnosis, decision-support and treatment.

Lastly, from the statistical point of view, the proposal will develop innovative methods to tackle the important scientific challenge of handling highly heterogeneous data, with a large number of missing data. Indeed, despite the high quality of the Traumabase, since data collection is carried out by data technicians, there are many missing values that occur for different reasons (impossibility to make the measurement for technical issues or because of the patient’s state, no time to record the measure, etc.). Current data analysis tools and predictive models cannot be applied with restrictions. To develop innovative methods allowing exploitation of missing data, heterogenous coding and complex structure is an important scientific contribution. Any development in this field will be useful and applicable to a large array of scientific sectors.

The project provides thus a unique opportunity for trans-disciplinary research and collaboration bringing together mathematical, methodological, technological, cognitive and medical expertise to design innovative methodological solutions to respond to complex challenges and improve patient care.

Current Works: We are actively working on step 1.  The subject feeds many statistical research problems. I am particularly interested in problematics of causal inference with missing values and performing exploratory analysis and predictive models with missing values with many missing values (with different coding: NA for Not Applicable, Imp for impossible, NR for Not Recorded, NM for Not Made..) as well as both continuous and categorical data.  We have start step 2 and 3. 

Research papers can be found here. Below, I put some papers associated to the project and presentations given at the SFAR (French Society of Anesthesia & Intensive Care Medecine) of somes projects/papers and internship done with interns or polytechnique students.

We need support: This project is exciting and the benefits to save lives are significant but to implement it, we need support. Data collection is already ensured and maintained by the ARS (Regional Health Agency). We received the support of PEPS/AMIES and DataIA for postdocs/interns.  The project is run by a team of dedicated people and we are also fortunate to have  students who help us and participate in this project on their spare time. Since January 2019, Capgemini, as part of a phylantropic action, has been providing us with human resources, which is crucial for this project. But the project is very ambitious and it is important to have the means to finance phd theses, internships, the creation of the interface, etc. so do not hesitate to contact us for if you want to contribute and help us.

Distributed matrix completion for medical databases

This started as a joint work with Geneviève Robin (CNRS Researcher), François Husson (Professor at Agrocampus Ouest) and Balasubramanian Narasimhan (Senior Researcher at Stanford University) and Anqi Fu (Phd Student at Stanford). Personalized medical care relies on comparing new patients profiles to existing medical records, in order to predict patients treatment response or risk of disease based on their individual characteristics, and adapt medical decisions accordingly. The chances of finding profiles similar to new patients, and therefore of providing them better treatment, increase with the number of individuals in the database. For this reason, gathering the information contained in the databases of several hospitals promises better care for every patient. However, there are technical and social barriers to the aggregation of medical data. The size of combined databases often makes computations and storage intractable, while institutions are usually reluctant to share their data due to privacy concerns and proprietary attitudes. Both obstacles can be overcome by turning to distributed computations, which consists in leaving the data on sites and distributing the calculations, so that hospitals only share some intermediate results instead of the raw data. This could solve the privacy problem and reduce the cost of calculations by splitting one large problem into several smaller ones. The general project is described in Narasimhan et. al. (2017). As it is often the case, the medical databases are incomplete. One aim of the project is to impute the data of one hospital using the data of the other hospitals. This could also be an incentive to encourage the hospitals to participate in the project and to share their summaries of their data. This project is continued with Claire Boyer.

ICUBAM: ICU Bed Availability Monitoring

One of the major challenges during the health crisis was the availability of resuscitation beds equipped with ventilators. To deal with this problem, I created with colleagues from Inria and a team of computer scientists, clinicians and researchers, ICUBAM as a operational tool for resuscitators to monitor and visualize bed availability in real time. Resuscitators directly provided data on available beds as well as demographic data on mortality, entries and transfers on their mobile phone (in less than 10 seconds) and got an updated visualization on the map of France.

The project described in an article, in an interview and in slides (Slides application, Slides models), is the result of a personal initiative of a resuscitator (A. Kimmoun) from the Grand Est region severely affected by the crisis.
In less than two weeks, all intensive care units in this region were overwhelmed and beds were created spontaneously. Official information systems were unable to provide reliable and up-to-date information on bed availability and this information was crucial for efficient allocation of patients and resources.  Due to my collaborations with a network of resuscitators, I organized a meeting on March 22, 2020 with colleagues Gabriel Dulac Arnold, Olivier Teboule and A. Kimmoun and we worked 7 days a week for several weeks on the project building an inter-disciplinary team (Laurent Bonnasse-Gahot EHESS • Maxime Dénès Inria,  Sertan Girgin Google Research • François Husson CNRS, IRMAR, Valentin Iovene Inria, François Landes Université Paris-Saclay, Jean-Pierre Nadal CNRS & EHESS, Romain Primet Inria, Frederico Quintao Google Research, Pierre Guillaume Raverdy Inria, Vincent Rouvreau Inria, Roman Yurchak). ICUBAM was deployed on March 25 with the agreement of the ARS (regional health agency) EST and was used by 40 departments, 130 resuscitation services and covered more than 2,000 resuscitation beds.

The success of ICUBAM can be explained on the one hand by the quality of the data collected (directly at the bedside) which allows a real-time inventory and a modelling of the evolution of the epidemic to inform patients. health authorities and practitioners and on the other hand by a team with complementary skills working with caregivers and institutions.

Missing Values: more ressources on Rmistatic

The problematic of missing values is ubiquitous in data analysis.The naive workaround which consists in deleting observations with missing entries is not an alternative in high dimension as it will lead to the deletion of almost all data and huge biais. The methods available to handle missing values depend on the aim of the analysis, the pattern of missing values and the mechanism that generates missing values. Rubin in 1976 defined a widely used nomenclature for missing values mechanisms: missing completely at random (MCAR) where the missingness is independent of the data, missing at random (MAR) where the probability of being missing depends only on observed values and missing not at random (MNAR) when the probability of missingness then depends on the unobserved values. Large part of the literature focuses on MCAR and MAR.

Contribution on MNAR data (identifiability and estimation):

Many statistical methods have been developed to handle missing values (Little and Rubin, 2019; van Buuren, 2018) in an inferential framework, i.e. when the aim is to estimate parameters and their variance from incomplete data. One popular approach to handle missing values is imputation, which consists in replacing the missing values by plausible values to get a completed data that can be analyzed by any methods. Mean imputation is the worst thing that can be done in an inferential framework as it distorts joint and marginal distribution. One can either impute according to a joint model or using a fully conditional modeling approach. Powerful methods include imputation by random forest (using misforest package) but also imputation by low rank methods (using missMDA, softimpute packages) and recently using optimal transport.

Contribution on  imputation methods (with  SVD based methods, optimal transport):

  • Muzellec, B., Josse, J. Boyer, C. & Cuturi, M. (2020) Missing Data Imputation using Optimal Transport. ICML2020.
  • Pavlo Mozharovskyi, Julie Josse, François Husson. (2020). Nonparametric imputation by data depth. Journal of the American Statistical Association, 2020. Vol 115(529) 241-253.
  • Husson F., Josse J., Narasimhan B., Robin, G. (2019) Imputation of mixed data with multilevel singular value decomposition. Journal of Computational and Graphical Statistics 28(3).
  • Audigier, V. Husson, F. Josse, J. (2016). A principal components method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5-26.

A single imputation method can be interesting in itself if the aim is to predict as well as possible the missing values (to do matrix completion). Nevertheless, even if we manage to impute by preserving as well as possible the joint and marginal distribution of the data, a single imputation can not reflect the uncertainty associated to the prediction of missing values. To achieve this goal, multiple imputation (MI) (van Buuren, 2018 in the mice package) consists in generating several plausible values for each missing data (to reflect the variance of prediction given observed data and imputation model) leading to different imputed data sets. Then, the analysis is performed on each imputed data sets and results are combined so that the final variance takes into account the supplement variability due to missing values.

Contribution on  multiple imputation methods (based on low rank methods):

  • Audigier, V. Husson, F. Josse, J. (2017). MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis. Statistics and Computing, 27(2), 501-518.
  • Audigier, V., Husson, F. and Josse, J. (2015). Multiple Imputation with Bayesian PCA. Journal of Statistical Computation and Simulation.
  • Josse, J., Husson, F. and Pagès, J. (2011). Multiple imputation in PCA. Advances in data analysis and classification.

An alternative to handle missing values consists in modifying estimation processes so that they can be applied to incomplete data. For example, one can use the EM algorithm to obtain the maximum likelihood estimate (MLE) despite missing values. This is implemented for instance for regression and logistic regression in the R package misaem.

Contribution on  methods to do inference with missing values (logistic regression, variable selection): 

  • M. Bogdan, W. Jiang, J. Josse, B. Miasojedow and V. Rockova. (2021). Adaptive Bayesian SLOPE – High dimensional Model Selection with Missing Values. In revision in JCGS.
  • Jiang, W., Lavielle, M. Josse, J. and T. Gauss. (2019). Logistic Regression with Missing Covariates — Parameter Estimation, Model Selection and Prediction within a Joint-Modeling Framework. CSDA

Contribution on  methods to do exploratory data analysis with missing values; combine estimation and imputation (PCA with missing values):

Finally, for supervised learning with missing values,  where the aim is to predict as well as possible an outcome and not to estimate parameters as accurately as possible,  the solutions are very different. With our group, we have suggested new approaches to tackle this issue. For instance, we show that the solution which consists in imputing the train and the test set with the means of the variables in the train set, even if this is not appropriate for estimation is consistent for prediction. We have also studied  solution to do random forest with missing values, which consists in using the missing incorporated in attributes criterion (implemented in scikitlearn in HistGradientBoosting) and in the R grf package. Finally, we have developed new methods theoretically justified to do neural nets with missing entries.

Contribution on  supervised learning with missing values (SGD, random forest, neural nets): 

  • A. Sportisse, C. Boyer, A. Dieuleveut, J. Josse. Debiasing Stochastic Gradient Descent to handle missing values. Neurips2020.
  • Le Morvan, J. Josse, M., Moreaux, T, E. Scornet. & G. Varoquaux Neumiss networks: differential programming for supervised learning with missing values. Neurips2020. (Oral)
  • Le Morvan, M., N. Prost, J. Josse, E. Scornet. & G. Varoquaux Linear predictor on linearly-generated data with missing values: non consistency and solutions. AISTAT2020.
  • Josse, J., Prost, N., Scornet, E. & Varoquaux, G. On the consistency of supervised learning with missing values. In revision in JMLR.