I am a senior researcher at Inria (National research center for digital science) and the leader of the Inria – Inserm (National research center for health) PreMeDiCaL (Precision Medicine by Data integration and Causal Learning) team and member of Idesp joint with University of Montpellier.  Previously, I was a Professor of Statistics at Ecole Polytechnique, responsible of a Master in Data Science for Business with HEC Business school and  a visiting researcher at Google Brain. I have been working for over 10 years with clinicians to personalize care  through the development of machine learning solution.

News: see also PreMeDICaL’s news
– I have received a prize « Young Researcher » by the French Academy of Science! More info + video of the coupole and the republican guard!
– My PhD student Margaux Zaffran has won the L’Oreal Unesco Young Talent Prize —my second student to receive this honor.

Project highlights: 
Traumatrix: AI-powered decision support tools embedded in ambulances to optimize patient triage, care, and resource planning for trauma cases. See Video, slides, statitical challenges, team retreat, more info.  Consortium Capgemini Invent, Traumabase, EHESS, CNRS, Ecole Polytechnique, Inria.
ICUBAM Development of an app for Bed Allocation Monitoring  during COVID-19 Fork. slides.

Research talks:
Causal inferences: Transporting treatment effects from clinical trials to broader populations, NIH Biostat. National Cancer Institute (slides)
Video at Online Causal Inference Seminar: « Leveraging incomplete RCT and observational data »
Missing values: EPFL 2023, AutoML2022 ICML slides, video. A  missing values tour:  2022 slides les diableret PhD school, 2019 useR slides, 2019 video (start at 30′)

Researchers/Interns/Phd/Postdoc/Engineers positions. Contact me julie.josse[at]inria.fr

My research focuses on missing data methods (EM algorithms, imputation, supervised learning), causal inference (treatment effect estimation, integrating RCTs with observational data, survival analysis, policy learning), and uncertainty quantification. I also explore multi-modal data analysis, visualization through dimensionality reduction techniques (PCA, correspondence analysis, questionnaire analysis), and low-rank matrix estimation. My primary applications lie in the bio-sciences and health domains. Short CVDetailed CV

Research

Selected publications:

  • Risk ratio, odds ratio, risk difference… Which causal measure is easier to generalize? (pdfIn revision. 2023.
  • Causal inference for combining randomized trials & observational data: a review (pdf). Statistical Science. 2020. 
  • Doubly robust treatment effect estimation with incomplete confounders. (pdfAnnals Of Applied Statistics. 2020.
  • What’s a good imputation to predict with missing values? (pdfNeurips 2021. Spotlight.
  •  On the consistency of supervised learning with missing values. (pdf) Statistical paper. 2018-2024.
  • Bootstrap regularization for low-rank matrix estimation. (pdfJournal of Machine Learning research. 2016.
  • missMDA a package to handle missing values. (pdf) Journal of Statistical Software. 2015

Current researchers collaborators: Judith AbecassisGosia Bogdan, Claire Boyer,  Antoine ChambazYanniv Romano, Erwan Scornet, Bertrand Thirion, Gael VaroquauxShu Yang,
Current collaboration with companies EDF, Elixir Health,  Quinten Health, Theremia, Sanofi,  etc.
Current collaboration in health:   APHP, CHU Nancy/Montpellier, Gustave Roussy, Traumabase, etc.
Associate Editor: Foundations and Trends® in Machine Learning. Past:  Journal of Computational & Graphical Statistics.  Journal of Statistical Software. (7 years). AC for Neurips, ICLR.

A summary of my research contributions up to 2023  can be found here.
An overview of my research up to 2016 can be found in my Habilitation.  (slides)

Publications:
YearAuthors Title link
2024Voinot, C. et al. Causal Survival Analysis: practical recommandation
2024Khellaf, R., Bellet, A. & Josse, J.Federated Causal Inference: Multi-Centric ATE Estimation beyond Meta-Analysis.
Submitted.
pdf
2024Boughdiri, A., Josse, J. & Scornet, E. Quantifying Treatment Effects: Estimating Risk Ratios in Causal Inference.
Submitted.
pdf
2024Gauss et al.Pilot deployment of a machine-learning enhanced prediction of need for hemorrhage resuscitation after trauma–the ShockMatrix pilot study.
BMC Medical Informatics and Decision Making.
pdf
2024Näf, J., & Josse, J. What is a Good Imputation Under MAR Missingness?
Submitted.
pdf
slides
2024Sussman, H., Chambaz, A. & Josse, J, Aegerter, P. Wargon, M., Bacry E.Probabilistic Prediction of Arrivals and Hospitalizations in Emergency Departments in Ile-de-France.
International Journal of Medical Informatics.
pdf
2024Zhao, P., Gatulle, N, James, A., Josse, J. & Chambaz, A. Learning, Evaluating and Analysising An Individualized Decision Support
Rule with Application to Early Intervention in Intensive Care Unit.
2024Stempfle, L., James, A., Josse, J., Gauss, T. & Johansson, F. Expert Study on Interpretable Machine Learning Models with Missing Data.
Machine Learning for Health (ML4H) symposium.
pdf
2023Bénard, C., Naf, J. & Josse, J.MMD-based Variable Importance for Distributional Random Forest.
AISTAT2024.

pdf
2023 Sussman, H., Chambaz, A. & Josse, J. Adaptive Conformal, an R package for adaptive conformal inference.
Computo.
pdf
2023Zhao, P., Chambaz, A., Josse, J., Yang, S. Positivity-free Policy Learning with Observational Data.
AISTAT2024.
pdf
2023Bénard, C, Josse, J.Variable importance for causal forests: breaking down
the heterogeneity of treatment effects
Submitted.
pdf
2023Colnet, B, Josse, J., Varoquaux, G., Scornet, E. Risk ratio, odds ratio, risk difference... Which causal measure is easier to generalize?
Submitted.
pdf
2023Zaffran, Josse, J. M., Dieuleveut A., Romano, Y. Conformal prediction with missing values.
ICML2023.
pdf
poster
2023Zhao, P., Josse, J. & Yang, S. (2023). Efficient and robust transfer learning of optimal individualized
treatment regimes with right-censored survival data.
Submitted.
pdf
2022-24Colnet, B, Josse, J., Varoquaux, G., Scornet, E. Reweighting the RCT for generalization: finite sample analysis and variable selection.
JRSSA.
pdf
2022Blet et al.Association between in-ICU red blood cells transfusion and one-year mortality in ICU survivors.
Critical Care.
pdf
2022Colnet, B, Josse, J., Varoquaux, G., Scornet, E. Generalizing a causal effect: sensitivity analysis and missing covariates.
Journal of Causal Inference.
pdf
slides
2022Gauss et al. Is Early Norepinephrine Associated with 24-hour Mortality of Blunt Trauma Patients in Haemorrhagic Shock? An International Cohort Study.
Jama Network.
pdf
2022Garaix et al.Decision-making tools for healthcare structures in times of pandemic.
Anaesthesia Critical Care & Pain Medicine.
pdf
2022Zaffran et al. Adaptive conformal prediction for time series.
ICML2022.
pdf
slides
video
2022Perez-Lebel et al. Benchmarking missing-values approaches for predictive models on health databases.
GigaScience.
pdf
2021Le Morvan, J. Josse, E. Scornet. & G. VaroquauxWhat’s a good imputation to predict with missing
values?
Neurips 2021. (Spotlight).
pdf
video
slides
2021Sportisse, A. et al. Model-based Clustering with Missing Not At Random Data.
Statistics and Computing.
pdf
2021Mayer, I., Josse, J & TraumbaseTransporting treatment effects with incomplete attributes.
Biometrical Journal
pdf
2020-2023Colnet, B et al.Causal inference methods for combining randomized trials and observational studies: a review.
Statistical Science.
pdf
2020Le Morvan, J. Josse, M., Moreaux, T, E. Scornet. & G. VaroquauxNeumiss networks: differential programming for supervised learning with missing values. Neurips2020. (Oral) pdf
slides
video
slides
video
2020Sbidian et al. Hydroxychloroquine with or without azithromycin and in-hospital mortality or discharge in patients hospitalized for COVID-19 infection: a cohort study of 4,642 in-patients in France.
Preprint.
pdf
2020Consortium ICUBAMICU Bed Availability Monitoring and analysis in the Grand Est région of France during the COVID-19 epidemic.
Statistiques et Société.
pdf
slides
2020A. Sportisse, C. Boyer,
and Josse, J.
Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data.
Neurips2020.
pdf
slides
video
code
2020A. Sportisse, C. Boyer, A. Dieuleveut, J. Josse.Debiasing Stochastic Gradient Descent to handle missing values. Neurips2020. pdf
slides
2020J.D. Moyer et al. Trauma reloaded: Trauma registry in the era of data science. Anaesthesia Critical Care & Pain Medicine. pdf
2020Muzellec, B., Josse, J. Boyer, C. & Cuturi, M.
Missing Data Imputation using Optimal Transport.
ICML2020.
pdf
slides
videos
code
2019Josse, J., Mayer, I, & Vert, J.P.MissDeepCausal: causal inference from incomplete data using deep latent variable models.
Preprint.
pdf
2020Le Morvan, M., N. Prost, J. Josse, E. Scornet. & G. VaroquauxLinear predictor on linearly-generated data with missing values: non consistency and solutions.
AISTAT2020.
pdf
slides
2020Descloux, P. , Boyer, C. Josse, J. Sportisse, A. Sardy, S. Robust Lasso-Zero for sparse corruption and model selection with missing covariates.
Scandinavian Journal of Statistics.
pdf
2022Mayer, I, Sportisse, A., Josse, J., Vialaneix, N., Tierney, N. R-miss-tastic: a unified platform for missing values methods and workflows. R journal.
pdf
2019-20Mayer, I, Josse, J., Wager, S., Sverdr, E., Moyer, J.D. and Gauss, T. Doubly robust treatment effect estimation with incomplete confounders.
Annals Of Applied Statistics.
pdf
code
slides
videos
2019-21M. Bogdan, W. Jiang, J. Josse, B. Miasojedow and V. Rockova.Adaptive Bayesian SLOPE – High dimensional Model Selection with Missing Values.
Journal of Computational and Graphical Statistics.
pdf
slides
2019-24Josse, J., Prost, N., Scornet, E. & Varoquaux, G. On the consistency of supervised learning with missing values.
Statistical paper.
pdf
slidescode
slides
2019G. Robin, O. Klopp, J. Josse, E. Moulines, and R. Tibshirani Main effects and interactions in mixed and incomplete data frames.
Journal of the American Statistical Association.
pdf
Package
2019Hamada, S et al.Effect of Fibrinogen administration on early mortality in traumatic haemorrhagic shock: a propensity score analysis.
Journal of Trauma.
2019Sportisse, A., Boyer, C. and Josse, J.Low-rank estimation with missing non at random data.
Statistics and Computing.
pdf
code
2018Josse, J., Husson, F. Robin, G. and Balasubramanian. N.Imputation of mixed data with multilevel SVD.
Journal of Computational and Graphical Statistics.
pdf
slides
2018Robin, G, Sardy, S., Moulines, E. and Josse, J. Low-rank model with covariates for count data
with missing values.
Journal of Multivariate Analysis.
pdf
Package
code

2018Jiang, W., Lavielle, M. Josse, J. and T. Gauss.Logistic Regression with Missing Covariates -- Parameter Estimation, Model Selection and Prediction within a Joint-Modeling Framework.
CSDA.
pdf
slides
Package, code
2018 G. Robin, Hoi To Wai, J. Josse, O. Klopp and E. MoulinesLow-rank interactions and sparse additive effects model for large data frames.
NeurIPS 2018.
2018Josse, J. and Reiter, J.Introduction to the Special Section on Missing Data.
Statistical Sciences.
pdf
2018Seijo-Pardo, B., Alonso-Betanzos, A., P. Bennett, K. Bol\'on-Canedo, Josse, J., Saeed, M., Guyon, I. Feature selection in the presence of missing data.
Neurocomputing, ESANN.
2017-2018Mozharovskyi, P., Husson, F. and Josse, J.Nonparametric imputation by data depth.
Journal of the American Statistical Association.
pdf
slides
code
2017Holmes, S and Josse, J.50 years of data-sciences, discussion.
Journal of Computational and Graphical Statistics.
pdf
2017Bollmann, S., Cook, Di. Dumas, J., Fox, J., Josse, J., Keyes, O. Strobl, C., Turner, H. and Debelak, R.A First Survey on the Diversity of the R Community.
R journal.
pdf
slides
2017G. Celeux, J. Jewson, J. Josse, J.M. Marin and C. P. Robert.Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig.

pdf
2017Foulley, JL, Celeux, G and Josse, J.Empirical Bayes approaches to PageRank type
algorithms for rating scientific journals.
Technical report.
pdf
slides
2016Sobczyk, P, Bogdan, M. and Josse, J.PCA using penalized semi-integrated likelihood.
Journal of Computational and Graphical Statistics.
pdf
2016Fithian, W. and Josse, J.Multiple Correspondence Analysis & the Multilogit Bilinear Model.
Journal of Multivariate Analysis.
pdf
slides
2016Husson, F., Josse, J. and Saporta, G.Jan de Leeuw and the French school of data analysis.
Journal of Statistical Software.
pdf
2016-2017Josse, J., Sardy, S. and Wager, S.denoiseR: a package for low rank matrix estimation.
Preprint.
pdf
Package
2016Groenen, P. and Josse, J.
Multinomial Multiple Correspondence Analysis.
Preprint.
pdf
2016Fujii, H., Josse, J., Tanioka, M., Miyachi, Y. Husson, F., and Ono, M.Regulatory T cells in melanoma revisited by a computational clustering of FOXP3+ T cell subpopulations.
Journal of Immunology.
pdf
2015Audigier, V., Husson, F. and Josse, J.MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis.
Statistics and Computing.
pdf
slides
2015-2016Josse, J and Wager, S. Bootstrap-Based Regularization for Low-Rank Matrix Estimation.
Journal of Machine Learning research.
pdf
slides
2015Josse, J. and Sardy, S.Adaptive Shrinkage of singular values.
Statistics and Computing.
pdf
2015Josse, J and Husson, F.
missMDA a package to handle missing values in and with multivariate data analysis methods.
Journal of Statistical Software.
pdf
2015 Audigier, V., Husson, F. and Josse, J.Multiple Imputation with Bayesian PCA. 
Journal of Statistical Computation and Simulation.
pdf
2015-2016Josse, J. and Holmes, S.Measuring multivariate association.
Statistics Survey.
pdf
2014Audigier, V., Husson, F. and Josse, J.A principal components method to impute mixed data. 
Advances in Data analysis and Classification. 
pdf
slides
2014Josse, J., Wager, S. and Husson, F.

Confidence areas for fixed-effects PCA. 
Journal of Computational and Graphical Statistics.
pdf
slides
2014Dray, S and Josse, J.Principal component analysis with missing values: a comparative survey of methods.
Plant Ecology. 
pdf
2014Josse, J.,  van Eeuwijk, F., Piepho, H-P and Denis, J.B.Another look at Bayesian analysis of AMMI models for genotype-environment data. 
Journal of Agricultural, Biological, and Environmental Statistics.
pdf
2013Verbanck, M., Josse, J. and Husson, F.Regularized PCA to denoise and visualise data. 
Statistics and Computing.  
pdf
2013Josse, J., Timmerman, M.E. and Kiers, H.A.L.
Missing values in multi-level simultaneous component analysis.
Chemometrics and Intelligent Laboratory Systems.
pdf
2013Husson, F. and Josse, J.Handling missing values in Multiple Factor Analysis.
Food Quality and Preferences.
pdf
2013Josse, J and Husson, F.Handling missing values in exploratory multivariate data analysis methods.
Journal de la SFdS. Paper written for the best Ph.D doctoral thesis prize delivered by the French Statistical Society.
pdf
2012Josse, J., Chavent, M., Liquet, B. and Husson, F.
Regularized Iterative Multiple Correspondence Analysis.
Journal of Classification.
pdf
2011Josse, J and Husson, F.Selecting the number of components in PCA using cross-validation approximations.
Computational Statistics and Data Analysis.
pdf
2011Josse, J., Husson, F. and Pagès, J.Multiple imputation in PCA.
Advances in data analysis and classification.
pdf
2010Josse, J., Husson, F. and Pagès, J.Principal component methods - hierarchical clustering - partitional clustering: why would we need to choose for visualizing data?
Technical report.
pdf
2009Josse, J., Husson, F. and Pagès, J.Analyse en Composantes Principales.
Journal de la SFdS.
pdf
2008Josse, J., Husson, F. and Pagès, J.Testing the significance of the RV coefficient.
Computational Statistics and Data Analysis.
pdf
2008Lê S., Josse, J. and Husson, F.FactoMineR: an R package for multivariate analysis.
Journal of Statistical Software.
pdf

 

Software

I am an active member of the R software community and was elected to the R Foundation for Statistical Computing.

R Package Development

  • FactoMineR: Exploratory data analysis (PCA, MCA, multi-table analysis, etc.).
  • missMDA: Imputation (matrix completion) for continuous/categorical data and PCA with missing values.
  • denoiseR: Low-rank matrix estimation using regularized SVD and bootstrap.

My students have also developed packages linked to our work:

  • misaem: Logistic regression with missing values.
  •  mimi: Generalized low-rank models for mixed/incomplete data.
  • lori: Contingency tables with missing values and covariates.
  • AdaptiveConformal: Adaptive conformal prediction for time series.

For causal inference with missing values, see grf and pipelines comparing estimators (IPW, AIPW, etc.).

Software Projects

  • R-mis-static: [paper] A static website providing resources on missing data (lectures, workflows, tutorials). Contributions welcome!
  • Causal Inference Taskview: Organizes R packages related to causal inference.

ICUBAM Development

I contributed to ICUBAM, an open-source tool for real-time visualization of ICU bed availability. Launched during the COVID-19 crisis, it aids ICU teams in managing patient flows, anticipating bed needs, and organizing transfers. Deployed in 130 ICUs across 40 départements, ICUBAM supports over 2,000 beds. Slides, Models, Paper, GitHub.

Community Engagement

  • Associate Editor, Journal of Statistical Software (2011–2017).
  • Founding member of RForwards, promoting diversity in the R community.
  • Member of the R Foundation Conference Committee; contributed to implementing the Code of Conduct.
  • Co-founder of the « French R Board » supporting Les Rencontres R.

Help R by supporting with donation or through the R consortium.

Teaching

My teaching activities span diverse topics, including statistics, machine learning, R programming, and data analysis, across various levels (bachelor, master’s, and PhD) and disciplines (mathematics, engineering, social sciences, agronomy, geography, and health). I have delivered courses on topics such as causal inference, (un)supervised learning, missing values, and experimental design at institutions like École Polytechnique, EHESS, Agrocampus Ouest, and Stanford. I’ve also developed hands-on labs, data challenges, and training programs for companies emphasizing practical applications of statistical methods. Beyond teaching, I have coordinated master’s programs, supervised student internships, and served on departmental and teaching committees. Tutorials.

Bio

Julie Josse is a senior researcher at Inria (France’s national research center in digital science), where she leads the PreMeDICaL team (Precision Medicine by Data Integration and Causal Learning) in collaboration with Inserm (the national research center in health). Her expertise spans missing data, causal inference, and machine learning techniques for healthcare applications. She specializes in developing statistical and computational tools to tackle challenges in personalized medicine, with a focus on integrating multi-source clinical data to improve decision-making. Julie’s journey began in the statistics department of an Agronomy school (Agrocampus Ouest), where she was trained in the « French data analysis school » and worked closely with interdisciplinary researchers, sparking her interest in transversal studies. She completed her PhD in 2010, which was awarded the French Statistical Society’s Best PhD in Applied Statistics. In 2013, she received a prestigious Marie Curie European Union grant to expand her research potential, spending 18 months at Stanford University. She joined Inria in 2020 after serving as a Professor of statistics at École Polytechnique, where she also led the Master’s program in Data Science for Business in partnership with HEC Paris and being a visiting researcher at Google Brain Paris.  Julie’s contributions include more than 80 published articles, three books in applied statistics, and open-source software tools like the R packages FactoMineR, missMDA, and denoiseR. She is deeply committed to reproducible research and contributes to the R community as a member of the R Foundation and RForwards, promoting diversity and inclusion. Driven by a passion for translational research, Julie’s mission is to advance methodological innovation and deliver impactful applications in bio-sciences and health.

Perso:  I had the privilege of growing up in Africa and French Polynesia experiences that shaped my curiosity and outlook on life. Later, I moved to Brittany, a stunning region of France, before discovering the vibrancy of Paris and now enjoying life in  the sunny south of France. Beyond my passion for statistics, I love traveling—an adventure that began on horseback in my youth—and exploring the world’s beauty. I’m captivated by nature and science, avidly following resources like Science Friday and admiring initiatives like Wildlife Photographer of the Year. I also have a strong interest in humanitarian issues and aspire to apply my skills more directly to such causes in the future.

Outreach: Interview in Academie des technologies (French, English) – Interview in MontpellierInterview in medium.

Misc

Links:
SFdS French Statistical Society – Interested in data sciences? Join-us!!
Some historical references on what was the French school of data analysis.

Others projects:
Distributed computation with hospital data (with Balasubramanian Narasimhan)

Conferences organization head:
Leveraging Observational Data with Machine Learning 2021. 
Artemiss workshop on missing values at ICML 2020.
MissData conference on missing values and matrix completion, June 2015.
Correspondence analysis related methods CARME 2011.
The R conference useR! 2009.