Sept 2020, After having been a Professor of Statistics at Ecole Polytechnique and a visiting researcher at Google Brain IA Paris, I have a new position of Advanced Researcher at Inria. I  am a scientific collaborator associated to CMAP Polytechnique and teaching at Ecole Polytechnique, in particular causal inference.
News: Workshop 22-23 June. Leveraging observational data with ML.
– Talk at IAS Special Year on Optimization, Statistics & Theoretical Machine Learning: Supervised learning with missing values (pres of neural networks with missing values!) slides, videos
– Talk at Mathematical Methods of Modern Statistics, CIRM: Causal inference with missing values slides, videos
– Talk at useR!2019, a  missing values tour  slides, video available (start at 30′)
– ICUBAM ICU Bed Availability Monitoring and analysis in the Grand Est région of France during the COVID-19 epidemic. Fork ICUBAM.

Interns/Phd/Postdoc positions on causal inference and policy learning for personalized medicine. Contact me. Internship to predict the need of intubation available.
Website of TrauMatrix; Short Video of presentation of the project TrauMatrix  (Decision tool for Anesthesia & Intensive Care) – Podcast pharmaradio
R-mis-static website, a plateforme with missing values ressources, Contribute!
Rforwards dedicated to widen the participation of the minorities in communities.

julie.josse[at]   – Office, 226,  INRIA Montpellier

My main research fields are: missing values, causal inference, selection biais, treatment effect estimation, visualization with dimensionality reduction (PCA, correspondence analysis), multi­-blocks data, low rank matrix estimation, questionnaire analyses; main application in  health for personalized medicine.  Detailed CV


Projects and sustained collaboration:
Causal inference:
Handling severe trauma patients –  personalized medecine, with J.P Nadal and Capgemini
– Causal inference with missing values  (with Stefan Wager) 
– New collaborations: transporting causal effect – combining RCT and observational data (with S. Wang, JP Vert and IJ Dahabreh)
– Covid19: Application for bed allocation monitoring/Effect of hydrochloroquine
Missing values:
– Missing Non At Random data (with Claire Boyer)
– Random Forests, Multilayer Perceptron with missing values (with Erwan Scornet, Gael Varoquaux)
– Variable selection to control the FDR with missing values (with Gosia Bogdan)
– New collaborations: with Jes Frellsen with a grant and J.P Vert at Google.
– Exploratory data analysis/software/Missing values (with F. Husson)
Distributed computation with hospital data (with Balasubramanian Narasimhan)

Students & group’s meeting and missing data and causal inference seminar

Associate Editor: JMLR. Past:  Journal of Computational & Graphical Statistics.  Journal of Statistical Software. (7 years). AC for Neurips2020, ICLR2021.

An overview of my research up to 2016 can be found in my Habilitation.  (slides)

YearAuthors Title link
2020Colnet, B et al.Causal inference methods for combining randomized trials and observational studies: a review. Submittedpdf
2020Le Morvan, J. Josse, M., Moreaux, T, E. Scornet. & G. VaroquauxNeumiss networks: differential programming for supervised learning with missing values. Neurips2020. (Oral) pdf
2020Sbidian et al. Hydroxychloroquine with or without azithromycin and in-hospital mortality or discharge in patients hospitalized for COVID-19 infection: a cohort study of 4,642 in-patients in Francepdf
2020Consortium ICUBAMICU Bed Availability Monitoring and analysis in the Grand Est région of France during the COVID-19 epidemic. pdf
2020A. Sportisse, C. Boyer,
and Josse, J.
Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data.
2020A. Sportisse, C. Boyer, A. Dieuleveut, J. Josse.Debiasing Stochastic Gradient Descent to handle missing values. Neurips2020. pdf
2020J.D. Moyer et al. Trauma reloaded: Trauma registry in the era of data science. Anaesthesia Critical Care & Pain Medicine pdf
2020Muzellec, B., Josse, J. Boyer, C. & Cuturi, M.
Missing Data Imputation using Optimal Transport. ICML2020. pdf
2019Josse, J., Mayer, I, & Vert, J.P.MissDeepCausal: causal inference from incomplete data using deep latent variable modelspdf
2020Le Morvan, M., N. Prost, J. Josse, E. Scornet. & G. VaroquauxLinear predictor on linearly-generated data with missing values: non consistency and solutions. AISTAT2020. pdf
2020Descloux, P. , Boyer, C. Josse, J. Sportisse, A. Sardy, S. Robust Lasso-Zero for sparse corruption and model selection with missing covariates. Submitted
2019Mayer, I, Josse, J., Vialaneix, N., Tierney, N. R-miss-tastic: a unified platform for missing values methods and workfows.
2019-20Mayer, I, Josse, J., Wager, S., Sverdr, E., Moyer, J.D. and Gauss, T. Doubly robust treatment effect estimation with incomplete confounders.
Annals Of Applied Statistics (AOAS)
2019M. Bogdan, W. Jiang, J. Josse, B. Miasojedow and V. Rockova.Adaptive Bayesian SLOPE – High dimensional Model Selection with Missing Values.
In revision in JCGS.
2019Josse, J., Prost, N., Scornet, E. & Varoquaux, G. On the consistency of supervised learning with missing values.
In revision in JMLR.
2019G. Robin, O. Klopp, J. Josse, E. Moulines, and R. Tibshirani Main effects and interactions in mixed and incomplete data frames.
Journal of the American Statistical Association.
2019Hamada, S et al.Effect of Fibrinogen administration on early mortality in traumatic haemorrhagic shock: a propensity score analysis.
Journal of Trauma
2019Sportisse, A., Boyer, C. and Josse, J.Low-rank estimation with missing non at random data.
Statistics and Computing
2018Josse, J., Husson, F. Robin, G. and Balasubramanian. N.Imputation of mixed data with multilevel SVD.
Journal of Computational and Graphical Statistics.
2018Robin, G, Sardy, S., Moulines, E. and Josse, J. Low-rank model with covariates for count data
with missing values.
Journal of Multivariate Analysis.

2018Jiang, W., Lavielle, M. Josse, J. and T. Gauss.Logistic Regression with Missing Covariates -- Parameter Estimation, Model Selection and Prediction within a Joint-Modeling Framework.
Package, code
2018 G. Robin, Hoi To Wai, J. Josse, O. Klopp and E. MoulinesLow-rank interactions and sparse additive effects model for large data frames.
NeurIPS 2018.
2018Josse, J. and Reiter, J.Introduction to the Special Section on Missing
Statistical Sciences
2018Seijo-Pardo, B., Alonso-Betanzos, A., P. Bennett, K. Bol\'on-Canedo, Josse, J., Saeed, M., Guyon, I. Feature selection in the presence of missing data.
Neurocomputing, ESANN.
2017-2018Mozharovskyi, P., Husson, F. and Josse, J.Nonparametric imputation by data depth.
Journal of the American Statistical Association.
2017Holmes, S and Josse, J.50 years of data-sciences, discussion.
Journal of Computational and Graphical Statistics.
2017Bollmann, S., Cook, Di. Dumas, J., Fox, J., Josse, J., Keyes, O. Strobl, C., Turner, H. and Debelak, R.A First Survey on the Diversity of the R Community.
R journal.
2017G. Celeux, J. Jewson, J. Josse, J.M. Marin and C. P. Robert.Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig.

2017Foulley, JL, Celeux, G and Josse, J.Empirical Bayes approaches to PageRank type
algorithms for rating scientific journals.
Technical report.
2016Sobczyk, P, Bogdan, M. and Josse, J.PCA using penalized semi-integrated likelihood.
Journal of Computational and Graphical Statistics JCGS.
2016Fithian, W. and Josse, J.Multiple Correspondence Analysis & the Multilogit Bilinear Model.
Journal of Multivariate Analysis.
2016Husson, F., Josse, J. and Saporta, G.Jan de Leeuw and the French school of data analysis.
Journal of Statistical Software.
2016-2017Josse, J., Sardy, S. and Wager, S.denoiseR: a package for low rank matrix estimation.
Journal of Statistical Software.
2016Groenen, P. and Josse, J.
Multinomial Multiple Correspondence Analysis. pdf
2016Fujii, H., Josse, J., Tanioka, M., Miyachi, Y. Husson, F., and Ono, M.Regulatory T cells in melanoma revisited by a computational clustering of FOXP3+ T cell subpopulations.
Journal of Immunology.
2015Audigier, V., Husson, F. and Josse, J.MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis.
Statistics and Computing.
2015-2016Josse, J and Wager, S. Bootstrap-Based Regularization for Low-Rank Matrix Estimation.
Journal of Machine Learning research.
2015Josse, J. and Sardy, S.Adaptive Shrinkage of singular values.
Statistics and Computing.
2015Josse, J and Husson, F.
missMDA a package to handle missing values in and with multivariate data analysis methods.
Journal of Statistical Software.
2015 Audigier, V., Husson, F. and Josse, J.Multiple Imputation with Bayesian PCA. 
Journal of Statistical Computation and Simulation.
2015-2016Josse, J. and Holmes, S.Measuring multivariate association.
Statistics Survey.
2014Audigier, V., Husson, F. and Josse, J.A principal components method to impute mixed data. 
Advances in Data analysis and Classification. 
2014Josse, J., Wager, S. and Husson, F.

Confidence areas for fixed-effects PCA. 
Journal of Computational and Graphical Statistics.
2014Dray, S and Josse, J.Principal component analysis with missing values: a comparative survey of methods.
Plant Ecology. 
2014Josse, J.,  van Eeuwijk, F., Piepho, H-P and Denis, J.B.Another look at Bayesian analysis of AMMI models for genotype-environment data. 
Journal of Agricultural, Biological, and Environmental Statistics.
2013Verbanck, M., Josse, J. and Husson, F.Regularized PCA to denoise and visualise data. 
Statistics and Computing.  
2013Josse, J., Timmerman, M.E. and Kiers, H.A.L.
Missing values in multi-level simultaneous component analysis.
Chemometrics and Intelligent Laboratory Systems.
2013Husson, F. and Josse, J.Handling missing values in Multiple Factor Analysis.
Food Quality and Preferences.
2013Josse, J and Husson, F.Handling missing values in exploratory multivariate data analysis methods.
Journal de la SFdS. Paper written for the best Ph.D doctoral thesis prize delivered by the French Statistical Society.
2012Josse, J., Chavent, M., Liquet, B. and Husson, F.
Regularized Iterative Multiple Correspondence Analysis.
Journal of Classification.
2011Josse, J and Husson, F.Selecting the number of components in PCA using cross-validation approximations.
Computational Statistics and Data Analysis.
2011Josse, J., Husson, F. and Pagès, J.Multiple imputation in PCA.
Advances in data analysis and classification.
2010Josse, J., Husson, F. and Pagès, J.Principal component methods - hierarchical clustering - partitional clustering: why would we need to choose for visualizing data?
Technical report.
2009Josse, J., Husson, F. and Pagès, J.Analyse en Composantes Principales.
Journal de la SFdS.
2008Josse, J., Husson, F. and Pagès, J.Testing the significance of the RV coefficient.
Computational Statistics and Data Analysis.
2008Lê S., Josse, J. and Husson, F.FactoMineR: an R package for multivariate analysis.
Journal of Statistical Software.


YearTitle EventSlides/Video
2019 Doubly-robust treatment effect estimation with incomplete confoundersSeminar Frejus I. Mayerpdf
2019Decisions trees with missing values, selection and predictionSFDSpdf
2019On the consistency of supervised learning with missing valuesCornell, Statlearn, Google Brainpdf
2019 High dimensional variable selection with missing values with adaptive slopeDagStat, Munichpdf
2019Low-rank estimation and imputation with MNAR data DagStat, Munichpdf
2019Causal inference with missing valuesLyonpdf
2018Distributed multilevel matrix completion for medical dataFrance-Finland Worshop, Washington University Seminarpdf
Stochastic Approximation EM for logistic regression with missing valuesCMStat, London, 17 Decemberpdf
2017Single Imputation with data depthCMStat, London, 17 Decemberpdf
2017 R forwards to widen the participation of under-represented groupsuseR!2017, 7 Julyvideo
2017Empirical Bayes approaches to PageRank type algorithms for rating scienti fic journals French Stat Society, 29 Maypdf
2017Low-rank log-linear models. Vienna, University of economics and business, 16 March
Telecom, Paris, 23 March.
2016 Meetup Machine Learning, Paris, France, 10 February
2016 Imputation with data depth - Pavlo Mozharovskyi French Stat Society, Montpellier, France, May pdf
2016MultiLogit bilinear model & MCA Working group on model based clustering, Paris, France, 22 July
AgroPariTech, Paris, France, 13 June
2015Missing values and principal components methods INRIA, Orsay, France, 12 October
Tesla Motor, Palo Alto, USA, 14 August
2015Multiple imputation for categorical data Conference CARME (correspondence analysis & related methods), Napoli, Italy, 21 September
Journée de Statistiques, Rennes, France, 23 october
20152015 A flexible framework for regularized low rank matrix estimationLos Alamos, USA, 15 February
Rotterdam University , Netherlands, 30 January
Adobe Research , San Jose, USA, 20 January
2015 Visualization with regularized PCA Stanford Biostatistics, Palo Alto, USA, 15 January
2014A flexible framework for regularized low rank matrix estimation Montpellier 2 University, France, 20 October
Paris Descartes University, France, 24 October
Wroclaw University, Poland, 31 October

past talks:
MultiLogit bilinear model & MCA
Bootstrap approach for low rank estimation
A missing values tour with principal components
Visualization with regularized PCA and confidence ellipses
Exploratory data analysis: multi­-blocks/3ways methods
Multiple imputation for categorical data
Imputation of mixed data: Random forest/PCA

Software – R

I am involved in the R software community and I am sincerely glad to have been elected as a member of the R Foundation for Statistical Computing. Please if R is helping you, help us by supporting with donation

Development of packages:
FactoMineR: visualization with principal components methods
missMDA: missing values (imputation continuous, categorical data) – matrix completion
For questions on the use of packages we have a google group.
denoiseR: low rank matrix estimation with regularized SVD and bootstrap

My students have also developed R packages associated to our  works:
misaem: logistic regression with missing values
mimi: Generalized low-rank models for mixed and incomplete data frames.
lori: contingency table with missing values and covariates

If you want to do causal inference with missing values, you can use the R package grf where a double robust method handling missing covariates is implemented and see the pipeline to compare different estimators (IPW, DR) strategies (imputations, etc.).

I  served as an associate editor of Journal of Statistical Software (2011-2017) and I am involved in Rforwards to leading the R community forwards in widening the participation of women and other under-represented group. I am in the R foundation conference committee and work for implementation of Code of Conduct.
With M. Chavent, S. Dray, R. Genuer, F.Husson, B. Liquet, J. Sarracco, we created the « French R board group » to support the organization of Les Rencontres R.

News: Video presentation of Rforwards. Blog posts and multivariate studies of the R community.
Support R with the R consortium.


As a French professor, I was teaching around 160 hours/year (lectures, computer labs mainly with the R software) and I supervise master students projects and their internship in industry.  In addition, I give tutorials in different instituts and in conferences. Learn more. From, Sept 2020, I will teach Causal Inference in the IPP (Institut Polytechnique de Paris.) Master of Data Science at Polytechnique. For recent tutorials on missing values see the Rmistatic plateform.



Her first employment was in the statistics department of an Agronomy University (Agrocampus Ouest) where she was trained to « the French data analysis school » and had the opportunity to work closely with researchers from other departments and increases her interest in transversal studies. In the meantime, she prepared her PhD which was defended in 2010 and rewarded by the French Statistical Society as the best PhD in applied statistics. She has specialized in missing data, visualization and the nonparametric analyses of complex data structures. Her work was rewarded by a Marie Curie European Union grant in 2013 to increase her research potential and to spend a year at Stanford University. She spent a year as a researcher in INRIA before joining Polytechnique in 2016 as a Professor of Statistics. At Polytechnique, she was responsible of a master in data-sciences for business in collaboration with HEC. She has been a visiting researcher at Google Brain Paris, for a year (2 days a week) in 2019. In september 2020, she join Inria as a advanced researcher to set-up a team in data-science for health. She has published over 50 articles and written 2 books in applied statistics.  Her experience on dealing with incomplete data is recognized by the community: she organized an ICML workshop, the MissData conference, created the Rmistatic website and she is often invited to give lectures to share her experience. Her vocation is to push methodological innovation to bring useful application of her research to the user in particular in bio-sciences and health. Her curent research focuses on causal inferences techniques for personalized medicine.  Julie Josse is dedicated to reproducible research with the R statistical software: she has developed packages including FactoMineR, denoiseR, missMDA to transfer her work, she is a member of the R foundation and of Rforwards to increase the participation of minorities in the community.

Perso: I grew up in Africa and French Polynesia. Then I arrived in Brittany a magnificent French region and I had the chance to discover Paris and now the south of France.  I am passionate about statistics but also about travelling (often on horseback) around the world. I am also fascinated by nature and science (fan of, wildlife photographer of the year). I have a particular interest in humanitarian issues and my long-term goal is to use more of my skills for these purposes.

Interview in medium.


– SFdS French Statistical Society – Interested in data sciences? Join-us!!
– Some historical references on french data analysis, data sciences.
Roger Peng’s activities and  Brian Caffo youtube channel.  

Conferences organization head:
Artemiss workshop at ICML 2020.
– The first MissData on missing values and matrix completion, June 2015.
– Correspondence analysis related methods CARME 2011. Videos. Let the data speak…. data analysis
– The R conference useR! 2009