I am interested in the analysis of incomplete data and its application to machine learning of remote sensing and biomedical data. My research focuses on two kind of approaches to deal with missing data : predictive and parametric approaches. In the former, I use EOF analysis and EM algorithms to predict missing values. In the latter, the aim is to estimate statistical parameters using the EM algorithm from incomplete data. In a broader sens, I am attentive to dimensionality reduction methods that deal with missing data, classification/clustering with metrics induced by statistical models and geophysical applications from remotely sensed data.
In this scope, I mainly use a variety of Empirical Orthogonal Functions (EOF, EEOF) to decompose heterogeneous and incomplete signals. This decomposition, which shares a link with the spectral representation of the signal, can be embedded in EM-types algorithms to iteratively predict missing values. Such methods allow the extraction of temporal and/or spatial features of the data along with their interpolation. Application to various geophysical datasets from SAR and optical imagery (glacier velocity) confirm the possibilities to denoise and interpolate highly incomplete data, which is a key step to facilitate their interpretation by geophysicists.
The aim is build robust estimators of statistical parameters (as the mean and the covariance matrix) using the EM algorithm from incomplete data following both Gaussian and non-Gaussian distributions, as mixed-effects and/or elliptical distributions, and asometimes with a low-rank structure constraint. Applications include covariance-based classification of electroencephalograms (EEG), imputation of satellite reflectances for crop monitoring and clustering of hyperspectral images.