Faculty of Science and Engineering · University of Groningen · Groningen, The Netherlands
Figure 1. An overview of the proposed RGP-VAE. An input SPD matrix Xi is projected onto the tangent space at a reference point Pref via the logarithmic map (logPref), vectorized, and fed to the encoder. The encoder produces a latent distribution (μ, log σ²) from which a latent vector zi is sampled. The decoder reconstructs the tangent representation, which is mapped back onto the SPD manifold via the exponential map (expPref).
Average balanced accuracy (%) across 12 subjects. Wilcoxon signed-rank test with Bonferroni correction (α=0.0083).
| Generator | Classifier | Baseline | Augmented | Δ Augmented | p-value | Synthetic-Only | Δ Synthetic | p-value |
|---|---|---|---|---|---|---|---|---|
| Prior | MDM | 59.5±5.5 | 58.9±5.4 | −0.59 | 0.092 | 58.4±5.0 | −1.16 | 0.043 |
| KNN | 53.2±4.0 | 55.4±4.2 | +2.19 | 0.003 | 56.2±4.2 | +3.00 | 0.001 | |
| SVC | 60.7±5.3 | 57.4±6.3 | −3.24 | 0.016 | 56.8±6.4 | −3.92 | 0.002 | |
| Posterior | MDM | 59.5±5.5 | 58.8±5.3 | −0.69 | 0.092 | 59.0±5.5 | −0.57 | 0.151 |
| KNN | 53.2±4.0 | 55.6±4.1 | +2.45 | 0.002 | 56.7±4.1 | +3.49 | 0.002 | |
| SVC | 60.7±5.3 | 57.2±6.6 | −3.48 | 0.007 | 56.7±6.3 | −4.01 | 0.002 |
Across all folds, every synthetic EEG covariance matrix from both prior and posterior generators passed symmetry and positive-definiteness checks - overcoming the 40% failure rate of a standard Euclidean VAE.
Posterior sampling yielded the largest classification gain for KNN (on average +3.49 %, p=0.002), and prior sampling was similarly beneficial (average of +3.00 %, p=0.001). Subject-level gains reached up to +7.8 %.
UMAP visualization reveals latent codes from different subjects are heavily intermingled, confirming the model learns generalised cross-subject representations via parallel transport alignment.
With a noise scaling factor of σi=2.2 and γ = 0.035, the statistical variance and geometric diversity closely matches original data, without distorting SPD properties.
SVC performance significantly degraded with augmentation (up to −4.01 %, p=0.002), while MDM remained stable. Data augmentation utility is not universal - it depends on the classifier.
Beyond classification, synthetic EEG covariance generation enables privacy-preserving data sharing and pipeline scalability testing without requiring raw neural signal sharing.
Figure 2 — Prior Sampling Accuracy Improvements. Distribution of accuracy improvement for each classifier using the prior generator. The plot shows the percentage point difference between the `Augmented' and `Synthetic-Only' conditions relative to the `Baseline' across all subjects. The red line signifies the mean whilst the blue line is the median.
Figure 3 — Posterior Sampling Accuracy Improvements. Distribution of accuracy improvement for each classifier using the posterior generator, showing similar trends to the prior generator but with more pronounced fluctuations.
@inproceedings{polaka2026rgpvae,
title={Riemannian Geometry-Preserving Variational Autoencoder for {MI-BCI} Data Augmentation},
author={Poļaka, Viktorija and de Jong, Ivo Pascal and Sburlea, Andreea Ioana},
booktitle={Submitted to Graz Brain-Computer Interface Conference},
volume={10},
year={2026},
url = {https://641e16.github.io/RGP-VAE/}
}