📄 GBCIC 2026

Riemannian Geometry-Preserving Variational Autoencoder
for MI-BCI Data Augmentation

Viktorija Poļaka  ·  Ivo Pascal De Jong  ·  Andreea Ioana Sburlea

Faculty of Science and Engineering · University of Groningen · Groningen, The Netherlands

Abstract

This paper addresses the challenge of generating synthetic electroencephalogram (EEG) covariance matrices for motor imagery brain-computer interface (MI-BCI) applications. We aim to develop a generative model capable of producing high-fidelity synthetic covariance matrices while preserving their symmetric positive-definite (SPD) nature. We propose a Riemannian Geometry-Preserving Variational Autoencoder (RGP-VAE) integrating geometric mappings with a composite loss function combining Riemannian distance, tangent space reconstruction accuracy and generative diversity. The model generates valid, representative EEG covariance matrices while learning a subject-invariant latent space. Synthetic data proves practically useful for MI-BCI, with its impact depending on the paired classifier. This work introduces and validates the RGP-VAE as a geometry-preserving generative model for EEG covariance matrices, highlighting its potential for signal privacy, scalability and data augmentation.

Model Architecture

RGP-VAE Architecture Diagram

Figure 1. An overview of the proposed RGP-VAE. An input SPD matrix Xi is projected onto the tangent space at a reference point Pref via the logarithmic map (logPref), vectorized, and fed to the encoder. The encoder produces a latent distribution (μ, log σ²) from which a latent vector zi is sampled. The decoder reconstructs the tangent representation, which is mapped back onto the SPD manifold via the exponential map (expPref).

Quantitative Results

Average balanced accuracy (%) across 12 subjects. Wilcoxon signed-rank test with Bonferroni correction (α=0.0083).

GeneratorClassifierBaseline AugmentedΔ Augmentedp-value Synthetic-OnlyΔ Syntheticp-value
Prior MDM59.5±5.558.9±5.4 −0.590.092 58.4±5.0−1.160.043
KNN53.2±4.055.4±4.2 +2.190.003 56.2±4.2+3.000.001
SVC60.7±5.357.4±6.3 −3.240.016 56.8±6.4−3.920.002
Posterior MDM59.5±5.558.8±5.3 −0.690.092 59.0±5.5−0.570.151
KNN53.2±4.0 55.6±4.1+2.450.002 56.7±4.1+3.490.002
SVC60.7±5.357.2±6.6 −3.480.007 56.7±6.3−4.010.002

Key Findings

100% Valid SPD Matrices

Across all folds, every synthetic EEG covariance matrix from both prior and posterior generators passed symmetry and positive-definiteness checks - overcoming the 40% failure rate of a standard Euclidean VAE.

KNN Benefits Significantly

Posterior sampling yielded the largest classification gain for KNN (on average +3.49 %, p=0.002), and prior sampling was similarly beneficial (average of +3.00 %, p=0.001). Subject-level gains reached up to +7.8 %.

Subject-Invariant Latent Space

UMAP visualization reveals latent codes from different subjects are heavily intermingled, confirming the model learns generalised cross-subject representations via parallel transport alignment.

Realistic Diversity

With a noise scaling factor of σi=2.2 and γ = 0.035, the statistical variance and geometric diversity closely matches original data, without distorting SPD properties.

Classifier-Dependent Utility

SVC performance significantly degraded with augmentation (up to −4.01 %, p=0.002), while MDM remained stable. Data augmentation utility is not universal - it depends on the classifier.

Privacy & Scalability

Beyond classification, synthetic EEG covariance generation enables privacy-preserving data sharing and pipeline scalability testing without requiring raw neural signal sharing.

Results

Prior sampling accuracy improvements

Figure 2 — Prior Sampling Accuracy Improvements. Distribution of accuracy improvement for each classifier using the prior generator. The plot shows the percentage point difference between the `Augmented' and `Synthetic-Only' conditions relative to the `Baseline' across all subjects. The red line signifies the mean whilst the blue line is the median.

MDM per-subject classification results

Figure 3 — Posterior Sampling Accuracy Improvements. Distribution of accuracy improvement for each classifier using the posterior generator, showing similar trends to the prior generator but with more pronounced fluctuations.

BibTeX

@inproceedings{polaka2026rgpvae,
  title={Riemannian Geometry-Preserving Variational Autoencoder for {MI-BCI} Data Augmentation},
  author={Poļaka, Viktorija and de Jong, Ivo Pascal and Sburlea, Andreea Ioana},
  booktitle={Submitted to Graz Brain-Computer Interface Conference},
  volume={10},
  year={2026},
  url       = {https://641e16.github.io/RGP-VAE/}
}