<#>LM07 Research β Pre-Meeting Preparation Document#>
Team: Darryl, Luo (Yichi), Adeline Supervisors: Sandra, A/P Aw Paung Deadline: Next meeting in 2 weeks (7pm Zoom) Goal: Narrow down to 1-2 project ideas for final direction
<##>Master Tabulation: Ideas, Technical Capabilities, Use Cases, References##>
| Dimension | Idea 1: EEG Disentanglement | Idea 2: OvertβImagined Transfer | Idea 3: Co-articulation Aware EEG | Idea 5: Cross-lingual/Paradigm Disentanglement | |-----------|---------------------------|-------------------------------|----------------------------------|------------------------------------------------| | Core Idea | Separate EEG into distinct latent streams (content, articulation, speaker identity) | Transfer learned representations from overt speech EEG to imagined speech decoding | Model transitions between neighboring speech units, not isolated categories | Investigate universal vs language-specific neural factors in speech planning | | Technical Approach | Multi-head encoder with factorized latents (Ξ²-VAE, adversarial training, gradient reversal) | Domain adaptation (CORAL, adversarial alignment, optimal transport) with regularized label transfer | Sequence modeling (Transformer/CRF/CTC), dual-timescale architecture (fast phonetic + slow prosodic) | Multi-dataset training, cross-paradigm alignment, shared semantic latent spaces | | Key Innovation | Factorized EEG representations for interpretability; cross-subject transfer via disentangled streams | Bootstrapping imagined speech with higher-SNR overt data; reduces calibration time | Context-dependent decoding matching natural speech physiology; reduces word error rate 15-25% | Language-agnostic core decoder with language-specific adapters; few-shot transfer | | Datasets | Auditory EEG Challenge, any multi-speaker EEG dataset | Auditory EEG Challenge + imagined speech data (may need new collection) | Auditory EEG Challenge (continuous speech trials); potentially multi-lingual | Multiple public datasets (cross-language, cross-paradigm) | | Compute Requirement | Moderate β encoder training with multiple heads | Moderate-High β domain adaptation pipelines | Moderate β sequence models but offloadable to frozen LLM | High β multi-dataset training (may be computationally prohibitive) | | Specific Use Case 1 | ALS Voice Identity Preservation: Decode intended words + preserve pre-illness voice identity for dignity/family recognition | Post-Stroke Broca's Aphasia: Cut patient calibration from ~14 days to 1 session using overt speech as source domain | Cerebral Palsy Continuous Communication: 5β12-15 wpm improvement via fluid sequence modeling | Bilingual Stroke Survivors (SG Mandarin-English): 70% reduction in cross-language calibration | | Specific Use Case 2 | Pediatric Dysarthria (Ages 6-14): Dynamic adaptation to changing muscle control during rehabilitation; error rates 30%β<12% | TBI/TBM Vocal Cord Palsy (Veterans): Rapid deployment for trauma patients lacking calibration time | Logopenic Primary Progressive Aphasia: Predictive intent completion reduces cognitive load during word-finding episodes | Heritage Language AAC: Low-resource language support via universal intent + few-shot alignment | | Underexplored Niche | Voice identity transfer (ignored by clinical teams treating all decoded speech as generic text) | Cross-paradigm adaptation at non-invasive EEG resolution (clinically unimplemented) | Predictive sequence modeling for neurodegenerative AAC (unaddressed) | Heritage language support for linguistically marginalized patients (too "small" for commercial BCI) | | Similarity | Shares disentanglement concept with Idea 5; data from Idea 2 useful here | Cross-paradigm transfer overlaps with Idea 5; data collection synergizes with Idea 1 | Unique β focuses on temporal/contextual aspects others ignore | Adds cross-language dimension to Idea 2's cross-paradigm transfer | | Risk Level | Low-Medium β clear methodology, flexible application | Medium β requires new data collection; domain adaptation can be unstable | Low-Medium β builds on existing seq2seq methods; needs careful architecture design | High β scope too broad; compute constraints; dataset diversity uncertain |
<##>Key Terminology Breakdown##>
<###>Idea 1: EEG Disentanglement###>
<##>Similarities, Differences & Synergies##>
<###>Overlap Map###>
Idea 1: Disentanglement
β (shares factorized representations)
Ideas 1 ββ 2 ββ 5 β Core cluster: all involve latent representations & transfer
β Difference: Idea 2 = paradigm transfer, Idea 5 = language + paradigm
β
Idea 3: Co-articulation
β (UNIQUE β focuses on temporal/contextual aspects others ignore)
<###>Key Insight from A/P Aw###>
<##>Top-Down & Bottom-Up Mapping##>
<###>Bottom-Up (Technology β Use Case)###> | Technology | β Benefits β | Natural Use Case | |------------|-------------|-----------------| | Disentangled speaker identity | Voice preservation | ALS patients wanting pre-illness voice | | Overtβimagined transfer | Fast calibration | Stroke patients with limited training time | | Co-articulation modeling | Fluent output | CP patients needing natural-sounding AAC | | Cross-lingual adapters | Multi-language support | Bilingual heritage language users |
<###>Top-Down (Use Case β Technology Required)###> | Specific Use Case | β Requires β | Best Matching Idea | |------------------|-------------|-------------------| | ALS voice preservation | Separate content from speaker features | Idea 1 (disentanglement) | | Aphasia communication | Transfer from preserved motor speech | Idea 2 (overtβimagined) | | CP fluent speech | Continuous sequence, not isolated units | Idea 3 (co-articulation) | | Heritage language AAC | Universal core + language adapter | Idea 5 (cross-lingual) |
<##>Reference Papers (Literature Review Summary)##>
<###>Idea 1: EEG Disentanglement###>
<##>Underexplored Niches / Opportunities (A/P Aw Advisory Focus)##>
<##>Action Plan (Next 2 Weeks)##>
<###>Week 1: Deep Dive & Documentation###>
<##>Recommended Direction##>
Primary Choice: Idea 1 (Disentanglement)
<##>Disentanglement Methods: Beyond Beta-VAE (Updated Literature Review)##>
<###>EEG-Specific Disentanglement Work (Newly Found Papers)###>
These are the most directly relevant papers discovered via arXiv search (2025-2026):
| arXiv ID | Title | Authors | Year | Key Contribution | |----------|-------|---------|------|------------------| | 2207.00323 | Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders | Bollens, Francart, Van Hamme | 2022 | VAE for subject-invariant speech-EEG representations; baseline for our approach | | 1812.06857 | Transfer Learning in BCIs with Adversarial Variational Autoencoders | Ozdenizci, Wang, Koike-Akino | 2018 | Foundational adversarial VAE for BCI transfer; gradient reversal for subject invariance | | 2501.04359 | Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation | Chen, Chen, Soederhaell | 2025 | VAE augmentation + transformer for EEG speech; validates VAE as data augmentation for limited EEG data | | 2602.22597 | Relating Neural Representations of Vocalized, Mimed, and Imagined Speech | Maghsoudi, Chillale, Shamma | 2026 | Most relevant β directly compares overt/mimed/imagined speech neural representations using stereotactic EEG; found that mimed and imagined share more similarity than previously assumed | | 2502.04132 | Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and TFS | Duraisamy, Dubiel, Rekrut | 2025 | Transfer learning from overt to covert speech using Hilbert envelope features | | 2508.11357 | PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding | Jing, Liu, Wang | 2025 | Physiology-aware task-invariant EEG modeling; strong cross-subject generalization | | 2504.03762 | Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer | Jiang, Ding, Zhang | 2025 | Covert speech decoding using transformer with functional area modeling | | 2507.07526 | DMF2Mel: Dynamic Multiscale Fusion for EEG-Driven Mel Spectrogram Reconstruction | Fan, Zhang, Zhang | 2025 | Multi-scale EEG fusion for speech reconstruction (TTS-style) | | 2512.22146 | EEG-to-Voice Decoding of Spoken and Imagined Speech Using Non-Invasive EEG | Park, Cho, Kim | 2025 | EEG-to-voice reconstruction for both spoken and imagined speech |
<###>Key Finding from 2602.22597 (Maghsoudi et al., 2026)###> Critical for experimental design: This paper found that neural representations of mimed and imagined speech are MORE similar to each other than either is to overt speech β contrary to the common assumption that imagined is simply "weaker" overt. This has important implications:
Based on literature review of methods from adjacent modalities that could transfer to EEG:
<####>Tier 1: High Transfer Potential (Audio β EEG)####>
| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | Contrastive VAE | VAE + contrastive loss in latent space; pulls same-factor pairs together | Handles noisy EEG labels well; contrastive learning proven in EEG (EEG-Conformer, TS-TCC) | YES β top candidate | | Speaker-invariant EEG decoders (GRL) | Gradient reversal layer removes speaker/subject identity | Directly addresses content vs identity; proven in BCI literature (Ozdenizci 2018) | YES β core to approach | | CPC / Wav2Vec-style pretraining | Contrastive predictive coding; maximize mutual information between context and future frames | InfoMax principle proven in both audio and EEG; pretrained audio models may init EEG encoder | YES β pretraining strategy | | Disentangled Speech Representation (DSR) | Factorized encoders + F0/prosody predictor; cycle-consistency | Directly separates content from speaker/prosody in speech; could map to EEG content vs articulation | YES β model architecture |
<####>Tier 2: Strong Theoretical Basis (Image β EEG)####>
| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | FactorVAE | Total correlation penalty using density ratio trick; separate discriminator | Theoretically rigorous for forcing factorized latents; handles EEG's high inter-subject variability | YES β if Ξ²-VAE insufficient | | DIP-VAE | Penalizes latent correlations; encourages diagonal prior covariance | Simpler than FactorVAE, no extra discriminator, good empirical results | YES β baseline method | | Annealed VAE | Gradually increases KL weight during training | Avoids reconstruction collapse; stable training dynamics important for noisy EEG | YES β training schedule | | CVIB (Conditional VIB) | Maximizes I(content; labels) while minimizing I(latent; noise) | If phoneme labels available, explicitly separates content from subject-specific noise | Consider if supervised | | AAE (Adversarial Autoencoder) | Adversarial prior matching in latent space | Robust to noise via adversarial training; mode coverage better than VAE | Consider for artifact-heavy EEG |
<####>Tier 3: Promising but Needs Adaptation####>
| Method | Core Mechanism | EEG Suitability | Risk | |--------|---------------|-----------------|------| | InfoGAN | Mutual information maximization for discrete codes | No encoder for inference; discrete codes may not map well to continuous EEG factors | Medium risk | | Slot Attention VAE | Object-centric via slot-based attention | Could separate articulation modes (overt/mimed/imagined) as slots, but computationally expensive | High risk/reward | | TC-DRG | Total correlation via MINE estimator | Bound on total correlation but MINE estimation noisy; computational overhead | Medium |
<###>Recommended Disentanglement Methods for EEG Speech (Final Ranking)###>
After synthesis of cross-modality literature + EEG-specific work:
<##>Finalized Experimental Design##>
<###>Paradigm Selection (Confirmed)###>
All three paradigms will be included, following A/P Aw Paung's recommendation and the Maghsoudi (2026) finding that mimed and imagined are more similar to each other than to overt:
| Paradigm | Description | Expected SNR | Role in Experiment | |----------|-------------|--------------|-------------------| | Overt Speech | Subject speaks words aloud with actual vocalization | Highest | Source domain for transfer learning | | Mimed Speech | Subject mouths words without sound (mouth movement present) | Medium | Intermediate transfer step (novel contribution) | | Imagined Speech | Subject imagines speaking without any movement | Lowest | Target domain; primary use case |
Key insight from Maghsoudi 2026: Mimicked and imagined speech representations cluster separately from overt. Treating mimed as an intermediate step between overt and imagined may be more effective than direct overtβimagined transfer.
<###>Vocabulary / Class Selection (Confirmed)###>
Following A/P Aw's recommendation:
Phase 1: Pre-training on public datasets
βββ Auditory EEG Challenge (open dataset)
βββ Train baseline encoder on overt speech yes/no/rest classification
βββ Establish upper-bound accuracy
Phase 2: Internal data fine-tuning
βββ Fine-tune on FYP overt speech data (20+ subjects)
βββ Evaluate on FYP mimed speech (NEW β intermediate transfer)
βββ Evaluate on FYP imagined speech (primary target)
Phase 3: Cross-subject generalization
βββ Leave-one-subject-out (LOSO) cross-validation
βββ Report per-subject accuracy + mean Β± std
βββ Compare disentangled vs non-disentangled models
<###>Disentanglement Evaluation Strategy###>
| Evaluation | Metric | What It Measures | |-----------|--------|-----------------| | Classification accuracy | Top-1 accuracy, F1 | Does the content stream correctly decode yes/no/rest? | | Subject invariance | LOSO accuracy vs within-subject accuracy gap | Does the model generalize across subjects? | | Factor separability | MIG (Mutual Information Gap), DCI score, SAP | Are latent dimensions actually factorized? | | Reconstruction quality | Signal-to-noise ratio of reconstructed EEG | Does the content stream preserve speech-relevant info? | | Transfer effectiveness | OvertβImagined accuracy with vs without disentanglement | Does disentanglement actually help cross-paradigm transfer? |
<###>Computational Requirements###>
| Component | Specification | |-----------|--------------| | Encoder | EEGNet or ShallowConvNet backbone (standard BCI architectures) | | VAE heads | 3 latent streams Γ 32 dims each (content/articulation/subject) | | Discriminator | 2-layer MLP for GRL gradient reversal | | Training | Single RTX 3090/4090 sufficient; ~4-6 hours training per full experiment | | Data | FYP data (20+ subjects) + Auditory EEG Challenge (public) |
<###>Novel Contributions of Experimental Design###>
<##>Updated Action Items##>
<###>Darryl's Tasks (from brainstorming document)###>
Slide deck outline:
<##>Next Meeting Preparation (April 22, 2026, 7pm Zoom)##>
Questions to bring to supervisors: