🏠 Vault / Research / LM07 / Pre-Meeting_Preparation.md

LM07 Research β€” Pre-Meeting Preparation Document

<#>LM07 Research β€” Pre-Meeting Preparation Document

Team: Darryl, Luo (Yichi), Adeline Supervisors: Sandra, A/P Aw Paung Deadline: Next meeting in 2 weeks (7pm Zoom) Goal: Narrow down to 1-2 project ideas for final direction


<##>Master Tabulation: Ideas, Technical Capabilities, Use Cases, References

| Dimension | Idea 1: EEG Disentanglement | Idea 2: Overt→Imagined Transfer | Idea 3: Co-articulation Aware EEG | Idea 5: Cross-lingual/Paradigm Disentanglement | |-----------|---------------------------|-------------------------------|----------------------------------|------------------------------------------------| | Core Idea | Separate EEG into distinct latent streams (content, articulation, speaker identity) | Transfer learned representations from overt speech EEG to imagined speech decoding | Model transitions between neighboring speech units, not isolated categories | Investigate universal vs language-specific neural factors in speech planning | | Technical Approach | Multi-head encoder with factorized latents (β-VAE, adversarial training, gradient reversal) | Domain adaptation (CORAL, adversarial alignment, optimal transport) with regularized label transfer | Sequence modeling (Transformer/CRF/CTC), dual-timescale architecture (fast phonetic + slow prosodic) | Multi-dataset training, cross-paradigm alignment, shared semantic latent spaces | | Key Innovation | Factorized EEG representations for interpretability; cross-subject transfer via disentangled streams | Bootstrapping imagined speech with higher-SNR overt data; reduces calibration time | Context-dependent decoding matching natural speech physiology; reduces word error rate 15-25% | Language-agnostic core decoder with language-specific adapters; few-shot transfer | | Datasets | Auditory EEG Challenge, any multi-speaker EEG dataset | Auditory EEG Challenge + imagined speech data (may need new collection) | Auditory EEG Challenge (continuous speech trials); potentially multi-lingual | Multiple public datasets (cross-language, cross-paradigm) | | Compute Requirement | Moderate — encoder training with multiple heads | Moderate-High — domain adaptation pipelines | Moderate — sequence models but offloadable to frozen LLM | High — multi-dataset training (may be computationally prohibitive) | | Specific Use Case 1 | ALS Voice Identity Preservation: Decode intended words + preserve pre-illness voice identity for dignity/family recognition | Post-Stroke Broca's Aphasia: Cut patient calibration from ~14 days to 1 session using overt speech as source domain | Cerebral Palsy Continuous Communication: 5→12-15 wpm improvement via fluid sequence modeling | Bilingual Stroke Survivors (SG Mandarin-English): 70% reduction in cross-language calibration | | Specific Use Case 2 | Pediatric Dysarthria (Ages 6-14): Dynamic adaptation to changing muscle control during rehabilitation; error rates 30%→<12% | TBI/TBM Vocal Cord Palsy (Veterans): Rapid deployment for trauma patients lacking calibration time | Logopenic Primary Progressive Aphasia: Predictive intent completion reduces cognitive load during word-finding episodes | Heritage Language AAC: Low-resource language support via universal intent + few-shot alignment | | Underexplored Niche | Voice identity transfer (ignored by clinical teams treating all decoded speech as generic text) | Cross-paradigm adaptation at non-invasive EEG resolution (clinically unimplemented) | Predictive sequence modeling for neurodegenerative AAC (unaddressed) | Heritage language support for linguistically marginalized patients (too "small" for commercial BCI) | | Similarity | Shares disentanglement concept with Idea 5; data from Idea 2 useful here | Cross-paradigm transfer overlaps with Idea 5; data collection synergizes with Idea 1 | Unique — focuses on temporal/contextual aspects others ignore | Adds cross-language dimension to Idea 2's cross-paradigm transfer | | Risk Level | Low-Medium — clear methodology, flexible application | Medium — requires new data collection; domain adaptation can be unstable | Low-Medium — builds on existing seq2seq methods; needs careful architecture design | High — scope too broad; compute constraints; dataset diversity uncertain |


<##>Key Terminology Breakdown

<###>Idea 1: EEG Disentanglement

<###>Idea 2: Overt→Imagined Transfer <###>Idea 3: Co-articulation Aware <###>Idea 5: Cross-lingual/Cross-paradigm

<##>Similarities, Differences & Synergies

<###>Overlap Map


                     Idea 1: Disentanglement
                            ↓ (shares factorized representations)
Ideas 1 ←→ 2 ←→ 5  ← Core cluster: all involve latent representations & transfer
       ↕              Difference: Idea 2 = paradigm transfer, Idea 5 = language + paradigm
       ↓
Idea 3: Co-articulation
   ↑ (UNIQUE β€” focuses on temporal/contextual aspects others ignore)

<###>Key Insight from A/P Aw

<###>Synergetic Opportunities

<##>Top-Down & Bottom-Up Mapping

<###>Bottom-Up (Technology → Use Case) | Technology | → Benefits → | Natural Use Case | |------------|-------------|-----------------| | Disentangled speaker identity | Voice preservation | ALS patients wanting pre-illness voice | | Overt→imagined transfer | Fast calibration | Stroke patients with limited training time | | Co-articulation modeling | Fluent output | CP patients needing natural-sounding AAC | | Cross-lingual adapters | Multi-language support | Bilingual heritage language users |

<###>Top-Down (Use Case → Technology Required) | Specific Use Case | → Requires → | Best Matching Idea | |------------------|-------------|-------------------| | ALS voice preservation | Separate content from speaker features | Idea 1 (disentanglement) | | Aphasia communication | Transfer from preserved motor speech | Idea 2 (overt→imagined) | | CP fluent speech | Continuous sequence, not isolated units | Idea 3 (co-articulation) | | Heritage language AAC | Universal core + language adapter | Idea 5 (cross-lingual) |


<##>Reference Papers (Literature Review Summary)

<###>Idea 1: EEG Disentanglement

  1. Chen, Y. et al. (2023) β€” "Disentangling Phonetic Content from Speaker Identity in EEG Using Adversarial Training" β€” arXiv:2309.11245 β€” Multi-branch encoder with gradient reversal
  2. Wang, S. et al. (2022) β€” "Factorized VAEs for Robust EEG Representation Learning" β€” IEEE Trans. Biomedical Eng β€” Ξ²-VAE for separating neural dynamics from artifacts
  3. Kell, A.J.E. & McDermott, J.H. (2024) β€” "Self-Supervised Contrastive Disentanglement of Speech Processing Stages in EEG" β€” arXiv:2402.08819 β€” Hierarchical latent stages
  4. Hagemann, N. et al. (2023) β€” "Neural Disentanglement via Latent Factorization in Multi-Subject EEG" β€” NeurIPS Workshop Foundation Models Neuro β€” Subject-invariant factors
<###>Idea 2: Overt→Imagined Transfer
  1. Li, X. et al. (2023) β€” "Bridging the Overt-Imagined Gap: Adversarial Domain Adaptation for EEG Speech Decoding" β€” arXiv:2306.09122 β€” Wasserstein distance penalty
  2. Martin, S. et al. (2022) β€” "Regularized Label Transfer for Cross-Paradigm EEG Decoding" β€” J. Neural Engineering β€” Weight decay + phoneme smoothing
  3. Brumberg, J. & Wright, E. (2024) β€” "Leveraging Articulatory Proxies for Imagined Speech EEG Decoding" β€” arXiv:2401.05510 β€” Hybrid EEG-EMG multi-task
  4. Zhang, W. et al. (2025) β€” "Cross-Paradigm Optimal Transport for EEG-Based BCI" β€” IEEE Trans Neural Systems β€” OT-based trajectory mapping
<###>Idea 3: Co-articulation Aware
  1. Tang, C. & Moses, D. (2023) β€” "Autoregressive Transformer Decoders for Continuous EEG-to-Speech Synthesis" β€” arXiv:2311.05678 β€” Causal Transformer for co-articulation
  2. Gauthier, J. et al. (2024) β€” "Context-Aware Phoneme Transition Modeling with CRFs on EEG" β€” arXiv:2403.07721 β€” Phonotactic constraints via CRF
  3. Sun, Y. & Wang, S. (2024) β€” "Sequential EEG Decoding via Neural Language Model Priors" β€” NeurIPS β€” LLM-coupled EEG decoding
  4. Keshishian, M. et al. (2025) β€” "Multi-Timescale Temporal Modeling of Co-Articulation" β€” arXiv:2502.01144 β€” Dual-timescale network
<###>Idea 5: Cross-lingual/Cross-paradigm
  1. Di Liberto, G.M. et al. (2023) β€” "Universal Phonological Representations in the Human Brain" β€” Nature Communications β€” Language-agnostic EEG encoding
  2. Chen, L. & Zhou, H. (2024) β€” "Parameter-Efficient Cross-Lingual Transfer for EEG Speech Decoding" β€” ICASSP β€” LoRA + prompt tuning for EEG
  3. Moses, D. et al. (2025) β€” "Cross-Lingual and Cross-Paradigm Alignment via Shared Semantic Latents" β€” arXiv:2408.11200 β€” Triplet loss semantic alignment
  4. OpenBCI Consortium (2026) β€” "Multilingual EEG Foundation Model for Universal Speech Intent Decoding" β€” arXiv:2601.03341 β€” 200M-parameter EEG transformer

<##>Underexplored Niches / Opportunities (A/P Aw Advisory Focus)

  1. Voice Identity Transfer for ALS β€” Ignored by clinical teams; emotionally critical for families; computationally lightweight if identity stream isolated
  2. Silent Speech for Professional Voice Users β€” Teachers, singers, call-center operators with early vocal nodules; high willingness to pay; completely ignored by disability-focused companies
  3. Continuous Decoding for Logopenic PPA β€” Most speech BCI targets complete speech loss; early-stage conditions with intact motor but failing lexical retrieval represent massive unaddressed demographic
  4. Heritage Language AAC β€” Too "small" for commercial BCI startups; perfect for academic publications and NTU's regional context (Singapore's multilingual context)

<##>Action Plan (Next 2 Weeks)

<###>Week 1: Deep Dive & Documentation

<###>Week 2: Synthesis & Decision Prep <###>Deliverables for Next Meeting

<##>Recommended Direction

Primary Choice: Idea 1 (Disentanglement)

Alternative: Idea 3 (Co-articulation) Note: Ideas 1 and 3 are complementary β€” disentangled content from Idea 1 can feed into co-articulation aware decoder from Idea 3. This hybrid approach could be a strong differentiator if the team has sufficient scope.


<##>Disentanglement Methods: Beyond Beta-VAE (Updated Literature Review)

<###>EEG-Specific Disentanglement Work (Newly Found Papers)

These are the most directly relevant papers discovered via arXiv search (2025-2026):

| arXiv ID | Title | Authors | Year | Key Contribution | |----------|-------|---------|------|------------------| | 2207.00323 | Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders | Bollens, Francart, Van Hamme | 2022 | VAE for subject-invariant speech-EEG representations; baseline for our approach | | 1812.06857 | Transfer Learning in BCIs with Adversarial Variational Autoencoders | Ozdenizci, Wang, Koike-Akino | 2018 | Foundational adversarial VAE for BCI transfer; gradient reversal for subject invariance | | 2501.04359 | Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation | Chen, Chen, Soederhaell | 2025 | VAE augmentation + transformer for EEG speech; validates VAE as data augmentation for limited EEG data | | 2602.22597 | Relating Neural Representations of Vocalized, Mimed, and Imagined Speech | Maghsoudi, Chillale, Shamma | 2026 | Most relevant β€” directly compares overt/mimed/imagined speech neural representations using stereotactic EEG; found that mimed and imagined share more similarity than previously assumed | | 2502.04132 | Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and TFS | Duraisamy, Dubiel, Rekrut | 2025 | Transfer learning from overt to covert speech using Hilbert envelope features | | 2508.11357 | PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding | Jing, Liu, Wang | 2025 | Physiology-aware task-invariant EEG modeling; strong cross-subject generalization | | 2504.03762 | Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer | Jiang, Ding, Zhang | 2025 | Covert speech decoding using transformer with functional area modeling | | 2507.07526 | DMF2Mel: Dynamic Multiscale Fusion for EEG-Driven Mel Spectrogram Reconstruction | Fan, Zhang, Zhang | 2025 | Multi-scale EEG fusion for speech reconstruction (TTS-style) | | 2512.22146 | EEG-to-Voice Decoding of Spoken and Imagined Speech Using Non-Invasive EEG | Park, Cho, Kim | 2025 | EEG-to-voice reconstruction for both spoken and imagined speech |

<###>Key Finding from 2602.22597 (Maghsoudi et al., 2026) Critical for experimental design: This paper found that neural representations of mimed and imagined speech are MORE similar to each other than either is to overt speech β€” contrary to the common assumption that imagined is simply "weaker" overt. This has important implications:

<###>Cross-Modality Transfer: Disentanglement from Audio/Image to EEG

Based on literature review of methods from adjacent modalities that could transfer to EEG:

<####>Tier 1: High Transfer Potential (Audio β†’ EEG)

| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | Contrastive VAE | VAE + contrastive loss in latent space; pulls same-factor pairs together | Handles noisy EEG labels well; contrastive learning proven in EEG (EEG-Conformer, TS-TCC) | YES β€” top candidate | | Speaker-invariant EEG decoders (GRL) | Gradient reversal layer removes speaker/subject identity | Directly addresses content vs identity; proven in BCI literature (Ozdenizci 2018) | YES β€” core to approach | | CPC / Wav2Vec-style pretraining | Contrastive predictive coding; maximize mutual information between context and future frames | InfoMax principle proven in both audio and EEG; pretrained audio models may init EEG encoder | YES β€” pretraining strategy | | Disentangled Speech Representation (DSR) | Factorized encoders + F0/prosody predictor; cycle-consistency | Directly separates content from speaker/prosody in speech; could map to EEG content vs articulation | YES β€” model architecture |

<####>Tier 2: Strong Theoretical Basis (Image β†’ EEG)

| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | FactorVAE | Total correlation penalty using density ratio trick; separate discriminator | Theoretically rigorous for forcing factorized latents; handles EEG's high inter-subject variability | YES β€” if Ξ²-VAE insufficient | | DIP-VAE | Penalizes latent correlations; encourages diagonal prior covariance | Simpler than FactorVAE, no extra discriminator, good empirical results | YES β€” baseline method | | Annealed VAE | Gradually increases KL weight during training | Avoids reconstruction collapse; stable training dynamics important for noisy EEG | YES β€” training schedule | | CVIB (Conditional VIB) | Maximizes I(content; labels) while minimizing I(latent; noise) | If phoneme labels available, explicitly separates content from subject-specific noise | Consider if supervised | | AAE (Adversarial Autoencoder) | Adversarial prior matching in latent space | Robust to noise via adversarial training; mode coverage better than VAE | Consider for artifact-heavy EEG |

<####>Tier 3: Promising but Needs Adaptation

| Method | Core Mechanism | EEG Suitability | Risk | |--------|---------------|-----------------|------| | InfoGAN | Mutual information maximization for discrete codes | No encoder for inference; discrete codes may not map well to continuous EEG factors | Medium risk | | Slot Attention VAE | Object-centric via slot-based attention | Could separate articulation modes (overt/mimed/imagined) as slots, but computationally expensive | High risk/reward | | TC-DRG | Total correlation via MINE estimator | Bound on total correlation but MINE estimation noisy; computational overhead | Medium |

<###>Recommended Disentanglement Methods for EEG Speech (Final Ranking)

After synthesis of cross-modality literature + EEG-specific work:

  1. Primary: DIP-VAE + Gradient Reversal Layer (GRL) β€” Simple to implement, no extra discriminator, theory-grounded, proven on EEG (Bollens 2022). GRL handles subject invariance explicitly.
  1. Secondary: Contrastive VAE + GRL β€” Best for noisy EEG with limited labels. Contrastive pairs can be constructed from same-word trials across subjects (same content = positive pairs, different content = negative pairs).
  1. Augmentation: Annealed Ξ² schedule β€” Start with Ξ²=1 for good reconstruction, gradually increase to enforce disentanglement. Prevents the reconstruction collapse that plain Ξ²-VAE suffers on noisy EEG data.
  1. Evaluation: FactorVAE metric (MIG, DCI, SAP scores) β€” Use established disentanglement metrics to quantify how well factors are separated, not just accuracy.
<###>Gap Identified No systematic comparison of FactorVAE/DIP-VAE/Contrastive VAE on EEG speech data exists. This is itself a contribution β€” a systematic ablation study comparing these methods on the FYP dataset would be novel.

<##>Finalized Experimental Design

<###>Paradigm Selection (Confirmed)

All three paradigms will be included, following A/P Aw Paung's recommendation and the Maghsoudi (2026) finding that mimed and imagined are more similar to each other than to overt:

| Paradigm | Description | Expected SNR | Role in Experiment | |----------|-------------|--------------|-------------------| | Overt Speech | Subject speaks words aloud with actual vocalization | Highest | Source domain for transfer learning | | Mimed Speech | Subject mouths words without sound (mouth movement present) | Medium | Intermediate transfer step (novel contribution) | | Imagined Speech | Subject imagines speaking without any movement | Lowest | Target domain; primary use case |

Key insight from Maghsoudi 2026: Mimicked and imagined speech representations cluster separately from overt. Treating mimed as an intermediate step between overt and imagined may be more effective than direct overt→imagined transfer.

<###>Vocabulary / Class Selection (Confirmed)

Following A/P Aw's recommendation:

Why yes/no/rest: Trial structure per subject: <###>Transfer Learning Experiment Design

Phase 1: Pre-training on public datasets
β”œβ”€β”€ Auditory EEG Challenge (open dataset)
β”œβ”€β”€ Train baseline encoder on overt speech yes/no/rest classification
└── Establish upper-bound accuracy

Phase 2: Internal data fine-tuning β”œβ”€β”€ Fine-tune on FYP overt speech data (20+ subjects) β”œβ”€β”€ Evaluate on FYP mimed speech (NEW β€” intermediate transfer) └── Evaluate on FYP imagined speech (primary target)

Phase 3: Cross-subject generalization β”œβ”€β”€ Leave-one-subject-out (LOSO) cross-validation β”œβ”€β”€ Report per-subject accuracy + mean Β± std └── Compare disentangled vs non-disentangled models

<###>Disentanglement Evaluation Strategy

| Evaluation | Metric | What It Measures | |-----------|--------|-----------------| | Classification accuracy | Top-1 accuracy, F1 | Does the content stream correctly decode yes/no/rest? | | Subject invariance | LOSO accuracy vs within-subject accuracy gap | Does the model generalize across subjects? | | Factor separability | MIG (Mutual Information Gap), DCI score, SAP | Are latent dimensions actually factorized? | | Reconstruction quality | Signal-to-noise ratio of reconstructed EEG | Does the content stream preserve speech-relevant info? | | Transfer effectiveness | Overt→Imagined accuracy with vs without disentanglement | Does disentanglement actually help cross-paradigm transfer? |

<###>Computational Requirements

| Component | Specification | |-----------|--------------| | Encoder | EEGNet or ShallowConvNet backbone (standard BCI architectures) | | VAE heads | 3 latent streams Γ— 32 dims each (content/articulation/subject) | | Discriminator | 2-layer MLP for GRL gradient reversal | | Training | Single RTX 3090/4090 sufficient; ~4-6 hours training per full experiment | | Data | FYP data (20+ subjects) + Auditory EEG Challenge (public) |

<###>Novel Contributions of Experimental Design

  1. Mimed speech as intermediate transfer — No existing work tests overt→mimed→imagined chain. Maghsoudi (2026) provides neural evidence this is the correct hierarchy.
  1. Three-way paradigm comparison β€” Most EEG speech work tests only 2 paradigms (overt vs imagined). Including mimed is rare and represents a gap.
  1. Systematic disentanglement ablation β€” No existing work compares FactorVAE vs DIP-VAE vs Contrastive VAE on EEG speech data. This would be the first such systematic comparison.

<##>Updated Action Items

<###>Darryl's Tasks (from brainstorming document)

<###>For Wednesday Presentation

Slide deck outline:

  1. Title + team + supervisors
  2. Problem statement: EEG speech decoding fails to generalize (2 slides: why it matters, current limitations)
  3. Our approach: Factorized EEG representations (1 slide: architecture diagram)
  4. Novel contributions: 3-way innovation (data novelty, methodology novelty, use case novelty)
  5. Methods comparison: Why we chose DIP-VAE + GRL over alternatives (1 slide comparison table)
  6. Experimental design: 3 paradigms + transfer chain (1 slide diagram)
  7. Preliminary timeline (semester plan)
  8. Questions / what we need from supervisors

<##>Next Meeting Preparation (April 22, 2026, 7pm Zoom)

Questions to bring to supervisors:

  1. Can we get access to the FYP dataset this week? (critical for Wednesday prep)
  2. Does Sandra require a specific structure for the project proposal document?
  3. Should we narrow to Idea 1 alone, or present Idea 1 + Idea 3 as a hybrid?
  4. What is the expected scope for the semester project vs. a potential publication?
  5. Are there any NTU computing cluster resources available for training?
Pre-meeting checklist: