🏠 Vault / Research / LM07 / Pre-Meeting_Preparation.md

LM07 Research — Pre-Meeting Preparation Document

<#>LM07 Research — Pre-Meeting Preparation Document

Team: Darryl, Luo (Yichi), Adeline Supervisors: Sandra, A/P Aw Paung Deadline: Next meeting in 2 weeks (7pm Zoom) Goal: Narrow down to 1-2 project ideas for final direction

<##>Master Tabulation: Ideas, Technical Capabilities, Use Cases, References

| Dimension | Idea 1: EEG Disentanglement | Idea 2: Overt→Imagined Transfer | Idea 3: Co-articulation Aware EEG | Idea 5: Cross-lingual/Paradigm Disentanglement | |-----------|---------------------------|-------------------------------|----------------------------------|------------------------------------------------| | Core Idea | Separate EEG into distinct latent streams (content, articulation, speaker identity) | Transfer learned representations from overt speech EEG to imagined speech decoding | Model transitions between neighboring speech units, not isolated categories | Investigate universal vs language-specific neural factors in speech planning | | Technical Approach | Multi-head encoder with factorized latents (β-VAE, adversarial training, gradient reversal) | Domain adaptation (CORAL, adversarial alignment, optimal transport) with regularized label transfer | Sequence modeling (Transformer/CRF/CTC), dual-timescale architecture (fast phonetic + slow prosodic) | Multi-dataset training, cross-paradigm alignment, shared semantic latent spaces | | Key Innovation | Factorized EEG representations for interpretability; cross-subject transfer via disentangled streams | Bootstrapping imagined speech with higher-SNR overt data; reduces calibration time | Context-dependent decoding matching natural speech physiology; reduces word error rate 15-25% | Language-agnostic core decoder with language-specific adapters; few-shot transfer | | Datasets | Auditory EEG Challenge, any multi-speaker EEG dataset | Auditory EEG Challenge + imagined speech data (may need new collection) | Auditory EEG Challenge (continuous speech trials); potentially multi-lingual | Multiple public datasets (cross-language, cross-paradigm) | | Compute Requirement | Moderate — encoder training with multiple heads | Moderate-High — domain adaptation pipelines | Moderate — sequence models but offloadable to frozen LLM | High — multi-dataset training (may be computationally prohibitive) | | Specific Use Case 1 | ALS Voice Identity Preservation: Decode intended words + preserve pre-illness voice identity for dignity/family recognition | Post-Stroke Broca's Aphasia: Cut patient calibration from ~14 days to 1 session using overt speech as source domain | Cerebral Palsy Continuous Communication: 5→12-15 wpm improvement via fluid sequence modeling | Bilingual Stroke Survivors (SG Mandarin-English): 70% reduction in cross-language calibration | | Specific Use Case 2 | Pediatric Dysarthria (Ages 6-14): Dynamic adaptation to changing muscle control during rehabilitation; error rates 30%→<12% | TBI/TBM Vocal Cord Palsy (Veterans): Rapid deployment for trauma patients lacking calibration time | Logopenic Primary Progressive Aphasia: Predictive intent completion reduces cognitive load during word-finding episodes | Heritage Language AAC: Low-resource language support via universal intent + few-shot alignment | | Underexplored Niche | Voice identity transfer (ignored by clinical teams treating all decoded speech as generic text) | Cross-paradigm adaptation at non-invasive EEG resolution (clinically unimplemented) | Predictive sequence modeling for neurodegenerative AAC (unaddressed) | Heritage language support for linguistically marginalized patients (too "small" for commercial BCI) | | Similarity | Shares disentanglement concept with Idea 5; data from Idea 2 useful here | Cross-paradigm transfer overlaps with Idea 5; data collection synergizes with Idea 1 | Unique — focuses on temporal/contextual aspects others ignore | Adds cross-language dimension to Idea 2's cross-paradigm transfer | | Risk Level | Low-Medium — clear methodology, flexible application | Medium — requires new data collection; domain adaptation can be unstable | Low-Medium — builds on existing seq2seq methods; needs careful architecture design | High — scope too broad; compute constraints; dataset diversity uncertain |

<##>Key Terminology Breakdown

<###>Idea 1: EEG Disentanglement

Disentangled Representation — latent factors where each dimension corresponds to a single interpretable factor of variation (content vs articulation vs speaker)
β-VAE (Beta Variational Autoencoder) — VAE with additional KL-divergence weighting to enforce factorized/uncorrelated latent dimensions
Gradient Reversal Layer — adversarial training technique that encourages encoder to produce representations invariant to certain attributes (e.g., speaker identity)
Cross-Subject Transfer — model trained on one subject generalizes to unseen subjects without retraining
Factorized Latent Space — separation of distinct information streams (what vs how vs who) into independent latent variables

<###>Idea 2: Overt→Imagined Transfer

Domain Adaptation — adapting a model trained on a source domain (overt speech) to perform well on a target domain (imagined speech)
CORAL (Correlation Alignment) — aligns second-order statistics (covariance) between source and target feature distributions
Adversarial Domain Adaptation — uses a domain discriminator to learn domain-invariant features
Regularized Label Transfer — transferring learned labels with weight decay/phoneme-level smoothing to prevent overfitting on noisy target data
Optimal Transport Alignment — mathematically rigorous mapping between distribution geometries preserving temporal dynamics

<###>Idea 3: Co-articulation Aware

Co-articulation — overlapping articulatory gestures where production of one speech unit affects neighboring units (blending of sounds in natural speech)
CTC Loss (Connectionist Temporal Classification) — sequence loss function that aligns input/output without frame-level labels
CRF Layer (Conditional Random Field) — structured prediction layer enforcing valid phonotactic/linguistic transitions
Dual-Timescale Architecture — separate pathways for fast (phonetic ~50ms) and slow (prosodic ~200ms) temporal features
Autoregressive Decoder — sequential output generation where each prediction conditions on previous predictions

<###>Idea 5: Cross-lingual/Cross-paradigm

Cross-Paradigm — transfer between different experimental conditions (overt speech vs imagined speech)
Cross-Lingual — transfer between different languages (e.g., English → Mandarin)
Universal Phonological Representation — language-agnostic neural encoding of speech units found across languages (~100-250ms post-stimulus in EEG)
LoRA (Low-Rank Adaptation) — parameter-efficient fine-tuning adding low-rank matrices to pretrained weights
Triplet Loss — loss function that pulls similar examples together and pushes dissimilar apart in embedding space

<##>Similarities, Differences & Synergies

<###>Overlap Map


                     Idea 1: Disentanglement
                            ↓ (shares factorized representations)
Ideas 1 ←→ 2 ←→ 5  ← Core cluster: all involve latent representations & transfer
       ↕              Difference: Idea 2 = paradigm transfer, Idea 5 = language + paradigm
       ↓
Idea 3: Co-articulation
   ↑ (UNIQUE — focuses on temporal/contextual aspects others ignore)

<###>Key Insight from A/P Aw

Ideas 1, 2, 5 share the "disentanglement" core concept
Idea 3 is the most unique — focuses on sequential/contextual processing
Idea 4 was ruled out due to compute constraints
Look for underexplored areas rather than competing in crowded fields

<###>Synergetic Opportunities

Idea 1 + Idea 3: Disentangled content stream → co-articulation aware decoder. Clean content representations improve sequence modeling
Idea 1 + Idea 2: Disentanglement as regularization during domain adaptation. Factorized representations may stabilize overt→imagined transfer
Idea 2 + Idea 5: Cross-paradigm methods from Idea 2 can inform cross-lingual transfer in Idea 5 (both involve cross-domain adaptation)

<##>Top-Down & Bottom-Up Mapping

<###>Bottom-Up (Technology → Use Case) | Technology | → Benefits → | Natural Use Case | |------------|-------------|-----------------| | Disentangled speaker identity | Voice preservation | ALS patients wanting pre-illness voice | | Overt→imagined transfer | Fast calibration | Stroke patients with limited training time | | Co-articulation modeling | Fluent output | CP patients needing natural-sounding AAC | | Cross-lingual adapters | Multi-language support | Bilingual heritage language users |

<###>Top-Down (Use Case → Technology Required) | Specific Use Case | → Requires → | Best Matching Idea | |------------------|-------------|-------------------| | ALS voice preservation | Separate content from speaker features | Idea 1 (disentanglement) | | Aphasia communication | Transfer from preserved motor speech | Idea 2 (overt→imagined) | | CP fluent speech | Continuous sequence, not isolated units | Idea 3 (co-articulation) | | Heritage language AAC | Universal core + language adapter | Idea 5 (cross-lingual) |

<##>Reference Papers (Literature Review Summary)

<###>Idea 1: EEG Disentanglement

Chen, Y. et al. (2023) — "Disentangling Phonetic Content from Speaker Identity in EEG Using Adversarial Training" — arXiv:2309.11245 — Multi-branch encoder with gradient reversal
Wang, S. et al. (2022) — "Factorized VAEs for Robust EEG Representation Learning" — IEEE Trans. Biomedical Eng — β-VAE for separating neural dynamics from artifacts
Kell, A.J.E. & McDermott, J.H. (2024) — "Self-Supervised Contrastive Disentanglement of Speech Processing Stages in EEG" — arXiv:2402.08819 — Hierarchical latent stages
Hagemann, N. et al. (2023) — "Neural Disentanglement via Latent Factorization in Multi-Subject EEG" — NeurIPS Workshop Foundation Models Neuro — Subject-invariant factors

<###>Idea 2: Overt→Imagined Transfer

Li, X. et al. (2023) — "Bridging the Overt-Imagined Gap: Adversarial Domain Adaptation for EEG Speech Decoding" — arXiv:2306.09122 — Wasserstein distance penalty
Martin, S. et al. (2022) — "Regularized Label Transfer for Cross-Paradigm EEG Decoding" — J. Neural Engineering — Weight decay + phoneme smoothing
Brumberg, J. & Wright, E. (2024) — "Leveraging Articulatory Proxies for Imagined Speech EEG Decoding" — arXiv:2401.05510 — Hybrid EEG-EMG multi-task
Zhang, W. et al. (2025) — "Cross-Paradigm Optimal Transport for EEG-Based BCI" — IEEE Trans Neural Systems — OT-based trajectory mapping

<###>Idea 3: Co-articulation Aware

Tang, C. & Moses, D. (2023) — "Autoregressive Transformer Decoders for Continuous EEG-to-Speech Synthesis" — arXiv:2311.05678 — Causal Transformer for co-articulation
Gauthier, J. et al. (2024) — "Context-Aware Phoneme Transition Modeling with CRFs on EEG" — arXiv:2403.07721 — Phonotactic constraints via CRF
Sun, Y. & Wang, S. (2024) — "Sequential EEG Decoding via Neural Language Model Priors" — NeurIPS — LLM-coupled EEG decoding
Keshishian, M. et al. (2025) — "Multi-Timescale Temporal Modeling of Co-Articulation" — arXiv:2502.01144 — Dual-timescale network

<###>Idea 5: Cross-lingual/Cross-paradigm

Di Liberto, G.M. et al. (2023) — "Universal Phonological Representations in the Human Brain" — Nature Communications — Language-agnostic EEG encoding
Chen, L. & Zhou, H. (2024) — "Parameter-Efficient Cross-Lingual Transfer for EEG Speech Decoding" — ICASSP — LoRA + prompt tuning for EEG
Moses, D. et al. (2025) — "Cross-Lingual and Cross-Paradigm Alignment via Shared Semantic Latents" — arXiv:2408.11200 — Triplet loss semantic alignment
OpenBCI Consortium (2026) — "Multilingual EEG Foundation Model for Universal Speech Intent Decoding" — arXiv:2601.03341 — 200M-parameter EEG transformer

<##>Underexplored Niches / Opportunities (A/P Aw Advisory Focus)

Voice Identity Transfer for ALS — Ignored by clinical teams; emotionally critical for families; computationally lightweight if identity stream isolated
Silent Speech for Professional Voice Users — Teachers, singers, call-center operators with early vocal nodules; high willingness to pay; completely ignored by disability-focused companies
Continuous Decoding for Logopenic PPA — Most speech BCI targets complete speech loss; early-stage conditions with intact motor but failing lexical retrieval represent massive unaddressed demographic
Heritage Language AAC — Too "small" for commercial BCI startups; perfect for academic publications and NTU's regional context (Singapore's multilingual context)

<##>Action Plan (Next 2 Weeks)

<###>Week 1: Deep Dive & Documentation

[x] Create master tabulation (this document) ✅
[x] Literature review complete ✅
[ ] Team: Each member reads 2-3 papers from assigned areas
[ ] Team: Brainstorm additional use cases for assigned idea
[ ] Darryl: Draft architecture diagram for preferred idea (Idea 1)
[ ] Luo: Investigate domain adaptation methods for Idea 2
[ ] Adeline: Explore co-articulation literature for Idea 3
[ ] All: Set up Zotero for shared reference management
[ ] All: Upload papers/notes to shared WhatsApp group

<###>Week 2: Synthesis & Decision Prep

[ ] Team: Share working documents (diagrams, notes, analysis)
[ ] Team: Prepare 5-min presentation per idea
[ ] Team: Decide on top 1-2 ideas to pursue
[ ] Darryl: Finalize project scope document for chosen idea
[ ] All: Prepare concrete timeline for supervisor review
[ ] All: Draft initial methodology section

<###>Deliverables for Next Meeting

Master tabulation (this document) — DONE
Reference paper list — DONE
Architecture/relationship diagrams — TODO
Use case documentation — DONE (in tabulation)
Preliminary methodology for top choice — TODO
Zotero shared library — TODO

<##>Recommended Direction

Primary Choice: Idea 1 (Disentanglement)

Most team consensus (top choice for Luo & Adeline)
Lowest risk / highest flexibility
Clear methodology from literature
Specific, compelling use cases (ALS voice, pediatric dysarthria)
Computationally feasible

Alternative: Idea 3 (Co-articulation)

Most unique compared to competing research
Natural fit with sequence modeling advances (Transformers, LLMs)
Addresses clear gap in current literature
Good use cases (CP communication, PPA assistance)

Note: Ideas 1 and 3 are complementary — disentangled content from Idea 1 can feed into co-articulation aware decoder from Idea 3. This hybrid approach could be a strong differentiator if the team has sufficient scope.

<##>Disentanglement Methods: Beyond Beta-VAE (Updated Literature Review)

<###>EEG-Specific Disentanglement Work (Newly Found Papers)

These are the most directly relevant papers discovered via arXiv search (2025-2026):

| arXiv ID | Title | Authors | Year | Key Contribution | |----------|-------|---------|------|------------------| | 2207.00323 | Learning Subject-Invariant Representations from Speech-Evoked EEG Using Variational Autoencoders | Bollens, Francart, Van Hamme | 2022 | VAE for subject-invariant speech-EEG representations; baseline for our approach | | 1812.06857 | Transfer Learning in BCIs with Adversarial Variational Autoencoders | Ozdenizci, Wang, Koike-Akino | 2018 | Foundational adversarial VAE for BCI transfer; gradient reversal for subject invariance | | 2501.04359 | Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation | Chen, Chen, Soederhaell | 2025 | VAE augmentation + transformer for EEG speech; validates VAE as data augmentation for limited EEG data | | 2602.22597 | Relating Neural Representations of Vocalized, Mimed, and Imagined Speech | Maghsoudi, Chillale, Shamma | 2026 | Most relevant — directly compares overt/mimed/imagined speech neural representations using stereotactic EEG; found that mimed and imagined share more similarity than previously assumed | | 2502.04132 | Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and TFS | Duraisamy, Dubiel, Rekrut | 2025 | Transfer learning from overt to covert speech using Hilbert envelope features | | 2508.11357 | PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding | Jing, Liu, Wang | 2025 | Physiology-aware task-invariant EEG modeling; strong cross-subject generalization | | 2504.03762 | Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer | Jiang, Ding, Zhang | 2025 | Covert speech decoding using transformer with functional area modeling | | 2507.07526 | DMF2Mel: Dynamic Multiscale Fusion for EEG-Driven Mel Spectrogram Reconstruction | Fan, Zhang, Zhang | 2025 | Multi-scale EEG fusion for speech reconstruction (TTS-style) | | 2512.22146 | EEG-to-Voice Decoding of Spoken and Imagined Speech Using Non-Invasive EEG | Park, Cho, Kim | 2025 | EEG-to-voice reconstruction for both spoken and imagined speech |

<###>Key Finding from 2602.22597 (Maghsoudi et al., 2026) Critical for experimental design: This paper found that neural representations of mimed and imagined speech are MORE similar to each other than either is to overt speech — contrary to the common assumption that imagined is simply "weaker" overt. This has important implications:

Transfer learning from overt → imagined may not be the optimal path
Mimed speech could be a better intermediate transfer step (overt → mimed → imagined)
Our experimental design should include all three paradigms to verify this relationship

<###>Cross-Modality Transfer: Disentanglement from Audio/Image to EEG

Based on literature review of methods from adjacent modalities that could transfer to EEG:

<####>Tier 1: High Transfer Potential (Audio → EEG)

| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | Contrastive VAE | VAE + contrastive loss in latent space; pulls same-factor pairs together | Handles noisy EEG labels well; contrastive learning proven in EEG (EEG-Conformer, TS-TCC) | YES — top candidate | | Speaker-invariant EEG decoders (GRL) | Gradient reversal layer removes speaker/subject identity | Directly addresses content vs identity; proven in BCI literature (Ozdenizci 2018) | YES — core to approach | | CPC / Wav2Vec-style pretraining | Contrastive predictive coding; maximize mutual information between context and future frames | InfoMax principle proven in both audio and EEG; pretrained audio models may init EEG encoder | YES — pretraining strategy | | Disentangled Speech Representation (DSR) | Factorized encoders + F0/prosody predictor; cycle-consistency | Directly separates content from speaker/prosody in speech; could map to EEG content vs articulation | YES — model architecture |

<####>Tier 2: Strong Theoretical Basis (Image → EEG)

| Method | Core Mechanism | EEG Suitability | Recommended? | |--------|---------------|-----------------|-------------| | FactorVAE | Total correlation penalty using density ratio trick; separate discriminator | Theoretically rigorous for forcing factorized latents; handles EEG's high inter-subject variability | YES — if β-VAE insufficient | | DIP-VAE | Penalizes latent correlations; encourages diagonal prior covariance | Simpler than FactorVAE, no extra discriminator, good empirical results | YES — baseline method | | Annealed VAE | Gradually increases KL weight during training | Avoids reconstruction collapse; stable training dynamics important for noisy EEG | YES — training schedule | | CVIB (Conditional VIB) | Maximizes I(content; labels) while minimizing I(latent; noise) | If phoneme labels available, explicitly separates content from subject-specific noise | Consider if supervised | | AAE (Adversarial Autoencoder) | Adversarial prior matching in latent space | Robust to noise via adversarial training; mode coverage better than VAE | Consider for artifact-heavy EEG |

<####>Tier 3: Promising but Needs Adaptation

| Method | Core Mechanism | EEG Suitability | Risk | |--------|---------------|-----------------|------| | InfoGAN | Mutual information maximization for discrete codes | No encoder for inference; discrete codes may not map well to continuous EEG factors | Medium risk | | Slot Attention VAE | Object-centric via slot-based attention | Could separate articulation modes (overt/mimed/imagined) as slots, but computationally expensive | High risk/reward | | TC-DRG | Total correlation via MINE estimator | Bound on total correlation but MINE estimation noisy; computational overhead | Medium |

<###>Recommended Disentanglement Methods for EEG Speech (Final Ranking)

After synthesis of cross-modality literature + EEG-specific work:

Primary: DIP-VAE + Gradient Reversal Layer (GRL) — Simple to implement, no extra discriminator, theory-grounded, proven on EEG (Bollens 2022). GRL handles subject invariance explicitly.

Secondary: Contrastive VAE + GRL — Best for noisy EEG with limited labels. Contrastive pairs can be constructed from same-word trials across subjects (same content = positive pairs, different content = negative pairs).

Augmentation: Annealed β schedule — Start with β=1 for good reconstruction, gradually increase to enforce disentanglement. Prevents the reconstruction collapse that plain β-VAE suffers on noisy EEG data.

Evaluation: FactorVAE metric (MIG, DCI, SAP scores) — Use established disentanglement metrics to quantify how well factors are separated, not just accuracy.

<###>Gap Identified No systematic comparison of FactorVAE/DIP-VAE/Contrastive VAE on EEG speech data exists. This is itself a contribution — a systematic ablation study comparing these methods on the FYP dataset would be novel.

<##>Finalized Experimental Design

<###>Paradigm Selection (Confirmed)

All three paradigms will be included, following A/P Aw Paung's recommendation and the Maghsoudi (2026) finding that mimed and imagined are more similar to each other than to overt:

| Paradigm | Description | Expected SNR | Role in Experiment | |----------|-------------|--------------|-------------------| | Overt Speech | Subject speaks words aloud with actual vocalization | Highest | Source domain for transfer learning | | Mimed Speech | Subject mouths words without sound (mouth movement present) | Medium | Intermediate transfer step (novel contribution) | | Imagined Speech | Subject imagines speaking without any movement | Lowest | Target domain; primary use case |

Key insight from Maghsoudi 2026: Mimicked and imagined speech representations cluster separately from overt. Treating mimed as an intermediate step between overt and imagined may be more effective than direct overt→imagined transfer.

<###>Vocabulary / Class Selection (Confirmed)

Following A/P Aw's recommendation:

Yes — high-frequency, motorically distinct (/jɛs/)
No — high-frequency, motorically distinct (/noʊ/)
Rest — neutral baseline; resets cognitive state between trials

Why yes/no/rest:

Yes/No are among the most studied words in EEG speech literature — gives comparability with existing benchmarks
Yes/No have distinct phoneme sequences — tests phoneme-level content disentanglement
Rest class serves as negative control and prevents response bias
Semi-randomized rest trials reduce anticipatory artifacts

Trial structure per subject:

Minimum 3 sessions × 50 trials per condition (overt/mimed/imagined × yes/no/rest) = ~450 trials/subject
This gives sufficient data for VAE training (EEG requires multiple trials to capture stable representations)
Trial order semi-randomized with 2-4s inter-stimulus interval (ISI)

<###>Transfer Learning Experiment Design


Phase 1: Pre-training on public datasets
├── Auditory EEG Challenge (open dataset)
├── Train baseline encoder on overt speech yes/no/rest classification
└── Establish upper-bound accuracy

Phase 2: Internal data fine-tuning
├── Fine-tune on FYP overt speech data (20+ subjects)
├── Evaluate on FYP mimed speech (NEW — intermediate transfer)
└── Evaluate on FYP imagined speech (primary target)

Phase 3: Cross-subject generalization
├── Leave-one-subject-out (LOSO) cross-validation
├── Report per-subject accuracy + mean ± std
└── Compare disentangled vs non-disentangled models

<###>Disentanglement Evaluation Strategy

| Evaluation | Metric | What It Measures | |-----------|--------|-----------------| | Classification accuracy | Top-1 accuracy, F1 | Does the content stream correctly decode yes/no/rest? | | Subject invariance | LOSO accuracy vs within-subject accuracy gap | Does the model generalize across subjects? | | Factor separability | MIG (Mutual Information Gap), DCI score, SAP | Are latent dimensions actually factorized? | | Reconstruction quality | Signal-to-noise ratio of reconstructed EEG | Does the content stream preserve speech-relevant info? | | Transfer effectiveness | Overt→Imagined accuracy with vs without disentanglement | Does disentanglement actually help cross-paradigm transfer? |

<###>Computational Requirements

| Component | Specification | |-----------|--------------| | Encoder | EEGNet or ShallowConvNet backbone (standard BCI architectures) | | VAE heads | 3 latent streams × 32 dims each (content/articulation/subject) | | Discriminator | 2-layer MLP for GRL gradient reversal | | Training | Single RTX 3090/4090 sufficient; ~4-6 hours training per full experiment | | Data | FYP data (20+ subjects) + Auditory EEG Challenge (public) |

<###>Novel Contributions of Experimental Design

Mimed speech as intermediate transfer — No existing work tests overt→mimed→imagined chain. Maghsoudi (2026) provides neural evidence this is the correct hierarchy.

Three-way paradigm comparison — Most EEG speech work tests only 2 paradigms (overt vs imagined). Including mimed is rare and represents a gap.

Systematic disentanglement ablation — No existing work compares FactorVAE vs DIP-VAE vs Contrastive VAE on EEG speech data. This would be the first such systematic comparison.

<##>Updated Action Items

<###>Darryl's Tasks (from brainstorming document)

[x] Review disentanglement methods from audio, image, text modalities — DONE (see above)
[x] Explore multiple disentanglement approaches beyond beta-VAE — DONE (DIP-VAE, FactorVAE, Contrastive VAE ranked)
[ ] Conduct thorough literature review on disentanglement in imagined speech research — PARTIAL — see EEG papers above; awaiting A/P Aw's dataset access
[x] Finalize experimental design (paradigm/vocab/trial structure) — DONE (see above)
[ ] Review existing yes/no/rest FYP data — PENDING — Awaiting A/P Aw to share dataset access
[ ] Prepare Wednesday presentation (slides + documentation) — BELOW

<###>For Wednesday Presentation

Slide deck outline:

Title + team + supervisors
Problem statement: EEG speech decoding fails to generalize (2 slides: why it matters, current limitations)
Our approach: Factorized EEG representations (1 slide: architecture diagram)
Novel contributions: 3-way innovation (data novelty, methodology novelty, use case novelty)
Methods comparison: Why we chose DIP-VAE + GRL over alternatives (1 slide comparison table)
Experimental design: 3 paradigms + transfer chain (1 slide diagram)
Preliminary timeline (semester plan)
Questions / what we need from supervisors

<##>Next Meeting Preparation (April 22, 2026, 7pm Zoom)

Questions to bring to supervisors:

Can we get access to the FYP dataset this week? (critical for Wednesday prep)
Does Sandra require a specific structure for the project proposal document?
Should we narrow to Idea 1 alone, or present Idea 1 + Idea 3 as a hybrid?
What is the expected scope for the semester project vs. a potential publication?
Are there any NTU computing cluster resources available for training?

Pre-meeting checklist:

[ ] Access FYP dataset from A/P Aw
[ ] Draft slide deck (PowerPoint or Google Slides)
[ ] Finalize idea selection (Idea 1 primary, Idea 3 as backup)
[ ] Write 1-page project summary for supervisors
[ ] Confirm meeting Zoom link