EEG Time‑Frequency Images Predict SSRI and rTMS Response

03/10/2026

Authors describe a pretreatment resting-EEG deep-learning pipeline that converts baseline recordings into time–frequency EEG images and evaluates models that classify SSRI and rTMS responder versus non-responder status.

Performance is presented at both the image level (treating each epoch-derived image as a sample) and the subject level (aggregating across images from the same participant), emphasizing that results depend on the unit of analysis. Overall, the work is framed as predicting depression-therapy response using only pre-treatment, resting EEG.

Across the two datasets, the paper describes 5-minute eyes-closed resting-state EEG acquired with 19 channels, followed by preprocessing with a 0.5–70 Hz band-pass filter and a 50 Hz notch filter. Artifact mitigation is reported using EEGLAB procedures and Multiscale Principal Component Analysis (MSPCA) denoising, with segmentation into non-overlapping 15-second epochs. From each channel and epoch, the authors generate three CNN-ready image modalities: continuous wavelet transform (CWT) scalograms; variational mode decomposition (VMD)–derived spectrogram images created by decomposing each epoch into K=20 intrinsic mode functions and accumulating their short-time Fourier transform representations; and pixel-wise fused images produced by averaging normalized CWT and VMD images.

For classification, the study fine-tunes multiple pretrained architectures end-to-end on the generated images, including ResNet-18, MobileNet-V3, EfficientNet-B0, and a TinyViT-Hybrid model. In the image-level evaluation, the top SSRI result is reported with ResNet-18 trained on CWT images (99.43% image-level accuracy), while the top rTMS result is reported with ResNet-18 trained on VMD images (98.77% image-level accuracy). The authors also compare model families and image modalities (including the fused representation) to describe how performance varies by representation and network. In their reported results, the most discriminative time–frequency representation differed between SSRI and rTMS.

Beyond image-based testing, the authors describe a stricter generalization assessment using subject-independent 6-fold cross-validation, where all images from a participant are held out together and a subject label is assigned by majority voting across that participant’s images. Under this subject-level approach, the paper reports accuracies of 82.50% for SSRI and 83.53% for rTMS, reflecting a different evaluation granularity than image-level discrimination. The study presents this split as approximating performance on previously unseen individuals rather than additional images from known subjects. In the authors’ results, subject-level generalization was more modest than image-level classification.

To explore spatial contributions, the article reports per-channel analyses in which occipital electrodes (including O1 and O2) are described as most informative for SSRI prediction when using CWT-derived images, whereas frontotemporal channels are described as most informative for rTMS prediction when using VMD-derived images; any neurophysiologic alignment is presented as the authors’ interpretation. The authors note these channel-level patterns as potentially relevant when considering reduced-channel configurations, while treating the finding as exploratory within the reported experiments.

The discussion also highlights author-stated limitations and next steps, including the lack of an external independent validation cohort and proposals for replication in additional cohorts and added interpretability/explainability methods (e.g., Grad-CAM) to clarify which signal features drive model decisions. The paper closes by emphasizing continued development and validation of EEG-based deep-learning approaches for therapy-response prediction.

Key Takeaways:

The authors report a therapy-specific pattern in which the best-performing time–frequency representation differed for SSRI versus rTMS (CWT versus VMD in their best configurations).
Results are presented at two levels—image-based discrimination and a stricter subject-level generalization approach—highlighting how evaluation depends on whether epochs or participants are treated as samples.
Channel-level analyses are described as showing occipital prominence for SSRI (under CWT) and frontotemporal prominence for rTMS (under VMD), alongside author-stated next steps focused on external validation and improved interpretability.

EEG Time‑Frequency Images Predict SSRI and rTMS Response

NEED HELP?

Contact us

NEED HELP?

Contact us

Title

Share on ReachMD