Skip to content

Interactive Data Preparation

Everything described in the preceding pages—filtering, referencing, artifact rejection, ICA decomposition—happens automatically. The pipeline makes reasonable default decisions at every stage. But reasonable defaults are not the same as correct decisions for this recording, this patient, this clinical question. A variance-based heuristic might flag a channel as bad when the unusual activity is genuine pathology. ICLabel might classify a component as brain when a clinician can see, at a glance, that the topography screams eye movement.

The interactive data preparation step is where the clinician takes ownership of the signal. It sits between automated preprocessing and automated analysis—the human checkpoint in an otherwise algorithmic pipeline. Nothing proceeds to spectral analysis, connectivity, or source localization until the clinician reviews the preprocessing results, makes adjustments, and signs off.

The data preparation interface centers on a full-recording waveform viewer—the entire EEG rendered as scrollable, zoomable channel traces. The display order follows clinical convention (anterior to posterior, left hemisphere first), matching the layout neurotherapists expect from WinEEG and similar tools.

The recording loads with pipeline-standard filtering already applied: 0.5 Hz high-pass, 45 Hz low-pass, 60 Hz notch. These are display filters for the preparation workflow, not the analysis filters—the raw data is preserved separately and never modified. What you see is what the automated pipeline would analyze if you changed nothing and signed off immediately.

The first thing the system does automatically is detect the amplifier settling artifact—the initial voltage swing that occurs when the EEG amplifier stabilizes after power-on. This transient is identified by scanning the first ten seconds for RMS amplitude that exceeds the stable baseline, and the affected segment is pre-excluded. You can adjust or remove this exclusion if you disagree.

A bad channel is one where the electrode contact has failed—high impedance, bridging, or physical displacement. The signal from a bad channel doesn’t represent cortical activity; it represents noise that will contaminate every analysis that includes it.

The application offers an auto-detection heuristic based on channel variance: channels whose variance falls far outside the distribution of their neighbors are flagged as candidates. This is a starting point, not a verdict. The heuristic cannot distinguish between a genuinely noisy electrode and a channel that shows unusual activity because the underlying cortex is doing something unusual. A temporal channel with high variance might be picking up muscle artifact—or it might be capturing legitimate high-amplitude beta in a patient with anxiety. The clinician makes the call.

Channels marked as bad are interpolated during sign-off using spherical spline interpolation—their values are reconstructed from the surrounding electrodes’ spatial pattern. This preserves the channel count for downstream analysis while removing the contaminated signal. Bad channel decisions are recorded in the provenance log.

Not all of a recording is worth analyzing. The patient may have been restless for the first minute. A door slam at the three-minute mark may have caused a startle response. The last thirty seconds may show drowsiness artifacts as attention fades.

The segment tools let you draw directly on the waveform to define two kinds of regions:

Selected segment sets the time window for analysis. Everything outside this window is ignored. If you don’t set a selection, the full recording is used.

Excluded segments mark specific time regions within the selection as unusable. These become MNE BAD_manual annotations—downstream analysis stages skip them when extracting epochs or computing spectra.

As you draw selections and exclusions, the interface shows feasibility estimates: how much clean data remains, and whether it meets the minimum durations for spectral analysis (which needs sustained segments for reliable PSD estimation) and connectivity analysis (which needs enough epochs for stable phase-lag statistics).

Once you’re satisfied with the channel and segment decisions, the next step is ICA. Clicking the Fit ICA button runs the same Picard decomposition described in the ICA documentation—extended mode, variance-based component count, 1.0 Hz high-pass copy for fitting stability—but applied to the data as you’ve prepared it. Bad channels are interpolated on a temporary copy before fitting so they don’t contaminate the decomposition. Common average reference is applied. The fit typically completes in seconds.

After fitting, every component is automatically classified by ICLabel and all component contributions are pre-computed. This pre-computation is what makes the interactive review feel instantaneous—when you hover over a component, the system doesn’t need to recompute anything. It just slices into a cached matrix.

The ICA browser splits the screen: waveform on the left, component grid on the right. Each component in the grid shows a thumbnail topomap, a spectral profile, its ICLabel classification badge, and the percentage of total signal variance it explains. Components are sorted by variance—the most influential components appear first, because those are the ones where a wrong decision costs the most.

This layout serves a specific clinical purpose. The waveform shows you the signal. The component grid shows you the sources. The interaction between them—hovering, toggling, inspecting—lets you build an intuition for what each source contributes to the signal you’re looking at.

Hover over any component in the grid and its contribution appears as an overlay on the waveform—a lighter trace drawn on top of the existing channels, showing exactly how much of the signal at each electrode comes from that single independent source.

This is not a simulation or an approximation. Each component’s contribution is computed directly from the ICA decomposition: the mixing matrix column for that component multiplied by its source activation time series. The result is the component’s footprint in channel space—what would disappear from the signal if you removed it.

The overlay updates as you scroll through the recording. A component that looks like pure artifact in one segment might show brain-like characteristics in another. The hover overlay lets you see the whole story before making a decision.

Click a component to open the full diagnostic panel. Four pieces of evidence are presented together:

Topomap shows the spatial distribution of the component—which electrodes it loads on most heavily. Eye components have frontal topography. Muscle components cluster at the periphery. Brain components show the smooth, focal patterns you’d expect from a cortical dipole.

Spectrum shows the frequency content of the component’s time course. Brain components have identifiable spectral peaks (alpha, beta). Muscle components show broadband high-frequency power without clear peaks. Line noise shows a sharp 60 Hz spike.

ICLabel probabilities show the classifier’s confidence across all seven categories—brain, muscle, eye, heart, line noise, channel noise, and other. A component with 0.92 brain probability is almost certainly neural. One with 0.45 brain and 0.35 muscle is the kind of borderline case that requires your judgment.

Time course shows the component’s activation over the full recording. Eye blink components show regular, large-amplitude deflections. Cardiac components show the QRS rhythm. Brain components show the complex, aperiodic fluctuation that characterizes cortical activity.

You can navigate between components with arrow keys and toggle each component’s exclusion status from the detail view. The waveform updates immediately when you change a decision.

When you toggle a component to “excluded,” the waveform doesn’t just dim that component’s overlay—it recomputes the entire signal from scratch using only the components you’ve kept. This is the EEGLAB-style reconstruction approach, and it’s worth understanding because it differs from some other implementations.

The standard MNE approach to ICA cleaning is subtractive: reconstruct all components, then subtract the rejected ones from the original signal. This leaves the original signal plus the PCA residual (the variance not captured by ICA) intact, minus the artifact components.

The Coherence Workstation uses a different method—the same one EEGLAB uses. Instead of subtracting bad components from the original, it reconstructs the signal exclusively from the kept components. Mathematically: multiply the mixing matrix columns for kept components by their corresponding source activations. The cleaned signal is the sum of what you decided to keep. Nothing else.

Why does this distinction matter? Two reasons. First, the clinician sees exactly what will be analyzed—no hidden residual, no leftover variance from dimensions that didn’t decompose cleanly. The signal on screen is the signal that goes to spectral analysis. Second, if you exclude every component (which you’d never do in practice, but the logic should be clean), you get zeros—not noise. The reconstruction is honest about what’s left.

The practical effect: when you exclude an eye blink component, the frontal channels visibly flatten where the blinks were. When you exclude a muscle component, the temporal channels lose their high-frequency fuzz. You can see the cleaning happening, channel by channel, in real time.

Interactivity matters here. If toggling a component required a multi-second recomputation, clinicians would stop exploring. The system stays responsive through pre-computation: after ICA is fitted, the mixing matrix and all source activations are computed once and cached in memory. When you change which components are excluded, the reconstruction is a single matrix multiplication—kept columns of the mixing matrix times kept rows of the source matrix. On a 19-channel, 5-minute recording, this takes milliseconds.

The display pipeline runs in a specific order that matters for accuracy: raw working data → ICA reconstruction from kept components → display filtering → montage application → scaling to microvolts. ICA subtraction happens before display filtering so that the filter operates on the cleaned signal, not on artifacts that are about to be removed.

When you’re satisfied with the preparation—channels reviewed, segments selected, ICA components classified—you sign off. This is a deliberate, explicit action, not something that happens automatically when you close the window.

Sign-off triggers a processing sequence: bad channels are interpolated via spherical splines, average reference is re-applied to the interpolated data, ICA exclusions are applied, the selected segment is cropped, and excluded segments are annotated. The result is saved as a preprocessed .fif file that the analysis pipeline reads as its input.

Critically, sign-off also writes a provenance record: who reviewed the data, when, which channels were marked bad, which segments were excluded, which ICA components were kept and rejected, and the ICLabel probabilities for every component. This is the audit trail. If a colleague questions a finding in the clinical report, the provenance record shows exactly what cleaning decisions produced the analyzed signal.

The sign-off is per-condition. If a session includes both eyes-open and eyes-closed recordings, each condition goes through its own data preparation and sign-off. If ERP data is present, it chains to an epoch review workflow after the resting conditions are signed off.

Artifact Subspace Reconstruction can also be run interactively during data preparation, with the same parameters described in the Artifact Rejection page. There is one important constraint: ASR modifies the underlying data that ICA was fitted on. If you run ASR after fitting ICA, the decomposition is no longer valid—the unmixing matrix was computed on pre-ASR data and cannot be meaningfully applied to post-ASR data. The application warns you and requires re-fitting ICA if you apply ASR after a fit.

The recommended workflow is to review the raw data first, apply ASR if needed for large transient artifacts, and then fit ICA on the ASR-cleaned signal. This matches the two-layer architecture: ASR handles the gross contamination, ICA separates the subtler ongoing sources.

Data preparation is a cleaning step, not an analysis step. It does not compute spectra, connectivity, or any clinical metrics. It does not generate report content. Its sole output is a clean, signed-off .fif file with an accompanying provenance record.

This separation is deliberate. Cleaning decisions should be made without being influenced by their downstream effects. If a clinician could see that excluding a particular component changes a z-score from abnormal to normal, the temptation to rationalize keeping that component would be difficult to resist. By separating preparation from analysis, the pipeline ensures that cleaning decisions are made on signal quality grounds alone.

The analysis pipeline reads the .fif file and provenance record, applies the same processing stages to every recording, and produces the stage outputs that the dashboard visualizes. Every cleaning decision propagates forward—and every decision is traceable back to this step.