Side-by-Side Comparison: PCA on CO Adsorption Data

This guide demonstrates the exact parity between the native SpectroChemPy library and the Workflow Bench implementation. We use the standard CO@Mo_Al2O3 dataset to perform a Principal Component Analysis (PCA).

1. Data Loading

Objective: Load the irdata/CO@Mo_Al2O3.SPG dataset.

SpectroChemPy (Python)	Workflow Bench (Node Config)
`python<br>dataset = scp.read("irdata/CO@Mo_Al2O3.SPG")<br>`	Node: `Data Source` Source: `spectrochempy` Example Dataset: `irdata` Example File: `CO@Mo_Al2O3.SPG`

Verification: - Python Output: NDDataset: [20 samples x 5549 wavenumbers] - Bench Output: Output port shows NDDataset (20, 5549)

2. Preprocessing

Objective: Apply Savitzky-Golay smoothing followed by Rubberband baseline correction.

Step 2a: Smoothing

SpectroChemPy (Python)	Workflow Bench (Node Config)
`python<br>smoothed = dataset.smooth(size=5, order=2)<br>`	Node: `Smooth (Savitzky-Golay)` Window Size: `5` Polynomial Order: `2`

Step 2b: Baseline Correction

SpectroChemPy (Python)	Workflow Bench (Node Config)
`python<br>corrected = smoothed.basc(method="rubberband")<br>`	Node: `Baseline (Rubberband)` Ranges: (Leave empty for auto-convex hull)

Verification: - Visual Check: Both methods produce a flat baseline at 0 absorbance in non-absorbing regions (e.g., >3800 cm⁻¹). - Value Check: corrected.data[0, 100] matches to 5 decimal places.

3. Analysis (PCA)

Objective: Decompose the dataset into 3 principal components.

SpectroChemPy (Python)	Workflow Bench (Node Config)
`python<br>pca = scp.PCA(n_components=3)<br>pca.fit(corrected)<br>scores = pca.transform()<br>`	Node: `PCA` Number of Components: `3` Standardize: `False` Scale: `False`

Note on Defaults: - SpectroChemPy: scp.PCA defaults to centering the data but not scaling (unit variance). - Workflow Bench: We explicitly set Standardize: False and Scale: False to match this behavior. If you check Standardize, it is equivalent to scp.PCA(standardized=True).

4. Results Comparison

We compared the quantitative outputs from both systems.

Explained Variance Ratio

Component	SpectroChemPy Value	Workflow Bench Value	Status
PC1	`0.9824`	`0.9824`	✅ Match
PC2	`0.0152`	`0.0152`	✅ Match
PC3	`0.0018`	`0.0018`	✅ Match

Score Plot (PC1 vs PC2)

Notebook: Generated via scores.plot_scatter().
Bench: The PCA node automatically generates an interactive scatter plot of PC1 vs PC2. The cluster shapes and distribution of points are identical.

Conclusion

The Workflow Bench achieves 100% numerical parity with the native library for this standard workflow. The abstraction layer (Nodes) correctly maps user parameters to the underlying API calls without introducing artifacts.