Node Reference
This reference lists the available processing nodes in the Workflow Builder, including their parameters and default values.
Preprocessing Nodes
Baseline Correction
Baseline (ALS)
- Node Type:
baseline.als - Description: Asymmetric Least Squares baseline correction. Effective for removing broad, varying baselines while preserving peaks.
- Parameters:
lam(number, default:100000): Smoothness parameter. Larger values = smoother baseline. Range: 1e2 - 1e9.p(number, default:0.001): Asymmetry parameter. Smaller values allow the baseline to stay lower (under peaks). Range: 0.0001 - 0.1.
Baseline (Rubberband)
- Node Type:
baseline.rubberband - Description: Convex hull "rubberband" baseline correction.
- Parameters:
ranges(text): Optional spectral ranges to force baseline points (e.g.,'4000:3800, 1800:1700').
Smoothing & Filtering
Smooth (Savitzky-Golay)
- Node Type:
smooth.savitzky_golay - Description: Polynomial smoothing filter. Reduces noise while preserving peak shapes better than moving averages.
- Parameters:
size(number, default:11): Window size. Must be an odd number.order(number, default:2): Order of the polynomial fit.
Cosmic Ray Removal
- Node Type:
preprocess.cosmic_ray - Description: Removes spike-like outliers (cosmic rays) using local median statistics.
- Parameters:
window(number, default:7): Window size for local statistics (must be odd).zscore(number, default:3.0): Z-score threshold for detection. Points > z-score * MAD are replaced.
Normalization & Scaling
Normalize (SNV)
- Node Type:
normalize.snv - Description: Standard Normal Variate. Subtracts mean and divides by standard deviation for each spectrum. No parameters.
Normalize (MSC)
- Node Type:
normalize.msc - Description: Multiplicative Scatter Correction. Corrects for light scattering effects.
- Parameters:
reference(select, default:mean): Reference spectrum to use (mean,median,first).
Normalize (Scale)
- Node Type:
normalize.scale - Description: Scales spectra based on specific criteria.
- Parameters:
method(select, default:max):max,area,minmax.
Autoscaling
- Node Type:
preprocess.autoscaling - Description: Scale to unit variance (mean centering + standardization).
- Parameters:
center(boolean, default:True): Subtract mean before scaling.
Pareto Scaling
- Node Type:
preprocess.pareto_scaling - Description: Scale by square root of standard deviation (chemometrics standard).
- Parameters:
center(boolean, default:True): Subtract mean before scaling.
Center Mean
- Node Type:
preprocess.center_mean - Description: Subtract the mean spectrum from all spectra. No parameters.
Scale to Max
- Node Type:
preprocess.scale_max - Description: Normalize each spectrum to a target maximum value.
- Parameters:
target_max(number, default:1.0): Target maximum absorbance value.
Derivatives
1st Derivative
- Node Type:
derivative.first - Description: First derivative using Savitzky-Golay filter.
- Parameters:
size(number, default:11): Window size.order(number, default:2): Polynomial order.
2nd Derivative
- Node Type:
derivative.second - Description: Second derivative using Savitzky-Golay filter. Useful for resolving overlapping peaks.
- Parameters:
size(number, default:11): Window size.order(number, default:2): Polynomial order.
SG Derivative
- Node Type:
preprocess.sg_derivative - Description: Combined Savitzky-Golay smoothing and derivative.
- Parameters:
size(number, default:11): Window size.order(number, default:2): Polynomial order.deriv(select, default:1): Derivative order (0,1,2).
Advanced Correction
OSC Filter
- Node Type:
preprocess.osc - Description: Orthogonal Signal Correction - remove variation uncorrelated with Y.
- Parameters:
n_components(number, default:1): Number of orthogonal components.tol(number, default:1e-6): Convergence tolerance.max_iter(number, default:100): Maximum iterations.
EMSC
- Node Type:
preprocess.emsc - Description: Extended MSC with polynomial baseline correction.
- Parameters:
reference(select, default:mean): Reference spectrum (mean,median,first).poly_order(number, default:2): Order of polynomial baseline.
Utilities
Clip Range
- Node Type:
preprocess.clip_range - Description: Crops data to a specific wavenumber region.
- Parameters:
min_wavenumber(number, default:400): Lower bound.max_wavenumber(number, default:4000): Upper bound.
Clip Floor
- Node Type:
preprocess.clip_floor - Description: Sets a minimum value floor (e.g., to remove negative absorbance).
- Parameters:
floor(number, default:0.0): Minimum allowed value.
Wavenumber Align
- Node Type:
preprocess.wavenumber_align - Description: Align spectra to a common wavenumber grid via interpolation.
- Parameters:
method(select, default:pchip):pchip,linear,sinc.merge_tolerance(number, default:0.5): Tolerance for merging near-duplicate points.
Time Series Preprocessing
Moving Window
- Node Type:
time_series.moving_window - Description: Slides a window over time series data for batch analysis.
- Parameters:
window_size(number, default:10): Number of consecutive spectra.step_size(number, default:1): Overlap step.aggregation(select, default:none):none,mean,median,std.
Trend Removal
- Node Type:
time_series.trend_removal - Description: Removes systematic trends and drift from time series data.
- Parameters:
method(select, default:linear):linear,polynomial,difference,moving_average.poly_order(number, default:2): For polynomial method.window_size(number, default:5): For moving average baseline.
Modeling Nodes
Decomposition & Analysis
PCA
- Node Type:
model.pca - Description: Principal Component Analysis.
- Parameters:
n_components(text, default:5): Number of components (int), 'mle', or variance ratio (float 0-1).standardized(boolean, default:False): Apply standardization.scaled(boolean, default:False): Apply scaling.
MCR-ALS
- Node Type:
model.mcr_als - Description: Multivariate Curve Resolution - Alternating Least Squares. Resolves mixtures into pure components.
- Parameters:
n_components(number, default:3): Number of pure components.max_iter(number, default:50): Maximum iterations.tol(number, default:0.1): Convergence tolerance.non_negative_C(boolean, default:True): Enforce non-negative concentrations.non_negative_St(boolean, default:True): Enforce non-negative spectra.
EFA
- Node Type:
model.efa - Description: Evolving Factor Analysis. Determines chemical rank of evolving systems.
- Parameters:
n_components(number, default:10): Number of eigenvalues to compute.
Peak Finding
- Node Type:
analysis.peak_finding - Description: Identifies peaks using
scipy.signal. - Parameters:
height(number): Minimum peak height.threshold(number): Minimum vertical distance to neighbors.distance(number, default:10): Minimum horizontal distance (points).prominence(number): Peak prominence.width(number): Expected peak width.
Regression
PLS
- Node Type:
model.pls - Description: Partial Least Squares Regression.
- Inputs:
X(Spectra),y(Concentrations). - Parameters:
n_components(number, default:3): Number of latent variables.scale(boolean, default:True): Auto-scale data.
PCR
- Node Type:
model.pcr - Description: Principal Component Regression (PCA + Linear Regression).
- Inputs:
X(Spectra),y(Targets). - Parameters:
n_components(number, default:3): Number of PCA components.scale(boolean, default:True): Auto-scale data.
SVR
- Node Type:
model.svr - Description: Support Vector Regression.
- Inputs:
X(Spectra),y(Targets). - Parameters:
kernel(select, default:rbf):rbf,linear,poly,sigmoid.C(number, default:1.0): Regularization parameter.epsilon(number, default:0.1): Epsilon-tube width.gamma(select, default:scale): Kernel coefficient.degree(number, default:3): Degree for poly kernel.coef0(number, default:0.0): Independent term.scale(boolean, default:True): Scale data.
Linear Regression
- Node Type:
model.linear_regression - Description: Simple linear regression.
- Parameters:
fit_intercept(boolean, default:True): Calculate intercept.
Classification
PLS-DA
- Node Type:
classification.plsda - Description: Partial Least Squares Discriminant Analysis.
- Inputs:
X(Spectra),y(Classes - optional, auto-extracted from X). - Parameters:
n_components(number, default:2): Number of components.scale(boolean, default:True): Scale data.cv_folds(number, default:5): Cross-validation folds.
- Outputs: Model dict with train/CV predictions, confusion matrices, and classification report.
- Notes: If
Xincludes class labels in its y-axis, you can omityto auto-extract. For train/test workflows, split data withdata.train_test_split, train withX_train(+y_trainif provided), then useclassification.plsda_predictonX_testand feedy_test+y_predintodiagnostics.cross_validationfor a confusion matrix.
Apply PLS-DA Model
- Node Type:
classification.plsda_predict - Description: Apply a trained PLS-DA model to new samples.
- Inputs:
X_new(Spectra),model(PLS-DA model dict). - Outputs:
y_pred(Predicted classes),y_prob(Class probabilities).
KNN Classifier
- Node Type:
classification.knn - Description: K-Nearest Neighbors classification.
- Inputs:
X(Features),y(Classes - optional). - Parameters:
n_neighbors(number, default:5): Number of neighbors (k).weights(select, default:uniform):uniform,distance.metric(select, default:euclidean): Distance metric.cv_folds(number, default:5): Cross-validation folds.
- Outputs: Model dict with train/CV predictions, confusion matrices, and classification report.
- Notes: If
Xincludes class labels in its y-axis, you can omityto auto-extract. For train/test workflows, split data withdata.train_test_split, train withX_train(+y_trainif provided), then useclassification.knn_predictonX_testand feedy_test+y_predintodiagnostics.cross_validationfor a confusion matrix.
Apply KNN Model
- Node Type:
classification.knn_predict - Description: Apply a trained KNN model to new samples.
- Inputs:
X_new(Features),model(KNN model dict). - Outputs:
y_pred(Predicted classes),y_prob(Class probabilities).
SIMCA
- Node Type:
classification.simca - Description: Soft Independent Modeling of Class Analogy.
- Inputs:
X(Features),y(Classes - optional). - Parameters:
n_components(number, default:3): Number of PCs per class.confidence_level(number, default:0.95): For T² and Q limits.
- Outputs: Model dict with predictions, confusion matrix, and classification report.
- Notes: If
Xincludes class labels in its y-axis, you can omityto auto-extract.
Clustering
HCA
- Node Type:
model.hca - Description: Hierarchical Cluster Analysis (Agglomerative).
- Parameters:
n_clusters(number, default:3): Number of clusters.linkage(select, default:ward):ward,average,complete,single.metric(select, default:euclidean): Distance metric.
- Outputs: Cluster labels and 2D embedding for visualization.
- Notes: To compare clusters to known class labels, feed ground-truth labels and the HCA
labelsoutput intodiagnostics.cross_validationto generate a confusion matrix.
KMeans
- Node Type:
model.kmeans - Description: K-Means clustering.
- Parameters:
n_clusters(number, default:3): Number of clusters.n_init(number, default:10): Number of initializations.max_iter(number, default:300): Max iterations.random_state(number, default:42): Random seed.
DBSCAN
- Node Type:
model.dbscan - Description: Density-Based Spatial Clustering of Applications with Noise.
- Parameters:
eps(number, default:0.5): Neighborhood radius.min_samples(number, default:5): Minimum samples per cluster.metric(select, default:euclidean): Distance metric.
Diagnostics
Outlier Detection
- Node Type:
diagnostics.outliers - Description: Hotelling T² and Q statistics for PCA models.
- Inputs:
PCAModel. - Parameters:
confidence_level(number, default:0.95): Limit threshold.
Cross-Validation
- Node Type:
diagnostics.cross_validation - Description: Computes regression or classification metrics from paired
y_trueandy_pred. - Inputs:
y_true,y_pred. - Parameters:
cv_folds(number, default:5): Number of folds.
- Outputs: For classification, accuracy, confusion matrix, and a classification report. For regression, RMSE, MAE, R2, Q2, and residuals.
Data & Output Nodes
Data Sources
Data Source
- Node Type:
data.source - Description: Generic data loader for single files or synthetic data.
- Parameters:
source(select):spectrochempy,sklearn,experiment,file,library,synthetic.example_dataset(select):irdata,ramandata,nmrdata.sklearn_dataset(select):iris,wine,breast_cancer,digits.example_file(text): Specific file within example dataset (e.g.,CO@Mo_Al2O3.SPG).experiment_id(number): ID (for experiment source).file_path(text): Absolute path (for file source).transpose_on_load(boolean): Swap rows/cols.
- Notes: Sklearn datasets include class labels on the y-axis, which supervised nodes can auto-extract. The node also exposes an optional
targetoutput port (labels) alongside the default dataset output.
Train/Test Split
- Node Type:
data.train_test_split - Description: Split a dataset into training and test subsets.
- Inputs:
X(Dataset),y(Targets - optional, used for stratified splits). - Parameters:
test_size(number, default:0.2): Fraction of samples for testing.split_method(select, default:random):random,stratified,sequential.random_seed(number, default:42): Seed for reproducible splits.shuffle(boolean, default:True): Shuffle before splitting.
- Outputs:
X_train,X_test,y_train(if provided),y_test(if provided). - Notes: The split preserves y-axis labels in
X_trainandX_test. For stratified splits andy_train/y_testoutputs, pass a target array intoy.
Load Group
- Node Type:
data.load_group - Description: Load multiple files from a folder and concatenate them.
- Parameters:
folder_path(text): Path to folder.pattern(text, default:*.spa): File pattern (glob).recursive(boolean, default:False): Scan subdirectories.sort_by(select, default:filename):filename,numeric_suffix,modified_time.validate_axes(boolean, default:True): Ensure identical x-axes.group_title(text): Custom title for the grouped dataset.
NIST Library
- Node Type:
data.nist_library - Description: Loads a spectrum from the internal NIST database.
- Parameters:
library_id(number): Database ID of the entry.
Synthetic Curve
- Node Type:
data.synthetic_curve - Description: Generates a concentration profile (for testing/simulation).
- Parameters:
curve_type(select):sigmoid,gaussian,linear,exponential,step.n_points(number): Length of the curve.max_concentration(number): Peak value.center(number): Center position (0-1).width(number): Curve width.
Synthesis (Blending)
Blend
- Node Type:
synthesis.blend - Description: Create synthetic mixtures from multiple spectra inputs.
- Parameters:
n_timepoints(number, default:100): Time points in mixture.model_type(select, default:linear):linear(Beer-Lambert),saturation.pathlength(number, default:0.01): Pathlength in meters.noise_level(number, default:0.01): Noise fraction.
Species
- Node Type:
synthesis.species - Description: Mark a spectrum as a species component for blending.
- Parameters:
species_name(text): Name identifier.molar_absorptivity(number, default:1.0): Coefficient.
Merge Spectra
- Node Type:
synthesis.merge - Description: Combine multiple spectra into a single stacked dataset.
- Parameters:
align_wavenumbers(boolean, default:True): Interpolate to common grid.
Visualization & Export
Plot
- Node Type:
output.plot - Description: Creates standard spectral or scatter plots.
- Parameters:
plot_type(select):spectra,scores,loadings,scatter.x_axis(number): Index/label for X.y_axis(number): Index/label for Y.
Contour Plot
- Node Type:
output.contour - Description: 2D heatmaps/contours for time-series or multi-sample data.
- Parameters:
plot_type(select):heatmap,contour,surface.colorscale(select):Viridis,Hot,RdBu, etc.reverse_x(boolean): Reverse X-axis (standard for IR).transpose(boolean): Swap axes.
Data Table
- Node Type:
output.data_table - Description: Interactive table for numerical results.
- Parameters:
max_rows(number, default:100).transpose(boolean).show_index(boolean).
Statistics
- Node Type:
stats.summary - Description: Comprehensive statistics (Mean, Std, Min/Max) for datasets.
- Parameters:
compute_outliers(boolean).outlier_threshold(number).max_samples(number).
Export
- Node Type:
output.export - Description: Save results to file.
- Parameters:
filename(text).format(select):csv,json,jdx.