Skip to content

Selection and Validation Nodes

Selection and validation nodes help choose variables, split samples, and estimate model reliability.

Sample Partitioning

Node Use When Inputs Outputs Key Configuration
Sample Partition (selection.sample_partition) You need train/test splits designed for chemometrics rather than only random shuffling. X: Array2D; y: TargetMatrix? X_train; X_test; y_train; y_test; train_indices; test_indices method; test_size; metric; n_pcs; random_seed. Methods include chemometric designs such as Kennard-Stone-style selection where available.

Variable Selection

Node Use When Inputs Outputs Key Configuration
Variable Selection (selection.variable_select) Select wavelengths by VIP, coefficients, spectral region, peaks, or an incoming mask. X: Array2D; model: FittedModel?; mask: Array1D? X_selected; mask; scores method; region_start; region_end; peak_prominence; peak_half_window; threshold; invert.
Interval PLS (selection.ipls) Search spectral intervals and keep the interval set with best cross-validated PLS performance. X: SpectralDataset; y: TargetMatrix X_selected; mask; scores n_intervals; n_components; cv_folds; n_best.
CARS (selection.cars) Perform competitive adaptive reweighted sampling for wavelength selection. X: Array2D; y: TargetMatrix X_selected; mask; scores n_iterations; n_components; cv_folds.
SPA (selection.spa) Select variables with low collinearity using successive projections. X: Array2D; y: TargetMatrix? X_selected; mask; scores n_select.
UVE (selection.uve) Remove uninformative variables using Monte Carlo stability benchmarking. X: Array2D; y: TargetMatrix X_selected; mask; scores n_components; n_resamples; test_fraction.
Stability Selection (selection.stability) Keep variables that survive repeated subsampling. X: Array2D; y: TargetMatrix X_selected; mask; scores base_method; base_threshold; stability_threshold; n_bootstrap; n_components; subsample_fraction.
Compare Feature Selections (selection.compare) Compare multiple masks and build a consensus selection. X; mask_1; mask_2; optional mask_3; optional mask_4 X_consensus; consensus_mask; report consensus_threshold.
Audit Feature Selection (selection.audit) Inspect and document selection decisions before reporting or model handoff. X: Array2D X_out; audit include_scores.

Validation and Diagnostics

Node Use When Inputs Outputs Key Configuration
Evaluate Nested CV Selection (selection.nested_cv) Estimate performance when variable selection happens inside each CV fold. X: Array2D; y: TargetMatrix? cv_metrics; y_pred; stability selection_method; n_components; cv_folds; vip_threshold; coef_threshold.
Evaluate by Cross-Validation (diagnostics.cross_validation) Compute CV metrics from true and predicted values. y_true; y_pred cv_metrics; predictions; plots; model cv_folds; cv_method; task_type.
Evaluate Holdout Set (diagnostics.holdout_evaluation) Compute final test-set metrics from held-out predictions. y_true; y_pred; optional context and training predictions metrics; visualization; predictions task_type (regression or classification).
Detect Outliers (diagnostics.outliers) Flag PCA-model outliers by Hotelling T² and Q/SPE residuals. PCA/model output flags; T2; Q; model confidence_level. Connect directly to PCA model output when possible.
Statistics (stats.summary) Summarize a dataset, model output, metrics dictionary, or plot payload. default: Any statistics: ValidationResult compute_outliers; outlier_threshold; max_samples.

Interpretation

Selection can improve interpretability, but it can also overfit. Prefer nested validation or a true holdout when variable selection influences the final model. A variable-selection method that sees the full dataset before cross-validation will usually make the CV result too optimistic.