Selection and Validation Nodes

Selection and validation nodes help choose variables, split samples, and estimate model reliability.

Sample Partitioning

Node	Use When	Inputs	Outputs	Key Configuration
Sample Partition (`selection.sample_partition`)	You need train/test splits designed for chemometrics rather than only random shuffling.	`X: Array2D`; `y: TargetMatrix?`	`X_train`; `X_test`; `y_train`; `y_test`; `train_indices`; `test_indices`	`method`; `test_size`; `metric`; `n_pcs`; `random_seed`. Methods include chemometric designs such as Kennard-Stone-style selection where available.

Variable Selection

Node	Use When	Inputs	Outputs	Key Configuration
Variable Selection (`selection.variable_select`)	Select wavelengths by VIP, coefficients, spectral region, peaks, or an incoming mask.	`X: Array2D`; `model: FittedModel?`; `mask: Array1D?`	`X_selected`; `mask`; `scores`	`method`; `region_start`; `region_end`; `peak_prominence`; `peak_half_window`; `threshold`; `invert`.
Interval PLS (`selection.ipls`)	Search spectral intervals and keep the interval set with best cross-validated PLS performance.	`X: SpectralDataset`; `y: TargetMatrix`	`X_selected`; `mask`; `scores`	`n_intervals`; `n_components`; `cv_folds`; `n_best`.
CARS (`selection.cars`)	Perform competitive adaptive reweighted sampling for wavelength selection.	`X: Array2D`; `y: TargetMatrix`	`X_selected`; `mask`; `scores`	`n_iterations`; `n_components`; `cv_folds`.
SPA (`selection.spa`)	Select variables with low collinearity using successive projections.	`X: Array2D`; `y: TargetMatrix?`	`X_selected`; `mask`; `scores`	`n_select`.
UVE (`selection.uve`)	Remove uninformative variables using Monte Carlo stability benchmarking.	`X: Array2D`; `y: TargetMatrix`	`X_selected`; `mask`; `scores`	`n_components`; `n_resamples`; `test_fraction`.
Stability Selection (`selection.stability`)	Keep variables that survive repeated subsampling.	`X: Array2D`; `y: TargetMatrix`	`X_selected`; `mask`; `scores`	`base_method`; `base_threshold`; `stability_threshold`; `n_bootstrap`; `n_components`; `subsample_fraction`.
Compare Feature Selections (`selection.compare`)	Compare multiple masks and build a consensus selection.	`X`; `mask_1`; `mask_2`; optional `mask_3`; optional `mask_4`	`X_consensus`; `consensus_mask`; `report`	`consensus_threshold`.
Audit Feature Selection (`selection.audit`)	Inspect and document selection decisions before reporting or model handoff.	`X: Array2D`	`X_out`; `audit`	`include_scores`.

Validation and Diagnostics

Node	Use When	Inputs	Outputs	Key Configuration
Evaluate Nested CV Selection (`selection.nested_cv`)	Estimate performance when variable selection happens inside each CV fold.	`X: Array2D`; `y: TargetMatrix?`	`cv_metrics`; `y_pred`; `stability`	`selection_method`; `n_components`; `cv_folds`; `vip_threshold`; `coef_threshold`.
Evaluate by Cross-Validation (`diagnostics.cross_validation`)	Compute CV metrics from true and predicted values.	`y_true`; `y_pred`	`cv_metrics`; `predictions`; `plots`; `model`	`cv_folds`; `cv_method`; `task_type`.
Evaluate Holdout Set (`diagnostics.holdout_evaluation`)	Compute final test-set metrics from held-out predictions.	`y_true`; `y_pred`; optional context and training predictions	`metrics`; `visualization`; `predictions`	`task_type` (`regression` or `classification`).
Detect Outliers (`diagnostics.outliers`)	Flag PCA-model outliers by Hotelling T² and Q/SPE residuals.	PCA/model output	`flags`; `T2`; `Q`; `model`	`confidence_level`. Connect directly to PCA model output when possible.
Statistics (`stats.summary`)	Summarize a dataset, model output, metrics dictionary, or plot payload.	`default: Any`	`statistics: ValidationResult`	`compute_outliers`; `outlier_threshold`; `max_samples`.

Interpretation

Selection can improve interpretability, but it can also overfit. Prefer nested validation or a true holdout when variable selection influences the final model. A variable-selection method that sees the full dataset before cross-validation will usually make the CV result too optimistic.