Skip to content

Data Nodes

Data nodes introduce datasets into a workflow.

Core Data Nodes

Node Use When Inputs Outputs Key Configuration
Data Source (data.source) You need an example dataset, uploaded experiment file, library entry, or direct file path. none default: SpectralDataset; target: TargetMatrix when available source; source-specific selectors such as example_dataset, sklearn_dataset, eigenvector_dataset, synthetic_dataset, experiment_id, file_id, stage, library_id, file_path.
File Load (data.file_load) You want a specific uploaded experiment file from a project dataset. none spectral dataset experiment_id; file_id; stage such as raw or processed.
My Dataset (data.my_dataset) You want to load all files in a saved dataset collection. none default: SpectralDataset; target: TargetMatrix if the dataset has targets dataset_id.
NIST Library (data.nist_library) You need reference spectra from the bundled/reference NIST library. none SherpaDataset library_id; compound_name.
Load Group (data.load_group) You need to stack many files from a folder into one dataset. none grouped spectral dataset folder_path; pattern; recursive; sort_by; validate_axes; group_title. Requires SpectroChemPy for spectroscopy-native grouped file loading.
Attach Target (data.attach_target) You already have spectra and a target vector or matrix and want downstream supervised nodes to see them together. X: SpectralDataset; y: TargetMatrix default: SpectralDataset with attached targets target_type such as continuous or categorical.
Train/Test Split (data.train_test_split) You need a simple train/test split before modeling or holdout evaluation. X: SpectralDataset; y: TargetMatrix? X_train; X_test; y_train; y_test test_size; split_method; random_seed; shuffle.
Filter Samples (data.filter_samples) You need to subset rows by sample label, class, target, metadata, row number, or intensity rule. X: SpectralDataset filtered SpectralDataset field; pattern; match_mode; case_sensitive; invert; sample-table and intensity threshold settings.

Synthesis and Mixture Helpers

These nodes are useful for examples, simulation, and method development. They should not be mistaken for measured calibration data.

Node Use When Inputs Outputs Key Configuration
Species (synthesis.species) Mark a spectrum as a component/species before mixture generation. default: SpectralDataset spectral dataset with species metadata species_name; molar_absorptivity.
Blend (synthesis.blend) Create synthetic mixtures from component spectra and concentration curves. default: SpectralDataset synthetic spectral dataset n_timepoints; model_type; pathlength; noise_level; species_config.
Merge Spectra (synthesis.merge) Stack or combine spectra into one dataset. default: SpectralDataset merged spectral dataset align_wavenumbers.
Synthetic Curve (data.synthetic_curve) Generate concentration/time curves for synthetic examples. none synthetic curve dataset curve_type; n_points; max_concentration; center; width.
Concentration Curve (custom.concentration_curve) Generate simple concentration curves for custom blending workflows. none array curve_type; n_points; max_concentration; center; width.
Catmull-Rom Curve (custom.catmull_rom_curve) Generate smoother custom curves from control points. none array n_points; max_concentration; control_points.
Noise Injection (custom.noise_injection) Stress-test workflows with controlled synthetic noise. default: SpectralDataset spectral dataset noise_level; noise_type; seed.
System Saturation (custom.system_saturation) Simulate detector or system saturation in synthetic spectra. default: SpectralDataset spectral dataset s_system; p_system.
Linear Calibration (custom.linear_calibration) Simulate Beer-Lambert style response with saturation capping. spectrum: SpectralDataset; concentrations: Any spectral dataset s_max; concentration_unit; reference_confirmed.
Saturation Model (custom.saturation_model) Simulate nonlinear high-concentration response. spectrum: SpectralDataset; concentrations: Any spectral dataset validate_params; concentration_unit; warn_extrapolation.
Hybrid Model Selector (custom.hybrid_selector) Choose linear or saturation output per variable. linear_result; saturation_result; concentrations spectral dataset auto_select; saturation_threshold.
Golden Grid Align (custom.golden_grid_align) Align spectra to a common wavenumber grid for synthetic or imported sets. default: SpectralDataset aligned spectral dataset method; merge_tolerance.

Deployment I/O Helpers

Node Use When Inputs Outputs Key Configuration
Deploy Input (deploy.input) Building a headless prediction workflow that receives external data. none default: SpectralDataset stream_name.
Deploy Output (deploy.output) Formatting headless prediction results for a service response. default: Any scalar or structured response payload output_format; key_value_separator; end_of_message_tag.

Data Checks Before Modeling

Confirm sample count, variable count, spectral axis units, sample IDs, and target alignment before connecting to supervised nodes. A model can train on row-misaligned spectra and targets, but the results are scientifically invalid.

Import Boundary

Base file support and optional SpectroChemPy-backed file support are documented in Supported File Types.