Synthesis Builder
The Synthesis module creates synthetic spectral datasets by blending reference species (pure components) according to defined concentration profiles. It uses advanced calibration models to simulate realistic detector responses.
Preprocessing
Before blending, you can clean your source spectra. The builder uses spectrochempy for these operations.
- Alignment: Spectra must be on the same wavenumber grid. The system uses PCHIP (Piecewise Cubic Hermite Interpolating Polynomial) interpolation by default to preserve peak shapes without introducing ringing artifacts.
- Clipping:
- Range: Limits the spectral range (e.g., 400-4000 cm⁻¹).
- Floor: Sets a minimum absorbance value (e.g., 0.0) to remove non-physical negative noise.
- Cosmic Ray Removal: Uses a Z-score based algorithm (default Z=3.0) to detect and remove single-point spikes.
Blending Algorithm
The synthesis engine implements a multi-species superposition model.
The total absorbance $A_{total}$ at wavenumber $\nu$ and time $t$ is the sum of individual component contributions:
$$ A_{total}(\nu, t) = \sum_{i=1}^{N} A_i(\nu, C_i(t)) $$
Where $C_i(t)$ is the concentration of species $i$ at time $t$.
Calibration Models
Each species can use a different calibration model $A_i(\nu, C)$, defined per wavenumber:
1. Linear Model (Beer's Law)
Standard for low concentrations. $$ A = \text{slope}(\nu) \cdot C + \text{intercept}(\nu) $$
- Clipping: To prevent non-physical extrapolation, linear models are capped at a "Safe Threshold" (default 1.8 AU) or the saturation level $s$ if available.
2. Saturation Model
Models non-linear detector response at high concentrations using a hyperbolic tangent function. $$ A = s(\nu) \cdot \left[ \tanh\left( \left( \frac{c(\nu) \cdot C}{s(\nu)} \right)^{p(\nu)} \right) \right]^{1/p(\nu)} $$
- $s(\nu)$: Saturation plateau (max absorbance).
- $c(\nu)$: Linear sensitivity (slope at $C=0$).
- $p(\nu)$: Shape parameter controlling the transition to saturation.
3. Hybrid Model
Allows different wavenumbers to use different models (e.g., peaks use saturation, baseline uses linear).
Concentration Modes
The system supports two concentration modes, critical for quantitative accuracy:
- Product Mode (
ppm·m): Concentration-Pathlength product. This is the standard mode.- If your input is in
ppm, you must specify apathlength_m. - If your input is
ppm·m, leave pathlength as None.
- If your input is in
- Concentration Mode (
ppm): Legacy mode. Assumes a fixed pathlength.
System Saturation
After blending, a global "System Saturation" can be applied to simulate the detector's dynamic range limit on the total absorbance:
$$ A_{measured} = s_{sys} \cdot \tanh\left( (A_{total} / s_{sys})^{p_{sys}} \right)^{1/p_{sys}} $$
This is dimensionless and affects the entire spectrum.
Generating Data
- Define Profiles: Create time-series concentrations ($C_i(t)$) for each species using the Curve Generator (Sigmoid, Gaussian, Linear, Step).
- Run Blend: The engine vectorizes the calculation across all wavenumbers and time points.
- Statistics: The result includes min, max, mean, and std deviation of the generated absorbance matrix.
Export
Synthetic data can be exported as:
* CSV: Simple matrix (rows=wavenumbers, cols=samples).
* JSON: Full metadata including the SpectrumRecord structure and model parameters.
* NetCDF: Standard scientific format.