Quality Checks
Assert data quality with built-in rules. Block pipeline execution on failures or collect warnings.
Assert data quality with built-in rules. Block pipeline execution on failures or collect warnings.
How it works
A quality_check node runs assertions against the incoming dataset. Each check evaluates a rule and reports pass/fail. Depending on the on_failure setting, a failed check either:
block-- stops the pipeline with an errorwarn-- logs a warning and passes data through
Configuration
{
"checks": [
{"column": "email", "rule": "not_null", "on_failure": "block"},
{"column": "id", "rule": "unique", "on_failure": "warn"},
{"column": "amount", "rule": "range", "params": {"min": 0, "max": 10000}, "on_failure": "block"}
]
}Available rules
not_null
Asserts no null or empty values in the column.
{"column": "email", "rule": "not_null", "on_failure": "block"}unique
Asserts all values in the column are unique (no duplicates).
{"column": "order_id", "rule": "unique", "on_failure": "block"}min
Asserts all numeric values are ≥ the minimum.
{"column": "age", "rule": "min", "params": {"min": 0}, "on_failure": "warn"}max
Asserts all numeric values are ≤ the maximum.
{"column": "score", "rule": "max", "params": {"max": 100}, "on_failure": "warn"}range
Asserts all numeric values fall within a range (inclusive).
{"column": "price", "rule": "range", "params": {"min": 0.01, "max": 99999.99}, "on_failure": "block"}regex
Asserts all values match a regular expression pattern.
{"column": "email", "rule": "regex", "params": {"pattern": "^[^@]+@[^@]+\\.[^@]+$"}, "on_failure": "warn"}row_count
Asserts the total number of rows falls within bounds.
{"column": "", "rule": "row_count", "params": {"min": 1, "max": 1000000}, "on_failure": "block"}Tip: Use
row_countwithmin: 1to ensure your pipeline doesn't silently process empty data.
type_check
Asserts all values parse as the expected type: int, float, date, or email.
{"column": "created_at", "rule": "type_check", "params": {"expected_type": "date"}, "on_failure": "warn"}Supported date formats: RFC3339, 2006-01-02, 2006-01-02T15:04:05, 01/02/2006, 02-Jan-2006.
freshness
Asserts a date column has values within N hours of the current time. Useful for detecting stale data.
{"column": "updated_at", "rule": "freshness", "params": {"max_hours": "24"}, "on_failure": "block"}no_blank
Stricter than not_null -- also catches empty strings and whitespace-only values.
{"column": "name", "rule": "no_blank", "on_failure": "warn"}Rule summary
| Rule | Column | Params | Description |
|---|---|---|---|
not_null | required | -- | No nulls or empty values |
unique | required | -- | No duplicate values |
min | required | min | Values ≥ minimum |
max | required | max | Values ≤ maximum |
range | required | min, max | Values within range |
regex | required | pattern | Values match regex |
row_count | optional | min, max | Row count within bounds |
type_check | required | expected_type | Values parse as type |
freshness | required | max_hours | Date values within N hours |
no_blank | required | -- | No nulls, empty strings, or whitespace |
Example: Production quality gate
{
"checks": [
{"column": "", "rule": "row_count", "params": {"min": 1}, "on_failure": "block"},
{"column": "customer_id", "rule": "not_null", "on_failure": "block"},
{"column": "customer_id", "rule": "unique", "on_failure": "block"},
{"column": "email", "rule": "regex", "params": {"pattern": "^.+@.+$"}, "on_failure": "warn"},
{"column": "amount", "rule": "range", "params": {"min": 0, "max": 999999}, "on_failure": "block"},
{"column": "updated_at", "rule": "freshness", "params": {"max_hours": "48"}, "on_failure": "warn"}
]
}Viewing results
Quality check results appear in:
- Run logs -- each check is logged with pass/fail and measured value
- Node preview -- the data preview panel shows which checks passed
- WebSocket events -- real-time check results during execution