Brokoli
Pipelines

Quality Checks

Assert data quality with built-in rules. Block pipeline execution on failures or collect warnings.

Assert data quality with built-in rules. Block pipeline execution on failures or collect warnings.

How it works

A quality_check node runs assertions against the incoming dataset. Each check evaluates a rule and reports pass/fail. Depending on the on_failure setting, a failed check either:

  • block -- stops the pipeline with an error
  • warn -- logs a warning and passes data through

Configuration

{
  "checks": [
    {"column": "email", "rule": "not_null", "on_failure": "block"},
    {"column": "id", "rule": "unique", "on_failure": "warn"},
    {"column": "amount", "rule": "range", "params": {"min": 0, "max": 10000}, "on_failure": "block"}
  ]
}

Available rules

not_null

Asserts no null or empty values in the column.

{"column": "email", "rule": "not_null", "on_failure": "block"}

unique

Asserts all values in the column are unique (no duplicates).

{"column": "order_id", "rule": "unique", "on_failure": "block"}

min

Asserts all numeric values are ≥ the minimum.

{"column": "age", "rule": "min", "params": {"min": 0}, "on_failure": "warn"}

max

Asserts all numeric values are ≤ the maximum.

{"column": "score", "rule": "max", "params": {"max": 100}, "on_failure": "warn"}

range

Asserts all numeric values fall within a range (inclusive).

{"column": "price", "rule": "range", "params": {"min": 0.01, "max": 99999.99}, "on_failure": "block"}

regex

Asserts all values match a regular expression pattern.

{"column": "email", "rule": "regex", "params": {"pattern": "^[^@]+@[^@]+\\.[^@]+$"}, "on_failure": "warn"}

row_count

Asserts the total number of rows falls within bounds.

{"column": "", "rule": "row_count", "params": {"min": 1, "max": 1000000}, "on_failure": "block"}

Tip: Use row_count with min: 1 to ensure your pipeline doesn't silently process empty data.

type_check

Asserts all values parse as the expected type: int, float, date, or email.

{"column": "created_at", "rule": "type_check", "params": {"expected_type": "date"}, "on_failure": "warn"}

Supported date formats: RFC3339, 2006-01-02, 2006-01-02T15:04:05, 01/02/2006, 02-Jan-2006.

freshness

Asserts a date column has values within N hours of the current time. Useful for detecting stale data.

{"column": "updated_at", "rule": "freshness", "params": {"max_hours": "24"}, "on_failure": "block"}

no_blank

Stricter than not_null -- also catches empty strings and whitespace-only values.

{"column": "name", "rule": "no_blank", "on_failure": "warn"}

Rule summary

RuleColumnParamsDescription
not_nullrequired--No nulls or empty values
uniquerequired--No duplicate values
minrequiredminValues ≥ minimum
maxrequiredmaxValues ≤ maximum
rangerequiredmin, maxValues within range
regexrequiredpatternValues match regex
row_countoptionalmin, maxRow count within bounds
type_checkrequiredexpected_typeValues parse as type
freshnessrequiredmax_hoursDate values within N hours
no_blankrequired--No nulls, empty strings, or whitespace

Example: Production quality gate

{
  "checks": [
    {"column": "", "rule": "row_count", "params": {"min": 1}, "on_failure": "block"},
    {"column": "customer_id", "rule": "not_null", "on_failure": "block"},
    {"column": "customer_id", "rule": "unique", "on_failure": "block"},
    {"column": "email", "rule": "regex", "params": {"pattern": "^.+@.+$"}, "on_failure": "warn"},
    {"column": "amount", "rule": "range", "params": {"min": 0, "max": 999999}, "on_failure": "block"},
    {"column": "updated_at", "rule": "freshness", "params": {"max_hours": "48"}, "on_failure": "warn"}
  ]
}

Viewing results

Quality check results appear in:

  • Run logs -- each check is logged with pass/fail and measured value
  • Node preview -- the data preview panel shows which checks passed
  • WebSocket events -- real-time check results during execution