Brokoli
Getting Started

Core Concepts

The mental model behind Brokoli -- pipelines, nodes, edges, runs, connections, and variables.

The mental model behind Brokoli: pipelines, nodes, edges, runs, connections, and variables.

Pipeline

A pipeline is a directed acyclic graph (DAG) of nodes connected by edges. It describes a data flow from sources through transformations to sinks.

Each pipeline has:

  • Name and description
  • Nodes -- the processing steps
  • Edges -- the connections between nodes
  • Schedule -- optional cron expression
  • Parameters -- default key-value pairs passed to nodes at runtime
  • Tags -- labels for filtering and grouping
  • Hooks -- lifecycle webhooks (on_start, on_success, on_failure)

Node

A node is a single step in a pipeline. Each node has a type that determines what it does, and a config object with type-specific settings.

Node types at a glance

TypeCategoryDescription
source_fileSourceRead CSV, JSON, Parquet, or Excel files
source_apiSourceHTTP GET/POST to a REST API
source_dbSourceSQL query against a database
transformTransformRename, filter, add columns, aggregate, sort
codeTransformRun custom Python scripts
joinTransformJoin two datasets on a key
sql_generateTransformGenerate SQL INSERT/UPSERT from data
quality_checkLogicAssert data quality rules
conditionLogicBranch execution based on expressions
sink_fileSinkWrite to CSV, JSON, Parquet files
sink_dbSinkInsert/upsert rows into a database
sink_apiSinkPOST data to a REST API
migrateOperationRun database migrations

See Node Types for full details on each type.

Edge

An edge connects one node's output to another node's input. Edges define the data flow and execution order. A node only executes after all its upstream nodes complete.

{"from": "source1", "to": "transform1"}

A node can have multiple incoming edges (e.g., a join node receives two datasets) and multiple outgoing edges (fan-out).

Run

A run is a single execution of a pipeline. Each run tracks:

FieldDescription
idUnique run identifier
statuspending > running > success / failed / cancelled
started_atWhen execution began
finished_atWhen execution completed
node_runsPer-node status, duration, and row counts
errorError message from the first failed node

Status lifecycle

stateDiagram-v2
    [*] --> pending
    pending --> running
    running --> success
    running --> failed
    running --> cancelled
    failed --> running : resume

Resume: Failed runs can be resumed from the first failed node. Nodes that already succeeded are skipped.

Connections

A connection stores credentials for an external system (database, API, SFTP, S3). Passwords and secrets are encrypted at rest using AES-256-GCM.

Supported connection types:

TypeUsed for
postgresPostgreSQL databases
mysqlMySQL databases
sqliteSQLite databases
httpREST APIs
sftpSFTP/SSH file transfers
s3Amazon S3 buckets
genericAny TCP-reachable service

Reference connections in node config: "conn_id": "my-postgres" -- the engine resolves the connection ID to a URI at runtime.

Variables

Variables are key-value pairs available to all pipelines. Two types:

  • string -- plaintext, visible in the UI
  • secret -- encrypted at rest, masked in responses

Reference in node config: ${var.my_variable}

Architecture

Brokoli is a single Go binary with three internal components:

broked serve
├── REST API + WebSocket  (chi router, JWT auth)
├── Scheduler             (cron, timezone-aware)
└── Engine                (DAG runner, Kahn's algorithm)
    └── SQLite or PostgreSQL

Execution model

The engine uses Kahn's algorithm for topological sorting and executes nodes in waves:

  1. Find all nodes with no incoming edges (in-degree = 0)
  2. Execute them in parallel (up to 4 concurrent by default)
  3. When a wave completes, decrement in-degree of downstream nodes
  4. Repeat until all nodes are processed

This means independent branches of your DAG run in parallel automatically.

Concurrency

  • Max concurrent runs: 4 (configurable via BROKOLI_MAX_CONCURRENT_RUNS)
  • Max parallel nodes per run: 4
  • Node timeout: 30 minutes (configurable per node via timeout config key)
  • Retry: Exponential backoff with max_retries and retry_delay per node

Workspaces (Enterprise)

Workspaces provide tenant isolation within a single Brokoli instance. Each workspace has its own pipelines, connections, and variables.

Next steps