Lakeflow and Structured Streaming Pipelines

Back to modules
Course progress50%
article

Incremental ingestion patterns

Design cloud-file ingestion around freshness, idempotency, and replay.

Incremental Ingestion Patterns

Incremental ingestion is mostly about trust. Teams need to know which files arrived, which records were accepted, and whether a retry will duplicate data.

Pattern map

  • Append-only landing data: use checkpointed streaming or Auto Loader-style ingestion.
  • Late-arriving updates: merge into a curated Delta table with deterministic keys.
  • Partner drops: quarantine bad files and publish validation results.

Idempotent write sketch

MERGE INTO main.silver.orders AS target
USING staging.orders_updates AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Operational notes

Retries are normal. A good ingestion design makes retries boring by separating discovery, validation, write, and publish steps.

Incremental ingestion patterns

Ingestion patterns