Lakeflow and Structured Streaming Pipelines
Back to modules
Course progress50%
article
Incremental ingestion patterns
Design cloud-file ingestion around freshness, idempotency, and replay.
Incremental Ingestion Patterns
Incremental ingestion is mostly about trust. Teams need to know which files arrived, which records were accepted, and whether a retry will duplicate data.
Pattern map
- Append-only landing data: use checkpointed streaming or Auto Loader-style ingestion.
- Late-arriving updates: merge into a curated Delta table with deterministic keys.
- Partner drops: quarantine bad files and publish validation results.
Idempotent write sketch
MERGE INTO main.silver.orders AS target
USING staging.orders_updates AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Operational notes
Retries are normal. A good ingestion design makes retries boring by separating discovery, validation, write, and publish steps.
1
Incremental ingestion patterns
Ingestion patterns