Incremental validation
If your source data is continuously updated, Tracelake will sometimes detect problems in your data since these changes are not yet replicated to your data warehouse. One way to avoid this is to run validation in “quiet time” when there are fewer changes to the data.
The other option is to use incremental validation. With this approach, if problems are found in some of your tables, Tracelake will automatically re-run the validation (by default 3 times), and raises only problems that were present in all runs.
You can enable incremental validation in the plan settings.
Parameters:
incremental
: Whether to enable incremental validation.incremental delay
: Delay between re-runs in minutes.max incremental runs
: Maximum number of re-runs.
Example:
We are replicating data from SAP to Snowflake. The replication is scheduled to synchronize changes every hour. It takes another 20 minutes before the data to be processed in Snowflake and is available in the silver (compacted) layer. In summary, the SAP data is available in Snowflake with maximum delay of 80 minutes after it was updated.
A suitable validation schedule for this scenario would be:
schedule interval
: Dailyfirst run
:11:00 PM UTC incremental
: Trueincremental delay
: 81 minutesmax incremental runs
: 3
With this schedule, every one of the incremental runs should run on fresh data and should be able to exclude any problems which are caused only by the delay in the replication and processing.