Skip to main content
Datachecks supports seven validation types. Each type covers a different aspect of data quality and exposes one or more functions you reference in the on field when configuring a validation.
TypeWhat it checks
CompletenessMissing values, nulls, and empty strings
UniquenessDuplicates and distinct counts
ValidityFormats, patterns, value ranges, and identifier standards
ReliabilityRow counts, freshness, and timeliness
Numeric DistributionStatistical measures across numeric columns
Custom SQLAny metric expressible as a SQL query
Delta ValidationDifferences between two data sources or segments

Completeness

Completeness validations detect missing data — null values and empty strings — in your datasets.
FunctionDescription
count_null(column)Count of null values in the column.
percent_null(column)Percentage of null values in the column.
count_empty_string(column)Count of empty string ('') values in the column.
percent_empty_string(column)Percentage of empty string values in the column.
Example
validations for product_db.products:
  - null_count_in_first_name:
      on: count_null(first_name)
      threshold: "= 0"

Uniqueness

Uniqueness validations measure data distinctiveness and identify duplication within a column.
FunctionDescription
count_distinct(column)Count of unique values in the column.
count_duplicate(column)Count of duplicate values in the column.
Example
validations for product_db.products:
  - distinct_count_of_product_categories:
      on: count_distinct(product_category)
      threshold: "> 5"

Validity

Validity validations check that values conform to expected formats, standards, or patterns.

Value-based

FunctionDescription
count_valid_values(column)Count of values matching the values list.
percent_valid_values(column)Percentage of values matching the values list.
count_invalid_values(column)Count of values not in the values list.
percent_invalid_values(column)Percentage of values not in the values list.
Use the values parameter to specify the accepted list:
validations for iris_db.iris:
  - valid_species_percentage:
      on: percent_valid_values(species)
      values: ["setosa", "virginica"]
      threshold: "> 65"

String format

FunctionDescription
string_length_max(column)Maximum string length in the column.
string_length_min(column)Minimum string length in the column.
string_length_average(column)Average string length in the column.
count_all_space(column)Count of values that are only whitespace.
percent_all_space(column)Percentage of values that are only whitespace.
count_null_keyword(column)Count of null-like string keywords (e.g. "null", "N/A").
percent_null_keyword(column)Percentage of null-like string keywords.

Regex patterns

FunctionDescription
count_valid_regex(column)Count of values matching the pattern.
percent_valid_regex(column)Percentage of values matching the pattern.
count_invalid_regex(column)Count of values not matching the pattern.
percent_invalid_regex(column)Percentage of values not matching the pattern.
Use the pattern parameter to specify the regex:
validations for users_db.accounts:
  - valid_email_format:
      on: percent_valid_regex(email)
      pattern: "^[\\w.+-]+@[\\w-]+\\.[\\w.]+$"
      threshold: "> 99"

Contact information

FunctionDescription
count_usa_phone(column)Count of valid US phone numbers.
percent_usa_phone(column)Percentage of valid US phone numbers.
count_email(column)Count of valid email addresses.
percent_email(column)Percentage of valid email addresses.

Identifiers

FunctionDescription
count_uuid(column)Count of valid UUIDs.
percent_uuid(column)Percentage of valid UUIDs.
count_sedol(column)Count of valid SEDOL codes.
percent_sedol(column)Percentage of valid SEDOL codes.
count_cusip(column)Count of valid CUSIP codes.
percent_cusip(column)Percentage of valid CUSIP codes.
count_lei(column)Count of valid LEI codes.
percent_lei(column)Percentage of valid LEI codes.
count_figi(column)Count of valid FIGI codes.
percent_figi(column)Percentage of valid FIGI codes.
count_isin(column)Count of valid ISIN codes.
percent_isin(column)Percentage of valid ISIN codes.

Geolocation

FunctionDescription
count_latitude(column)Count of valid latitude values (−90 to 90).
percent_latitude(column)Percentage of valid latitude values.
count_longitude(column)Count of valid longitude values (−180 to 180).
percent_longitude(column)Percentage of valid longitude values.

Timestamps and dates

FunctionDescription
count_timestamp_string(column)Count of values in valid ISO timestamp format.
percent_timestamp_string(column)Percentage of values in valid ISO timestamp format.
count_not_in_future(column)Count of timestamps that are not in the future.
percent_not_in_future(column)Percentage of timestamps that are not in the future.
count_date_not_in_future(column)Count of dates that are not in the future.
percent_date_not_in_future(column)Percentage of dates that are not in the future.

Reliability

Reliability validations monitor data freshness, row counts, and update frequency to detect pipeline delays or data loss.
FunctionDescription
freshness(timestamp_column)Time in seconds since the most recent value in the column. Alert when stale.
count_rowsTotal row count in the table.
count_documentsTotal document count in a search index.
Examples
validations for product_db.products:
  - freshness_of_products:
      on: freshness(updated_at)
      threshold: "> 86400"

  - product_row_count:
      on: count_rows
      where: "country_code = 'IN'"
      threshold: "> 1000"

Numeric Distribution

Numeric Distribution validations track statistical measures on numeric columns to detect shifts, outliers, or unexpected variance.

Central tendency

FunctionDescription
avg(column)Mean value.
min(column)Minimum value.
max(column)Maximum value.
sum(column)Sum of all values.

Dispersion

FunctionDescription
variance(column)Variance from the mean.
stddev(column)Standard deviation.

Percentiles

FunctionDescription
percentile(column, 20)20th percentile value.
percentile(column, 40)40th percentile value.
percentile(column, 60)60th percentile value.
percentile(column, 80)80th percentile value.
percentile(column, 90)90th percentile value.

Zero and negative values

FunctionDescription
count_zero(column)Count of zero values.
percent_zero(column)Percentage of zero values.
count_negative(column)Count of negative values.
percent_negative(column)Percentage of negative values.
Example
validations for sales_db.orders:
  - average_order_value:
      on: avg(order_total)
      threshold: "< 500"

  - negative_amounts:
      on: count_negative(order_total)
      threshold: "= 0"

Custom SQL

Custom SQL validations let you define any metric as a SQL SELECT statement that returns a single numeric value. Use this for business-specific checks that don’t fit standard types. Configuration
ParameterDescription
onMust be set to custom_sql.
queryThe SQL query. Must return a single numeric value.
Example
validations for mysql_db.student:
  - high_age_bangalore_students:
      on: custom_sql
      query: |
        SELECT COUNT(*) FROM student
        WHERE city = 'bangalore' AND age >= 30
      threshold: "> 0"
Supported on all connected databases: Snowflake, BigQuery, Databricks, PostgreSQL, MySQL, Oracle, Azure SQL, and Sybase.

Delta Validation

Delta validations compare a metric between two datasets — useful for migration checks, cross-environment comparisons, and change detection. Configuration
ParameterDescription
ondelta followed by a supported function, e.g. delta count_rows.
refReference asset in the format datasource_name.dataset_name.
thresholdThe acceptable difference between the two values.
Supported functions
FunctionDescription
delta count_rowsCompare row counts between source and reference.
Example
validations for iris_pgsql_source.dcs_iris:
  - row_count_delta:
      on: delta count_rows
      ref: iris_pgsql_target.dcs_iris
      threshold: "< 1"