Create Comparison

To create a new comparison, whether within or across databases, start by navigating to Comparisons via the left-hand sidebar. Click on Create New Comparison to begin the setup.


1. Select Comparison Type:

After clicking on Create New Comparison, you'll have options to choose the Comparison Type either Shallow Comparison or Deep Comparison.

  • Shallow Comparison
    Stops at the first difference, providing a faster check. Suitable for basic validation where only the presence of discrepancies is important.
  • Deep Comparison
    Thoroughly scans both datasets to identify all differences and generate detailed reports. Ideal for comprehensive validations and data quality assessments.

2. After selecting the Comparison Type, give your comparison a name.


3. Choose Data Sources:

Select your source and target data source types.

ℹ️

If comparing across different databases, select both source and target manually. Otherwise, the target is pre-selected.


4. Next, indicate whether you want to compare tables or queries:

Table:

You can select this option to compare data directly from database tables.

  • Choose a dataset from the available options.

  • Filter: Add a filter clause to narrow down your dataset, helping you focus only on relevant data points for analysis.

    For example, if you want to filter products with a price above a certain value, you can use product.price > 100.

Query:

Select this option to compare datasets using custom SQL queries.

  • Write a custom SQL query to retrieve specific data based on your requirements.

  • For example, if you want to compare sales data for a specific year where the transaction amount is greater than 500, you can use:

    SELECT transaction_id, customer_id, amount, transaction_date
    FROM sales
    WHERE YEAR(transaction_date) = 2023 AND amount > 500;

5. Click Next to continue


6. Primary Keys

On the next screen, you'll see options to define the matching keys between your Source and Target datasets. These keys are used to align records for accurate comparison. Select the relevant fields using the dropdowns, and click Add Primary Key to include more keys if needed. This step ensures precise row-level matching.


7. Column Selection:

After Primary Keys, you’ll see all the columns available for comparison. You can select all columns or choose specific ones. However, you cannot select columns that are exclusive to one dataset.


8. Add Column Mapping

Column Mapping allows you to define how fields from the source dataset should be matched with fields in the target dataset. This ensures accurate data comparison between two datasets.

To add column mapping, choose the column from the source dataset and the corresponding column in the target dataset.

Click Add Column Mapping Pair to map additional columns


9. Configuring Semantic Similarity for Comparison:

ℹ️

Semantic Similarity is only available for Shallow Comparisons.

Semantic Similarity helps identify data variations by comparing text-based columns using different similarity functions and pre-processing techniques. To configure semantic similarity:

  • Toggle the switch to activate semantic similarity for this comparison
  • Choose a text analysis model that best suits your comparison needs
  • Apply Pre-Processing Functions like lowercasing and punctuation removal to standardize text. For more details, see Pre-Processing Functions.
  • Choose a Similarity Function, such as Levenshtein Distance, to measure text closeness. For more info, refer to Similarity Functions.
  • Set a Match Threshold: Define a similarity score (0 to 1) to determine matching accuracy. For instance, setting 0.8 means values with at least 80% similarity will be considered a match.

10. Finally, click Submit to successfully create your comparison.