Create Comparison

To create a new comparison, whether within or across databases, start by navigating to Comparisons via the left-hand sidebar. Click on Create New Comparison to begin the setup.


1. Select Comparison Type:

After clicking on Create New Comparison, you'll have options to choose the Comparison Type either Same DB or Cross DB.

  • Same DB, you’ll compare the source and target within the same database.
  • Cross DB, you’ll compare across different databases. Navigate to DataSources from the sidebar to select the data sources you've already created.

2. After selecting the Comparison Type, give your comparison a name.


3. Choose Data Sources:

Select your source and target data source types.

Note: If comparing across different databases, select both source and target manually. Otherwise, the target is pre-selected.


4. Next, indicate whether you want to compare tables or queries:

Table:

You can select this option to compare data directly from database tables.

  • Choose a dataset from the available options and select the primary key to uniquely identify records.

  • Filter: Add a filter clause to narrow down your dataset, helping you focus only on relevant data points for analysis.

    For example, if you want to filter products with a price above a certain value, you can use product.price > 100.

Query:

Select this option to compare datasets using custom SQL queries.

  • Write a custom SQL query to retrieve specific data based on your requirements.

  • Define a primary key to uniquely identify records in the result set.

    For example, if you want to compare sales data for a specific year where the transaction amount is greater than 500, you can use:

    SELECT transaction_id, customer_id, amount, transaction_date
    FROM sales
    WHERE YEAR(transaction_date) = 2023 AND amount > 500;

5. Add Column Mapping

Column Mapping allows you to define how fields from the source dataset should be matched with fields in the target dataset. This ensures accurate data comparison between two datasets.

To add column mapping, choose the column from the source dataset and the corresponding column in the target dataset.

Click Add Column Mapping Pair to map additional columns.


6. Click Next to continue


7. Column Selection:

On the next screen, you’ll see all the columns available for comparison. You can select all columns or choose specific ones. However, you cannot select columns that are exclusive to one dataset.


8. Configuring Semantic Similarity for Comparison:

Semantic Similarity helps identify data variations by comparing text-based columns using different similarity functions and pre-processing techniques. To configure semantic similarity:

  • Toggle the switch to activate semantic similarity for this comparison
  • Choose a text analysis model that best suits your comparison needs
  • Apply Pre-Processing Functions like lowercasing and punctuation removal to standardize text. For more details, see Pre-Processing Functions.
  • Choose a Similarity Function, such as Levenshtein Distance, to measure text closeness. For more info, refer to Similarity Functions.
  • Set a Match Threshold: Define a similarity score (0 to 1) to determine matching accuracy. For instance, setting 0.8 means values with at least 80% similarity will be considered a match.

9. Finally, click Submit to successfully create your comparison.