Operations · Generic · Generic · Dataset Join
Overview

Overview

Analysis overview and configuration

Analysis TypeDataset Join
CompanyTest
ObjectiveJoin orders, sessions, support, and marketing data on customer_id and date/campaign_date
Analysis Date2026-03-02
Processing Idtest_1772483180
Total Observations6
ParameterValue_row
join_typeinnerjoin_type
join_type_1innerjoin_type_1
join_type_2leftjoin_type_2
join_type_3leftjoin_type_3
Interpretation

Join Operation Summary

A join operation combines multiple datasets into a single table by matching rows based on common columns (keys), enabling cross-functional analysis across different data sources. In this case, the analysis successfully joined orders, sessions, support, and marketing data on customer_id and date/campaign_date, resulting in 6 rows and 11 columns that integrate order metrics (total, items), engagement data (pageviews, session duration), customer support (tickets), and marketing performance (channel, spend, conversions).

The join type used appears to be a standard/inner join (based on the tool configuration), which retains only rows where matching keys exist across all datasets—explaining why the final dataset contains fewer rows than the sum of individual sources, as unmatched records are excluded.

Input datasets summary

Initial Rows31
Final Rows6
Rows Removed25
Interpretation

Data Pipeline Summary

This multi-dataset join operation combined 4 input tables (orders, sessions, support, and marketing) with a combined 31 rows, resulting in only 6 output rows—an 80.6% row reduction. The significant loss suggests an inner join was performed, which retains only rows with matching keys across all datasets; if an outer join had been used instead, the output would contain more rows (up to 31) with null values for non-matching records.

The 25 rows removed raises important data quality concerns: this indicates substantial key misalignment or missing values across the joined datasets, meaning most records lack corresponding matches. Before proceeding with analysis, investigate whether this high removal rate reflects legitimate data filtering (e.g., only complete customer journeys) or problematic data quality issues like incomplete key populations, inconsistent identifiers, or upstream data collection gaps.

Data Table

Join Process Log

Join process log showing step-by-step statistics

stepdataset_namejoin_typerows_beforerows_aftercolumns_beforecolumns_afterrows_changedcolumns_added
1sessionsinner8646-22
2supportleft666802
3marketingleft6681103
Interpretation

I appreciate you providing this data profile, but I'm unable to generate meaningful insights at this time because the join_log table is empty — the sample rows are truncated and no actual data is available to analyze the sequential join process.

To explain the step-by-step join process, I would need:

  • Before/after row counts for each of the 3 join steps
  • Join types used (INNER, LEFT, RIGHT, FULL, CROSS)
  • Column names added at each step
  • Key columns used for matching

Could you provide the actual join_log data or a summary showing the row counts and join types for each step? This would allow me to explain whether rows increased (due to CROSS or FULL joins), decreased (due to INNER joins), or remained stable.

Data Table

Data Preview

Preview of joined data (first 6 rows)

customer_iddateorder_totalitems_countpageviewssession_durationtickets_openedsatisfaction_scorechannelspendconversions
C0012026-01-15125.531234009.2email1503
C0012026-01-2289.9928220social2005
C0022026-01-15451512026.5email1002
C0022026-01-22210.851548017.8search3008
C0032026-01-1567.3238508.9social1754
C0042026-01-1515542060035.1search2506
Interpretation

Joined Dataset Preview

The joined dataset preview displays the first 6 rows to help verify that the join operation completed successfully; however, the actual data appears to be truncated in this summary, showing placeholder values rather than the specific column contents from the input datasets. While the metadata indicates 11 columns are present in the joined result, the specific column names and their source datasets cannot be identified from the truncated preview provided.

Executive Summary

Executive Summary

Executive summary of dataset join operation

datasets_joined
4
final_rows
6
join_type
step1: inner, step2: left, step3: left
MetricValue
Datasets Joined4
Join Keysdim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date
Join Dimensions2
Join Typestep1: inner, step2: left, step3: left
Input Rows31
Output Rows6
Match Rate19.4%
Bottom Line: Successfully joined 4 datasets using step1: inner, step2: left, step3: left join on column 'dim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date'. Combined 31 input rows into 6 output rows.

Key Findings:
• Join operation completed without errors
• All datasets had the required join column
• Result contains data from all input sources

Recommendation: Use the joined dataset for downstream analysis across all input sources.
Interpretation

Executive Summary: Dataset Join Operation

Four datasets were successfully combined into 6 output rows using a multi-step join strategy (inner join followed by two left joins) on customer_id and date fields. This approach preserved all records from the later datasets while filtering to matching records in the initial inner join, resulting in a consolidated view across all input sources. The 31 input rows reduced to 6 outputs, indicating the inner join step removed non-matching records; users should verify this data loss is acceptable or consider adjusting the join strategy if broader coverage is needed. The combined dataset is now ready for downstream analysis with integrated customer, date, and campaign information.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing