Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| join_type | inner | join_type |
| join_type_1 | inner | join_type_1 |
| join_type_2 | left | join_type_2 |
| join_type_3 | left | join_type_3 |
A join operation combines multiple datasets into a single table by matching rows based on common columns (keys), enabling cross-functional analysis across different data sources. In this case, the analysis successfully joined orders, sessions, support, and marketing data on customer_id and date/campaign_date, resulting in 6 rows and 11 columns that integrate order metrics (total, items), engagement data (pageviews, session duration), customer support (tickets), and marketing performance (channel, spend, conversions).
The join type used appears to be a standard/inner join (based on the tool configuration), which retains only rows where matching keys exist across all datasets—explaining why the final dataset contains fewer rows than the sum of individual sources, as unmatched records are excluded.
Input datasets summary
| dataset_name | rows | columns | _row |
|---|---|---|---|
| orders | 8 | 4 | orders |
| sessions | 8 | 4 | sessions |
| support | 7 | 4 | support |
| marketing | 8 | 5 | marketing |
This multi-dataset join operation combined 4 input tables (orders, sessions, support, and marketing) with a combined 31 rows, resulting in only 6 output rows—an 80.6% row reduction. The significant loss suggests an inner join was performed, which retains only rows with matching keys across all datasets; if an outer join had been used instead, the output would contain more rows (up to 31) with null values for non-matching records.
The 25 rows removed raises important data quality concerns: this indicates substantial key misalignment or missing values across the joined datasets, meaning most records lack corresponding matches. Before proceeding with analysis, investigate whether this high removal rate reflects legitimate data filtering (e.g., only complete customer journeys) or problematic data quality issues like incomplete key populations, inconsistent identifiers, or upstream data collection gaps.
Join process log showing step-by-step statistics
| step | dataset_name | join_type | rows_before | rows_after | columns_before | columns_after | rows_changed | columns_added |
|---|---|---|---|---|---|---|---|---|
| 1 | sessions | inner | 8 | 6 | 4 | 6 | -2 | 2 |
| 2 | support | left | 6 | 6 | 6 | 8 | 0 | 2 |
| 3 | marketing | left | 6 | 6 | 8 | 11 | 0 | 3 |
I appreciate you providing this data profile, but I'm unable to generate meaningful insights at this time because the join_log table is empty — the sample rows are truncated and no actual data is available to analyze the sequential join process.
To explain the step-by-step join process, I would need:
Could you provide the actual join_log data or a summary showing the row counts and join types for each step? This would allow me to explain whether rows increased (due to CROSS or FULL joins), decreased (due to INNER joins), or remained stable.
Preview of joined data (first 6 rows)
| customer_id | date | order_total | items_count | pageviews | session_duration | tickets_opened | satisfaction_score | channel | spend | conversions |
|---|---|---|---|---|---|---|---|---|---|---|
| C001 | 2026-01-15 | 125.5 | 3 | 12 | 340 | 0 | 9.2 | 150 | 3 | |
| C001 | 2026-01-22 | 89.99 | 2 | 8 | 220 | social | 200 | 5 | ||
| C002 | 2026-01-15 | 45 | 1 | 5 | 120 | 2 | 6.5 | 100 | 2 | |
| C002 | 2026-01-22 | 210.8 | 5 | 15 | 480 | 1 | 7.8 | search | 300 | 8 |
| C003 | 2026-01-15 | 67.3 | 2 | 3 | 85 | 0 | 8.9 | social | 175 | 4 |
| C004 | 2026-01-15 | 155 | 4 | 20 | 600 | 3 | 5.1 | search | 250 | 6 |
The joined dataset preview displays the first 6 rows to help verify that the join operation completed successfully; however, the actual data appears to be truncated in this summary, showing placeholder values rather than the specific column contents from the input datasets. While the metadata indicates 11 columns are present in the joined result, the specific column names and their source datasets cannot be identified from the truncated preview provided.
| Metric | Value |
|---|---|
| Datasets Joined | 4 |
| Join Keys | dim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date |
| Join Dimensions | 2 |
| Join Type | step1: inner, step2: left, step3: left |
| Input Rows | 31 |
| Output Rows | 6 |
| Match Rate | 19.4% |
Four datasets were successfully combined into 6 output rows using a multi-step join strategy (inner join followed by two left joins) on customer_id and date fields. This approach preserved all records from the later datasets while filtering to matching records in the initial inner join, resulting in a consolidated view across all input sources. The 31 input rows reduced to 6 outputs, indicating the inner join step removed non-matching records; users should verify this data loss is acceptable or consider adjusting the join strategy if broader coverage is needed. The combined dataset is now ready for downstream analysis with integrated customer, date, and campaign information.