Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| join_type | inner | join_type |
| join_type_1 | inner | join_type_1 |
| join_type_2 | left | join_type_2 |
| join_type_3 | left | join_type_3 |
Join Operation Summary
A join operation combines multiple datasets into a single table by matching rows based on common columns (keys), enabling cross-functional analysis across different data sources. In this case, the analysis successfully joined orders, sessions, support, and marketing data on customer_id and date/campaign_date, resulting in 6 rows and 11 columns that integrate order metrics (total, items), engagement data (pageviews, session duration), customer support (tickets), and marketing performance (channel, spend, conversions).
The join type used appears to be a standard/inner join (based on the tool configuration), which retains only rows where matching keys exist across all datasets—explaining why the final dataset contains fewer rows than the sum of individual sources, as unmatched records are excluded.
Input datasets summary
Data Pipeline Summary
This multi-dataset join operation combined 4 input tables (orders, sessions, support, and marketing) with a combined 31 rows, resulting in only 6 output rows—an 80.6% row reduction. The significant loss suggests an inner join was performed, which retains only rows with matching keys across all datasets; if an outer join had been used instead, the output would contain more rows (up to 31) with null values for non-matching records.
The 25 rows removed raises important data quality concerns: this indicates substantial key misalignment or missing values across the joined datasets, meaning most records lack corresponding matches. Before proceeding with analysis, investigate whether this high removal rate reflects legitimate data filtering (e.g., only complete customer journeys) or problematic data quality issues like incomplete key populations, inconsistent identifiers, or upstream data collection gaps.
Join Process Log
Join process log showing step-by-step statistics
| step | dataset_name | join_type | rows_before | rows_after | columns_before | columns_after | rows_changed | columns_added |
|---|---|---|---|---|---|---|---|---|
| 1 | sessions | inner | 8 | 6 | 4 | 6 | -2 | 2 |
| 2 | support | left | 6 | 6 | 6 | 8 | 0 | 2 |
| 3 | marketing | left | 6 | 6 | 8 | 11 | 0 | 3 |
I appreciate you providing this data profile, but I'm unable to generate meaningful insights at this time because the join_log table is empty — the sample rows are truncated and no actual data is available to analyze the sequential join process.
To explain the step-by-step join process, I would need:
- Before/after row counts for each of the 3 join steps
- Join types used (INNER, LEFT, RIGHT, FULL, CROSS)
- Column names added at each step
- Key columns used for matching
Could you provide the actual join_log data or a summary showing the row counts and join types for each step? This would allow me to explain whether rows increased (due to CROSS or FULL joins), decreased (due to INNER joins), or remained stable.
Data Preview
Preview of joined data (first 6 rows)
| customer_id | date | order_total | items_count | pageviews | session_duration | tickets_opened | satisfaction_score | channel | spend | conversions |
|---|---|---|---|---|---|---|---|---|---|---|
| C001 | 2026-01-15 | 125.5 | 3 | 12 | 340 | 0 | 9.2 | 150 | 3 | |
| C001 | 2026-01-22 | 89.99 | 2 | 8 | 220 | social | 200 | 5 | ||
| C002 | 2026-01-15 | 45 | 1 | 5 | 120 | 2 | 6.5 | 100 | 2 | |
| C002 | 2026-01-22 | 210.8 | 5 | 15 | 480 | 1 | 7.8 | search | 300 | 8 |
| C003 | 2026-01-15 | 67.3 | 2 | 3 | 85 | 0 | 8.9 | social | 175 | 4 |
| C004 | 2026-01-15 | 155 | 4 | 20 | 600 | 3 | 5.1 | search | 250 | 6 |
Joined Dataset Preview
The joined dataset preview displays the first 6 rows to help verify that the join operation completed successfully; however, the actual data appears to be truncated in this summary, showing placeholder values rather than the specific column contents from the input datasets. While the metadata indicates 11 columns are present in the joined result, the specific column names and their source datasets cannot be identified from the truncated preview provided.
Executive Summary
Executive summary of dataset join operation
| Metric | Value |
|---|---|
| Datasets Joined | 4 |
| Join Keys | dim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date |
| Join Dimensions | 2 |
| Join Type | step1: inner, step2: left, step3: left |
| Input Rows | 31 |
| Output Rows | 6 |
| Match Rate | 19.4% |
Key Findings:
• Join operation completed without errors
• All datasets had the required join column
• Result contains data from all input sources
Recommendation: Use the joined dataset for downstream analysis across all input sources.
Executive Summary: Dataset Join Operation
Four datasets were successfully combined into 6 output rows using a multi-step join strategy (inner join followed by two left joins) on customer_id and date fields. This approach preserved all records from the later datasets while filtering to matching records in the initial inner join, resulting in a consolidated view across all input sources. The 31 input rows reduced to 6 outputs, indicating the inner join step removed non-matching records; users should verify this data loss is acceptable or consider adjusting the join strategy if broader coverage is needed. The combined dataset is now ready for downstream analysis with integrated customer, date, and campaign information.