Analysis overview and configuration

Configuration

Analysis TypeDataset Join

CompanyTest

ObjectiveJoin orders, sessions, support, and marketing data on customer_id and date/campaign_date

Analysis Date2026-03-02

Processing Idtest_1772483180

Total Observations6

Module Parameters

Parameter	Value	_row
join_type	inner	join_type
join_type_1	inner	join_type_1
join_type_2	left	join_type_2
join_type_3	left	join_type_3

Dataset Join analysis for Test

Interpretation

Join Operation Summary

A join operation combines multiple datasets into a single table by matching rows based on common columns (keys), enabling cross-functional analysis across different data sources. In this case, the analysis successfully joined orders, sessions, support, and marketing data on customer_id and date/campaign_date, resulting in 6 rows and 11 columns that integrate order metrics (total, items), engagement data (pageviews, session duration), customer support (tickets), and marketing performance (channel, spend, conversions).

The join type used appears to be a standard/inner join (based on the tool configuration), which retains only rows where matching keys exist across all datasets—explaining why the final dataset contains fewer rows than the sum of individual sources, as unmatched records are excluded.

Input datasets summary

Data Quality

Initial Rows31

Final Rows6

Rows Removed25

Input Summary

dataset_name	rows	columns	_row
orders	8	4	orders
sessions	8	4	sessions
support	7	4	support
marketing	8	5	marketing

Data Pipeline: Multi-dataset join operation
Input Datasets: 4
Total Input Rows: 31
Output Rows: 6
Rows Removed: 25 (80.6%)

Interpretation

Data Pipeline Summary

This multi-dataset join operation combined 4 input tables (orders, sessions, support, and marketing) with a combined 31 rows, resulting in only 6 output rows—an 80.6% row reduction. The significant loss suggests an inner join was performed, which retains only rows with matching keys across all datasets; if an outer join had been used instead, the output would contain more rows (up to 31) with null values for non-matching records.

The 25 rows removed raises important data quality concerns: this indicates substantial key misalignment or missing values across the joined datasets, meaning most records lack corresponding matches. Before proceeding with analysis, investigate whether this high removal rate reflects legitimate data filtering (e.g., only complete customer journeys) or problematic data quality issues like incomplete key populations, inconsistent identifiers, or upstream data collection gaps.

Join process log showing step-by-step statistics

step	dataset_name	join_type	rows_before	rows_after	columns_before	columns_after	rows_changed	columns_added
1	sessions	inner	8	6	4	6	-2	2
2	support	left	6	6	6	8	0	2
3	marketing	left	6	6	8	11	0	3

Interpretation

I appreciate you providing this data profile, but I'm unable to generate meaningful insights at this time because the join_log table is empty — the sample rows are truncated and no actual data is available to analyze the sequential join process.

To explain the step-by-step join process, I would need:

Before/after row counts for each of the 3 join steps
Join types used (INNER, LEFT, RIGHT, FULL, CROSS)
Column names added at each step
Key columns used for matching

Could you provide the actual join_log data or a summary showing the row counts and join types for each step? This would allow me to explain whether rows increased (due to CROSS or FULL joins), decreased (due to INNER joins), or remained stable.

Preview of joined data (first 6 rows)

customer_id	date	order_total	items_count	pageviews	session_duration	tickets_opened	satisfaction_score	channel	spend	conversions
C001	2026-01-15	125.5	3	12	340	0	9.2	email	150	3
C001	2026-01-22	89.99	2	8	220			social	200	5
C002	2026-01-15	45	1	5	120	2	6.5	email	100	2
C002	2026-01-22	210.8	5	15	480	1	7.8	search	300	8
C003	2026-01-15	67.3	2	3	85	0	8.9	social	175	4
C004	2026-01-15	155	4	20	600	3	5.1	search	250	6

Interpretation

Joined Dataset Preview

The joined dataset preview displays the first 6 rows to help verify that the join operation completed successfully; however, the actual data appears to be truncated in this summary, showing placeholder values rather than the specific column contents from the input datasets. While the metadata indicates 11 columns are present in the joined result, the specific column names and their source datasets cannot be identified from the truncated preview provided.

Key Metrics

datasets_joined: 4
final_rows: 6
join_type: step1: inner, step2: left, step3: left

Key Findings

Metric	Value
Datasets Joined	4
Join Keys	dim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date
Join Dimensions	2
Join Type	step1: inner, step2: left, step3: left
Input Rows	31
Output Rows	6
Match Rate	19.4%

Summary

Bottom Line: Successfully joined 4 datasets using step1: inner, step2: left, step3: left join on column 'dim1: customer_id = customer_id = customer_id = customer_id; dim2: date = date = date = campaign_date'. Combined 31 input rows into 6 output rows.

Key Findings:
• Join operation completed without errors
• All datasets had the required join column
• Result contains data from all input sources

Recommendation: Use the joined dataset for downstream analysis across all input sources.

Interpretation

Executive Summary: Dataset Join Operation

Four datasets were successfully combined into 6 output rows using a multi-step join strategy (inner join followed by two left joins) on customer_id and date fields. This approach preserved all records from the later datasets while filtering to matching records in the initial inner join, resulting in a consolidated view across all input sources. The 31 input rows reduced to 6 outputs, indicating the inner join step removed non-matching records; users should verify this data loss is acceptable or consider adjusting the join strategy if broader coverage is needed. The combined dataset is now ready for downstream analysis with integrated customer, date, and campaign information.

Overview

Configuration

Module Parameters

Interpretation

Join Operation Summary

Data Pipeline

Data Quality

Input Summary

Interpretation

Data Pipeline Summary

Join Process Log

Interpretation

Data Preview

Interpretation

Joined Dataset Preview

Executive Summary

Key Metrics

Key Findings

Summary

Interpretation

Executive Summary: Dataset Join Operation