Join Algorithm | Notion

Motivation

How do we process SELECT * FROM Student WHERE sid > 30 ?
- Learned in previous lectures, use indexes and data structures etc
How do we process SELECT * FROM Student S, Enroll E WHERE S.sid = E.sid ?

Nest-Loop Join (NLJ)

For each r in R:
	For each s in S:
		if r.A = s.A: output(r,s)

Performance are bad when the table is large

Sort-Merge Join (SMJ)

Sort the relations first, then join
Use a two pointer approach to compare each tuple in the table
- e.g. always move the smaller pointer when the 2 pointers' value are not the same

if not sorted, sort R and S by A
i,j = 1,1
while i < len(R) and j < len(S):
	if R[i].A == S[j].A:
			output (R[i],S[j])
			i += 1
			j += 1
	else if R[i].A > S[j].A: j += 1
	else if R[i].A < S[j].A: i += 1

Index Join (IJ)

Create an index for S.A if needed

For each r in R:
	let X = loopkup index on S.A with r.A value
	For each s in X:
			output (r,s)

Hash Join (HJ)

Hash function : h(v) → [1, k]
Q: Given $(r \in R)$ and $(s \in S)$, can r and s join if h(r.A) ≠ h(s.A)?
Main idea
- Partition tuples in R and S based on hash values on join attributes
- Perform "joins" only between partitions of the same hash value

Hashing stage (bucketizing): hash tuples into buckets
- Hash R tuples into G1,...Gk buckets
- Hash S tuples into G1,...Gk buckets
Join stage: join tuples in matching buckets
- For i = 1 to k do
  
  match tuples in Gi, Hi buckets