
A join brings together two sets of data. Spark compares the value of one or more keys of the left and right data and evaluates a join expression to decide whether it should bring the left set of data and the right set of data. The join expression determines where the two rows should join and the join type determines what should be the result. If you are interesting in Apache Spark please contact us for more information.

Inner Joins
Keeps rows that exist in both datasets

Outer Join
Keeps rows with keys in either dataset

Left Outer Join
Keeps rows with keys in left the dataset

Right Outer Join
Keeps rows with keys in right the dataset
PySpark Join Syntax





For Example:
rc.join(ps, rc.District = ps.Format_district, ‘left_outer’).show()