There are various cases where CROSS joining the relations / tables is required. It returns all the rows in all the tables/relations provided in a query using CROSS join: each row in the first table is paired with all the rows in the second table/relation. This occurs when no association between the two relations/tables is defined.
In this article, we’ll look at how to implement the CROSS join in Apache Pig.
1) Sample Data.
Dataset1:
ColorID, Color
1, Blue
2, Red
3, Green
4, Yellow
5, White
6, Black
Dataset2:
ProductID, Product
101, T-Shirts
102, Shirts
103, Pants
104, Shorts
Assume that the specified products in Dataset2 are available in the colors given in Dataset1, thus we must list all potential matches.
2) Loading the data into a relation.
“Execute the below in a single line to avoid each line execution.
Colors = LOAD ‘Desktop/Docs/colors.csv’ USING PigStorage(‘,’) AS
“““( colorid:int,
“““color:chararray);
Products = LOAD ‘Desktop/Docs/products.csv’ USING PigStorage(‘,’) AS
“““( productid:int,
“““product:chararray);
Cross Joining:
CrossJoinedData = CROSS Colors BY Products;
Dump CrossJoinedData;
Result:
ColorID, Color, ProductID, Product
1, Blue, 101, T-Shirts
2, Red, 101, T-Shirts
3, Green, 101, T-Shirts
4, Yellow, 101, T-Shirts
5, White, 101, T-Shirts
6, Black, 101, T-Shirts
1, Blue, 102, Shirts
2, Red, 102, Shirts
3, Green, 102, Shirts
4, Yellow, 102, Shirts
5, White, 102, Shirts
6, Black, 102, Shirts
1, Blue, 103, Pants
2, Red, 103, Pants
3, Green, 103, Pants
4, Yellow, 103, Pants
5, White, 103, Pants
6, Black, 103, Pants
1, Blue, 104, Shorts
2, Red, 104, Shorts
3, Green, 104, Shorts
4, Yellow, 104, Shorts
5, White, 104, Shorts
6, Black, 104, Shorts
“`SQL equivalent command is :
SELECT * FROM colors, products;
Hope you find this article helpful.
Happy learning..!!!
One comment