大数据技术之争:PIG对Hive

B = filter A by date == '20100819' and age < 30;

-- both date and country are partition columns

C = filter A by date == '20100819' and country == 'US';

...

不过如果假定这里存在无数个分区,且我们打算利用Hcatalog通过单一请求对其进行全部查询,那么Pig也将遭遇与Hive类似的问题。在这种情况下,使用glob与通配符来表达可能更为方便。

例如:

Partition-1, Partition-2, Partition-3,....Partition-n exist within the location /user/inputLocation/

Using globs we can provide the input to Pig as:

/user/inputLocation/{Partition-1, Partition-2, Partition-3,....Partition-n}