spark 中hint使用总结

avatar
作者
猴君
阅读量:0

在spark sql 中用户可以使用Join hint来建议Spark使用哪一种Join。在Spark 3.0以前,只支持BROADCAST这种Join hint。从Spark 3.0开始增加了MERGE、SHUFFLE_HASH和SHUFFLE_REPLICATE_NL这三种Join Hint。优先级为BROADCAST > MERGE > SHUFFLE_HASH > SHUFFLE_REPLICATE_NL。如果Join的两侧都添加了BROADCAST或者SHUFFLE_HASH,则Spark会根据joinType和两侧的大小来选择build哪一侧。

-- Join Hints for broadcast join SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key; SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key; SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;  -- Join Hints for shuffle sort merge join SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key; SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key; SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;  -- Join Hints for shuffle hash join SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;  -- Join Hints for shuffle-and-replicate nested loop join SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;  -- When different join strategy hints are specified on both sides of a join, Spark -- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint -- over the SHUFFLE_REPLICATE_NL hint. -- Spark will issue Warning in the following example -- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge) -- is overridden by another hint and will not take effect. SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

spark hint 中使用关系:https://blog.51cto.com/u_15435003/5296344

广告一刻

为您即时展示最新活动产品广告消息,让您随时掌握产品活动新动态!