Hive的基本操作(查询)

avatar
作者
猴君
阅读量:0

1、基础查询

基本语法

select 字段列表|表达式|子查询 from(子查询|视图|临时表|普通表) where [not] 条件A and|or 条件B				--先:面向原始行进行筛选 group by 字段A[,字段B,...]					=> 分组【去重处理】 having 聚合条件(非原始字段条件)				--再:针对聚合后的字段进行二次筛选 order|sort|cluster by 字段A[,字段B,...]		--后:全局排序(非limit的最后一句)	走mapreduce limit N(前N条记录) | M(行号偏移量),N(记录数) 

1.where子句的条件格式

一:关系运算符
关系运算符:> , >= , < , <= , =【等值判断】 , <>【不等于】 
  • 延伸:between (>=)SMALL_VALUE and (<=)BIG_VALUE; 【面向于 数值或日期】
二:逻辑运算符
逻辑运算符:not【非】 , and【与】 , or【或】 
  • 延伸
--if函数: if(BOOLEN_EXPR,VALUE_IF_TRUE,VALUE_IF_FALSE_OR_NULL) 	案例:         select user_id,`if`(order_amount < 1000,'low','high') as consumption         from test1w         where user_gender = '女'         limit 100; 	结果展示: 		user_id	consumption 		652,high         376,high         537,high         280,high         23,high    --空值判断: 1.nvl(VALUE_A,VALUE_B)	=>	VALUE_A为空值(null),则返回VALUE_B。否则返回VALUE_A 2.isnull(VAL)		=>	如果 VAL 为 null,则返回 1 。否则返回 0  --case when函数: case EXPR when V1 then VAL1 when V2 then VAL2 ... else VALN end <=> switch ... case case when 条件1 then VAL1 when 条件2 then VAL2 ... else VALN end <=> if ... else if ... 	案例: 		select user_id, 			case when order_amount<1000 then '低消费人群'  				when order_amount<5000 then '中等消费人群'  				else '高消费人群' end as level         from test1w         where user_gender = '女'         limit 100; 	结果展示: 		user_id	level 		652,高消费人群         376,高消费人群         537,低消费人群         280,中等消费人群 		... 
三:通配符

模糊查询

基本语法: 	like '% | _'	【模糊匹配】 	 讲解: 	% => 任意个任意符号 	_ => 一个任意符号  案例: 	select "张无极" like '张%';		=> true 	select "张无极" like '张_';		=> false 

正则匹配

基本语法: 	rlike '正则表达式' 	如:'^//d+$'  案例: 	select "like" rlike '^[a-zA-Z]{2,4}$';	  =>true 

2.排序

1order by 表达式[field|func|case...when...]    		---【全局排序】:性能差 	优化:在order by B 之前,可以先对数据进行 distribute by A 与 sort by B 		=> 先部分排序,后全局排序 		 2、sort by FIELD_N 								  --在【每一个reducer端】排序 	解释: 		当reducer 的数量为1时,等同于 order by 		FIELD_N 必须是select字段列表中的一员 		一般和 distribute by 配合使用 	 3、cluster by 		--cluster by 字段A = distribute by 字段A + sort by 字段A 

3.分组

1group by 表达式(field|func|case...when) 	--为了聚合而分组,否则类似去重(代替distinct) 	目的:按照某些条件对数据进行分组并进行聚合操作,使用 group by 	多分组: 		1.group by A,B,C 		 		  grouping sets(B,(A,C),(B,C))--指定多个【分组】为:B,(A,C),(B,C) 		   		2.group by cube(A,B,C) 		--排列组合后的所有分组:A,B,C,(A,B),(A,C),(B,C),(A,B,C) 		 		3.group by rollup(A,B,C)	--最左原则的所有分组:A,(A,B),(A,B,C) 		 2、distribute by 表达式(field|func|case...when) 	目的:为了将数据分区,仅仅将数据分发到多个节点上并行处理,使用 distribute by 	解释: 		1.不改变原始行数 		2.类似于 hadoop job 中的 Partitioner。 【默认是采用hash算法】 		3.指定按哪个字段的hashcode分区,配合【预先设置reducer数量】 	注意: 		distribute by【决定进哪个reducer】与sort by【在reducer中排序】一般搭配使用的 		distribute by通常使用在SORT BY语句之前 

小型案例

with product_total as (      select order_item_product_id product_id,sum(order_item_subtotal) total     from cb_order_items     group by order_item_product_id ) select product_id,total from product_total distribute by product_id sort by total desc; 
多分组案例
1.grouping sets 案例:✔ 	create temporary table tmp_cb_order_ymbsc_sets as 	select year,month,dept_id,cate_id,prod_id 		grouping__id, 		sum(quantity) as quantity, 		round(sum(amount)) as amount 	from tmp_cb_order_ymbsc 	group by year,month,dept_id,cate_id,prod_id 	grouping sets(prod_id,(dept_id,cate_id),(year,month),(year,month,prod_id)) 	order by grouping__id; 	------------------------------------- 	寻找哪几组【去重】: 		select grouping__id 		from tmp_cb_order_ymbsc_sets 		group by grouping__id; 	------------------------------------- 	-- grouping__id: 	6 :	year,month,prod_id 	7 :	year,month 	25 : dept_id,cate_id 	 	30 : prod_id 	 	 2.cube 案例:【不常用】 	select 		year(order_date) as year, 		month(order_date) as month, 		day(order_date) as day, 		count(*) as count, 		grouping__id 	from cb_orders 	group by cube (year(order_date),month(order_date),day(order_date)) 	order by grouping__id; 	 	 	 3.rollup 案例:【不常用】 	select 		year(order_date) as year, 		month(order_date) as month, 		day(order_date) as day, 		count(*) as count, 		grouping__id 	from cb_orders 	group by rollup (year(order_date),month(order_date),day(order_date)) 	order by grouping__id; 

2、子查询

基本语法

select 			可以出现子查询(查某个字段值,与主查询存在逻辑主外键关系) from 			可以出现子查询(数据表的子集 select F1,...,FN from T where ... group by ...) where 			可以出现子查询(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...)) group by FIELD|substr(FIELD,0,4),... having 			可以出现子查询(FIELD in|=|>= (select ONLY_ONE_FIELD_IN ...)) order by FIELD|substr(FIELD,0,4),... 

常用语法【from子查询】

select 字段列表|表达式|子查询 from(     select 字段列表|表达式|子查询					 ---先进行内部的查询     from TABLE     where [not] 条件A and|or 条件B     ... )												---后进行外部的查询 where [not] 条件A and|or 条件B						--后=>先:面向原始行进行筛选 group by 字段A[,字段B,...]	 order by 字段A[,字段B,...]							--后=>再:针对聚合后的字段进行二次筛选 limit N(前N条记录) | M(行号偏移量),N(记录数)		--后=>后:全局排序(非limit的最后一句) 

3、CTE

基本语法

with  SUB_ALIA as(...), SUB_ALTER as(select...from SUB_ALIA...) select... 

小型案例

with     total_amount as(         select sum(order_amount) total         from hive_internal_par_regex_test1w         where year>=2016         group by user_gender, user_id         having total>=20000     ),     level_amount as(         select round(total/10000) as level         from total_amount     ) select level,count(*) as level_count from level_amount group by level;  结果展示: 	level level_count 	2,162     3,125     4,26     5,5 

4、联合查询

数据准备

Class表: +-------+---------+ |classId|className| +-------+---------+ |      1|  yb12211| |      2|  yb12309| |      3|  yb12401| +-------+---------+  Student表: +-----+-------+ | name|classId| +-----+-------+ |henry|      1| |ariel|      2| | jack|      1| | rose|      4| |jerry|      2| | mary|      1| +-----+-------+ 

三种主要形式

一:内连接【inner join】

两集合取交集

select A.内容,....,B.内容,...                              =>字段别名:提高筛选的性能 from TABLE_A as A												 inner join TABLE_B as B on A.主键=B.外键 (and A.fa = VALUE...)  多表√ 两表√        	=>表进行合并时进行【连接条件】 where A.fa = VALUE;                     两表√             =>合并后进行【条件筛选】 group by ... having ... order by ... limit ... 

小型案例

select * from Student S inner join Class C on S.classId = C.classId  结果展示:     +-----+-------+-------+---------+     | name|classId|classId|className|     +-----+-------+-------+---------+     |henry|      1|      1|  yb12211|     |ariel|      2|      2|  yb12309|     | jack|      1|      1|  yb12211|     |jerry|      2|      2|  yb12309|     | mary|      1|      1|  yb12211|     +-----+-------+-------+---------+ 
二:外连接
左外连接【left join】

两个集合取左全集,右交集

select A.内容,....,B.内容,...                              	     =>字段别名:提高筛选的性能 from TABLE_A as A                 									【A为主表】 left [outer] join TABLE_B as B		    							【B为从表】 on A.主键|外键=B.外键|主键 (and A.fa = VALUE...)    多表√ 两表√     =>表进行合并时进行【连接条件】 where A.fa = VALUE;                                 两表√        =>合并后进行【条件筛选】 group by ... having ... order by ... limit ... 

小型案例

select * from Student S left join Class C on S.classId = C.classId  结果展示:     +-----+-------+-------+---------+     | name|classId|classId|className|     +-----+-------+-------+---------+     |henry|      1|      1|  yb12211|     |ariel|      2|      2|  yb12309|     | jack|      1|      1|  yb12211|     | rose|      4|   null|     null|     |jerry|      2|      2|  yb12309|     | mary|      1|      1|  yb12211|     +-----+-------+-------+---------+ 
右外连接【right join】

两集合取右全集,左交集

select A.内容,....,B.内容,...                              		=>字段别名:提高筛选的性能 from TABLE_A as A                 										【A为主表】 right [outer] join TABLE_B as B		    								【B为从表】 on A.主键|外键=B.外键|主键 (and A.fa = VALUE;)    多表√ 两表√      =>表进行合并时进行【连接条件】 where A.fa = VALUE;                               两表√         =>合并后进行【条件筛选】 group by ... having ... order by ... limit ... 

小型案例

select * from Student S right join Class C on S.classId = C.classId  结果展示:     +-----+-------+-------+---------+     | name|classId|classId|className|     +-----+-------+-------+---------+     | mary|      1|      1|  yb12211|     | jack|      1|      1|  yb12211|     |henry|      1|      1|  yb12211|     |jerry|      2|      2|  yb12309|     |ariel|      2|      2|  yb12309|     | null|   null|      3|  yb12401|     +-----+-------+-------+---------+ 
全外连接【full join】

两集合取左右全集

select A.内容,....,B.内容,...                              		 =>字段别名:提高筛选的性能 from TABLE_A as A                 										【A为主表】 full [outer] join TABLE_B as B		    								【B为从表】 on A.主键|外键=B.外键|主键 (and A.fa = VALUE;)    多表√ 两表√       =>表进行合并时进行【连接条件】 where A.fa = VALUE;                               两表√          =>合并后进行【条件筛选】 group by ... having ... order by ... limit ... 

小型案例

select * from Student S full join Class C on S.classId = C.classId  结果展示:     +-----+-------+-------+---------+     | name|classId|classId|className|     +-----+-------+-------+---------+     |henry|      1|      1|  yb12211|     | jack|      1|      1|  yb12211|     | mary|      1|      1|  yb12211|     | null|   null|      3|  yb12401|     | rose|      4|   null|     null|     |ariel|      2|      2|  yb12309|     |jerry|      2|      2|  yb12309|     +-----+-------+-------+---------+ 
三:交叉连接【cross join】

两集合取笛卡尔积

select A.内容,....,B.内容,...                              		 =>字段别名:提高筛选的性能 from TABLE_A as A                 										【A为主表】 cross join TABLE_B as B		    										【B为从表】 on A.主键|外键=B.外键|主键 (and A.fa = VALUE;)    多表√ 两表√       =>表进行合并时进行【连接条件】 where A.fa = VALUE;                               两表√          =>合并后进行【条件筛选】 group by ... having ... order by ... limit ... 

小型案例

select * from Student S cross join Class C on S.classId = C.classId  结果展示:     +-----+-------+-------+---------+     | name|classId|classId|className|     +-----+-------+-------+---------+     |henry|      1|      1|  yb12211|     |henry|      1|      2|  yb12309|     |henry|      1|      3|  yb12401|     |ariel|      2|      1|  yb12211|     |ariel|      2|      2|  yb12309|     |ariel|      2|      3|  yb12401|     | jack|      1|      1|  yb12211|     | jack|      1|      2|  yb12309|     | jack|      1|      3|  yb12401|     | rose|      4|      1|  yb12211|     | rose|      4|      2|  yb12309|     | rose|      4|      3|  yb12401|     |jerry|      2|      1|  yb12211|     |jerry|      2|      2|  yb12309|     |jerry|      2|      3|  yb12401|     | mary|      1|      1|  yb12211|     | mary|      1|      2|  yb12309|     | mary|      1|      3|  yb12401|     +-----+-------+-------+---------+ 

5、联合查询

何为联合查询?

  • 纵向拼接表,高变大

  • 查询字段的【数量】与【类型】必须相同,字段名是以【第一张表为准】。

union与union all的区分

  • union:合并后删除重复项(去重)

  • union all:合并后保留重复项 ✔

小型案例

数据准备

语句

select age,job from bank_client_info_3 union all select age,job from bank_client_info_3; 

广告一刻

为您即时展示最新活动产品广告消息,让您随时掌握产品活动新动态!