Run major compaction on passed table or pass a region row
to major compact an individual region. To compact a single
column family within a region specify the region name
followed by the column family name.
Examples:
Compact all regions in a table:
hbase> major_compact 't1'
hbase> major_compact 'ns1:t1'
Compact an entire region:
hbase> major_compact 'r1'
Compact a single column family within a region:
hbase> major_compact 'r1', 'c1'
Compact a single column family within a table:
hbase> major_compact 't1', 'c1'
Compact table with type "MOB"
hbase> major_compact 't1', nil, 'MOB'
Compact a column family using "MOB" type within a table
hbase> major_compact 't1', 'c1', 'MOB'
/**
* Extends the base <code>Mapper</code> class to add the required input key
* and value classes.
*
* @param <KEYOUT> The type of the key.
* @param <VALUEOUT> The type of the value.
* @see org.apache.hadoop.mapreduce.Mapper
*/
@InterfaceAudience.Public
public abstract class TableMapper<KEYOUT, VALUEOUT>
extends Mapper<ImmutableBytesWritable, Result, KEYOUT, VALUEOUT> {
}
总结
MapReduce读取Hbase数据的API已经封装好了,只需要调用工具类实现即可
知识点13:MR集成Hbase:读Hbase实现
目标
实现从Hbase读取数据,将数据写入文件中
分析
step1:使用TableInputFormat读取Hbase数据
step2:使用TextOutputFormat写入文件
实现
总结
知识点14:MR集成Hbase:写Hbase规则
目标
掌握MapReduce写入Hbase的开发规则
分析
输出由OutputFormat决定
TableOutputFormat:负责实现将上一步的KV数据写入Hbase表中
/**
* Convert Map/Reduce output and write it to an HBase table. The KEY is ignored
* while the output value <u>must</u> be either a {@link Put} or a
* {@link Delete} instance.
*/
@InterfaceAudience.Public
public class TableOutputFormat<KEY> extends OutputFormat<KEY, Mutation>
要求输出的Value类型必须为Mutation类型:Put / Delete
实现
step1:调用工具类初始化Reduce和Output
MapReduce中封装了工具类,实现读取Hbase数据
TableMapReduceUtil.initTableReducerJob
/**
* Use this before submitting a TableReduce job. It will
* appropriately set up the JobConf.
*
* @param table The output table.
* @param reducer The reducer class to use.
* @param job The current job to adjust.
* @throws IOException When determining the region count fails.
*/
public static void initTableReducerJob(
String table,
Class<? extends TableReducer> reducer,
Job job
);
step2:构建Reduce类继承TableReducer
/**
* Extends the basic <code>Reducer</code> class to add the required key and
* value input/output classes.
*
* @param <KEYIN> The type of the input key.
* @param <VALUEIN> The type of the input value.
* @param <KEYOUT> The type of the output key.
* @see org.apache.hadoop.mapreduce.Reducer
*/
@InterfaceAudience.Public
public abstract class TableReducer<KEYIN, VALUEIN, KEYOUT>
extends Reducer<KEYIN, VALUEIN, KEYOUT, Mutation> {
}
org.apache.hadoop
hadoop-common
h a d o o p . v e r s i o n < / v e r s i o n > < / d e p e n d e n c y > < d e p e n d e n c y > < g r o u p I d > o r g . a p a c h e . h a d o o p < / g r o u p I d > < a r t i f a c t I d > h a d o o p − m a p r e d u c e − c l i e n t − c o r e < / a r t i f a c t I d > < v e r s i o n > {hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version> hadoop.version</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop−mapreduce−client−core</artifactId><version>{hadoop.version}
org.apache.hadoop
hadoop-auth
h a d o o p . v e r s i o n < / v e r s i o n > < / d e p e n d e n c y > < d e p e n d e n c y > < g r o u p I d > o r g . a p a c h e . h a d o o p < / g r o u p I d > < a r t i f a c t I d > h a d o o p − h d f s < / a r t i f a c t I d > < v e r s i o n > {hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version> hadoop.version</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop−hdfs</artifactId><version>{hadoop.version}