大数据培训新三板挂牌机构 股票代码:837906 | EN CN

小牛学堂hadoop学习资源之基于Hive的Hadoop日志分析

于2017-03-17由小牛君创建

分享到:


本文将本地的Hadoop日志,加载到Hive数据仓库中,再过滤日志中有用的日志信息转存到MySQL数据库里。

  环境:hive-0.12.0+Hadoop1.2.1

  1、日志格式

 

2014-04-17 22:53:30,621 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_4856124673669777083 to 127.0.0.1:50010 
2014-04-17 22:53:30,621 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_3952951948255939678 to 127.0.0.1:50010 
2014-04-17 22:53:30,629 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_-7659295115271387204 to 127.0.0.1:50010 
2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_1247712269990837159 to 127.0.0.1:50010 
2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_3041853621376201540 to 127.0.0.1:50010 
2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_5165085626508844330 to 127.0.0.1:50010 
2014-04-17 22:53:30,713 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_5058515618805863349 to 127.0.0.1:50010

 

  日志的格式大概是这样子的,这里采用空格方式对其分隔,组织成多列,不过后面的提示信息不好处理,暂时以3列来存储。

  表结构定义:

 

//建立Hive表,用来存储日志信息  
        HiveUtil.createTable("create table if not exists loginfo11   
( rdate String,time ARRAY<string>,type STRING,relateclass STRING,  
information1 STRING,information2 STRING,information3 STRING)   
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'");  

 

  2、代码:

  GetConnect.Java类负责建立与Hive、Mysql的连接与关闭;

 

//package com.my.hivetest;  
  
import java.sql.Connection;  
import java.sql.DriverManager;  
import java.sql.ResultSet;  
import java.sql.SQLException;  
import java.sql.Statement;  
  
public class getConnect {  
  
    private static Connection conn = null;  
    private static Connection conntomysql = null;  
  
    private getConnect() {  
    }  
    public static Connection getHiveConn() throws SQLException {  
        if (conn == null)  
        {  
            try {  
                Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");  
            } catch (ClassNotFoundException e) {  
                // TODO Auto-generated catch block  
                e.printStackTrace();  
                System.exit(1);  
            }  
            conn = DriverManager.getConnection(  
                    "jdbc:hive://localhost:50031/default", "", "");  
            System.out.println(1111);  
        }  
        return conn;  
    }  
  
    public static Connection getMysqlConn() throws SQLException {  
        if (conntomysql == null)  
        {  
            try {  
                Class.forName("com.mysql.jdbc.Driver");  
            } catch (ClassNotFoundException e) {  
                // TODO Auto-generated catch block  
                e.printStackTrace();  
                System.exit(1);  
            }  
            conntomysql = DriverManager.getConnection(  
                "jdbc:mysql://localhost:3306/hadoop?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=GBK",  
                "root", "123456");  
            System.out.println(1111);  
        }  
        return conntomysql;  
    }  
    public static void closeHive() throws SQLException {  
        if (conn != null)  
            conn.close();  
    }  
    public static void closemysql() throws SQLException {  
        if (conntomysql != null)  
            conntomysql.close();  
    }  
}  

 

  exeHiveQL.java类,执行类,实现main函数。

 

//package com.my.hivetest;  
  
import java.sql.Connection;  
import java.sql.DriverManager;  
import java.sql.ResultSet;  
import java.sql.SQLException;  
import java.sql.Statement;  
  
public class exeHiveQL {  
    public static void main(String[] args) throws SQLException {  
  
        if (args.length < 2) {  
            System.out.print("请输入查询条件: 日志级别 日期");  
            System.exit(1);  
        }  
  
        String type = args[0];  
        String date = args[1];  
  
        //在hive中创建表  
        HiveUtil.createTable(  
            "create table if not exists loginfo11   
            ( rdate String,time ARRAY<string>,type STRING,  
            relateclass STRING,information1 STRING,information2 STRING,  
            information3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '   
            COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':'");  
        //加载hadoop日志  
        HiveUtil.loadDate("load data local inpath '/root/hadoop-1.2.1/logs/*.log.*' overwrite into table loginfo11");  
        //查询有用的信息  
        //test code  
        String str = "select rdate,time[0],type,relateclass,information1,information2,information3 from loginfo11 where type='"  
            + type + "' and rdate='" + date + "' ";  
        System.out.println(str + "----test");  
        ResultSet res1 = HiveUtil.queryHive(  
            "select rdate,time[0],type,relateclass,information1,  
            information2,information3 from loginfo11 where type='"+ type + "' and rdate='" + date + "' ");  
        //查询结果转存到mysql中  
        HiveUtil.hiveTomysql(res1);  
        //关闭hive连接  
        getConnect.closeHive();  
        //关闭mysql连接  
        getConnect.closemysql();  
    }  
}  

 

 在运行之前需要开启hiveserver服务,这里端口号50031要与GetConnect.Java类的一致。

 

# bin/hive --service hiveserver -p 50031

 

  然后在eclipse中运行起来,设置输入参数

  ERROR2014-04-14

  不过在运行之前还需要导入各种包:
  hive-jdbc-0.12.0.jar
  hive-metastore-0.12.0.jar
  mysql-connector-java-5.1.10.jar(mysql的jdbc驱动,以及hive/lib下所有的包(为图省事全导入了。)
  还有一点就是,要预先在mysql中建立数据库hadoop,并且在里面建立表hadooplog。表的格式与代码中的保持一致即可。