大数据集群CDH5.11.0搭建及配置

环境

系统环境

  1. CentOS7 3台
# host
192.168.237.100 hadoop001
192.168.237.110 hadoop002
192.168.237.120 hadoop003
  1. SSH免密登录
  2. 关闭防火墙
# 关闭防火墙
systemctl stop firewalld
# 关闭开启自启
systemctl disable firewalld
  1. SELINUX关闭
setenforce 0
sed -i "s/SELINUX=enforcing/SELINUX=disabled/" /etc/selinux/config
iptables --flush
reboot  #重启生效

软件环境

  1. JDK1.8
# 查看是否安装openjdk
rpm -qa | grep jdk
# 卸载openjdk, 否则Cloudera 安装parcel hang
rpm -e java-1.8.0-openjdk-devel-1.8.0.181-3.b13.el7_5.x86_64 java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64 java-1.8.0-openjdk-headless-1.8.0.181-3.b13.el7_5.x86_64
# 安装Oracle JDK

最好将JDK安装在/usr/java/default中,有的版本要求安装在此目录,要不然找不到JAVA_HOME

  1. MySQL5.7
#为hive建库hive
mysql>create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Activity Monitor建库amon
mysql>create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Oozie建库oozie
mysql>create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Hue建库hue
mysql>create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

mysql>grant all privileges on *.* to 'root'@'hadoop001' identified by 'root@pierce' with grant option;
mysql>flush privileges;
  1. 依赖安装
# pstree
yum install psmisc

安装说明

  • 在线
  • 离线(推荐): 系统侵入性小, 便于升级

软件下载与安装

# cm5.11.0下载
wget http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.11.0_x86_64.tar.gz
# cdh5.11.0下载
wget http://archive.cloudera.com/cdh5/parcels/5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel
wget http://archive.cloudera.com/cdh5/parcels/5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1

安装Cloudera Manager Server 和Agent

  1. 解压cm
sudo tar -zxvf cloudera-manager-centos7-cm5.11.0_x86_64.tar.gz -C /opt/cm/
  1. 配置 CM Server数据库(master)
sudo cp mysql-connector-java-5.1.6-bin.jar /opt/cm/cm-5.11.0/share/cmf/lib/
# 创建数据库scm, 用户scm, 密码scm
/opt/cm/cm-5.11.0/share/cmf/schema/scm_prepare_database.sh mysql -hhadoop001 -uroot -proot --scm-host hadoop001 scm scm scm
  1. 配置 CM Agent(master)
# vi /opt/cm/cm-5.11.0/etc/cloudera-scm-agent/config.ini

[General]
# Hostname of the CM server.
server_host=hadoop001  #修改为主节点主机名
  1. 将Agent文件从主节点分发到其他从节点:(master)
scp -r /opt/cm/cm-5.11.0 hadoop002:/opt/cm/cm-5.11.0
scp -r /opt/cm/cm-5.11.0 hadoop003:/opt/cm/cm-5.11.0
  1. 创建用户cloudera-scm(hadoop001,hadoop002,hadoop003): 每个节点上执行命令创建cloudera-scm用户,因为CM使用cloudera-scm用户管理
useradd --system --home=/opt/cm/cm-5.11.0/run/cloudera-scm-server/ --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
  1. 准备Parcels,用以安装CDH5
    1. 在主节点hadoop001(master)上,将cdh parcels安装包放置到/opt/cloudera/parcel-repo目录下,并修改权限:
    cp /home/hadoop/softwares/cm5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel /opt/cm/cloudera/parcel-repo/
    cp /home/hadoop/softwares/cm5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1 /opt/cm/cloudera/parcel-repo/
    mv /opt/cm/cloudera/parcel-repo/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1   /opt/cm/cloudera/parcel-repo/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha
    
    1. 在所有cloudera-agent上创建parcels目录(这一步可以不用做)
  2. 启动 CM Server和Agent
    1. 启动服务
    # 主节点启动(hadoop001):
    /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-server start
    # 所有节点启动(hadoop001,hadoop002,hadoop003):
    /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-agent start
    
    1. 添加开机自启动
    chmod +x /etc/rc.d/rc.local
    # vi /etc/rc.d/rc.local
    
    # 在此文件中添加启动命令即可实现开机启动
    #主节点启动(hadoop001)
    /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-server start
    # 所有节点启动(hadoop001,hadoop002,hadoop003):
    /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-agent start
    
  3. 登录: http://hadoop001:7180

CDH部署

  1. 选择节点
  2. 选择本地Parcel
    • 如果未显示, 在更多选项中修改本地Parcel路径, 然后重启server和agent
  3. 服务器检查: 修复警告
sysctl -w vm.swappiness=10
echo "vm.swappiness=10" >>/etc/sysctl.conf

# 将下面两句添加到系统启动脚本中如`/etc/rc.local`,以便系统重启生效
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
  1. 选择服务
  2. 服务配置
  3. 数据库设置
  4. 集群设置
# 解决spark找不到JAVA_HOME问题
mkdir -p /usr/java
ln -s /usr/local/jdk目录  /usr/java/default
# hive mysql驱动
cp mysql-connector-java-5.1.42-bin.jar /opt/cm/cloudera/parcels/CDH/lib/hive/lib/
# oozie mysql驱动
cp /home/hadoop/lib/mysql-connector-java-5.1.6-bin.jar /usr/share/java/mysql-connector-java.jar
  1. 启动服务

Kafka安装

wget http://archive.cloudera.com/kafka/parcels/2.1.1/KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel
wget http://archive.cloudera.com/kafka/parcels/2.1.1/KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1
wget http://archive.cloudera.com/kafka/parcels/2.1.1/manifest.json
  • 拷贝到parcel-repo:
mv KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha
  • 主界面->主机->Parcel->检查新Parcel->分配/激活
  • 添加服务

配置

  • bootstrap.servers: node00:9092,node01:9092,node02:9092
  • source.bootstrap.servers: node00:9092,node01:9092,node02:9092
  • whitelist: node00:9092,node01:9092,node02:9092
    • 如开启Kafka MirrorMaker需要配置该项, 否则角色日志中会报Error: whitelist must be specified
  • broker_max_heap_size: 1G
  • mirror_maker_max_heap_size: 1G
  • 修改带var的路径

参考