大数据集群CDH5.11.0搭建及配置
环境
系统环境
- CentOS7 3台
# host
192.168.237.100 hadoop001
192.168.237.110 hadoop002
192.168.237.120 hadoop003
- SSH免密登录
- 关闭防火墙
# 关闭防火墙
systemctl stop firewalld
# 关闭开启自启
systemctl disable firewalld
- SELINUX关闭
setenforce 0
sed -i "s/SELINUX=enforcing/SELINUX=disabled/" /etc/selinux/config
iptables --flush
reboot #重启生效
软件环境
- JDK1.8
# 查看是否安装openjdk
rpm -qa | grep jdk
# 卸载openjdk, 否则Cloudera 安装parcel hang
rpm -e java-1.8.0-openjdk-devel-1.8.0.181-3.b13.el7_5.x86_64 java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64 java-1.8.0-openjdk-headless-1.8.0.181-3.b13.el7_5.x86_64
# 安装Oracle JDK
最好将JDK安装在/usr/java/default中,有的版本要求安装在此目录,要不然找不到JAVA_HOME
- MySQL5.7
#为hive建库hive
mysql>create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Activity Monitor建库amon
mysql>create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Oozie建库oozie
mysql>create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
#为Hue建库hue
mysql>create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
mysql>grant all privileges on *.* to 'root'@'hadoop001' identified by 'root@pierce' with grant option;
mysql>flush privileges;
- 依赖安装
# pstree
yum install psmisc
安装说明
- 在线
- 离线(推荐): 系统侵入性小, 便于升级
软件下载与安装
# cm5.11.0下载
wget http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.11.0_x86_64.tar.gz
# cdh5.11.0下载
wget http://archive.cloudera.com/cdh5/parcels/5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel
wget http://archive.cloudera.com/cdh5/parcels/5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1
安装Cloudera Manager Server 和Agent
- 解压cm
sudo tar -zxvf cloudera-manager-centos7-cm5.11.0_x86_64.tar.gz -C /opt/cm/
- 配置 CM Server数据库(master)
sudo cp mysql-connector-java-5.1.6-bin.jar /opt/cm/cm-5.11.0/share/cmf/lib/
# 创建数据库scm, 用户scm, 密码scm
/opt/cm/cm-5.11.0/share/cmf/schema/scm_prepare_database.sh mysql -hhadoop001 -uroot -proot --scm-host hadoop001 scm scm scm
- 配置 CM Agent(master)
# vi /opt/cm/cm-5.11.0/etc/cloudera-scm-agent/config.ini
[General]
# Hostname of the CM server.
server_host=hadoop001 #修改为主节点主机名
- 将Agent文件从主节点分发到其他从节点:(master)
scp -r /opt/cm/cm-5.11.0 hadoop002:/opt/cm/cm-5.11.0
scp -r /opt/cm/cm-5.11.0 hadoop003:/opt/cm/cm-5.11.0
- 创建用户cloudera-scm(hadoop001,hadoop002,hadoop003): 每个节点上执行命令创建cloudera-scm用户,因为CM使用cloudera-scm用户管理
useradd --system --home=/opt/cm/cm-5.11.0/run/cloudera-scm-server/ --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
- 准备Parcels,用以安装CDH5
- 在主节点hadoop001(master)上,将cdh parcels安装包放置到/opt/cloudera/parcel-repo目录下,并修改权限:
cp /home/hadoop/softwares/cm5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel /opt/cm/cloudera/parcel-repo/ cp /home/hadoop/softwares/cm5.11.0/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1 /opt/cm/cloudera/parcel-repo/ mv /opt/cm/cloudera/parcel-repo/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha1 /opt/cm/cloudera/parcel-repo/CDH-5.11.0-1.cdh5.11.0.p0.34-el7.parcel.sha
- 在所有cloudera-agent上创建parcels目录(这一步可以不用做)
- 启动 CM Server和Agent
- 启动服务
# 主节点启动(hadoop001): /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-server start # 所有节点启动(hadoop001,hadoop002,hadoop003): /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-agent start
- 添加开机自启动
chmod +x /etc/rc.d/rc.local # vi /etc/rc.d/rc.local # 在此文件中添加启动命令即可实现开机启动 #主节点启动(hadoop001) /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-server start # 所有节点启动(hadoop001,hadoop002,hadoop003): /opt/cm/cm-5.11.0/etc/init.d/cloudera-scm-agent start
- 登录:
http://hadoop001:7180
CDH部署
- 选择节点
- 选择本地Parcel
- 如果未显示, 在更多选项中修改本地Parcel路径, 然后重启server和agent
- 服务器检查: 修复警告
sysctl -w vm.swappiness=10
echo "vm.swappiness=10" >>/etc/sysctl.conf
# 将下面两句添加到系统启动脚本中如`/etc/rc.local`,以便系统重启生效
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
- 选择服务
- 服务配置
- 数据库设置
- 集群设置
# 解决spark找不到JAVA_HOME问题
mkdir -p /usr/java
ln -s /usr/local/jdk目录 /usr/java/default
# hive mysql驱动
cp mysql-connector-java-5.1.42-bin.jar /opt/cm/cloudera/parcels/CDH/lib/hive/lib/
# oozie mysql驱动
cp /home/hadoop/lib/mysql-connector-java-5.1.6-bin.jar /usr/share/java/mysql-connector-java.jar
- 启动服务
Kafka安装
- 查看CDH集群支持的Kafka版本: https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#pcm_kafka
- 下载csd包:
http://archive.cloudera.com/csds/kafka/
- 下载kafka parcel包
wget http://archive.cloudera.com/kafka/parcels/2.1.1/KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel
wget http://archive.cloudera.com/kafka/parcels/2.1.1/KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1
wget http://archive.cloudera.com/kafka/parcels/2.1.1/manifest.json
- 拷贝到parcel-repo:
mv KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha
- 主界面->主机->Parcel->检查新Parcel->分配/激活
- 添加服务
配置
- bootstrap.servers:
node00:9092,node01:9092,node02:9092
- source.bootstrap.servers:
node00:9092,node01:9092,node02:9092
- whitelist:
node00:9092,node01:9092,node02:9092
- 如开启Kafka MirrorMaker需要配置该项, 否则角色日志中会报Error: whitelist must be specified
- broker_max_heap_size: 1G
- mirror_maker_max_heap_size: 1G
- 修改带var的路径