GreenPlum6.0搭建,超详细

2023-03-07 00:00:00 命令 文件 配置 初始化 主机

目录
一、基础架构介绍

1.网络结构

2.部署架构

二、集群主机基础配置

1.机器准备

2.机器配置

1⃣️域名解析配置

2⃣️设置系统参数

3⃣️配置Linux文件描述符

4⃣️挂在磁盘

5⃣️关闭防火墙

6⃣️配置系统时钟

7⃣️重启系统,让所有配置生效

三、安装GreenPlum数据库,并配置gpadmin用户

1.在ansible主机的/etc/ansible/hosts上配置对应的机器域名:

2.Ansible Playbook样例

3.下载对应的greenplum压缩包

4.运行ansible-playbook命令

四、配置SSH免密登陆

1.登陆master主机并切换成gpadmin用户

2.初始化GreenPlum的path文件

3.在master节点上生成ssh key文件

4.使用ssh-copy-id命令将master的public key添加到其他主机上

5.在master节点上创建主机列表文件

6.执行gpssh-exkeys命令完成n-n的ssh免密登陆

7.验证

五、创建数据存储

1.在master上创建存储目录

2.用gpssh命令在segment主机上创建目录

六、初始化GreenPlum集群

1.登陆master主机,切换gpadmin用户

2.创建初始化host文件

3.拷贝初始化文件到用户目录下

4.修改初始化配置

5.保存并退出文件

6.执行初始化命令

1⃣️跳转到master机器上gpadmin的用户目录,执行初始化命令

2⃣️确认安装步骤

3⃣️初始化失败定位

七、配置GreenPlum环境变量

1.以gpadmin用户登陆master主机

2.编辑.bashrc文件

3.将GreenPlum命令行和MASTER_DATA_DIRECTORY加入初始化命令中

4.退出并保存

5.初始化文件使其生效

一、基础架构介绍
1.网络结构
GP数据通过多台主机进行大量的数据处理;master节点是整个GP集群的入口,用户通过master节点连接并提交sql语句;segment节点功能是处理数据和存储数据,master负责协调各个节点直接的工作负载,如下图所示:





2.部署架构
本编文章部署架构为单master节点,单segment节点;如需部署高可用集群:master主备,segment冗余,可参考官网https://gpdb.docs.pivotal.io/6-0/main/index.html,或评论区留言




二、集群主机基础配置
1.机器准备
本例子用的是5台16C32G的腾讯云机器,每台机器挂在一个100G的数据磁盘

2.机器配置
机器配置主要包含一下三个方面:

共享内存:如果segment节点没有配置共享内存,GP集群将无法启动。大部分Linux的默认共享内存配置低于GP集群所需要的共享内存;同时,你还需要关闭主机上的OOM killer。
网络:GP必须要一个大流量、优化的网络
用户限制:GP必须要对相关文件设置高度的访问权限;默认的文件访问权限限制可能会造成GP访问失败
1⃣️域名解析配置
用root用户登陆各台主机,编辑/etc/hosts,并将IP和域名映射配置加到末尾,为了让5台机器之间通过域名能相互访问,如:

# master
10.0.0.1 mdw
# segments
10.0.0.2 sdw1
10.0.0.3 sdw2
10.0.0.4 sdw3
10.0.0.5 sdw4

可在任意一台机器ping对方的域名测试,如:在master上执行 ping sdw1

2⃣️设置系统参数
5个地方需要根据系统的值配置

# kernel.shmall = _PHYS_PAGES / 2 # 备注<1>
kernel.shmall = 4000000000
# kernel.shmmax = kernel.shmall * PAGE_SIZE # 备注<2>
kernel.shmmax = 500000000
kernel.shmmni = 4096
vm.overcommit_memory = 2
vm.overcommit_ratio = 95
net.ipv4.ip_local_port_range = 10000 65535 # 备注<3>
kernel.sem = 500 2048000 200 40960
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # 备注<5>
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296

Notes:

备注<1><2>
kernel.shmall(共享内存页总数)
kernel.shmmax (共享内存段的大值)
一般来讲,这两个参数的值应该是物理内存的一半,可以通过操作系统的值_PHYS_PAGES和PAGE_SIZE计算得出。

kernel.shmall = ( _PHYS_PAGES / 2)
kernel.shmmax = ( _PHYS_PAGES / 2) * PAGE_SIZE

也可以通过以下两个命令得出这两个参数的值:

$ echo $(expr $(getconf _PHYS_PAGES) / 2)
$ echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))

如果得出的kernel.shmmax值小于系统的默认值,则引用系统默认值即可

备注<3>
segment使用的端口是6000开始
segment mirror使用的端口是7000开始
所以配置默认值即可
net.ipv4.ip_local_port_range = 10000 65535


备注<5>
对于64G内存的操作系统,建议配置如下值:
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5GB
vm.dirty_bytes = 4294967296 # 4GB

对于小于64G内存的操作系统,建议配置如下值:
vm.dirty_background_ratio = 3
vm.dirty_ratio = 10

3⃣️配置Linux文件描述符
配置如下参数到/etc/security/limits.conf文件中:

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
4⃣️挂在磁盘
官方建议使用XFS磁盘类型,当然其他磁盘类型也是可以

示例配置如下:

将/dev/data磁盘挂载到/data目录下,配置/etc/fstab文件以使Linux系统启动默认挂载磁盘,如下配置添加到文件/etc/fstab:

/dev/data /data xfs nodev,noatime,nobarrier,inode64 0 0
5⃣️关闭防火墙
自行检查
# systemctl status firewalld
# systemctl stop firewalld.service
# systemctl disable firewalld.service

# /sbin/chkconfig iptables off
6⃣️配置系统时钟
配置segment主机与master时钟同步

将如下配置加入到/etc/ntp.conf文件中:

server mdw prefer
mdw为前面master配置的域名

7⃣️重启系统,让所有配置生效


三、安装GreenPlum数据库,并配置gpadmin用户
该步骤主要是安装GreenPlum软件包,创建gpadmin用户并配置目录权限

以下示例通过ansible-playbook安装,也可以通过yum、apt等包管理工具安装:

1.在ansible主机的/etc/ansible/hosts上配置对应的机器域名:
[greenplum]
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.5
可配置连接用户名或密码,如:

[greenplum]
10.0.0.1 ansible_ssh_user=root ansible_ssh_pass=xxx
10.0.0.2 ansible_ssh_user=root ansible_ssh_pass=xxx
10.0.0.3 ansible_ssh_user=root ansible_ssh_pass=xxx
10.0.0.4 ansible_ssh_user=root ansible_ssh_pass=xxx
10.0.0.5 ansible_ssh_user=root ansible_ssh_pass=xxx
2.Ansible Playbook样例
Ansible Playbook - Greenplum Database Installation for CentOS 7

---

- hosts: greenplum
vars:
- version: "6.0.0"
- greenplum_admin_user: "gpadmin"
- greenplum_admin_password: "$changeme"
# - package_path: passed via the command line with: -e package_path=./greenplum-db-6.0.0-rhel7-x86_64.rpm
remote_user: root
become: yes
become_method: sudo
connection: ssh
gather_facts: yes
tasks:
- name: create greenplum admin user
user:
name: "{{ greenplum_admin_user }}"
password: "{{ greenplum_admin_password | password_hash('sha512', 'DvkPtCtNH+UdbePZfm9muQ9pU') }}"
- name: copy package to host
copy:
src: "{{ package_path }}"
dest: /tmp
- name: install package
yum:
name: "/tmp/{{ package_path | basename }}"
state: present
- name: cleanup package file from host
file:
path: "/tmp/{{ package_path | basename }}"
state: absent
- name: find install directory
find:
paths: /usr/local
patterns: 'greenplum*'
file_type: directory
register: installed_dir
- name: change install directory ownership
file:
path: '{{ item.path }}'
owner: "{{ greenplum_admin_user }}"
group: "{{ greenplum_admin_user }}"
recurse: yes
with_items: "{{ installed_dir.files }}"
- name: update pam_limits
pam_limits:
domain: "{{ greenplum_admin_user }}"
limit_type: '-'
limit_item: "{{ item.key }}"
value: "{{ item.value }}"
with_dict:
nofile: 524288
nproc: 131072
- name: find installed greenplum version
shell: . /usr/local/greenplum-db/greenplum_path.sh && /usr/local/greenplum-db/bin/postgres --gp-version
register: postgres_gp_version
- name: fail if the correct greenplum version is not installed
fail:
msg: "Expected greenplum version {{ version }}, but found '{{ postgres_gp_version.stdout }}'"
when: "version is not defined or version not in postgres_gp_version.stdout"

3.下载对应的greenplum压缩包
https://network.pivotal.io/products/pivotal-gpdb/#/releases/449820/file_groups/2047

4.运行ansible-playbook命令
ansible-playbook ansible-playbook.yml -i hosts -e package_path=./greenplum-db-6.0.0-rhel7-x86_64.rpm
四、配置SSH免密登陆
1.登陆master主机并切换成gpadmin用户
2.初始化GreenPlum的path文件
$ source /usr/local/greenplum-db-<version>/greenplum_path.sh
3.在master节点上生成ssh key文件
$ ssh-keygen
提示语一直按Enter使用默认值即可

4.使用ssh-copy-id命令将master的public key添加到其他主机上
$ ssh-copy-id sdw1
$ ssh-copy-id sdw2
$ ssh-copy-id sdw3
$ ssh-copy-id sdw4
按照提示语输入密码对应主机上gpadmin用户的密码即可

至此,完成1-n的ssh免密登陆

5.在master节点上创建主机列表文件
跳转到gpadmin用户目录下,创建hostfile_exkeys文件,写入包含master在内的所有节点的域名,如下:

mdw
sdw1
sdw2
sdw3
sdw4
tips:确保在每台机器上的/etc/hosts文件上配置域名解析文件,否则各个主机之间将不能访问,参考2-2-1

6.执行gpssh-exkeys命令完成n-n的ssh免密登陆
跳转到gpadmin用户目录下,执行命令:

$ gpssh-exkeys -f hostfile_exkeys
7.验证
执行如下命令,如果显示内容一致,则表示配置成功:

$ gpssh -f hostfile_exkeys -e 'ls -l /usr/local/greenplum-db-<version>'
五、创建数据存储
1.在master上创建存储目录
root用户登陆,并执行如下命令:

# mkdir -p /data/master
# chown gpadmin:gpadmin /data/master
2.用gpssh命令在segment主机上创建目录
创建hostfile_gpssh_segonly文件,只包含segment主机的域名

sdw1
sdw2
sdw3
sdw4
使用gpssh命令创建primary和mirror目录,如下:

# source /usr/local/greenplum-db/greenplum_path.sh
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/mirror'
# gpssh -f hostfile_gpssh_segonly -e 'chown -R gpadmin /data/*'
六、初始化GreenPlum集群
1.登陆master主机,切换gpadmin用户
$ su - gpadmin
2.创建初始化host文件
在gpadmin用户目录下创建gpconfigs目录

$ mkdir -p ~/gpconfigs
生成hostfile_gpinitsystem文件,将segment的节点hostname加入,每行一个域名

vim ~/gpconfigs/hostfile_gpinitsystem

sdw1
sdw2
sdw3
sdw4
3.拷贝初始化文件到用户目录下
$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config \
/home/gpadmin/gpconfigs/gpinitsystem_config
4.修改初始化配置
打开上一步拷贝的文件,根据自己需求更改以下配置

ARRAY_NAME="Greenplum Data Platform"
SEG_PREFIX=gpseg
PORT_BASE=6000
declare -a DATA_DIRECTORY=(/data/primary /data/primary /data/primary /data/primary)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
此示例没有配置mirror segment,如需要可以后续通过gpaddmirrors工具添加

5.保存并退出文件
6.执行初始化命令
1⃣️跳转到master机器上gpadmin的用户目录,执行初始化命令
$ cd ~
$ gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem
2⃣️确认安装步骤
工具会验证初始化配置文件、确保各个节点之间的网络互通和验证配置的目录是否能连通。如果验证都通过,系统会提示如下:

=> Continue with Greenplum creation? Yy/Nn
输入Y开始初始化

初始化成功后,控制台将输出

=> Greenplum Database instance successfully created.
3⃣️初始化失败定位
如果初始化失败,错误信息类似如下显示:

...
.......................
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:------------------------------------------------
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Parallel process exit status
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:------------------------------------------------
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Total processes marked as completed = 6
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Total processes marked as killed = 0
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[WARN]:-Total processes marked as failed = 3 <<<<<
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:------------------------------------------------
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[FATAL]:-Errors generated from parallel processes
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Dumped contents of status file to the log file
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Building composite backout file
20170124:16:53:30:gpinitsystem:htcom:postgres-[FATAL]:-Failures detected, see log file /home/postgresql/gpAdminLogs/gpinitsystem_20170124.log for more detail Script Exiting!
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[WARN]:-Script has left Greenplum Database in an incomplete state
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[WARN]:-Run command /bin/bash /home/postgresql/gpAdminLogs/backout_gpinitsystem_postgres_20170124_165242 to remove these changes
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-Start Function BACKOUT_COMMAND
20170124:16:53:30:031785 gpinitsystem:htcom:postgres-[INFO]:-End Function BACKOUT_COMMAND

可以通过grep命令搜索日志中的错误信息定位问题

$ grep FATAL /home/postgresql/gpAdminLogs/gpinitsystem_20170124.log
执行如下命令重置初始化失败的变更

$ sh /bin/bash/home/postgresql/gpAdminLogs/backout_gpinitsystem_postgres_20170124_165242
七、配置GreenPlum环境变量
1.以gpadmin用户登陆master主机
$ su - gpadmin
2.编辑.bashrc文件
$ vi ~/.bashrc
3.将GreenPlum命令行和MASTER_DATA_DIRECTORY加入初始化命令中
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/master/gpseg-1
4.退出并保存
5.初始化文件使其生效
$ source ~/.bashrc


本文来源:https://blog.csdn.net/u013767472/article/details/101195614

相关文章