yum -y install pacemaker
# yum -y install pacemaker;ssh root@node1 'yum -y install pacemaker'
Dependency Installed:
clusterlib.x86_64 0: 0:1.4.7-2.el6
corosynclib.x86_64 0:1.4.7-2.el6 libibverbs.x86_64 0:1.1.8-4.el6
# rpm -ql corosync
/etc/corosync/corosync.conf.example #配置文件模板
/etc/rc.d/init.d/corosync #服务脚本
/usr/sbin/corosync-keygen #生成节点间通信时用到的认证密钥文件,默认从/dev/random读随机数
/var/log/cluster #日志文件目录 ◆安装crmsh
# yum -y install pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm
crmsh.x86_64 0:1.2.6-4.el6 pssh.x86_64 0:2.3.1-2.el6
Dependency Installed:
python-dateutil.noarch 0:1.4.1-6.el6 redhat-rpm-config.noarch 0:9.0.3-44.el6.centos
Complete! ◆配置corosync
cd /etc/corosync/
cp corosync.conf.example corosync.conf
vim corosync.conf,在其中加入:
service { #以插件化方式调用pacemaker
ver: 0
name: pacemaker
# use_mgmtd: yes
# cd /etc/corosync/
# cp corosync.conf.example corosync.conf
# vim corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: on #是否进行消息认证;若启用,使用corosync-keygen生成密钥文件
threads: 0
interface {
ringnumber: 0
bindnetaddr: #接口绑定的网络地址
mcastaddr: #传递心跳信息所使用的组播地址
mcastport: 5405
ttl: 1
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log #日志路径
to_syslog: no
debug: off
timestamp: on #是否记录时间戳;当日志量很大时关闭该项可提高性能
logger_subsys {
subsys: AMF
debug: off
service {
name: pacemaker
# use_mgmtd: yes
} ◆生成节点间通信时用到的认证密钥文件
# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
# ll authkey
-r-------- 1 root root 128 Apr 27 23:31 authkey ◆将配置文件和密钥文件同步到对方节点
scp -p authkey corosync.conf root@node1:/etc/corosync/
# scp -p authkey corosync.conf root@node1:/etc/corosync/
authkey 100%128 0.1KB/s 00:00
corosync.conf 100% 2723 2.7KB/s 00:00 ◆启动corosync
service corosync start
grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
grep TOTEM /var/log/cluster/corosync.log
grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
grep pcmk_startup /var/log/cluster/corosync.log
# service corosync start;ssh root@node1 'service corosync start'
Starting Corosync Cluster Engine (corosync):
Starting Corosync Cluster Engine (corosync):
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Apr 28 02:03:08 corosync Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Apr 28 02:03:08 corosync Successfully read main configuration file '/etc/corosync/corosync.conf'.
# grep TOTEM /var/log/cluster/corosync.log
Apr 28 02:03:08 corosync Initializing transport (UDP/IP Multicast).
Apr 28 02:03:08 corosync Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 28 02:03:08 corosync The network interface is now up.
Apr 28 02:03:08 corosync A processor joined or left the membership and a new membership was formed.
Apr 28 02:03:11 corosync A processor joined or left the membership and a new membership was formed.
Apr 28 02:04:10 corosync A processor joined or left the membership and a new membership was formed.
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources #以下错误提示可忽略
Apr 28 02:03:08 corosync ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Apr 28 02:03:08 corosync ERROR: process_ais_conf:Please see Chapter 8 of 'Clusters from Scratch' ( for details on using Pacemaker with CMAN
Apr 28 02:03:13 corosync ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=7953, core=true)
# grep pcmk_startup /var/log/cluster/corosync.log
Apr 28 02:03:08 corosync info: pcmk_startup: CRM: Initialized
Apr 28 02:03:08 corosync Logging: Initialized pcmk_startup
Apr 28 02:03:08 corosync info: pcmk_startup: Maximum core file size is: 18446744073709551615
Apr 28 02:03:08 corosync info: pcmk_startup: Service: 9
Apr 28 02:03:08 corosync info: pcmk_startup: Local hostname: node2 ◆配置接口crmsh的启动命令是crm,其使用方式有两种:
命令行模式,例如 # crm ra list ocf
# crm
crm(live)# ra
crm(live)ra# list ocf
# crm
crm(live)# ra list ocf
①status: 查看集群状态
start, stop, restart
primitive, group, clone, ms/master(主从资源)
具体用法可使用help命令查看,如crm(live)configure# help primitive
primitive webstore ocf:Filesystem params device= directory=/var/www/html fstype=nfs op monitor interval=20s timeout=30s
group webservice webip webserver
location, collocation, order
colocation webserver_with_webip inf: webserver webip
order webip_before_webserver mandatory: webip webserver#mandatory也可换成inf
location webip_on_node2 webip rule inf: #uname eq node2
或location webip_on_node2 webip inf: node2
monitor #pacemaker具有监控资源的功能
monitor <rsc>[:<role>] <interval>[:<timeout>]
例如:monitor webip 30s:20s
有四种:lsb, ocf, service, stonith
list <class> [<provider>]:列出资源代理
list ocf#列出ocf类型的资源代理
list ocf linbit#列出ocf类型中,由linbit提供的资源代理
meta/info [<class>:[<provider>:]]<type>#查看一个资源代理的元数据,主要是查看其可用参数
例如:info ocf:linbit:drbd
或 info ocf:drbd
或 info drbd
providers <type> [<class>]:显示指定资源代理的提供者
例如:providers apache
crm(live)# help #查看有哪些子命令或获取帮助信息
This is crm shell, a Pacemaker command line interface.
Available commands:
cib manage shadow CIBs
resource resources management #资源管理
configure CRM cluster configuration #集群配置
node nodes management #节点管理
options user preferences
history CRM cluster history
site Geo-cluster support
ra resource agents information center #资源代理信息
status show cluster status #显示集群状态
help,? show help (help topics for list of topics)
end,cd,up go back one level
quit,bye,exit exit the program #退出
crm(live)# status #查看集群状态
Last updated: Fri Apr 29 00:19:36 2016
Last change: Thu Apr 28 22:41:38 2016
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1 node2 ]crm(live)# configure
crm(live)configure# help
Commands for resources are: #可配置的资源类型
- `primitive`
- `monitor`
- `group`
- `clone`
- `ms`/`master` (master-slave)
In order to streamline large configurations, it is possible to
define a template which can later be referenced in primitives:
- `rsc_template`
In that case the primitive inherits all attributes defined in the
There are three types of constraints: #可定义的约束
- `location`
- `colocation`
- `order`
crm(live)configure# help primitive #查看使用帮助
primitive <rsc> {[<class>:[<provider>:]]<type>|@<template>}
attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>
id_spec :: $id=<id> | $id-ref=<id>
op_type :: start | stop | monitor
primitive apcfence stonith:apcsmart \
params ttydev=/dev/ttyS0 hostlist="node1 node2" \
op start timeout=60s \
op monitor interval=30m timeout=60s
crm(live)configure# cd #使用cd或end命令切回上一级crm(live)# ra
crm(live)ra# help
This level contains commands which show various information about
the installed resource agents. It is available both at the top
level and at the `configure` level.
Available commands:
classes list classes and providers
list list RA for a class (and provider)
meta show meta data for a RA
providers show providers for a RA and a class
help show help (help topics for list of topics)
end go back one level
quit exit the program
crm(live)ra# classes
ocf / heartbeat linbit pacemaker
crm(live)ra# help list
List available resource agents for the given class. If the class
is `ocf`, supply a provider to get agents which are available
only from that provider.
list <class> [<provider>]
list ocf pacemaker
crm(live)ra# list ocf
CTDB ClusterMon Delay Dummy Filesystem
crm(live)ra# list ocf linbit
crm(live)ra# help meta
Show the meta-data of a resource agent type. This is where users
can find information on how to use a resource agent. It is also
possible to get information from some programs: `pengine`,
`crmd`, `cib`, and `stonithd`. Just specify the program name
instead of an RA.
info [<class>:[<provider>:]]<type>
info <type> <class> [<provider>] (obsolete)
info apache
info ocf:pacemaker:Dummy
info stonith:ipmilan
info pengine
crm(live)ra# info ocf:linbit:drbd
Operations' defaults (advisory minimum):
start timeout=240
promote timeout=90
demote timeout=90
notify timeout=90
stop timeout=100
monitor_Slave timeout=20 interval=20
monitor_Master timeout=20 interval=10
crm(live)ra# cdcrm(live)# resource
crm(live)resource# help
At this level resources may be managed.
All (or almost all) commands are implemented with the CRM tools
such as `crm_resource(8)`.
Available commands:
status show status of resources
start start a resource
stop stop a resource
restart restart a resource
promote promote a master-slave resource
demote demote a master-slave resource
crm(live)resource# help cleanup
Cleanup resource status. Typically done after the resource has
temporarily failed. If a node is omitted, cleanup on all nodes.
If there are many nodes, the command may take a while.
cleanup <rsc> [<node>]
............... ⊙在使用crmsh配置集群时曾遇到过如下错误:
ERROR: CIB not supported: validator 'pacemaker-2.0', release '3.0.9'
ERROR: You may try the upgrade command
大概的意思就是:经检验器pacemaker-2.0检查后发现crm shell版本相对较低,不被CIB(集群信息库)支持,因此建议更新crmsh版本;
其实如果我们执行 cibadmin --query | grep validate 就可看到这条信息:
<cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0"
cibadmin --modify --xml-text '<cib validate-with="pacemaker-1.2"/>'
crm_verify -L -V
crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
# crm configure property stonith-enabled=false
# crm configure property no-quorum-policy=ignore
# crm configure show
node node1
node node2
property $id="cib-bootstrap-options" \
dc-version="1.1.11-97629de" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
# crm_verify -L -V
# ◆配置集群资源
service mysqld stop;chkconfig mysqld off
service drbd stop;chkconfig drbd off
primitive myip ocf:heartbeat:IPaddr params ip= op monitor interval=30s timeout=20s
primitive mydrbd ocf:linbit:drbd params drbd_resource=mysql op monitor role=Master interval=10s timeout=20s op monitor role=Slave interval=20s timeout=30s op start timeout=240s op stop timeout=100s
ms ms_mydrbd mydrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1[ notify=True]
primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext4 op monitor interval=20s timeout=60s op start timeout=60s op stop timeout=60s
primitive myserver lsb:mysqld op monitor interval=20s timeout=20s
group myservice myip mystore myserver
collocation mystore_with_ms_mydrbd_master inf: mystore ms_mydrbd:Master
order mystore_after_ms_mydrbd_master mandatory: ms_mydrbd:promote mystore
order myserver_after_mystore mandatory: mystore myserver
order myserver_after_myip inf: myip myserver
INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置
crm configure rsc_defaults resource-stickiness=100
# service mysqld stop
Stopping mysqld:
# umount /mydata
# drbdadm secondary mysql
# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
ns:124 nr:0 dw:2282332 dr:4213545 al:7 bm:396 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
# service drbd stop;ssh root@node1 'service drbd stop'
Stopping all DRBD resources: .
Stopping all DRBD resources: .
# chkconfig mysqld off;ssh root@node1 'chkconfig mysqld off'
# chkconfig drbd off;ssh root@node1 'chkconfig drbd off'#配置资源
crm(live)configure# primitive myip ocf:heartbeat:IPaddr params ip= op monitor interval=30s timeout=20s
crm(live)configure# primitive mydrbd ocf:linbit:drbd params drbd_resource=mysql op monitor role=Master interval=10s timeout=20s op monitor role=Slave interval=20s timeout=30s op start timeout=240s op stop timeout=100s
crm(live)configure# ms ms_mydrbd mydrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=True
crm(live)configure# primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext4 op monitor interval=20s timeout=60s op start timeout=60s op stop timeout=60s
crm(live)configure# primitive myserver lsb:mysqld op monitor interval=20s timeout=20s#定义约束
crm(live)configure# group myservice myip mystore myserver
crm(live)configure# collocation mystore_with_ms_mydrbd_master inf: mystore ms_mydrbd:Master
crm(live)configure# order mystore_after_ms_mydrbd_master mandatory: ms_mydrbd:promote mystore
crm(live)configure# order myserver_after_mystore mandatory: mystore myserver
crm(live)configure# order myserver_after_myip inf: myip myserver
crm(live)configure# verify #语法验证
crm(live)configure# commit #提交配置crm(live)configure# show #查看配置信息
node node1
node node2
primitive mydrbd ocf:linbit:drbd \
params drbd_resource="mysql" \
op monitor role="Master" interval="10s" timeout="20s" \
op monitor role="Slave" interval="20s" timeout="30s" \
op start timeout="240s" interval="0" \
op stop timeout="100s" interval="0"
primitive myip ocf:heartbeat:IPaddr \
params ip="" \
op monitor interval="20s" timeout="30s"
primitive myserver lsb:mysqld \
op monitor interval="20s" timeout="20s"
primitive mystore ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/mydata" fstype="ext4" \
op monitor interval="20s" timeout="60s" \
op start timeout="60s" interval="0" \
op stop timeout="60s" interval="0"
group myservice myip mystore myserver
ms ms_mydrbd mydrbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="True"
colocation mystore_with_ms_mydrbd_master inf: mystore ms_mydrbd:Master
order myserver_after_myip inf: myip myserver
order myserver_after_mystore inf: mystore myserver
order mystore_after_ms_mydrbd_master inf: ms_mydrbd:promote mystore
property $id="cib-bootstrap-options" \
dc-version="1.1.11-97629de" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"crm(live)configure# cd
crm(live)# status #查看集群状态
Last updated: Fri Apr 29 13:43:06 2016
Last change: Fri Apr 29 13:42:23 2016
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
5 Resources configured
Online: [ node1 node2 ] #node1和node2均在线
Master/Slave Set: ms_mydrbd
Masters: [ node1 ] #node1为mydrbd资源的主节点
Slaves: [ node2 ]
Resource Group: myservice #组中的各资源均正常启动
myip(ocf::heartbeat:IPaddr):Started node1
mystore(ocf::heartbeat:Filesystem):Started node1
myserver(lsb:mysqld):Started node1#验证
# ip addr show #使用ip addr查看配置的新的ip
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:40:35:9d brd ff:ff:ff:ff:ff:ff
inet brd scope global eth0
inet brd scope global secondary eth0
inet6 fe80::20c:29ff:fe40:359d/64 scope link
valid_lft forever preferred_lft forever
# drbd-overview
0:mysql/0Connected Primary/Secondary UpToDate/UpToDate C r----- /mydata ext4 2.0G 89M 1.8G 5%
# ls /mydata
# service mysqld status
mysqld (pid65079) is running...
# mysql
mysql> create database testdb; #创建一个新库
Query OK, 1 row affected (0.08 sec)
mysql> exit
Bye 模拟故障
# service mysqld stop #手动停止mysqld服务
Stopping mysqld:
# crm status
Online: [ node1 node2 ]
Master/Slave Set: ms_mydrbd
Masters: [ node1 ]
Slaves: [ node2 ]
Resource Group: myservice
myip(ocf::heartbeat:IPaddr):Started node1
mystore(ocf::heartbeat:Filesystem):Started node1
myserver(lsb:mysqld):Started node1
Failed actions:
myserver_monitor_20000 on node1 'not running' (7): call=70, status=complete, last-rc-change='Fri Apr 29 23:00:55 2016', queued=0ms, exec=0ms
# service mysqld status #可以看到服务已自动重新启动
mysqld (pid4783) is running... 模拟资源转移
crm(live)# node standby #强制资源转移
crm(live)# status
Node node1: standby
Online: [ node2 ]
Master/Slave Set: ms_mydrbd
Slaves: [ node1 node2 ]
Resource Group: myservice
myip(ocf::heartbeat:IPaddr):Started node2
mystore(ocf::heartbeat:Filesystem):FAILED node2
Failed actions: #显示有错误信息
mystore_start_0 on node2 'unknown error' (1): call=236, status=complete, last-rc-change='Fri Apr 29 15:45:17 2016', queued=0ms, exec=69ms
mystore_start_0 on node2 'unknown error' (1): call=236, status=complete, last-rc-change='Fri Apr 29 15:45:17 2016', queued=0ms, exec=69ms
crm(live)# resource cleanup mystore #清理资源mystore的状态
Cleaning up mystore on node1
Cleaning up mystore on node2
Waiting for 2 replies from the CRMd.. OK
crm(live)# status #恢复正常,可以看到资源已成功转移至node2
Node node1: standby
Online: [ node2 ]
Master/Slave Set: ms_mydrbd
Masters: [ node2 ]
Stopped: [ node1 ]
Resource Group: myservice
myip(ocf::heartbeat:IPaddr):Started node2
mystore(ocf::heartbeat:Filesystem):Started node2
myserver(lsb:mysqld):Started node2
crm(live)# node online #让node1重新上线#验证
# ip addr show
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:bd:68:23 brd ff:ff:ff:ff:ff:ff
inet brd scope global eth0
inet brd scope global secondary eth0
inet6 fe80::20c:29ff:febd:6823/64 scope link
valid_lft forever preferred_lft forever
# mysql
mysql> show databases; #以node2上可以看到刚才在node1上创建的新库
| Database |
| information_schema |
| hellodb |
| mysql |
| test |
| testdb |
5 rows in set (0.16 sec)