-
Notifications
You must be signed in to change notification settings - Fork 189
/
Copy pathEX436_study_notes.txt
401 lines (289 loc) · 11.3 KB
/
EX436_study_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
######################################################################
## Tomas Nevar <[email protected]>
## Study notes for EX436 High Availability Clustering exam (RHEL7)
######################################################################
## Exam objectives:
* Configure a high-availability cluster, using either physical or virtual systems, that:
- Utilises shared storage.
- Provides service fail-over between the cluster nodes.
- Provides preferred node for a given service.
- Selectively fails over services based on specific constraints.
- Preemptively removes non-functioning cluster members to prevent corruption of shared storage.
* Manage logical volumes in a clustered environment:
- Create volume groups that are available to all members of a highly available cluster.
- Create logical volumes that can be simultaneously mounted by all members of a high-availability cluster.
* Configure a GFS file system to meet specified size, layout, and performance objectives.
* Configure iSCSI initiators.
* Use multipathed devices.
* Configure cluster logging.
* Configure cluster monitoring.
## While not an exam requirement, but I found clusterssh very useful
## when managing multiple SSH sessions on different servers.
## HA requires installation of Pacemaker software, fencing agents,
## configuration of the firewall and authentication of nodes:
# yum install -y pcs fence-agents-all
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --reload
# systemctl enable pcsd
# systemctl start pcsd
# echo password | passwd --stdin hacluster
# pcs cluster auth -u hacluster -p password node1 node2 node3
## Cluster creation (all nodes start and join the cluster automatically):
# pcs cluster setup --enable --start --name mycluster node1 node2 node3
## Cluster status:
# pcs status --full
# pcs status cluster
# pcs status corosync
# pcs status groups
# pcs status nodes
# pcs status resources
## Cluster management (add --all for all nodes):
# pcs cluster start
# pcs cluster stop
# pcs cluster enable
# pcs cluster disable
# pcs cluster standby
# pcs cluster unstandby
# pcs cluster destroy
## Add and remove cluster nodes:
# pcs cluster auth -u hacluster -p password node4
# pcs cluster node add node4 --enable --start
# pcs cluster node remove node4
## Corosync configuration and logging.
## Check the man page for information!
# man corosync.conf
> to_stderr
> to_syslog
> to_logfile
> logfile
> logfile_priority
> debug
# vim /etc/corosync/corosync.conf
# pcs cluster sync
# pcs cluster corosync node1
## Pacemaker logging:
# vim /etc/sysconfig/pacemaker
> PCMK_logfile
> PCMK_logpriority
## Quorum configuration.
## Check the man page for information!
# man votequorum
> two_node
> wait_for_all
> last_man_standing
> auto_tie_breaker
> auto_tie_breaker_node
# pcs property --default|grep quorum
> no-quorum-policy: stop
# vim /etc/corosync/corosync.conf
# pcs cluster sync
# pcs status corosync
# corosync-quorumtool -s
## Fencing / STONITH.
## Check the man page for information!
# man -k fence_
# pcs property --default|grep stonith
> stonith-action: reboot
> stonith-enabled: true
> stonith-timeout: 60s
# pcs stonith list
# pcs stonith describe fence_xvm
## Generic fence agent properties:
> priority
> pcmk_host_map
> pcmk_host_list
> pcmk_host_check
## IMPORTANT: pcmk_host_map parameter maps host names to fence device ports!
## --- hostname:port ---
## E.g.: node1.hl.local:kvm-name-node1
# pcs stonith show --full
# pcs stonith fence node1
## Configuration of fence_xvm.
## Copy /etc/cluster/fence_xvm.key from the hypervisor!
# man fence_xvm
# firewall-cmd --permanent --add-port=1229/tcp
# firewall-cmd --reload
# fence_xvm -o list
# fence_xvm -o off -H node1
## Fencing for a single node:
# pcs stonith create fence_node1 fence_xvm \
key_file="/etc/cluster/fence_xvm.key" \
action="reboot" \
port="node1" \
pcmk_host_list="node1.hl.local"
## Fencing for three cluster nodes:
# pcs stonith create fence_all fence_xvm \
key_file="/etc/cluster/fence_xvm.key" \
action="reboot" \
pcmk_host_map="node1.hl.local:node1;node2.hl.local:node2;node3.hl.local:node3" \
pcmk_host_list="node1,node2,node3" \
pcmk_host_check="static-list"
## Configuration of fence_scsi.
# pcs stonith create fence_myscsi fence_scsi \
devices=/dev/disk/by-id/dm-name-mpathc \
pcmk_monitor_action=metadata \
pcmk_host_list="node1.hl.local node2.hl.local node3.hl.local" \
pcmk_reboot_action="off" \
meta provides="unfencing"
## Cluster resources and SELinux. There are over 200 different
## cluster resources available for configuration on RHEL 7.1.
## Some will have SELinux configuration that needs tweaking.
## The most powerful way of getting SELinux information is by using
## man pages. On RHEL 7 SELinux man pages are not installed by default.
## We need to install them and update the manual page index caches:
# yum install -y policycoreutils-devel
# sepolicy manpage -a -p /usr/share/man/man8
# mandb
# man -k _selinux
## SELinux configuration for Apache:
# man httpd_selinux
> setsebool -P httpd_use_cifs 1
> setsebool -P httpd_use_nfs 1
> setsebool -P httpd_can_network_connect_db 1
> setsebool -P httpd_can_sendmail 1
## Allow Apache to listen on TCP port 81:
# man semanage-port
> semanage port -a -t http_port_t -p tcp 81
## EXTREMELLY USEFUL! Install Apache manual
## and use its module documentation!
# yum install -y httpd-manual elinks
# elinks -dump /usr/share/httpd/manual/mod/mod_status.html
## Cluster email notifications. The following resources
## can trigger email alerts on resource change:
ocf:heartbeat:MailTo - notifies recipients by email
ocf:pacemaker:ClusterMon - runs crm_mon in the background
# man ocf_heartbeat_MailTo
# pcs resource describe MailTo
# less /usr/lib/ocf/resource.d/heartbeat/MailTo
# pcs resource create clustermail MailTo \
email="root@localhost" \
subject="Cluster notification"
# man ocf_pacemaker_ClusterMon
# pcs resource describe ClusterMon
# less /usr/lib/ocf/resource.d/pacemaker/ClusterMon
# pcs resource add clustermail ClusterMon \
extra_options="-E /usr/local/bin/cluster_email.sh" \
--clone
## Cloned because we want a copy of the resource on each node.
## Cluster troubleshooting:
# pcs status
# pcs resource failcount show my_resource
# pcs resource debug-start my_resource --full
# corosync-quorumtool -s
# crm_simulate -sL
# journalctl -lf -u pacemaker -u corosync
# grep denied /var/log/audit/audit.log
# sealert -a /var/log/audit/audit.log
# grep ERROR /var/log/pacemaker.log
# grep ERROR /var/log/corosync.log
# firewall-cmd --list-all
## iSCSI initiator installation.
## Don't star the iscsi service, configure the initiator name first!
# yum install -y iscsi-initiator-utils
# systemctl enable iscsi
# vim /etc/iscsi/initiatorname.iscsi
> InitiatorName=iqn.1994-05.com.redhat:node1
## iSCSI target records are stored in "/var/lib/iscsi/nodes/".
## Records for nodes are created by discovery (even before you log in).
## Delete the content of the folder if you need to start over!
## iSCSI configuration, startup and timeouts:
# vim /etc/iscsi/iscsid.conf
> node.startup = automatic
> node.session.timeo.replacement_timeout = 120
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
## Use iSCSI examples from the man page!
# man iscsiadm
# systemctl start iscsi
# iscsiadm -m discovery -t sendtargets -p <target_ip>
## Log into the target by using both paths!
# iscsiadm -m node -T iqn.2003-01.local.hl.nfs:target -p <path1_ip> -l
# iscsiadm -m node -T iqn.2003-01.local.hl.nfs:target -p <path2_ip> -l
## Show verbose information for troubleshooting:
# iscsiadm -m discovery -P1
# iscsiadm -m node -P1
# iscsiadm -m session -P3
## List iSCSI and block devices:
# lsblk
# lsscsi
## Device mapper multipath:
# yum install -y device-mapper-multipath
# man mpathconf
# mpathconf --enable --with_multipathd y --user_friendly_names y
# multipath -ll
## Multipath configuration:
# grep -ve "^#" -ve "^$" /etc/multipath.conf
defaults {
user_friendly_names yes
find_multipaths yes
path_grouping_policy failover
no_path_retry fail
}
multipaths {
multipath {
wwid 36001405688c3235427e49e48d0f04745
alias clusterWeblog
path_grouping_policy failover
no_path_retry fail
}
}
blacklist {
devnode "^vd[a-z]"
}
## IMPORTANT: the defaults {} section in the multipath.conf produced by
## the mpathconf command does not contain the actual built-in defaults!
## IMPORTANT: the devices {} section overwrites defaults {}, and
## the multipaths {} section overwrites devices {}. Remember this:
## multipaths > devices > defaults
## Clustered LVM. The clvmd daemon and the DLM lock manager must be
## installed before configuring clustered LVM.
# yum install -y dlm lvm2-cluster
# lvmconf --enable_cluster
# systemctl stop lvm2-lvmetad
## The lvmconf command will configure the following:
> locking_type = 3
> use_lvmetad = 0
## Cluster dlm and clvmd resources and their ordering:
# pcs resource create mydlm controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
# pcs resource create myclvmd clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true
# pcs constraint order start mydlm-clone then myclvmd-clone
# pcs constraint colocation add myclvmd-clone with mydlm-clone
## Clustered LVM volume with a GFS2 file system:
# yum install -y gfs2-utils
# pcs property set no-quorum-policy=freeze
# pvcreate /dev/mapper/mpatha
# vgcreate -Ay -cy clustervg /dev/mapper/mpatha
# lvcreate --size 1G --name clusterlv clustervg
## Use examples from the man page!
# man gfs2
# mkfs.gfs2 -p lock_dlm -t cluster_name:fs_name -j 3 /dev/clustervg/clusterlv
## !!!IMPORTANT!!!
## When creating a file system resource, use "noatime" to improve performance.
## Also, disable gfs2.fsck run on a GFS2 file system at boot time!
# pcs resource describe Filesystem
# pcs resource create clusterfs Filesystem device="/dev/clustervg/clusterlv" \
directory="/cluster_share" fstype="gfs2" options="noatime" run_fsck="no" \
op monitor interval=10s on-fail=fence clone interleave=true
# pcs constraint order start clvmd-clone then clusterfs-clone
# pcs constraint colocation add clusterfs-clone with clvmd-clone
## Manage GFS2 filesystems:
# man -k gfs2_
gfs2_convert (8) - Convert a GFS1 filesystem to GFS2
gfs2_edit (8) - Display, print or edit GFS2 or GFS internal structures.
gfs2_grow (8) - Expand a GFS2 filesystem
gfs2_jadd (8) - Add journals to a GFS2 filesystem
## Pacemaker cluster clone options:
> notify - When stopping or starting a copy of the clone,
tell all the other copies beforehand.
> ordered - Should the copies be started in series (instead of in parallel)?
> interleave - If this clone depends on another clone via an ordering constraint,
is it allowed to start after the local instance of the other clone
starts, rather than wait for all instances of the other clone to start?
## Manage cluster constraints:
> location - controls the nodes on which resources may run.
> order - controls the order in which resources are started/stopped.
> colocation - controls whether two resources may run on the same node.
# pcs constraint location <resource id> prefers <node[=score]>
# pcs constraint location <resource id> avoids <node[=score]>
# pcs constraint order start <resource id> then start <resource id>
# pcs constraint colocation add <source resource id> with <target resource id>