You are on page 1of 12

MaxScale HA setup using Keepalived

and MaxCtrl
Posted on March 6, 2018 by Esa Korhonen

MariaDB MaxScale is a database proxy which does load balancing and query routing from client
applications to backend database servers. In a basic configuration, MaxScale is a single point of
failure. In this blog post we show how to setup a more resilient MaxScale HA cluster using
Keepalived and MaxCtrl.

Keepalived is a routing software for load balancing and high-availability. It has several
applications, but for this tutorial the goal is to set up a simple IP failover between two machines
running MaxScale. If the main server fails the backup machine takes over, receiving any new
connections. The Keepalived settings used in this tutorial follow the example given in simple
keepalived failover setup on Ubuntu 14.04.

The configuration examples in this blog are for a setup where two MaxScales are monitoring one
database cluster. Two hosts and one client machine are used, all in the same LAN. Hosts run
MaxScale and Keepalived. The backend servers may be running on one of the hosts, e.g. in
docker containers, or on separate machines for a more realistic setup. Clients connect to the
virtual IP (VIP), which is claimed by the current master host.
Once configured and running, the different Keepalived nodes continuously broadcast their status
to the network and listen for each other. If a node does not receive a status message from another
node with a higher priority than itself, it will claim the VIP, effectively becoming the master.
Thus, a node can be put online or removed by starting and stopping the Keepalived service.

If the current master node is removed (e.g. by stopping the service or pulling the network cable)
the remaining nodes will quickly elect a new master and future traffic to the VIP will be directed
to that node. Any connections to the old master node will naturally break. If the old master
comes back online, it will again claim the VIP, breaking any connections to the backup machine.

MaxScale has no knowledge of this even happening. Both MaxScales are running normally,
monitoring the backend servers and listening for client connections. Since clients are connecting
through the VIP, only the machine claiming the VIP will receive incoming connections. The
connections between MaxScale and the backends are using real IPs and are unaffected by the
VIP.

Install Keepalived
[root@crash1 ~]# yum install keepalived

[root@crash2 ~]# yum install keepalived

Configuration
[root@crash1 ~]# ifconfig

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255

inet6 fe80::a00:27ff:febe:17d3 prefixlen 64 scopeid 0x20<link>

ether 08:00:27:be:17:d3 txqueuelen 1000 (Ethernet)

RX packets 20274 bytes 24114156 (22.9 MiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 4623 bytes 290626 (283.8 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 192.168.56.151 netmask 255.255.255.0 broadcast 192.168.56.255

inet6 fe80::a00:27ff:fec0:93a prefixlen 64 scopeid 0x20<link>

ether 08:00:27:c0:09:3a txqueuelen 1000 (Ethernet)

RX packets 1434 bytes 172815 (168.7 KiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 1176 bytes 222911 (217.6 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536

inet 127.0.0.1 netmask 255.0.0.0

inet6 ::1 prefixlen 128 scopeid 0x10<host>

loop txqueuelen 1 (Local Loopback)

RX packets 100 bytes 8144 (7.9 KiB)


RX errors 0 dropped 0 overruns 0 frame 0

TX packets 100 bytes 8144 (7.9 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@crash2 ~]# ifconfig

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255

inet6 fe80::a00:27ff:fe91:82e5 prefixlen 64 scopeid 0x20<link>

ether 08:00:27:91:82:e5 txqueuelen 1000 (Ethernet)

RX packets 18479 bytes 24015815 (22.9 MiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 2992 bytes 192273 (187.7 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 192.168.56.152 netmask 255.255.255.0 broadcast 192.168.56.255

inet6 fe80::a00:27ff:fe8e:634a prefixlen 64 scopeid 0x20<link>

ether 08:00:27:8e:63:4a txqueuelen 1000 (Ethernet)

RX packets 257 bytes 23917 (23.3 KiB)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 278 bytes 73941 (72.2 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536

inet 127.0.0.1 netmask 255.0.0.0

inet6 ::1 prefixlen 128 scopeid 0x10<host>

loop txqueuelen 1 (Local Loopback)

RX packets 101 bytes 8232 (8.0 KiB)


RX errors 0 dropped 0 overruns 0 frame 0

TX packets 101 bytes 8232 (8.0 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

MaxScale does not require any specific configuration to work with Keepalived in this simple
setup, it just needs to be running on both hosts. The MaxScale configurations should be roughly
similar on both hosts if you plan on synchronizing any changes between the MaxScale
instances. Specifically, both instances should have the same services and listeners so they appear
identical to client applications. Setting the service-level setting “version_string” to different
values on the MaxScale nodes is recommended, as it will be printed to any connecting clients
indicating which node was connected to.

[Read-Write Service]
type=service
router=readwritesplit
version_string=PrimaryMaxScale
...

Keepalived requires specific setups on both machines. On the primary host, the
/etc/keepalived/keepalived.conf file should be as follows.

vrrp_instance VI_1 {
state MASTER
interface enp0s8
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass anhdhpasswd
}
virtual_ipaddress {
192.168.56.153
}
}

The state must be MASTER on both hosts. virtual_router_id and auth_pass must be identical on
all hosts. The interface defines the network interface used. This depends on the system, but often
the correct value is enp0s8, enp0s12f3 or similar. priority defines the voting strength between
different Keepalived instances when negotiating on which should be the master. The instances
should have different values of priority. In this example, the backup host(s) could have priority
149, 148 and so on. advert_int is the interval between a host “advertising” its existence to other
Keepalived host. One second is a reasonable value.
virtual_ipaddress (VIP) is the IP the different Keepalived hosts try to claim and must be
identical between the hosts. For IP negotiation to work, the VIP must be in the local network
address space and unclaimed by any other machine in the LAN.

An example /etc/keepalived/keepalived.conf file for a backup host is listed below.

vrrp_instance VI_1 {
state MASTER
interface enp0s8
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass anhdhpasswd
}
virtual_ipaddress {
192.168.56.153
}
}

Start KeepAlived
[root@crash1 keepalived]# service keepalived start
Redirecting to /bin/systemctl start keepalived.service
[root@crash1 keepalived]# chkconfig keepalived on
Note: Forwarding request to 'systemctl enable keepalived.service'.
Created symlink from /etc/systemd/system/multi-
user.target.wants/keepalived.service to
/usr/lib/systemd/system/keepalived.service.

[root@crash2 keepalived]# service keepalived start


Redirecting to /bin/systemctl start keepalived.service
[root@crash2 keepalived]# chkconfig keepalived on
Note: Forwarding request to 'systemctl enable keepalived.service'.
Created symlink from /etc/systemd/system/multi-
user.target.wants/keepalived.service to
/usr/lib/systemd/system/keepalived.service.

Check Virtual IPs


[root@crash1 keepalived]# ip addr show enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP qlen 1000
link/ether 08:00:27:c0:09:3a brd ff:ff:ff:ff:ff:ff
inet 192.168.56.151/24 brd 192.168.56.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet 192.168.56.153/32 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fec0:93a/64 scope link
valid_lft forever preferred_lft forever

[root@crash2 keepalived]# ip addr show enp0s8


3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP qlen 1000
link/ether 08:00:27:8e:63:4a brd ff:ff:ff:ff:ff:ff
inet 192.168.56.152/24 brd 192.168.56.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe8e:634a/64 scope link
valid_lft forever preferred_lft forever

Verify IP Failover
[root@crash1 ~]# ifconfig enp0s8 down
[root@crash2 ~]# ip addr show enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
UP qlen 1000
link/ether 08:00:27:8e:63:4a brd ff:ff:ff:ff:ff:ff
inet 192.168.56.152/24 brd 192.168.56.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet 192.168.56.153/32 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe8e:634a/64 scope link
valid_lft forever preferred_lft forever

[root@crash2 ~]# tail -1000f /var/log/messages


May 18 11:20:01 crash2 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="918" x-info="http://www.rsyslog.com"] rsyslogd was
HUPed
May 18 11:20:01 crash2 systemd: Started Session 8 of user root.
May 18 11:20:01 crash2 systemd: Starting Session 8 of user root.
May 18 11:27:00 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) Transition
to MASTER STATE
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) Entering
MASTER STATE
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) setting
protocol iptable drop rule
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) setting
protocol VIPs.
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1)
Sending/queueing gratuitous ARPs on enp0s8 for 192.168.56.153
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:01 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1)
Sending/queueing gratuitous ARPs on enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:27:06 crash2 Keepalived_vrrp[2471]: Sending gratuitous ARP on
enp0s8 for 192.168.56.153
May 18 11:28:09 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) Received
advert with higher priority 150, ours 100
May 18 11:28:09 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) Entering
BACKUP STATE
May 18 11:28:09 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) removing
protocol VIPs.
May 18 11:28:09 crash2 Keepalived_vrrp[2471]: VRRP_Instance(VI_1) removing
protocol iptable drop rule
May 18 11:30:01 crash2 systemd: Started Session 9 of user root.
May 18 11:30:01 crash2 systemd: Starting Session 9 of user root.

Once the Keepalived service is running, recent log entries can be printed with the command
service keepalived status.

Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]:


VRRP_Instance(VI_1) Received higher prio advert
Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]:
VRRP_Instance(VI_1) Entering BACKUP STATE
Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]:
VRRP_Instance(VI_1) removing protocol VIPs.

MariaDB MaxScale Health Check


So far, none of this tutorial has been MaxScale-specific and the health of the MaxScale process
has been ignored. To ensure that MaxScale is running on the current master host, a check
script should be set. Keepalived runs the script regularly and if the script returns an error value,
the Keepalived node will assume that it has failed, stops broadcasting its state and relinquishes
the VIP. This allows another node to take the master status and claim the VIP.

To define a check script, modify the configuration as follows. The example is for the primary
node. See Keepalived Check and Notify Scripts for more information.

[root@crash1 ~]# vi /etc/keepalived/keepalived.conf


[root@crash2 ~]# vi /etc/keepalived/keepalived.conf

vrrp_script chk_myscript {
script "/home/scripts/is_maxscale_running.sh"
interval 2 # check every 2 seconds
fall 2 # require 2 failures for KO
rise 2 # require 2 successes for OK
}
vrrp_instance VI_1 {
state MASTER
interface wlp2s0
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass anhdhpasswd
}
virtual_ipaddress {
192.168.56.153
}
track_script {
chk_myscript
}
}
An example script, /home/scripts/is_maxscale_running.sh, is listed below.
The script uses MaxAdmin to try to contact the locally running MaxScale and request a server
list, then check that the list has at least some expected elements. The timeout command
ensures the MaxAdmin call exits in reasonable time. The script detects if MaxScale has
crashed, is stuck or is totally overburdened and no longer responds to connections. Simply
checking that the MaxScale process is running would be a simple yet likely an adequate option.

[root@crash1 keepalived]# vi /home/scripts/is_maxscale_running.sh


[root@crash2 keepalived]# vi /home/scripts/is_maxscale_running.sh

#!/bin/bash
fileName="/home/scripts/maxadmin_output.txt"
rm $fileName
timeout 2s maxadmin list servers > $fileName
to_result=$?
if [ $to_result -ge 1 ]
then
echo Timed out or error, timeout returned $to_result
exit 3
else
echo MaxAdmin success, rval is $to_result
echo Checking maxadmin output sanity
grep1=$(grep server1 $fileName)
grep2=$(grep server2 $fileName)

if [ "$grep1" ] && [ "$grep2" ]


then
echo All is fine
exit 0
else
echo Something is wrong
exit 3
fi
fi

Aug 11 10:51:56 maxscale2 Keepalived_vrrp[20257]:


VRRP_Script(chk_myscript) failed
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]:
VRRP_Instance(VI_1) Entering FAULT STATE
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]:
VRRP_Instance(VI_1) removing protocol VIPs.
Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]:
VRRP_Instance(VI_1) Now in FAULT state

MaxScale active/passive-setting
MariaDB MaxScale 2.2.2 introduced master/slave replication cluster management features
(failover, switchover and rejoin). When running a setup with multiple MaxScales, only one
MaxScale instance should be allowed to modify the master/slave replication cluster at any given
time. This instance should be the one with MASTER Keepalived status. MaxScale does not
know its Keepalived state, but MaxCtrl (a replacement for MaxAdmin) can set a MaxScale
instance to passive mode. A passive MaxScale behaves similar to an active one with the
exception that it won’t perform failover, switchover or rejoin. Even manual versions of these
commands will end in error. The passive/active mode differences may be expanded in the future.

To have Keepalived modify the MaxScale operating mode, a notify script is needed. This script
is ran whenever Keepalived changes its state. The script file is defined in the Keepalived
configuration file as notify.

[root@crash1 keepalived]# vi /home/scripts/notify_script.sh

[root@crash2 keepalived]# vi /home/scripts/notify_script.sh

...
virtual_ipaddress {
192.168.56.153
}
track_script {
chk_myscript
}
notify /home/scripts/notify_script.sh
...

Keepalived calls the script with three parameters. In our case, only the third parameter, STATE,
is relevant. An example script is below.

[root@crash1 home]# vi /home/scripts/notify_script.sh

[root@crash2 home]# vi /home/scripts/notify_script.sh

#!/bin/bash
TYPE=$1
NAME=$2
STATE=$3
OUTFILE=/home/scripts/state.txt

case $STATE in
"MASTER") echo "Setting this MaxScale node to active mode" >
$OUTFILE
maxctrl alter maxscale passive false
exit 0
;;
"BACKUP") echo "Setting this MaxScale node to passive mode" >
$OUTFILE
maxctrl alter maxscale passive true
exit 0
;;
"FAULT") echo "MaxScale failed the status check." > $OUTFILE
maxctrl alter maxscale passive true
exit 0
;;
*) echo "Unknown state" > $OUTFILE
exit 1
;;
esac

The script logs the current state to a text file and sets the operating mode of MaxScale. The
FAULT case also attempts to set MaxScale to passive mode, although the MaxCtrl command
will likely fail.

If all MaxScale/Keepalived instances have a similar notify script, only one MaxScale should ever
be in active mode. The mode of a MaxScale instance can be checked with the command maxctrl
show maxscale, shown below. This MaxScale is “active”. A later blog post will show MaxCtrl
use in more detail.

[vagrant@maxscale1 ~]$ watch maxctrl show maxscale


┌──────────────┬────────────────────────────────────────────────
────────┐
│ Version │ 2.2.2

├──────────────┼────────────────────────────────────────────────
────────┤
.
.
.
├──────────────┼────────────────────────────────────────────────
────────┤
│ Parameters │ {

│ │ "libdir": "/usr/lib64/maxscale",

│ │ "datadir": "/var/lib/maxscale",
.
.
.
│ │ "passive": false,

│ │ "query_classifier": ""

│ │ }

Get started with MariaDB MaxScale—download it today!

You might also like