Master Slave
Master Slave is a method of operating multiple NMIS servers to spread the collection load, and then summarise all the network statistics into one global view.
For instance, you could run 1 or more NMIS servers in a datacentre, and spread polled devices by group across them to ease the collection load on each server. One master server can then be used to collect stats from each slave server and display a summary network health status and events view.
Another approach is to geographically split NMIS servers around the world to collect and display device stats on a regional basis and then display a summary network view in one location for management overview.
Note that the master slave implementation does not copy complete rrd between servers, and should not be seen as a redundant server architecture. The only stats that are copied up are those required to support a global health and events display (i.e.) the files and statistics required to populate the NMIS front page and the 'all device' dashboard view ( 'large dash' ). The amount of information transferred is not large and supports a globally linked architecture.
The master slave displays the top dashboard view and the device summary view, as normally seen at the 'large dash' view. All other views, reports,menubar items such as syslog etc, are http redirected to the slave server.
Any additional stats obtained by drilldown into device views or graphs are obtained by http request from the slave server.
A top level menubar has been added to ease browsing over a multiple server architecture.
Each master and slave server should be installed first as a standalone server, polling its own groups of devices and completely functional in all respects. Group names need not be unique - in fact my global implementation I use the same group names across all server, so a concatenated group view with summary health statistics is available on the master.
The master may also poll its own set of configured devices, I usually configure the master to collect the slave servers from the master, so any issues with the slave servers are reported by the master. At least one device is required to be collected locally on the master, to avoid errors with empty node files.
Multiple nested configurations are possible, and two or more masters could be set-up if required.
conf/slave.csv # Name is hostname of slave NMIS server # Host is IP address of slave NMIS server # Var is rsync module_name/path to slave var dir # Data is rsync module_name/path to slave database dir # NodeFile is rsync module_name/path to slave node file # all these should match similar entries in the respective slave's nmis.conf # # the Var, Data and NodeFile values consist of a module name and the sub directory of nmis_root. # ****The rsync module name does not start with a / *** # The Name value must be the same as the value of item nmis_host in nmis.conf on the slave. # Consistency of node name, module names, and install directories will keep this simple !! # Name Host Var Data NodeFile nmis1 192.168.10.1 module_name/nmis/var module_name/nmis/database module_name/nmis/conf/nodes.csv nmis2 192.168.20.1 module_name/nmis/var module_name/nmis/database module_name/nmis/conf/nodes.csv nmis3 192.168.30.1 module_name/nmis/var module_name/nmis/database module_name/nmis/conf/nodes.csv conf/master.csv # Name is master hostname - must be unique # Host is IP or FQDN of master. # THE FIRST LINE NON COMMENT IS THE HEADER LINE AND REQUIRED Name Host master_host01 192.168.1.101 /conf/nmis.conf master=true|false slave=true|false Slave_Table=<nmis_conf>/slave.csv Slave_Title=Slave Poll Table Slave_Key=Name Master_Table=<nmis_conf>/master.csv Master_Title=Master Table Master_Key=Name
The master slave .csv configuration files should be copied as they are to both master and all slave servers. Building your NMIS servers with a consistent directory structure is advised, but not essential, as all paths are recorded here. These paths could of been derived from nmis.conf, but I know some users run multiple conf files, so hard coding them here seems appropriate.
Additional system files required are part of the distribution.
/bin/master.pl /lib/masterslave.pm
/bin/master.pl is launched as a daemon on the master by nmis.pl or manually, acts as a tcp server to answer information updates from the slaves. When a 'hash' of data is received from a slave, master.pl will update the local slave-teachability rrd and slave_event file.
The master nmis.pl will check every 5 minute run if the master.pl daemon is running, and launch it if not.
lib/masterslave.pm is used by the slave nmis to post a 'hash' of stats to the master at the conclusion of each 5min run.
Also, the contents of each slave /var are pulled sequentially by the master to the /var directory on the master, once every 24 hours as part of the crontab 'type=update'. The /var directory on the master will then contain all network static information.
You will need to install a new perl module for master slave communication, Net::EasyTCP from cpan. This module simplifies moving bulk data using a perl script from the slave to the master. Make sure you have consistent version on all servers. You may wish to enable compression features, but I have found that not necessary as minimal data is transferred. Look in masterslave.pm to change EasyTCP options.
Rsync needs to be enabled to move the static /var device system and interface files between the slaves and the master. So as to avoid authentication prompts rsync needs to be setup in transparent mode. Refer 'man rsync' for details.
There are two possible approaches:
Rsync Server mode This has been implemented in masterslave.pm for copying to or from the local machine to or from a remote rsync server. This mode is invoked when the destination path contains a :: separator or a rsync:// URL. example /etc/rsyncd.conf on slave [nmis] path=/usr/local/nmis use chroot=no read only=yes uid=0 gid=0 #max connections = 20 # hosts allow=192.168.0.0/255.255.0.0 hosts deny=* execute 'rsync --daemon' on slave to run rsync as a daemon server. The rsync server commands used are in lib/masterslave.pm Optional: SSH trusted authentication mode. Make sure you have symmetric usernames. First, generate a public/private DSA key pair on the CLIENT. [CLIENT]% ssh-keygen -t dsa -f ~/.ssh/id_dsa When you are asked for a passphrase, leave it empty. Now send the public key to the SERVER. [CLIENT]% cd .ssh [CLIENT]% scp id_dsa.pub [USER]@[SERVER]:~/.ssh Next, log in to the SERVER and add the public key to the list of authorized keys. [CLIENT]% ssh [USER]@[SERVER] [CLIENT]% cd .ssh [CLIENT]% cat id_dsa.pub >> authorized_keys2 [CLIENT]% chmod 640 authorized_keys2 [CLIENT]% rm -f id_dsa.pub exit and test - a ssh connection to the server should be allowed with no password. then to rysnc files, use this syntax, note no '::' for rsync server type authentication. Uses ssh as the transport. Make sure you are logged in locally with the same username you use on the remote server. rsync -e ssh slave_hostname:/usr/local/var/* /usr/local/nmis/var
A memory mapped file cache is used to store reachability information form each slave owned collected device within the multithreaded environment. Install as usual from cpan. Note that the number of buckets should be set to be more than the number of nodes polled at that slave. If you are missing reachability records at the master, then check this parameter, in nmis.pl
To enable master/slave communications you must setup the preceding files and perl modules, edit /conf/nmis.conf and set master=true on the master server and slave=true on each slave server that is expected to communicate with the master.
The master needs static node information from each slave, such as node files, system.dat and node-interface.dat files. Basically all static information that is held in nmis/var directory. Rsync is used to copy this from each slave on a master type=update run.
Note that when the master reads it's local nodes.csv file into the hash, it also reads any slave_nodes.csv files into the same hash, and each slave node record is tagged with a 'slave=true' field name. This hash entry is used at the master to differentiate on a node basis, what is local, and what is remote.
You can use './nmis.pl type=master' to debug the rsync process. This use slaves.csv to get list of slaves to pull information from.
Information pulled from slave to master, by master: slave nodes.csv all from slave/nmis/var directory - excluding event.dat Master also posts it's nodes.csv file to each slave. [The copy of the master nodes.csv on the slave is not currently used - there for use if required.]
The master also requires dynamic information to display a summarised health view. On each slave collect run, reachability data is saved in a hash, along with any 'Node Down' events that are needed to colorize master dashboard. This hash is written to a memory cached file in the slave Cache::Mmap store, and sent to the master at the completion of the slave collect run, using Net::EasyTCP. A daemon running on the master, 'master.pl' , listens for the information, and writes a node reachability rrd on the master, for each slave node that is present in the hash.
Then the metrics process will see all the network nodes and groups, will find a local reachability rrd for each, and therefore generate a summary metric for all groups for all nodes, so a universal dashboard view is displayed, with all the information required to create it, available locally on the master.
Any 'Node Down' events on the slave are also sent, and a 'slavename_event.dat' file is written at the master. Thin event file is updated every 5 mins from the slave node, so a consistent copy of each slave 'Node Down' events is stored at the master. These slavename_event.dat files are read into the master event system for colorizing the master display. Note that any event escalation and acknowledgements are still required to be performed at the slave level. The master event display will read and display each slaves events, and event acknowledgement can be sent from the master event display to the slave.
Opportunities exist to send more event information from the slave to the master if required.
To test the collection and writing processes, just run the usual collect debug options at the slave, and additional debug will show what is happening. Running the master daemon with debug enabled will also post logs on what was received from who to the master nmis.log.
That should be it.