Network Management

Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance, and provisioning of networked systems.

* Operation deals with keeping the network (and the services that the network provides) up and running smoothly. It includes monitoring the network to spot problems as soon as possible, ideally before users are affected.

* Administration deals with keeping track of resources in the network and how they are assigned. It includes all the "housekeeping" that is necessary to keep the network under control.

* Maintenance is concerned with performing repairs and upgrades - for example, when equipment must be replaced, when a router needs a patch for an operating system image, when a new switch is added to a network. Maintenance also involves corrective and preventive measures to make the managed network run "better", such as adjusting device configuration parameters.

* Provisioning is concerned with configuring resources in the network to support a given service. For example, this might include setting up the network so that a new customer can receive voice service.


A common way of characterizing network management functions is FCAPS - Fault, Configuration, Accounting, Performance and Security.

Functions that are performed as part of network management accordingly include controlling, planning, allocating, deploying, coordinating, and monitoring the resources of a network, network planning, frequency allocation, predetermined traffic routing to support load balancing, cryptographic key distribution authorization, configuration management, fault management, security management, performance management, bandwidth management, Route analytics and accounting management.

Network Monitoring


The term network monitoring describes the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator in case of outages via email, pager or other alarms. It is a subset of the functions involved in network management.

While an intrusion detection system monitors a network for threats from the outside, a network monitoring system monitors the network for problems caused by overloaded and/or crashed servers, network connections or other devices.

For example, to determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a page; for email servers, a test message might be sent through SMTP and retrieved by IMAP or POP3.

Commonly measured metrics are response time and availability (or uptime), although both consistency and reliability metrics are starting to gain popularity. The widespread addition of wan optimization devices is having an adverse effect on most network monitoring tools -- especially when it comes to measuring accurate end to end response time because they limit round trip visibility.

Tools

Nagios:

is a popular open source computer system and network monitoring software application. It watches hosts and services, alerting users when things go wrong and again when they get better.

Nagios, originally created under the name NetSaint, was written and is currently maintained by Ethan Galstad, along with a group of developers actively maintaining both official and unofficial plugins. N.A.G.I.O.S. is a recursive acronym: "Nagios Ain't Gonna Insist On Sainthood", "Sainthood" being a reference to the original name of the software, which was changed in response to a legal challenge by owners of a similar trademark.

Nagios was originally designed to run under Linux, but also runs well on other Unix variants. It is free software, licensed under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

Website: www.nagios.org

Ganglia:

is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. It allows the user to remotely view live or historical statistics (such as CPU load averages or network utilization) for all machines that are being monitored.

It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.

The ganglia system comprises two unique daemons, a PHP-based web front-end, and a few other small utility programs.

Website: www.ganglia.info

Links

Network management - Wikipedia
Network monitoring - Wikipedia
Comparison of network monitoring systems - Wikipedia
Nagios - Wikipedia
Ganglia - Wikipedia
Nagios Website
Ganglia Website