Real time monitoring of Broadcast infrastructures
Concepts and implementation
An innovative functionality was added to the monitoring software LoriotPro in June 2012. This greatly improves the performance of SNMP collection and increases the refresh speed of the display and this even with important volumes of data to collect. The number of collection can grow, collection processing times can increase, the capacity and speed of processing and display will remain optimal. Parallel processing (multithreading) and the many tasks performed enables scalability by adding new process of collection and possibly more CPU in the system.
With this internal new architecture of LoriotPro, hundreds of SNMP collects can be considered with periods of the order of seconds, and the visuals (Active View) can reach refresh frequencies of this level. It is now possible to have extremely dynamic visuals (Active View) that can alert you in a blink of an eye of a malfunction in your network or broadcast playout.
The technical concept
As a reminder, the data collection equipment on the system to supervise mainly operates through SNMP. This protocol allows the retrieval of status indicators and performance via SNMP agents present on the equipment and systems. Response times to queries agents are quite unpredictable, so you cannot really predict how long a SNMP query for LoriotPro will get a response. Each collection process has a maximum limit of waiting (called timeout) beyond which it considers that the agent is not responding. If a single process is responsible for the collection so that it performs the sequential, performance may not be the appointment, knowing that a hundred collections can take several seconds to several minutes.
Let's summarize the context: We know that what we call the collections "tasks" have very random execution times ranging from a few milliseconds to several seconds. Furthermore we wish to achieve the collection periodically and on tight intervals (polling period), the order of seconds for some performance indicators.
Implementation principle: All tasks (collection) to be performed are grouped within a single program. This one details for each task, the type of collect to achieve (GET SNMP MIB object). Generally we use SNMP objects but other types of collection can be made from log files extraction, from SQL databases queries, TRAP meter reading, etc.. Note that the collects may come from global variables already in memory, which allows to treat by correlation.
To accomplish these tasks, a variable number of processes can be dedicated. In principle, higher is the number of tasks to perform and higher is their repetition frequency, higher the number of processes must be. The processes in question are instantiated as needed (Audit Process Plugin LoriotPro) to assume a quasi-parallel processing.
Here's a simplified example with two processes responsible for three collections. Collections are made at different time intervals, collection durations are also supposed to be variable. Both processes (audit process) support collections based on their availability. Once they have finished their job, they go through the list of collections and attributes to which the first polling period (polling) has expired and who is not already assigned.
This example shows the ratios between polling period and duration of disproportionate treatment. Usually the ratio thereof is from 1 to about 100. In the event that an SNMP agent is working properly, collection is of the order of several tens of milliseconds and the polling intervals between 1 and 15 seconds. Some delay in treatment may occur if the collection duration increases or if the number of collections increases.
Ideally it is necessary that the sum of the ratios between the execution time and the polling period of all collections is less than the number of processes available for processing.
The values of the collection are then stored in a block of global variables directly in memory. These variables are accessible everywhere within LoriotPro especially in a visual Active View.
The components of the architecture
The components of the architecture are presented in the diagram below:
On one side the equipment on which collections are made, the other a visual which should reflect as quickly as possible the working statutes of this equipment.
The collection processes (Audit Process) are constantly looking for collection tasks to perform in a kind of "job list". The job list is a list of collections to make and describes how to programmatically perform. The collection process record the values collected on equipment but not only (cf. correlation) stored in the corresponding global variables.
The ActiveView for their part have their collection process that gather, based on a collection interval positioned on each graphical object, the values of global variables. Depending of the returned values of the variable, graphic objects are modified (background color, text, position, clipart, etc..) to notify the administrator of a state change.
Correlation of variables
We use the term "correlation" to identify a variable resulting from the treatment of several other variables. There are several possible levels of correlation in the solution.The correlation can be achieved by collecting several indicators sequentially, a set of SNMP variable for example, then by the processing of their values by some mathematical operations and finally by storing the result in another global variable.
The other option consists of taking existing global variables, correlating them and to storing the result in another global variable (Example below).
The combination of these two options is also possible.
The correlation process is a task similar to the task of collecting and is inscribed in the Job List. Unlike a collection, treatment is extremely fast because it manipulates variables in memory and does not include network access time or treatment by remote devices.
Structure of the collection processes
The collection program is multi-instance capable because it can be launched simultaneously by separate processes (multi-task). To avoid any contention during update time and guaranty that a global variable is manipulated at a given time by a single collection process, a lock (semaphore) is assigned to it. The simplified logical diagram given hereafter follows the logic of the code. At first, global variables manipulated are checked one by one. If they do not exist, they are initialized with default values.
Then for each variable the audit process checks if a lock is present, if this is the case it goes to the next variable in the list if it sets a lock on it. The lock prohibits the change of the variable by another process. The program imports the variable (structure) for the handle. It then checks the polling interval for this variable, if it is expired it start the collection from the value on the device (if SNMP). The variable is updated and the lock released.
The audit process that scans the list of jobs, monopolizes all jobs whose variables are not locked.
Global variables are structures containing multiple information.
Here under are the details of this structure
Example of implementation
In the implementation example, we have an Active View that will display the status of three devices and a fourth element correlating the status of the three first.
Three colors are used for status: normal / green, major / orange, critical / red.
Classically the correlation element inherits the most critical color of the three devices.
The first step is to create the program that will execute the process of collection and correlation (audit process).
To create a new collection program, select in the menu option "Tools" option "Script Editor".
Save the LUA scripting program under a new name in the default directory / bin / config / script /
The first part of the program is for each variable to verify their existence and initialize them if necessary.
Code example for initializing the variable Global GlobVar1.
The LUA function lp.isValue() verify the existence of the variable (GlobVar1) in memory.
If this one doesn't exist, it create it an initialize its value. A Globale Variable is a structure as defined previously . Param1 and Parma2 are used to define in this example, the SNMP object name and the IP address of the host on which the collect is performed.
The function lp.SetValueParam1() and lp.SetValueParam2() allow the modification of this structure fields of GlobVar1.
Then we initialize the polling period of the variable to 5 seconds with lp.SetValueInterval function.
We repeat this code structure for the other two SNMP variables to collect, GlobVar2 and GlobVar3.
Note that the variable GlobVar4 which is the result of a correlation receives the names of other variables such as value Param2.
The existing variables having been initialized, you can add the code for the collections. A global variable is associated with each piece of code with a specific structure as shown in the flowchart presented in the chapter " Structure of the collection processes"
Here is the code structure for variable GlobVar1.
Let's decompose the structure of this piece of Lua program.
Found in the first test on the Global Variable.
lp.TriggerValueLock (...) - This function allows to know if a Global Variable is locked and allows to lock it or unlock it. If the variable is locked you leave this portion of code without further action.
lp.StatusOfValue(...) - This function retrieves in a LUA table, the contents of the variable to manipulate the code later. The first variable is the name of the global variable, the second the name of the variable in the LUA code. The names may be different but for readability we kept the same (GlobVar1).
Then we check by this line of code if the collection interval for this variable is due, If this is the case, we launch a new collection.
if (os.time()-GlobVar1.time>=GlobVar1.interval) then
We perform hereafter the actual data collection. This code is based on what you want to collect. In our example, we collect a variable (SNMP MIB object) on a device. The name of the SNMP variable to collect has been stored as the value of the global variable in Param1 GlobVar1.
The Lp.Get function allows to find on a host a unique MIB object with a SNMP get.
If we replace the variables by their value we obtain a simple SNMP query:
In return of the query we get a value of type double "val" and a value of type string in "buffer".
We must actually know the values returned by the MIB object to be able to encode more.
We extract this description of the object presented by LoriotPro. To achieve this, you must select the object in the MIB tree then the menu option to "show object properties."
Knowledge of the values returned will be needed to configure the graphic object Image ActiveView. Here the values can be: up, down, testing, unknown, sleeping, notPresent, lowerlayerdown.
The buffer can be simply stored in the Global Variable with the command:
Similarly, we record status values by:
This value can be used if necessary in ActiveView to display another status level.
The same code can be used for other variables to collect, and GlobVar2 globvar 3.
GlobVar4 global variable should be treated differently because it is the result of a correlation of the three others.
By design we defined as:
GlobVar4 may have three different numerical values:
1: corresponding to the "UP" state of the three interfaces
2: If at least one of the objects collected does not respond to SNMP requests, the value "error"
3: In all other cases, mainly the value "down"
The code assignment and collection of GlobVar4
In this mode, the Variable GlobVar4 always reflect the worst condition of any of the three other variables.
The proper functioning of the program can be tested directly from the menu of the Script Editor by pressing F5.
In case of fault in the code, errors are returned directly to the output window and as a message in the SYSLOG tab.
If you do not have mistake, you can check the status of your global variables from the Insert -> Insert Global Variable menu script editor LUA LoriotPro.
We note that the three queries return a "up" value and that the resulting correlation value is 1
The configuration of the ActiveView can begin at this stage.
An Active View has properties that define the frequency of updating the graphics objects (background color) and the frequency of collection of values associated with these objects. For a performance target updating graphic visual objects is controlled by a separate process from the one in charge of the execution of collections (defined by an expression) of each object.
There are therefore two important parameters for varying periods of updating and collection:
Refresh MAP Every : Time interval between two refresh of all the graphic objects, the colorization rules are scanned and the color or the position of the object are changed.
Run Thread Every : Sweep interval for objects to execute their collection expression of the value associated with this object.
Scan interval of the objects for the execution of their expression of value collection associated with this object.
Depending on the edition of LoriotPro minimum values are different:
LoriotPro Broadcast Edition : 40 ms
LoriotPro FREE,LITE,STANDARD,EXTENDED : 5 secondes
To change these values in the Active View menu, then select Edit Options ActiveView
In principle, we choose values with slightly lower frequency of collects than the lowest audit process running interval (object interval Global Variables).
Refer to documentation for our basic notions of inserting a new Active View in the directory, then to insert graphic objects in the display.
Configuration of graphics objects corresponding to our global variable GlobVar1.
Object is selected in the view and called "properties" by right click.
You then select Dynamic Appearance tab to configure the object.
You check the box Activate Dynamic Aspect.
It selects a collection period for this object, one second in our example.
It selects the Get Global Variable drop-down menu.
In the list, we select GlobVar1
Then we insert the rules of coloring of the object based on the value returned by the global variable and in this case the same values that the query returns on the SNMP MIB object ifOperStatus.
Immediately after, the object must turn green according to the value of the Global Variable (must be up).
We repeat the same principle of configuration variables and GlobVar2 GlobVar3.
For variable GlobVar4 we must take into account the return value (1,2,3) only.
Attention, as we read the numerical value of the variable expression syntax is slightly different
We now have a complete visual:
By disconnecting the ports or forcing variables directly in the script, it is possible to check the proper functioning of the dashboard. In the following visual, we see that the variable GlobVar3 imposes its status (red) to the variable GlobaVar4.
Adding collection process
To complete our development work, we must now add the collection processes (Audit Process). It is necessary to have at least one process for gathering data to update the global variables. Then, the number of collection processes to launch in parallel, must be evaluated according to the number of collections, the execution time of each collection and the collection frequency.
A collection process simply executes the LUA script program that we created earlier, at regular intervals. The execution frequency should be lower than the lowest frequency of polling of the global variables.
To add a process (Process Audit), select the object in the directory LoriotPro, invoke the context menu (right click). Select "Insert Task" then "Advanced TCP / Audit Polling."
An alternative is to select the object LoriotPro in directory and then call the Properties otpions.
The configuration window is displayed.
Select "Process Audit" as "Polling Type".
Select the frequency of execution (1 seconds)
Make sure the checkbox "Enable Process" is checked.
In the "Process Audit", select "Audit ref" and the value 901. Or use the Wizard to select it.
In the "Parameter", enter the file name of the script program created in the previous phases.
Programs scripts are located in the directory / bin / config / script / LoriotPro
Leave by "OK", the process runs on
Several tools exist to control the functioning of collection process (process audit).
A window is dedicated to the control of the execution. It is accessible from the main menu option "Configure" option, then "Audit Process". This window also allows you to stop all active audit process.
Within the ActiveView, it is possible to have a state of all dynamic visual objects.
From the ActiveView Menu, select "Help" and then "Statistics".
The Active View of the LoriotPro management solution for the realization of Maps, synoptic, dashboard, react quickly to any status change in your network, your playout or broadcast facilities. With this powerful technical concept that the Global Variables are, hundreds of performance indicators, including SNMP, can be monitored in extremely short cycle times of a few hundred milliseconds.