Setting Up parsers

Parsing Data From Your Datasources

ZSA currently assumes that no data can be handled as is, and will try to parse each data entry before saving it in database. This is the case for mainframe SMF records, which is handled by the ZSA parser.

Starting the Parser

Note:

The parser currently only handles CSV records produced by the mainframe part of ZSA.

The parser has its dedicated Parser List that resembles the connector one.

All buttons function as they do for the connectors.
Once you start the parser with the play button, the Messaging will start parsing data. See this page to make them available to the other Zetaly apps.

Configuring the Parser

As with connectors, the number of instances for the parser can be changed using the edit function, and clicking on it opens the dialog box.

The dialog box contains the following fields:

Number of retries & Retry Delay - number of retries and their delay (in seconds) in case of failure.
Number of records - defines the number of records to be parsed at once.
Poll interval (s) - defines the time (in seconds) to wait before attempting to parse records if none are waiting to be parsed.
Insert flush time (s) - defines the maximum delay in seconds to wait before inserting in the Zetaly database.
Bulk insert quantity - defines the number of parsed records that will be inserted at once. The flush time is not awaited if this amount is reached.
Statistics interval - not related to parsing. It defines interval (in milliseconds) at which statistics are produced by ZSA (displayed using the button (*) on the parser agent - see ZSA Parsing Statistics). This must be the same or less than the interval defined in the CSV_Parser window (see ZSA Parsing Statistics).
The Autoscale instances option automatically creates a new instance when the Usage Value reaches 90%. (read more here)

ZSA Parsing Statistics

Expand the row to see the statistics of the ZSA parser. This is used to control the parser operation and can help you determine if more instances are needed.

Parsers Charts - CSV

Multiple graphs help you track the performance of the CSV parser:

(1)The "Parsed records" graph shows the number of parsed records in the last 15 seconds, and also the ignored records (due to unknown types, which is rare) and error records.
(2) The "Inserted records" graph displays the number of records waiting for insertion and those that have been inserted.
(3) The "Usage" graph displays the average activity percentage of all instances of the parser. If the usage value exceeds 90% for more than 5 minutes and autoscaling is enabled, a new instance will be automatically created.
(4) The "Queue" graph displays the number of processed messages (dequeue) and the number of stored messages (current).

Parsers Charts - RAW

Multiple graphs help you track the performance of the RAW parser:

(1) The "Parsed records" graph shows the number of parsed SMF in the last 15 seconds, and the errors too.
(2) The "Queue" graph displays the number of processed messages (dequeue) and the number of stored messages (current).
(3) The "Inserted records" graph displays the records inserted into the database, and error records and also the records without a parser or records with multiple parsers.
(4) The "Usage" graph displays the average activity percentage of all the instances of the parser. If the usage value exceeds 90% for more than 5 minutes and autoscaling is enabled, a new instance will be automatically created.
(5) The "RAW records bunches" graph displays the number of processed groups of messages (dequeue) and the number of stored groups of messages (current).

How to find the right configuration

The current architecture introduces 2 buffers:

one at the mainframe level: DXQ
one at the Linux level: ZQM

Purpose of connectors and parsers tuning is to free-up those 2 buffers fast enough.

Connector and parser

The goal of the collect processes is to collect all data from the mainframe, parse it and insert it into the database. There are therefore two possible points of contention:

A given LPAR or the sum of all LPAR generate too much data, and we have a "network" contention between the DXQ process and the ZQM process
- Increase the network bandwidth
- Reduce the collected data (ZSA option)
- Increase the DXQ buffer size
- Increase the ZQM buffer size
- Increase ressource allocated to the Linux ZSA processes
The parser doesn't process fast enough
- Reduce the collected data (ZSA option)
- Increase the ZQM buffer size
- Increase ressource allocated to the Linux ZSA processes
Database insertions are not fast enough
- Reduce the collected data (ZSA option)
- Tune the database infrastructure to allow more write operations in parallel

Parser process

Open the parser statistics, observe the graph named "Queues". The "Number of stored messages" curve should be stable and close to 0. If this is not the case, then you have a potential problem. Performance can be enhanced via two properties:

Number of records
Quantity of instance

Increasing number of records will reduce the number of calls made to ZQM, thus reducing the incompressible network load and improving response time by reducing the number of calls. The default value is 5000, but it can be increased drastically (100,000, for example). The aim is to ensure that the processing time for a defined message packet takes no more than a few seconds.

If you're in a saturation situation, you can check that the value you've set is not too high. Open the "Queues" graph, look at the value of the "Number of processed messages" curve and divide it by 5, then divide the value obtained by the value of "Number of records". This will give the number of seconds required to process a message packet. This number must remain below 5 (note that this number is only valid in the event of saturation).

If you've already increased the "Number of records" property but you're still in a saturation situation, you can increase the number of parser instances. This will allow more CPU to be allocated to message processing. Beware, however, that increasing the number of instances increases the competition for ZQM access, which can slow it down. The aim is therefore to increase the number of instances to benefit from parallel processing, while avoiding overloading ZQM access.

Open the "Queues" graph, look at the value of the "Number of processed messages" curve. At the same time, increase the number of parser instances. You should see an increase in this number after a few minutes. As long as you remain saturated, repeat the operation. If the number no longer increases, or even decreases, then return to the previous value. If this doesn't resolve the ZQM saturation, please contact our support team with your configuration, analysis and environment specifications.

Parser insert into database

Open the parser statistics, observe the first graph named "Inserted records". Take a look to the value of "Waiting for bulk size or flush time". This value should be stable and close to zero. If this value increases and doesn't seem to decrease, then you need to modify your configuration. This is made possible by two properties:

Bulk insert quantity
Quantity of instance

Increasing the "Bulk insert quantity" value will reduce the number of calls made to the database for mass-produced SMFs. This value can be increased to several hundred thousand if necessary. Be careful, however, as this will increase the RAM consumption of both ZSA and your database.

If you've already increased the "Bulk insert quantity " property but you're still in a saturation situation, you can increase the number of parser instances. This will allow more CPU to be allocated to message inserts. Beware, however, that increasing the number of instances increases the competition for ZQM access and database access, which can slow it down. The aim is therefore to increase the number of instances to benefit from parallel processing, while avoiding overloading ZQM and database access.

Open the "Inserted records" graph, look at the value of the "Inserted in last 30 secs" curve. At the same time, increase the number of parser instances. You should see an increase in this number after a few minutes. As long as you remain saturated, repeat the operation. If the number no longer increases, or even decreases, then return to the previous value. If this doesn't resolve the database saturation, please contact our support team with your configuration, analysis and environment specifications.