- Print
- DarkLight
ZSA (z/OS)
The z/OS components of each lpar must be monitored. In the event of an anomaly or when a command is placed, a message with a code is displayed in the job. It is therefore necessary to monitor the arrival of new messages and react accordingly. This monitoring can be automated according to production needs.
Main task to monitor
It is essential to check that the following tasks are active:
- DXQUEUE
- DXSMF
- DXPL
- DXPLTCP
For some tasks, it's essential to track down error messages indicating a malfunction, such as an initialization problem or loss of record.
Error/Warning to monitor
DXSMF
DXPL104E DXQUEUE NOT ACTIVE
Explanation: during the initialization of the ZSA SMF Interceptor, the DXQUEUE address space was not active. The DXQUEUE started task must start before the DXSMF.
System action: The DXSMF is terminated.
Operator response: Start the DXQUEUE and after start the DXSMF started task again.
Source: ZSA SMF Interceptor (DXSMF)
DXPL130W/DXPL131W/DXPL132W/DXPL133W/DXPL134W/DXPL135W REC. DISCARDED (FULL)
Explanation: DXQUEUE address space refused to buffer the record because it runs out of 64-bit memory. A similar message is sent when the discarded count achieves the values: 10, 100, 1000 and 10000.
System action: The SMF record information is discarded, and the worker task waits for the next record to process.
Operator response: Check if the DataLoader program is running on the ZETALY Sever to consume the records and consider increasing the memory limit on the DXQUEUE buffer through the MEMLIMIT JCL statement on the started task PROC. If the Dataloader is executing, it means there was possibly a peak of SMF records and DXQUEUE could not save them because it doesn’t have enough 64-bit memory. Consider increase MEMLIMIT JCL statement and recycle all ZSA address spaces.
Source: DXSMF worker task (DXSPROC)
DXPL150W/DXPL151W/DXPL152W/DXPL153W/DXPL154W/DXPL155W REC. DISCARDED
Explanation: The SMF record was discarded because the SMF exit could not find a free task to process the record because they were all busy processing other SMF records. A similar message is sent when the discarded count achieves the values: 10, 100, 1000 and 10000.
exit which is one of the following:
. U83
The IEFU83 exit issues the message.
. U84
The IEFU84 exit issues the message.
. U85
The IEFU85 exit issues the message.
. LOG
The SYSLOG exit issues the message.
System action: The SMF record information is discarded and the SMF exit returns the control to SMF.
Operator response: Consider increasing the number of work tasks of the ZSA SMF Interceptors in the PARM of the DXSMF PROC.
Source: DXSMF exits (DXSU83, DXSU84 or DXSLOG)
DXPL110W ABEND XXXX DETECTED
Explanation: ZETALY Streaming Agent started task has detected an abend XXXX and the recovery routine took over control. The recovery routine will release all allocated memory areas before finishing the program.
XXXX Abend Code.
Source: ZETALY Streaming Agent Started task DXSMF.
We strongly recommend to maintain the definition of JCL DD SYSMDUMP on DXSMF, as provided in the SAMPJCL
DXPL
DXPL008E DXQUEUE NOT ACTIVE - DXPL INITIALIZATION SUSPENDED
Explanation: DXQUEUE address space was not found, which is mandatory to activate TCP server
System action: The DXPL Subsystem interface is terminated.
Operator response: Check if DXQUEUE address space is active and start it if already not active.
Source: DXPL Subsystem interface (DXPLSSI)
How to avoid record loss?
Monitoring actions that can be done to anticipate error conditions:
Verify if data in DXQUEUE 64bit buffer is increasing
The data sent to the Dataloader is retrieved from DXQUEUE 64bit data buffer, so if the number of SLOTS in use is constantly increasing, probably there will be an out of space condition and SMF data will not be sent resulting in information not saved in the databases.
To anticipate this condition, ZSA has a modify command to monitor the 64bit data buffer and measure the availability of SLOTS. The command is F DXQUEUE,SHOWBUFF and the results are shown in message DXPL274I, like below:
DXPL274I MEMLIMIT MEMORY DEFINED xxxx MB
MEMLIMIT MEMORY IN USE xxxx MB
MEMLIMIT SLOTS ALLOCATED nnnnn
MEMLIMIT SLOTS IN USE nnnnn
MEMLIMIT SLOTS AVAILABLE nnnnn
Field | Explanation |
---|---|
MEMLIMIT MEMORY DEFINED | This message presents the MEMLIMIT defined for DXQUEUE, obtained from MEMLIMIT JCL parameter, SMFPRMxx definition or z/OS default. |
MEMLIMIT MEMORY IN USE | This message shows the actual 64bit memory utilization |
MEMLIMIT SLOTS ALLOCATED | Presents the SLOTS acquired and formatted by DXQUEUE |
MEMLIMIT SLOTS IN USE | Presents the SLOTS currently in use by DXQUEUE. Each SLOT has one CSV record generated from a SMF record |
MEMLIMIT SLOTS AVAILABLE | This message presents the remainder SLOTS available. When there are no more SLOTS, DXQUEUE dynamically acquires more using another 1Mb in 64bits memory. |
If the number of in use SLOTS is increasing, it is necessary to check DXPLTCP and the Dataloader.
Verify if SMF records are being intercepted and send to DXSMF
There is a modify command for DXSMF to measure the utilization of ECSA buffers. Like DXQUEUE, the command
is F DXSMF,SHOWBUFF and the result is presented on message DXPL175I
DXPL175I TOTAL ECSA BUFFERS=TTTTTT USED BUFFERS=UUUUUU
Explanation: Indicates the total amount of allocated and in use ECSA buffers managed by DXSMF, generated by command SHOWBUFF
TTTTTT Total ECSA buffers allocated.
UUUUUU Total ECSA buffers currently in use. Since DXSMF is very fast, it is normal to present 0 buffers in use.
The USED BUFFERS information should be as close to zero as possible, indicating the SMF records are being consumed properly. Each ECSA buffer uses 32K and the number of tasks cannot be higher than the number of buffers.
They can't be modified dynamically.
ZSA Backend
The ZSA backend exposes all its objects via REST APIs. It is therefore possible for any monitoring tool to interrogate these APIs in order to retrieve the status of the various objects and set up alerts.
All existing APIs are available via the following swagger at this path: https://ZETALY_HOST/zsa/api/v1/swagger/index.html
Login to ZHB
All APIs require a valid token in order to be queried. It is therefore necessary for the user to request this token from ZHB.
URL: https://ZETALY_HOST/zhb/api/v1/users/public/login
Verb: POST
Body:
{"username":"USERNAME","password":"PASSWORD"}
Response example:
{"token": "MY_TOKEN"}
Example of how to retrieve this token via shell script, curl and python:
token=$(curl -k -H "Accept: application/json" -H "Content-type: application/json" -X POST https://ZETALY_HOST/zhb/api/v1/users/public/login -d '{"username":"USERNAME","password":"PASSWORD"}' | python3 -c "import sys, json; print(json.load(sys.stdin)['token'])");
For security reasons, the user's token is regularly revoked. It may therefore be necessary to reconnect in the event of an http 401 code.
Call to ZSA
The token must be supplied in all requests via the "token" header. Example using shell and curl:
curl -ki -H "token: $token" -X POST https://ZETALY_HOST/zsa/api/v1/servers/MESSAGING_NAME/messaging/start
Monitor ZSA/ZSAC/ZQM
The first things to monitor are the three main modules.
This can be checked by calling the information APIs of these modules.
URL: https://ZETALY_HOST/zsa/api/v1/info
Verb: GET
Response example:
2023.11.29.1611
URL: https://ZETALY_HOST/zsa-connection/api/v1/info
Verb: GET
Response example:
2023.11.29.1611
URL: https://ZETALY_HOST/zqm/api/v1/info
Verb: GET
Response example:
2023.11.29.1611
If these three APIs return a code of 200, then the modules are fully functional.
Monitor connection to Mainframe
Connection
The purpose of a connection to a mainframe LPAR is to retrieve records (thus emptying the DXQUEUE component) and then add these records to the Z(etaly) Q(ueue) M(anager) module. These instances do not parse or insert records into the database.
Get all agent
Within ZSA, a mainframe connection is represented by an Agent. All agents can be retrieved using the API /zsa/api/v1/servers
URL: https://ZETALY_HOST/zsa/api/v1/servers
Verb: GET
Response example:
[
{
"name": "LPAR1",
"description": "Connexion zpdt",
"hostName": "10.44.145.45",
"portNumber": 9999,
"lparName": "",
"systemName": "",
"sid": "",
"type": 5,
"keepUnknownRecord": false,
"instanceQuantity": 1
}
]
Please note that parsers are also represented by agents. It is therefore necessary to filter the list obtained on the "type" property (keep only types 2 and 5).
Get messaging state
To find out the status of an agent, you need to access its Messaging* object. You therefore need to call the API /zsa/api/v1/servers/{AGENT_NAME}/messaging
URL: https://ZETALY_HOST/zsa/api/v1/servers/{AGENT_NAME}/messaging
Verb: GET
Response example:
{
"states": [
{
"name": "SLEEPING",
"detail": ""
}
],
"canStart": false,
"canStop": true
}
The states attribute contains a list of statuses. There is one status per agent instance.
Status list:
Status | Explanation |
---|---|
STOPPED | The instance has not been launched and is on hold. |
STARTING | The instance is in the process of being launched and does not yet process records. |
STARTED | The instance has been launched and is currently processing records. |
SLEEPING | The instance is started, but no record to process is available. |
STOPPING | The proceedings are currently being discontinued. |
ERROR | Instance stopped due to critical error. |
Statistics
For each connection, it is possible to access its statistics to monitor its proper functioning. To retrieve them, use the API /zsa/api/v1/servers/test/messaging/statistics
URL: https://ZETALY_HOST/zsa/api/v1/servers/{AGENT_NAME}/messaging/statistics
Verb: GET
Response example:
[
{
"threads": [],
"queues": [
{
"name": "Parsed records",
"max": 0,
"current": 0,
"enqueue": 0,
"dequeue": 0
}
],
"inserts": []
}
]
The queues object contains the current state of the ZQM registration queue. It also contains the number of registrations requested by this agent in the last 5 seconds.
Monitor parsers
Parsers
Parser agents are responsible for retrieving records from ZQM, parsing them and then inserting them into the database.
Get all agent
Parser tracking is identical to connection tracking. However, only types 3 and 6 should be retained.
Get messaging state
Messaging from parsing agents has a state identical to that of connections.
Statistics
The statistics API for parsers is the same, but contains much more information.
Response example:
[
{
"threads": [
{
"name": "Bulk Workers",
"threadStates": [
{
"state": "WaitSleepJoin",
"count": 11
}
]
},
{
"name": "Parser",
"threadStates": [
{
"state": "Running",
"count": 1
}
]
}
],
"queues": [
{
"name": "Records",
"max": 1000,
"current": 0,
"enqueue": 0,
"dequeue": 0
},
{
"name": "Insert",
"max": 2147483647,
"current": 0,
"enqueue": null,
"dequeue": null
}
],
"inserts": [
{
"name": "Waiting for bulk size or flush time",
"numberOfRecords": 0
},
{
"name": "Waiting for insertion",
"numberOfRecords": 0
},
{
"name": "Parsed in last 5 secs",
"numberOfRecords": 0
},
{
"name": "Parsed error in last 5 secs",
"numberOfRecords": 0
},
{
"name": "Parsed ignored in last 5 secs",
"numberOfRecords": 0
},
{
"name": "Inserted in last 5 secs",
"numberOfRecords": 0
}
]
}
]
Threads
Bulk Workers: These are the threads used to insert parsed records into the database.
Parser: These are threads that perform record parsing.
Queues
Records: This is the number of records currently within ZQM, as well as the number of dequeue records in the last 5 seconds.
Inserts: This is the number of records processed and inserted into an insertion queue.
Inserts
Counters for parsing, number of database insertions, etc.
Monitor load views
Les loadviews permettent d'effectuer l'aggrégation des données. Il est donc nécessaires de surveiller ces traitements et d'agir en cas de défaillance.
Get all loadviews
URL: https://ZETALY_HOST/zsa/api/v1/loadviews?type=Compress
Verb: GET
Response example:
[
{
"name": "CicsHist",
"properties": {
"remove_duplicates": "True"
},
"type": "Compress",
"views": [
{
"target": "CicsHist",
"properties": {
"Target": "CicsHist",
"Interval": "Hourly"
}
}
]
}
]
Get Loadviews State
URL: https://ZETALY_HOST/zsa/api/v1/loadviews/{LOADVIEW_NAME}/state
Verb: GET
Response example:
{
"state": {
"name": "STOPPED",
"detail": ""
},
"canStart": true,
"canStop": false
}
Status list:
Status | Explanation |
---|---|
STOPPED | Load views are not in progress, and the last launch was a success |
STARTING | Load views are being launched (waiting for connections to stop) |
STARTED | Load views are in progress |
STOPPING | Load views are being stopped |
ERROR | Instance stopped due to critical error. |