Why did remote journaling end?
Probably the number one question remote journal users have is “Why did remote journaling end?” This question is not as hard to answer as one would think.
Every time remote journaling ends, the operating system sends a message to the journal’s message queue. The Work with Journal Attributes (WRKJRNA) command indicates the message queue associated with the journal. The Display message queue (DSPMSGQ) command can be used to display the messages on the queue. Sometimes user or third party applications monitor the journal message queue for messages that need action, and remove them from the queue. Because of this, starting in release 6.1 of the IBM i operating system, the remote journal messages are also sent to the system history log. The system history log is now the first place to look for messages. The Display Log (DSPLOG) command can be used to see messages in the system history log.
On the source system a message is sent to the journal message queue when remote journaling ends. The reason code in this message details why remote journaling ended. If the reason is related to a communications problem (reason code 2, 3, or 4 in CPF70D5) then a diagnostic CPExxxx message is also sent to the journal message queue detailing the exact communications problem. If the reason code in the CPF70D5 message is 20 (target side error), then the problem occurred on the target system and the messages on that system are the key to determining what happened.
On the target system a message is also sent to the journal message queue when remote journaling ends. The reason code in the message details why remote journaling ended. Again, if the error was related to a communications problem, then a diagnostic CPExxxx message will also be sent. If the message is CPF70DB, CPF70D7, or CPF70DC, there should be another message related to the error. Take note of the job that sent this message (often QDBSRV02 or QDBSRV03). In this situation, the real key to why remote journaling ended might be in the job log of the job that sent this additional message.
To see the messages related to remote journaling on the source system, use the following Display Log (DSPLOG) command:
DSPLOG MSGID(CPF70D3 CPF70D5 CPC6984 CPC6983 CPF70C5 CPI7012 CPI7016)
To see the messages related to remote journaling on the target system, use the following DSPLOG command:
DSPLOG MSGID(CPF70D4 CPF70D5 CPF70DB CPF70DC CPF70D7 CPC6983 CPF70C4 CPF70C5 CPI7012 CPI7016)
The above DSPLOG commands reveal the time frame of the error and return messages detailing the history of the remote journal environment.
If the reason code in the remote journal ended message indicates a communications error, then display messages from the history log in the time frame of the error to see any diagnostic CPExxxx messages.
Why did remote journaling fail to activate?
Another question remote journal users have is “Why did remote journaling fail to activate?” If remote journaling fails to activate, there should be messages in the job log of the job that was performing the activation to indicate what caused the failure. Usually the key to what went wrong is in these messages. Occasionally, the reason why remote journaling failed to start can be found in the target job. In the job log of the job performing the activate request, there will be a message (CPI9155) indicating the job on the target system that was involved in the activate request. Sometimes the messages in this job log can help determine what is wrong. There is also a message in the target job (CPI9152) that notes the job on the source system that is related. This can be used to verify that you are looking at the correct job log on the target system. Another useful piece of information is the names of the journal receivers that exist on the target system. Usually the names of these journal receivers, along with careful reading of the error message, is all that is needed to determine what is wrong.
Is my request to activate remote journal stuck?
It can take a while to activate remote journaling when a large amount of existing journal data needs to be sent. The activate request sends a status message when beginning to send entries for a given journal receiver, but those status messages are infrequent when large journal receivers must be sent. Sometimes it appears that progress is not being made and the activate request is ended prematurely. How do you know if it is progressing?
To see if progress is being made by remote journal simply display the receiver attributes of the journal receiver attached to the remote journal. If the number of journal entries for the journal receiver is increasing then remote journaling is progressing. There may be some time between updates to this number so sample it for a minute or two before concluding that progress is not being made. From the source system it may very well look like things are stuck when activating, but rest assured that remote journaling is sending entries as efficiently as possible during this catch-up time.
Why is remote journal falling behind sending entries?
Another frequent question concerning remote journal is “Why is remote journal falling behind sending entries?” The primary reason that remote journal is falling behind is likely that the communication line is undersized. Another common reason for remote journal to fall behind is that the target system is undersized and cannot keep up. Another common reason for remote journal to fall behind is problems with the communication network that result in retransmissions of data. To see if retransmissions could be a factor, use the Work with TCP/IP Network Status (NETSTAT) command to check the number of retransmissions. Ideally you would see no retransmissions.
On the Work with TCP/IP Network Status menu select the appropriate option for your remote journal connections: option 3 for IPV4 connection status, or option 6 for IPV6 connection status. On the IPV4 connection status panel, look for local ports with the value “rmtjour >” or 3777. These are the remote journal connections. Option 5 will display the connection-specific information. The retransmission information is available on the resulting panels. On the IPV6 connection status panel, look for local ports with the value “3888.” If there are a significant number of retransmissions, or if the count is changing, it is likely that a communication network problem is the source of your remote journal performance problems.