Abstract
An often underutilized and misunderstood attribute of IBM i5/OS® is the journal standby mode. This mode is available through the optional feature, Option 42, of the i5/OS operating system. Shops that use a high availability product often enable this mode beneath their replicated files on the target system.
While journal standby mode is a useful feature in the proper context, there are some possible subtle side effects. This Technote helps you think through those considerations and avoid, or at least mitigate, most of the potential gotchas. We also explain how important it can be to combine use of the INACCPTH(*ELIGIBLE) system-managed access-path protection (SMAPP) setting with any use of journal standby mode.
Written by Larry Youngren, Software Engineer
IBM Systems &Technology Group, Development
Development iSeries Journaling
Contents
Option 42
The IBM i5/OS® High Availability Journal Performance Option (Option 42) includes two modes:
- Journal caching
- Journal standby
Both features can be useful. In addition, both are aimed at helping to lessen the performance load that is associated with journal protection.
Caching mode is the better known of the pair and is actively used in thousands of shops. Caching mode combines consecutive journal entries (entries that are otherwise written to disk individually) into one continuous byte string, which is then written to disk in a single operation. Bundling in this fashion helps lessen the impact of the journal on performance. As a consequence, the caching mode is well understood and quite popular.
Standby mode is the lesser-known option. Standby mode sorts through the journal entries and discards most of them. It is generally enabled for the local journal on the target server when an object-replication-driven, high-availability environment is in place. Standby mode discards most of the journal entries because, until role swap ensues, the local journal on the target side is usually not yet designated as a critical recovery mechanism. Therefore, standby mode places most of the journal load on the target side in abeyance until you enable journal protection, which reduces the disk and CPU loads on the high availability replication software on the target side.
Both techniques are powerful approaches when used appropriately.
Selecting the correct option
It is becoming increasingly popular to set up a pair of IBM System i™ servers. One serves as the source or production server, and the other serves as the target or backup, ready to take over and help provide high availability. In such an environment, critical files on the source side are often afforded journal protection.
If you use a logical replication approach to keep a matching set of replica files in lock step on the target, it is often tempting to enable full-blown local journaling on the target side. Doing so helps ensure that the target is ready to take over immediately if the source side should be taken down. However, the extra performance load associated with such journaling on the target often seems wasteful until the role swap ensues.
To help mitigate such target-side overhead, journal standby mode can be enabled for the local journal on the target. As a consequence, the target side experiences little journal overhead until the mode of the journal on the target side is switched from standby mode to active. That transition generally occurs only at role swap time. Meanwhile, journal caching mode (the other half of Option 42) is often enabled on the source.
To enable standby mode, you can use the JrnState keyword in the CHGJRN command as shown in the following example:
CHGJRN... JRNSTATE(*STANDBY)
At role swap time, to exit standby mode and move the local journal on the target side to normal mode, you can simply reverse the process:
CHGJRN ... JRNSTATE(*ACTIVE)
To standby or not to standby: That is the question
Although standby mode is powerful and appropriate in the right circumstances, it has some aspects that you must consider.
Standby mode limits what flows into the journal and therefore prohibits certain target side operations (those that expect or demand journal entries to appear). These operations include commitment control, because rollbacks need a before image in the journal. They also include any operations that use subtle commitment control such as referential integrity enforcement. If either are being employed actively, perhaps by a HA replay product, on the target side, standby mode is inappropriate and use of caching mode on the target is a more appropriate choice.
Standby mode similarly does not allow the deposit of hidden journal entries, as contributed by SMAPP support. The consequence is that SMAPP is likely to size up and try to protect large database indexes on the target side with hidden journaling while the HA replay actions are actively adding or removing keys from the associated indexes, which puts the indexes “at risk.” When SMAPP senses the standby mode and realizes that it cannot protect the indexes, it searches for other indexes it can protect (such as those that are associated with physical files that are not attached to a standby mode journal) to compensate for those that are “ineligible." This can have negative performance consequences on the target side, especially if there are a lot of small indexes as evidenced by the busy behavior in the background of SMAPP-related SLIC tasks such as those shown in the following figure.
In this example, standby mode is preventing the SMAPP-related SLIC tasks from protecting the larger indexes. Therefore, they are busy searching for smaller indexes to compensate for the ineligible indexes.
Although it is tempting to use caching instead of standby mode on the target, there are other alternatives.
Suppressing SMAPP
If you chose to use standby, how can you get SMAPP to refrain from trying to compensate for ineligible indexes?
One option is increasing your overall SMAPP recovery time objective on the target. Consider the fact that should the target side go down, you can still keep working on the source side. If you are willing to accept the extra time it takes to do another initial program load (IPL) or recover the target side, you can use the Change Recovery Duration for Access Paths (CHGRCYAP) command as shown in the following example:
CHGRCYAP SYSRCYTIME(150)
Note: The higher you set your target system recovery time, the fewer ineligible indexes that SMAPP tries to compensate for, which lessens the SMAPP-related overhead you will experience because of the journal standby mode.
Another option is to tell SMAPP to ignore the indexes that are defined over underlying physical files that are attached to a standby mode journal. To do this, use the INACCPTH(*ELIGIBLE) setting for the CHGRCYAP command on the target as shown in the following example:
CHGRCYAP INCACCPTH(*ELIGIBLE)
Note: This command instructs SMAPP to focus only on the truly eligible access paths (that is, those that are unaffected by the presence of the standby journal).
If you take one of these custom actions, then standby mode is effective at run time and replay time. There are problems, however, if the target side goes down abruptly without saving the contents of main memory. At that time, the ineligible access paths are flagged as invalid, are displayed on the access path recovery screen, and can be rebuilt from scratch by background QDBSRV0x jobs.
In summary, journal standby mode is a good option to use, but be wise in configuring and using it. Think carefully about employing INACCPTH(*ELIGIBLE).
The other half of Option 42: Caching mode
Caching mode is a performance booster and takes only a small amount of additional main memory (generally about 128K). Its only downside is that, should you encounter a lull (a time when the arrival rate of journal entries suddenly slows to a crawl), the cached journal entries (and corresponding database changes) linger in the main memory longer than usual. However, the risk is rather modest because in V5R3 and V5R4, there is an SLIC background sweeper task that watches the clock and limits the amount of time that such journal entries can linger in the cache. So, even if the 128K buffer takes a long time to fill to capacity, the sweeper assures that the cache is flushed based ontime alone.
This task similarly assures that, in a remote journal environment, the cached journal entries on the source are available in a timely fashion. Also, because the cache is being employed on the source side, the resulting packets are likely to be fewer, wider, and more efficient with less surrounding chaff.
Caching is also a good option on either the source or target.
Summary
Option 42 provides both journal caching mode and journal standby mode. When used wisely, you can use both modes, one on the source system and the other on the target system, to help improve your journal performance.