When is 20 milliseconds too long?
A while back we had a production problem. A single dasd volume seemed to be the issue. I observed a 21 minute period of time in which a heavily hit device had only one pav. RMF showed 97-98% disconnect time. IOSQ time was over 1000, and got to 1800. After waiting 21 minutes, we manually added a pav to this device, and IOSQ time decreased to 7.8. A few minutes later we added another and IOSQ decreased to .099. It is now .000.
To be certain, there were other issues. As it turns out, the reason the device got so heavily hit was an unauthorized batch job running against the production database during the day. However, our biggest concern was that we had apparently misunderstood what WLMPAV did. Several of the devices on this CU had PAV's. We discovered that the number of PAVs: 64 to Devices: 192, could be low, and that is something we can address. That might be one reason why he wasn't getting a PAV.
I opened a PMR and got the following from IBM:
WLM does not add an alias to help a device with IOS queueing under the following conditions:
- The IOS queueing is due to a pending reserve.
- The IOS queueing is associated with high pending time on the local system usually indicating channel constraint, or high
disconnect time indicating constraint at the device level.
Further probing on my part garnered:
"When we are checking whether we should add an alias, if we have 20 msec disconnect we take that to be a threshold to indicate if we were to add another alias we might reduce IOSQ time but would end up just increasing disconnect time, so no overall improvement, so we don't make the change."
IBM encouraged me several times to implement Hyperpav, and though I would love to, the feature isn't on this dasd and isn't likely to be. (I'm picking my battles).
20 milliseconds doesn't seem like a very long time. Until it is.
Your link today is the one recommended by IBM on Hyperpavs:
Also, my thanks go to Sam Knutson, my sounding board during this problem.
Till next time,