Thursday, July 2, 2009

Load Balanced AR System Servers and Alerts

I thoroughly enjoy a good mystery! I was called in to help with a situation that arose today where an automated process that creates tickets in an AR System application server platform started to experience delays of several minutes. The platform happens to consist of three servers (knox, nova, and sandor) which comprise a single AR System Server Group (yamato). It was decided to reconfigure the ticket creation utility so it would interact directly with one of the member servers in the server group - knox. Everything seemed to work great and tickets were being created in a very timely manner.

With the holiday weekend coming up, it was hoped that this slight configuration tweak would allow the automated process to continue to work creating tickets until everyone returned from the holiday break. As is typically the case in these types of situations, Murphy's Law rears it's head.

This automated process uses the AR System Alerts framework as the mechanism for the application server to make it "aware" of changes made to tickets it creates. Reports started coming in from the end-user community indicating that information displayed by the automated process was out-of-sync with the data visible in the ticketing application. After doing a little sleuthing of our own, it was discovered that alerts generated by actions performed by users who were also connected to knox (the same application server to which the automated process is connected) were being delivered. Alerts initiated by users' actions on either nova or sandor that should have resulted in alerts being distributed were not being received by the automated process.

Turning on Alert logging and a subsequent review of the log files generated brought the following error message being reported on nova and sandor to light.

Error (99) in binding send socket for address <10.10.2.10:1310>

In an effort to rule out the possibility of there being an issue with the automated process' mechanism of receiving alerts, I started up an Alert client and signed into yamato. Using the BMC AR System User Tool also connected to yamato, I triggered an alert and it was delivered almost instantaneously to my Alert client.

I reconfigured my Alert and User Tool clients to connect directly to knox. Again, I triggered an alert for my account and it was delivered almost instantaneously. I modified the account settings for my User Tool so it would connect to nova. The Alert client remained connected to knox. When I initiated an alert via the User Tool connected to nova, no alerts were delivered.

As I was able to reproduce the problem using only BMC Remedy client software, I stopped looking at the automated process as the source of the alert delivery problems. After a quick scan of the "BMC AR System Server 7.1.00: Configuring" guide, I discovered the crux of the problem.

When knox, nova, and sandor were originally setup, the ar.conf file for each of the application servers had a single Map-IP-Address entry which mapped the IP address of the load-balanced farm (yamato) to each of the application servers. In addition to this entry, two additional Map-IP-Address entries should have been made - one for reach of the other application servers in the AR System Server Group.

After making the requisite changes to the ar.conf file and restarting the BMC AR System application server processes running on knox, nova, and sandor the automated process started receiving all of it's alerts. All was right with the world and my client's on-call support personnel started to get comfortable with the notion that they may get to enjoy the holidays without interruption from their workplace.

For more detailed examples, have a look at the "Error (99) in binding send socket for address" article on my company's web site.