I was recently introduced to a problem involving a Blackberry Enterprise Server (BES) and Microsoft Exchange Server 2003 in which the Exchange Server was acting up and the source of the problems was semi-obscured. The symptoms of the problem emerged on the Blackberry Server with worker threads failing health checks. After about 5 or 6 failed health checks, the Blackberry Service would restart and the issue would repeat. At firs the Blackberry device users were not receiving emails in a timely manner. At a certain point, they were not receiving emails at all.
I dove in head first to see what was going on. From the fact that that worker threads were failing health checks I came to the conclusion that the issue was on the Exchange Server side. Each thread issues Remote Procedure Calls to the Exchange Server and waits for these RPCs to terminate. The RPCs were timing out and as a result the health checks were failing.
On the Exchange Server, the RPC traffic was very high, with spikes maxing the server out. It was very evident that the Exchange Server was inundated with RPC requests. The irony of the matter is that the Blackberry Server was mainly responsible for those RPC requests. It is a common assumption that Blackberry Server traffic causes upward of five times the RPC traffic of a normal Outlook MAPI client. You can see where this can cause a vicious cycle.
My next step was to find out why the RPCs were not terminating in a timely manner. I looked at the RPC latency and that was high as well. I eventually narrowed down the problem to disk contention issues at the storage array on which the Exchange Server mail store databases were housed. The array attached to the Exchange Server was due for replacement soon anyway, so I decided to move all of the logs and databases to a super-fast SAN that was recently purchased for these types of servers. Suffice it to say that after the move to SAN, all of the issues were resolved. Because of the blistering speed at which Exchange could read from and write to the SAN, the RPCs were terminating in a timely fashion. I saw latency drop to single digit values from triple digit values. The Blackberry Server was no longer having worker threads fail health checks. Although the RPC traffic remained the same, the lower RPC latency improved overall performance by an order of magnitude.
I would like to note two things for Exchange or Blackberry administrators to be aware of here. The first is the very obvious point that a high-performance SAN greatly increases performance and reduces problems over all. The second is that when talking with Blackberry Enterprise support, the tech repeatedly tried to convince me that the Blackberry Enterprise Server did not have that great of an effect on an Exchange Server. He flatly denied the 5 fold increase rule, and he even said that it was a myth. He did hit the nail on the head when he suggested that the performance issue was on the Exchange Server side. I want to note, however, that when I turned off the Blackberry Enterprise Server to rule it out as a cause of the issue, the RPC requests dropped through the floor. The moral of the story is that although there were multiple reasons that contributed to the overall problems, never take the word of vendor support at face value. Always test every possible cause of the issue in a methodical fashion. You will probably find the root cause faster and end up much happier as an end result.