Recently, we encountered a very strange issue: our client was reporting that their Dynamics GP application on a Citrix server was running extremely slowly. The client has two Citrix servers running Windows 2012 R2, one on each of two hosts in a two-node VMware cluster.
One VM is a clone of the other, and the two hosts are identical—same age, brand, patch level, etc. However, the client reported that the slowness was experienced only on one of the Citrix servers, not the other one. Another item of note was that this issue seemed to affect only the GP applications.
Conventional wisdom told us that, “Aha!, this much be an application issue.” But how can that be, when it’s running fine on one VM but not on the other one?
We were going through all the troubleshooting tips we could think of to try to identify the issue, all to no avail.
One of our colleagues suggested we install WireShark, a packet sniffing tool, to see what’s going on at the network layer. This is a not typical task we perform, as a network sniffing tool could potentially slow down the server even further. We did that anyway out of desperation.
To our surprise, by running WireShark, the GP performance, rather than go even slower, returned to normal. The status did not change even we stopped the packet capturing, until we rebooted the VM. When this happened the very first time, we thought it was a fluke and didn’t pay much attention.
When it happened the second time, we started to realize that WireShark in fact made some change to the TCP/IP stack of the OS when starting its packet capture program. That action, in fact, fixed the GP slowness issue.
Further investigation indicates that features included in Windows 2012 R2 server to offload certain TCP/IP tasks from the VM to the host’s TCP/IP stack was, in fact, slowing down network performance, not increasing it as intended.
The fix was to disable all these new features that Windows 2012 R2 offers.
The moral of the story:
- The latest is not necessarily the greatest.
- Always perform a load test or a real world test before putting new servers in production. This client suffered this issue because they were forced to use the new version of their VMs before a full test could be completed.
- Last but not least: Troubleshooting is really an art and requires attention and intuition.