Notes from the field: Citrix NetScaler, partitions, performance, and pain

On a recent joint project with my partner in crime Anton van Pelt there was a long outstanding support issue which needed our dedicated attention. Long story short we have a customer in a nice new greenfield which got migrated from F5 to NetScaler and the introduction of Citrix Gateway and migrated the backend to admin partitions in favor of requirements stating so. To a maximum we would see 50/70 partitions for that setup.

Throughout the project we encountered three issues:

Citrix NetScaler Console (former ADM Service) peaked out the default 512mb logging in free tier, this solution is for the monitoring/alerting part of the setup
NetScaler appliances went ballistic and not responding quite frequent
Admin partitions would take about a total of 4GB in VAR space

From a troubleshooting perspective we did what we do, dissect, log, open a support case or three and dived in for a solution.

#1 NetScaler Console and the peaking of 512MB usages of free tier

This one got resolved quite fast because underneath it all, every admin partition is a configuration set and will take some storage, ok, fair enough. The total of 70 admin partitions would take some storage but the expectance was not that much. The latter issue would be that #2 which I’ll come to next is that even when using and external console agent (ADM agent) if you earlier would have been using built-in agent of the NetScaler configured you would end up in that one still being active without knowing it! When we got that resolved the storage decreased a bit that it would be under the default 512MB usage

#2 NetScaler appliances going ballistic and not being able to use it at a GUI perspective and/or being extremely slow in the works. This was to the fact that MASTOOLS on the built-in agent and the external agent running and probing at the same time doesn’t work quite that well. This seems like a bug that needs to be resolved in any case that if you are using built-in and migrate for more a feature set of NetScaler Console (ADM Service) that the predecessor gets disabled correctly. If you are in the same situation do the following:

SSH/PUTTY to primary/secondary nodes if applicable and make sure that the agent.conf file isn’t present anymore!

cd /var/mastools/conf
rm agent.conf
cd /var/mastools/scripts
mastoolsd stop
ps -awx | grep mas
mastoolsd restart

And from a GUI perspective disable the adm connect part and make sure the checkmark isn’t enabled to use the built-in variant. After completing these steps and rebooting nodes to make sure CPU spiking isn’t the case anymore and performance is stable.

#3 Admin partitions that blow up to about 4GB in VAR, well this one isn’t solved and still have a support case for that for what would be the correct way in handling this moving further. Is it the addition of a second disk to make VAR bigger? Or just extend the current appliance disk? Well, that one I’ll give the answer if I have one. For now, in case, you also encounter such a scenario look at the reporting feature that is by default enabled for the base partition and all new partitions. This is for the GUI perspective of reporting and something that we would need in the long run. Workaround in such a scenario is disabling this feature just so that you can clean up the so-called collector/pdb files (nscollectRRD) which hog all the space. Afterwards you can upgrade the appliances with a new firmware and enable the feature again, but keep in mind the diskspace usage would blow up again.

Two out of three isn’t that bad, when we get case#3 resolved as well I’ll give an update to the blog.

Hope it helps!