Notes from the lab: NetScaler VPX nsnet_connect prevents logon

When I started to rebuild my lab I came across the most strangest thing when configuring my NetScaler’s again. First a little background regarding my setup:

VMware ESXi 6.5u1 Hypervisors

NetScaler VPX 1000 Platinum Appliances

Distributed vSwitches with vlan trunks enabled

Dedicated NSVLAN for management (tagged)

Data transport vlan tagged

 

Whilst configuring and setting op the first and secondary nodes I’ve let the default appliance imports intact, that is 2vcpu and 2gb of ram and changed the E1000 nic’s to VMXNET3 and upgraded the VM compatibility format to the latest level. Nothing wrong here and started configuring both appliances with their NSIP’s respectively. Created the HA set and all was well.

 

Then it was time to put in the second nic which I’m going to use for my data transport with all vlan tagged interfaces and ip’s. Gave both appliances a shutdown and configured the nic’s accordingly (so it seemed at the time it was late 😊)

 

First node came back flawlessly but the second node wasn’t reachable anymore.. So put open the hypervisor console and I saw error messages regarding the nic and that the instance had crashed. When I would log in with the nsroot account I would get nsnet_connect prevents logon… Well ok.. that one was familiar to me with in mind the switch of E1000 and VMXNET3 devices (had this when upgrading a customer’s setup and that was the VM compatibility level, because you will need the latest build to be able to use VMXNET3, the default appliance level isn’t enough) but I’ve got both appliances up to date… I thought what the !%!@% and logged in with the nsrecover username to be able to login to the shell and dig in deeper. Thank god that worked and I was able to run the command ns_hw_err.bash which will check for any hardware error. And yes I instantly got the nic not present and reachable message. Looked at the configuration of the nic’s and a nice homer simpson moment the nic in question was still a E1000.. right… so turned it off and removed the nic, re-added it with the same MAC and presto all is well again.

 

Moral of the story double check your network settings when using VMXNET3!!!!

Notes from the field: XenMobile Certificate Based Authentication lessons learned

Throughout the XenMobile deployments with Certificate Based Authentication(CBA) I came across some items which I thought was worth mentioning.

1. CBA up until Secure Mail 10.6.20 / Secure Hub 10.6.20 was requesting new certificates on SSL exceptions, in effect the exceptions were triggered on every SSL connection error that occurred and thus requesting a new certificate from the PKI, this got resolved in version 10.6.20 by not using Java codes anymore but instead reading the NetScaler Gateway error code which gets presented to the client.

2. The PKI / Credential Provider settings configured with template, validity, CRL and renewal configured on the PKI server won’t work for CBA, this is because CBA is not a payload certificate but only a SIGN method. WiFi certificate which get pushed do honor the validity, renewal and CRL options.

3. With above actions you’ve might get a really large PKI environment which is not necessary and therefore maybe you would need to migrate to a new PKI server, this can be done side by side by creating a new PKI/Credential Provider and configuring those accordingly and migrate in a controlled fashion

4. You might see issued Certificates which aren’t valid anymore or revoked and those devices still get access to the MAM store, this is resolved when you apply CRL mandatory or OCSP mandatory see the following article for some more information regarding CRL: https://docs.citrix.com/en-us/netscaler/12/ssl/manage-cert-revocate-lists.html

Hope these lessons learned help and if there are any comments or questions please feel free to drop them here.

Notes from the field: Be Proactive! Apple ATS is coming

For those who are not aware Apple has an upcoming change regarding App Transport Security (ATS)
https://developer.apple.com/news/?id=12212016b
The date it should be in effect was originally January 2017… but was pushed back for migration purposes, and the new date is yet a mystery.

It will have impact! Be proactive and check your XenMobile / NetScaler environments:

– NetScaler 11.1 will be the preferred build for TLS1.2 and the ECDHE cipher suites
– XenMobile 10.4 RP4 and XenMobile 10.5 have the TLS1.2 and ECDHE cipher suites (plus ATS hotfix)

Once ATS is enforced, Apple will require at least one cipher suite enabled from a specific list of cipher suites. Apple supported ATS cipher suites are:
· TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
· TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
· TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
· TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
· TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
· TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
· TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
· TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
· TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
· TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
· TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

If SSL-Offloading is used in combination with XenMobile, remember that 11.1 is the preferred build.

https://docs.citrix.com/en-us/netscaler/11-1/ssl/supported-ciphers-list-release-11.html
https://docs.citrix.com/en-us/netscaler/11-1/upgrade-downgrade-netscaler-appliance/upgrade-to-release-11-1.html
https://support.citrix.com/article/CTX126793

Notes from the field: NetScaler SDX LACP Flapping issue

I came across a peculiar issue regarding a new NetScaler SDX 14020 setup in combination with a Cisco Nexus C9372-PX-E and C9336PQ infrastructure, a new buildup of the SDX/VPX with multiple HA instances spinning and a working environment. LA sets configured for HA probes and everything nice and easy separated through vlan access. Long story short, at first it looked like a bug regarding the combination of NetScaler and Cisco: https://support.citrix.com/article/CTX215720 and created an support case with the follow ups with it, afterwards it seemed that the untagged management vlan setup was overlapping from data channels and the root cause for this was at the Cisco ACI side of things, the EPG(EndpointGroup) and BridgeDomain were overlapping in that case. The solution was to create a new and dedicated EPG/BridgeDomain for the data channels of the NetScaler.

So lessons learned:

  • Double check the setup of the ACI even if you get the “yes it’s correct” statement from your customer

Notes from the field: XenMobile CBA didn’t I revoked that cert?

Just to start it off I’m assuming that the following is in place and fully configured and you are familiar with these concepts:

– XenMobile 10.x cluster (XMS)
– Active Directory (AD)
– Active Directory Certificate Services (ADCS)
– Active Directory Certificate Template(s)
– NetScaler Gateway (NSGW)
– Certificate Based Authentication (CBA)

Which all of them are combined in a XenMobile deployment which is configured to use CBA as an enrollment requirement.

I came across a limitation/by design issue in conjunction with the web enrollment of ADCS that XMS cannot solve, meaning that enrollment and requests for the first time will work just fine but when you revoke or selective wipe a device/user and the latter enrolls again you will get a cached certificate from XMS (you say what…) Revocation in XMS will work just fine but not at this point because according to support the API used in ADCS is not capable of doing a revocation, and basically XMS is using the web-enrollment for this and relying on that.

If you want to check it just enroll a user with the above setup and check for yourself, user gets revoked, you revoke the user certificate in ADCS and enroll again and you will see the cached certificate being issued from XMS (and no new issued certificate from ADCS)2016-10-30-15_51_12-xenmobile-internet-explorer

But there is a workaround/solution for this, query the XMS database for this certificate and select the user certificate to delete..
The following query will give you the certificates which are present on XMS
Select * from dbo.keystore where name like ‘%ag%’

To delete the certificate you execute this query with your ID (in my case 22)
Delete from dbo.keystore where id=22

After this the cached gateway certificate is deleted and with a new enrollment you will also get a new certificate.

UPDATE:

When combining the above with a CRL or OCSP integration on the NetScaler this will give an automatic renewal request for the device, meaning no manual action needed anymore. This seems to be a builtin behaviour client side (Secure Hub) see the following article for more information: https://docs.citrix.com/en-us/netscaler/11-1/ssl/manage-cert-revocate-lists.html

Notes from the field: Netscaler Insight Centre not showing data

I’ve come across an issue regarding the Netscaler Insight Centre were data is not showing all the time, at random it just fails on reporting and shows nothing. It seems that after a talk with support there is memory corruption occuring when the usage of insights memory is above 75%.

Resolution shall be active in the 11.0.67.x release of the product.

Notes from the field: Netscaler Insight Centre

I came across an issue with Netscaler Insight with the latest build for Netscaler 11 and the same for Insight, logging did not reach the appliance regarding GUI flowcharts, we did see traffic generate from and to the Insight centre but no updates in the GUI screen, after some digging around and reporting this with Citrix it’s an bug regarding the Integrated Caching feature, this needs to be disabled otherwise it won’t work at all! ok.. that’s nice.. permanent fix is yet to be developed.