Notes from the lab: vRealize Log Insight Cluster Upgrade 1-2-3

I must say I’m very impressed by the simpleness and stability at how VMware put the upgrade process in place for vRealize Log Insight.

First a little bit of a background of my deployment:
– Three node vRealize Log Insight 4.6.0 cluster
– Integrated Load Balancer (ILB) configured
– vSphere 6.7 as hypervisor platform

I had the deployment running for a while and saw that 4.6.1 was available. Simple as that downloaded the upgrade .pak file from myVMware and logged in to my Log Insight cluster address, started the upgrade and got prompted to redirect to the master node for the upgrade progress, and simple as that nothing else to do! Either it works and every node will get rebooted automatically or it will fail and rollback all nodes.

For reference I’ve taken some screenshots of the process:

Hope this helps.

Notes from the field: Ghost NIC on VMware

Quite recently I’ve encountered an issue/question at a customer which complained that two virtual machines had ghost NIC’s attached. Well it doesn’t always have to be hard in our line of work 😊, after a quick look it was clear that there were snapshots in place for those VM’s with deleted old NIC’s attached.

Removal of the snapshot and the NIC’s were no more.

See the following reference screenshot of the ghost NIC and the distributed port group NIC:

 

Hope this helps.

Notes from the field: NetScaler VPX & Intel Xeon Gold

Quite recently I came across an issue when deploying a VPX instance on VMware 6.5, which resulted in a bug of the VPX image and underlying physical hardware.
For reference the following hardware was backing the hypervisor:
Supermicro SYS-2029U-E1CR25M
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
VMware ESXi, 6.5.0, 7967591 with vSAN
NetScaler VPX 12.0 57.24nc

When deploying the VPX appliance it will get the default VM version 7 which needs to get upgraded to VM version 11/13 to support VMXNET3 NIC interfaces, well easily said and done configured the setup and booted the appliance and got stumped with the following error:

right.. that’s a beauty.. after some troubleshooting and migrating the host to another host/cluster it started booting. Migrated it back and crashes.. reimported the appliance let it stay at hardware level 7, and upgraded it to 11 and still works, went to 13 again and crash.. Ok so this is not a user error :).

Logged a case with Citrix support and after some time got the reply this is in regards to a known issue with Intel Xeon Gold processors. The resolvement should be in NetScaler 12.1 build which is expected to release Q3 this year.

For now the workaround is to keep it at level 11, as a reference case# 77116209 can be used to log the same if it applies to your setup.

UPDATE:

I’ve gotten an update from support that this will be resolved in the upcoming 12.1.49.x release for VPX and NMAS scheduled end August this year.

Notes from the field: NetScaler maxloginattempts

Came across a very peculiar issue at a customer in regards to the values:

  1. Max Login Attempts
  2. Failed Login Timeout

As soon as a value has been put in you could not reset it to the default value of 0, not from the GUI or CLI it would just not accept it as a value

After a support case and some days further the solution is maybe simple but I didn’t had it in mind, the simple unset command on the vpn vserver in regards to the maxloginattempts resolves it.

Hope it helps you out as well.

UPDATE: https://support.citrix.com/article/CTX234177 got published for this

Notes from the field: uberAgent to the rescue!!

We all know it, the once in a while “it’s slow logging  on..” and then it gets dropped at the escalation desk for a resolution. So I got the call for troubleshooting this issue. Since I knew from previous experiences that uberAgent is the troubleshooting tool you will want for this I contacted them and requested the consulting license at https://uberagent.com/ (thanks to Helge Klein) did the installation of Splunk / Uberagent and got myself a monitoring baseline to work with. A little background on the setup:

  • vSphere 6.0
  • XenDesktop 7.15 / MCS – Windows 8.1 & Windows Server 2012 R2
  • RES WorkspaceManager 10.1.300.1

The problem was at times users would have a profile initialization of 90 seconds! and at times the user shell would hang..

After a period of two weeks I would have my baseline with uberAgent and filtered out that this would be random very early start of the day or just after break time. No funny business whatsoever in the environment and no lack of resources e.g. iops or cpu/memory exhaustion, drilling down in some user trending with uberAgent I came to a somewhat recurring user base that experienced the issue. Ok! That helps and after that I could reproduce it with the useraccounts in question displaying the following screen:Dropped this in the resrockstars.slack.com group and got a quick reply from Dennis van Dam in regards to traceviewer and came to the following:This in turn pointed me out to the following support article:Problem resolved and a happy customer! Hope this helps you out as well.

Reference article:

HOWTO: Create a trace file

Notes from the lab: NetScaler VPX nsnet_connect prevents logon

When I started to rebuild my lab I came across the most strangest thing when configuring my NetScaler’s again. First a little background regarding my setup:

VMware ESXi 6.5u1 Hypervisors

NetScaler VPX 1000 Platinum Appliances

Distributed vSwitches with vlan trunks enabled

Dedicated NSVLAN for management (tagged)

Data transport vlan tagged

 

Whilst configuring and setting op the first and secondary nodes I’ve let the default appliance imports intact, that is 2vcpu and 2gb of ram and changed the E1000 nic’s to VMXNET3 and upgraded the VM compatibility format to the latest level. Nothing wrong here and started configuring both appliances with their NSIP’s respectively. Created the HA set and all was well.

 

Then it was time to put in the second nic which I’m going to use for my data transport with all vlan tagged interfaces and ip’s. Gave both appliances a shutdown and configured the nic’s accordingly (so it seemed at the time it was late 😊)

 

First node came back flawlessly but the second node wasn’t reachable anymore.. So put open the hypervisor console and I saw error messages regarding the nic and that the instance had crashed. When I would log in with the nsroot account I would get nsnet_connect prevents logon… Well ok.. that one was familiar to me with in mind the switch of E1000 and VMXNET3 devices (had this when upgrading a customer’s setup and that was the VM compatibility level, because you will need the latest build to be able to use VMXNET3, the default appliance level isn’t enough) but I’ve got both appliances up to date… I thought what the !%!@% and logged in with the nsrecover username to be able to login to the shell and dig in deeper. Thank god that worked and I was able to run the command ns_hw_err.bash which will check for any hardware error. And yes I instantly got the nic not present and reachable message. Looked at the configuration of the nic’s and a nice homer simpson moment the nic in question was still a E1000.. right… so turned it off and removed the nic, re-added it with the same MAC and presto all is well again.

 

Moral of the story double check your network settings when using VMXNET3!!!!

Notes from the field: XenDesktop RemotePC and Multi Licensing

Recently I got involved at a customer location which was going to use Remote PC catalogs in combination with their XenDesktop / XenApp 7.15 environment. This was no problem whatsoever to configure but on closer testing I encountered a bug that when you create for example a delivery group called “Windows 10 Remote PC” and adding more than one desktop the second, third and so on would get the published name of the local computer name e.g. WSDELL34951 which doesn’t comply with a standard name. The following can be observed for the delivery group name:Normally you would see at “PublishedName” an empty value, to correct this take a note of the “Uid” number and put in the following command:In this case my id was 4, and voila this will correct the name in StoreFront like in the following screenshot:

For the Multi Licensing part this needs to be done at the same level in powershell, see the following article:Multi-type licensing

In the previous screenshot you will see:

“LicenseModel” & “ProductCode” these need to be compliant with their respective edition of XenApp or XenDesktop license model, management is then per delivery group and not applicable for the entire site anymore. This would be a default for every new delivery group that will be created unless like in above screenshot you will add the “LicenseModel” & “ProductCode”

 

Hope this helps!

Notes from the lab: Exchange Server 2016 CU6 broken by default??

I came across the most peculiar issue I’ve seen so far with Exchange 2016.
Installed a greenfield setup and the ECP/OWA page was broken by default with the following entry in event viewer:
——————————————————————————————————————————————————–
Event code: 3005
Event message: An unhandled exception has occurred.
Event time: 9-9-2017 22:26:57
Event time (UTC): 9-9-2017 20:26:57
Event ID: 53b3f1166cb147408cb97bc79483c3f5
Event sequence: 2
Event occurrence: 1
Event detail code: 0

Application information:
Application domain: /LM/W3SVC/2/ROOT/owa-4-131494624100042355
Trust level: Full
Application Virtual Path: /owa
Application Path: C:\Program Files\Microsoft\Exchange Server\V15\ClientAccess\owa\
Machine name: EX01

Process information:
Process ID: 7756
Process name: w3wp.exe
Account name: NT AUTHORITY\SYSTEM

Exception information:
Exception type: TargetInvocationException
Exception message: Exception has been thrown by the target of an invocation.
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at Owin.Loader.DefaultLoader.<>c__DisplayClass12.<MakeDelegate>b__b(IAppBuilder builder)
at Owin.Loader.DefaultLoader.<>c__DisplayClass1.<LoadImplementation>b__0(IAppBuilder builder)
at Microsoft.Owin.Host.SystemWeb.OwinAppContext.Initialize(Action`1 startup)
at Microsoft.Owin.Host.SystemWeb.OwinBuilder.Build(Action`1 startup)
at Microsoft.Owin.Host.SystemWeb.OwinHttpModule.InitializeBlueprint()
at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)
at Microsoft.Owin.Host.SystemWeb.OwinHttpModule.Init(HttpApplication context)
at System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers)
at System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context)
at System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context)
at System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext)

Encryption certificate is absent
at Microsoft.Exchange.Security.Authentication.Utility.GetCertificates()
at Microsoft.Exchange.Clients.Owa2.Server.Core.notifications.SignalR.SignalRStartup.Configuration(IAppBuilder app)

Request information:
Request URL: https://localhost:444/owa/exhealth.check
Request path: /owa/exhealth.check
User host address: 127.0.0.1
User:
Is authenticated: False
Authentication Type:
Thread account name: NT AUTHORITY\SYSTEM

Thread information:
Thread ID: 25
Thread account name: NT AUTHORITY\SYSTEM
Is impersonating: False
Stack trace: at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at Owin.Loader.DefaultLoader.<>c__DisplayClass12.<MakeDelegate>b__b(IAppBuilder builder)
at Owin.Loader.DefaultLoader.<>c__DisplayClass1.<LoadImplementation>b__0(IAppBuilder builder)
at Microsoft.Owin.Host.SystemWeb.OwinAppContext.Initialize(Action`1 startup)
at Microsoft.Owin.Host.SystemWeb.OwinBuilder.Build(Action`1 startup)
at Microsoft.Owin.Host.SystemWeb.OwinHttpModule.InitializeBlueprint()
at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)
at Microsoft.Owin.Host.SystemWeb.OwinHttpModule.Init(HttpApplication context)
at System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers)
at System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context)
at System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context)
at System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext)

Custom event details:
———————————————————————————————————————————————————
After some digging I came across this blog: https://justaucguy.wordpress.com/2014/12/01/exchange-2013-cu6-owa-something-went-wrong/ and https://blogs.technet.microsoft.com/rmilne/2017/06/27/exchange-2016-cu6-released/
the first one mentions of replacing the sharedwebconfig, which wasn’t my error but tried it anyway without any change, and the other triggered me with certificates… okay I checked them via the Exchange Management Shell and also there no issues..

Finally I got the bugger in IIS, it appears that a wrong certificate got bound at installation (yeah two clean servers and even some re-runs in other lab setups give me the same) but the solution was to unbound the certificate it had and bind the Microsoft Exchange Server Auth Certificate and do a IISreset.

Problem was instantly solved in my case. (the second blog above mentions that in an upgrade scenario the Microsoft Exchange Auth Certificate could get deleted so beware!!)

See the following reference regarding the binding in IIS:

Hope this helps!

 

Notes from the lab: Windows Server 2016 black screen when launching any application

I came across an issue in my lab environment where the screen will go black while launching a session on Windows Server 2016. This is with XenApp/XenDesktop 7.15 LTSR
The following registry entry: DisableLogonUISuppression (D WORD Value 0) did not resolve the issue as stated in the following articles:
https://support.microsoft.com/en-us/help/4034661/windows-10-update-kb4034661 and https://support.citrix.com/article/CTX225819

Ultimatly after some trial and error the deletion of all subkeys from below registry entries resolved it:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\Connectivity
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\Configuration