Extremely slow app installs: SCCM OSD task sequence

Issue

A client engaged WME for assistance troubleshooting an issue where their image deployments were taking over four hours as part of an overall SCCM health assessment. The IT folks said the majority of the time elapsed during the Install Applications step of the OSD task sequence, which was quickly confirmed to be the case. They informed me that the problem had started around the time they implemented the Cloud Management Gateway and switched to https for all SCCM client communication. The latter turned out to be a red herring that consumed some troubleshooting time. Secure client-server communication was configured correctly and functioning normally as was the CMG. As a reference point, the client upgraded from CB 1710 to CB 1803 during the engagement.

Now, we know that installing applications via task sequence can be problematic. The blogosphere has plenty of opinions about not using this method, with some suggesting that one should use packages/programs for task sequences and applications for everything else. That approach is not ideal in that it requires double work for any application one wishes to deploy during imaging. Another approach is to deploy apps to collections and allow them to install after a machine is imaged. This is a sound approach but in this case the client wishes to deploy certain apps during OSD so a root cause analysis was necessary.

Live Microsoft Message Analyzer traces during image deployment did not reveal anything noteworthy – and we did import the server’s certificate to decode frames, a process worthy of its own blog which thankfully others have already obliged – other than it didn’t seem there was much communication between the laptop being imaged and the distribution point while the install application steps were running with 17 apps in the task sequence. As is often the case, log files started to reveal some clues but even then it can take a bit of deciphering and a secret decoder ring.

Clue #1

The CAS.log file showed a long list of content sources, only two of which referred to a distribution point while the rest were peers. This is not a bad thing, it illustrates that peer caching is working as advertised. This screenshot of the peer cache dashboard in SCCM shows that a large percentage of content is being delivered by peers. The top application is obfuscated on purpose; the others are all Microsoft content.

The following snippet from the CAS.log file shows that there are 43 available sources for content. The first several are shown, and again, only the last two (not shown) reference a distribution point. All others are peers:

Clue #2

The ContentTransferManager.log file illustrated what was occurring under the covers: the client was attempting to connect to each source in the DP list provided by the management point (shown in the CAS.log file) in sequence but was timing out on each until finally landing on the one good source: the distribution point. This was taking upwards of 15 minutes per application, not counting actual installation time: 17 apps times 15 minutes = 4.25 hours.

Note the start time in the first entry and the completion time in the last entry below…some 15 minutes transpire between the first connection attempt and the successful completion of content transfer from the DP.

Root cause

In short, peer caching was the culprit, or rather something to do with peers included in the SuperPeers table that are not available but the client insists on trying to use them for content. Some quick DNS queries to identify the subnet that the peers were located on revealed something interesting: a number of the peers were VPN clients connecting from elsewhere via the Internet! (included in recommendations to the customer to exclude VPN clients from peer caching). This is notgood, but what was also interesting was that it appeared that none of the 41 peers, even the ones on the same local network, were communicating. Indeed, we were seeing connection errors in other logs.

Solution

Given that we had to provide a solution in a limited amount of time, the client opted to disable peer caching for the time being to see if things improved. Alas, things did not improve! Why?

Answer: disabling peer caching in the SCCM client settings does not switch it off immediately. Clients must notify the management point to remove them as a content source. Until then, the SuperPeers table in the database is still populated and clients will continue to attempt to use peers for content. A quick SQL query revealed 1,143 entries in the SuperPeers table in the CM database.

As is often the case, I am exceedingly grateful to my esteemed colleagues in the blogosphere, the majority of whom I have never met. In this case, this blog: The strange case of Peer Cache not getting disabled details how to create a device collection with a script associated to it that tells clients to notify the management point to remove them as a content location. That would be the recommended and least risky approach. The same blog also shows a quick and dirty approach to purging the entries from the two tables in question: SuperPeerContentMap and SuperPeers.

WARNING: insert whatever “use at your own risk” verbiage works for you here. Oh, and before you do *anything* directly in the database EVER *always* perform a backup (yes a real time backup right before you start mucking around in there).

I can neither confirm nor deny which approach was taken in this case. I can, however confirm that the result was success! The slowness issue disappeared and the OSD task sequence buzzed through application installations as expected.

Now we cannot just leave things hanging with peer caching disabled, so check back for part 2 of this series on peer caching for recommendations on optimizing peer caching and when not to use it.[/vc_column_text][/vc_column][/vc_row]

Share:

Facebook
Twitter
LinkedIn

Contact Us

=
On Key

More Posts

WME Cybersecurity Briefings No. 007
Cyber Security

WME Security Briefing 27 April 2024

Critical Security Advisory | US Federal Agencies Ordered to Remove Suspect Foreign Software Overview The latest guidelines from the US Cybersecurity and Infrastructure Security Agency (CISA) insist that federal agencies must identify and remove software products linked to

Read More »
WME Cybersecurity Briefings No. 006
Cyber Security

WME Security Briefing 22 April 2024

Critical Update on FISA Section 702 Reauthorization Overview The expiration date of Section 702 of the Foreign Intelligence Surveillance Act (FISA) is near. So, Congress is looking to reauthorize crucial US spy programs. The provision is

Read More »
WME Cybersecurity Briefings No. 005
Cyber Security

WME Security Briefing 15 April 2024

E-Commerce Security Alert: Unveiling Magecart’s Persistent Backdoor Overview Malicious activities by Magecart attackers have been reported. They are targeting Shopify’s content delivery network (CDN) by creating fake Shopify stores. The backdoor method has enabled them to

Read More »
WME Cybersecurity Briefings No. 004
Cyber Security

WME Security Briefing 11 April 2024

Mispadu Trojan Exploits Windows Vulnerability to Target Financial Data Overview The Mispadu banking trojan has intensified its operations as it’s exploiting an already patched Windows SmartScreen flaw. Since its initial identification in 2019, Mispadu has primarily preyed on

Read More »
Be assured of everything

Get WME Services

Stay ahead of the competition with our Professional IT offerings.

=