With a single click, a user can destroy Google Analytics data: Moving from an AMP page to the main site or the main site to a payment processor can turn one visit into multiple sessions, mucking up source data along the way.
Critically, those clicks often happen at high-value transition points—from anonymous visitor to logged-in user or from the pre- to post-purchase moment.
Session stitching repairs technical fault lines, preserving clean analytics data and rescuing attribution information. This post covers four common use cases:
- User ID tracking
- AMP tracking
- Subdomain tracking
- Cross-domain tracking
What is session stitching?
In Google Analytics, session stitching connects user activities that occur within a single session but—because of technical tracking limitations—incorrectly generate multiple sessions.
Any effort to stitch sessions hinges on two elements, which Simo Ahava details:
- A tracker object that tracks to the same Google Analytics Property ID (UA-XXXXXX-Y)
- A _ga cookie that has the same Client ID
Modern browsers do not allow sites from one domain to share cookies with another domain. Overcoming that limitation is central to patching together sessions that otherwise rip apart.
The term “session stitching” is used by digital marketers more often than Google, which, most recently, has preferred two other terms:
- Session unification. “Session unification allows hits collected before the User ID is assigned to be associated with an ID.”
- Site linking. “Cross domain tracking makes it possible for Analytics to see this as a single session by a single user. This is sometimes called site linking.”
Notably, the Google Analytics support article on session unification retains the phrase “session stitching” in image alt text and titles (for the curious); a stray reference to session stitching also appears in the support article for the Google AMP Client ID API.
In this article, I use session stitching is an overarching term that includes all efforts to correctly group user activities that occur within a single session. As Yehoshua Coren notes, the underlying technical reality is more important than the terminology: “It’s really more than that. It’s a clientId integrity issue.”
The challenge—no matter what you call it—impacts data quality and attribution for almost every site.
Who should care most about session stitching?
Session stitching is useful for every site but essential for a few:
- Sites with logins. Sites with logins rely on “session unification” to gather data on events that lead to a user login.
- AMP-heavy sites. Proper tracking preserves attribution data when users migrate from an AMP page to a locally hosted non-AMP page.
- Large, multi-domain organizations. Multiple domains require cross-domain tracking to preserve attribution information as users migrate across domains (or subdomains).
- Sites with third-party payment processors. Without session stitching, sites that rely on third-party payment processors may lose attribution data for ecommerce conversions.
- Sites that use social logins. Like third-party payment processors, social logins can incorrectly reclassify post-login users as referrals from the social network.
- Sites with iframe forms. Iframes embed a cross-domain tracking challenge within a page on your site.
1. User ID tracking
For sites with a log-in feature, User ID tracking connects multiple visits over time—allowing a business to see, for example, which SaaS features resonate with customers.
Session unification joins the pre-login activity with the post-login User ID—creating a single session from the two. That way, you can see which behaviors precede a log-in, something especially valuable if that login represents a point of conversion.
Thus, instead of capturing only part of the second session (first image), session unification joins pre-login hits (white) with post-login hits (blue):
Importantly, session unification connects hits only if “those hits happen within the same session in which a specific ID value is assigned for the first time.” In other words, it includes data from the session that precedes the login—not previous sessions.
Google Analytics applies session unification during daily analytics processing—“at 5am each day, based on the western most timezone selected in any reporting view that is associated with the property.”
That time lag can lead to “higher direct sessions and direct revenue during intra-day dates because [. . .] campaign referral information is sent during the first hit of a session where the user has not yet logged in.”
By default, session unification toggles On when you set up User ID tracking. Why would you ever want to turn it off? I asked Silver Ringvee, one of our Speero agency analysts. While he didn’t see an obvious use case, he speculated on a potential one:
There could be some occasions, though, where you really want to focus on the actual logged in users (and not users that logged in at some point of the journey). So, if you don’t care about what happened before they got the ID, you might want to turn it off.
You can turn off session unification in Admin > Property > Tracking Info > User-ID:
While User ID tracking is most relevant for sites with expected logins (e.g. SaaS, ecommerce), there are other ways to incentivize login. Sites like Quora and Glassdoor gate high-value content behind log-in walls, for example.
For those login types, session unification delivers important data on the most engaging content—the answers or articles that catalyze logins and signups.
2. AMP tracking
Google’s AMP rollout created tracking issues: AMP clicks from search take users to the “AMP on cache” version, which is hosted on Google’s CDN.
As Perficient’s Eric Enge told me, “A lot of people still don’t get this correct. The subtleties of tracking cross domains (from the Google cache to your actual domain) are lost on most publishers.”
Ultimately, users can access AMP pages in one of three ways; each impacts where the Client ID is stored:
- Google Search. AMP page is accessed via a Google Search result and displayed inside an “AMP viewer.” The Client ID is stored on google.com.
- Proxy/Cache. AMP page is accessed from a proxy/cache. The Client ID is stored on cdn.ampproject.org.
- Direct AMP. AMP page is accessed directly on the publisher domain. The Client ID is stored on the publisher domain.
In the first two cases, a click to another page on the publisher’s site from the AMP page generates a referral and a new session—rather than counting the click as the second interaction in a single session.
Left unmanaged, the resulting analytics data suffers from several issues:
- Inflated session counts
- High bounce rate on AMP pages
- Low pages per session/session duration for AMP pages
As with other session stitching issues, the solution is to pass the same Client ID between pages on different domains, which Google makes possible via the AMP Client ID API.
How to use the Google AMP API to pass the same Client ID
Setting up AMP tracking has two steps: Analytics code changes and Referral Exclusions.
1. Analytics code changes. Proper AMP tracking starts with small additions to the Google Analytics code on AMP and non-AMP pages. Google provides details on how to make changes for analytics.js, gtag.js, and Google Tag Manager.
Because some browsers refuse third-party cookies, Google announced the AMP Linker in September 2018, which decorates URLs with Client ID information, bypassing the cookie-based limitation. AMP Linker does not require additional setup if you’ve already enabled the Google AMP Client ID API.
2. Referral Exclusions. Additionally, you need to add ampproject.org as a Referral Exclusion. If you serve AMP content from multiple subdomains, Google recommends adding a Referral Exclusion for each.
As Enge details, the current solution isn’t perfect: “You can’t see the difference between your own hosted AMP pages vs. Google CDN hosted pages.”
That limitation affects sites with “canonical AMP pages”—AMP pages hosted on the publisher domain that are the standard (canonical) version of the mobile page. A solution, the same article offers, is to create a Hit-level custom dimension.
After initial installation, the changes will affect near-term Google Analytics data:
- Total users and sessions will decline. Stitching AMP and non-AMP sessions will combine wrongly separated users and sessions.
- Related metrics will become more accurate. Bounce rate on AMP pages, for example, will drop.
- New users will rise. The Google AMP API makes a one-time reset of the Client ID for AMP visitors. As Google notes: “Depending on the frequency with which users visit your site(s), this could cause a noticeable, temporary fluctuation in your New Users metric and related reporting.”
3. Subdomain tracking
Subdomain tracking has gotten considerably easier and relies on a setting for the Cookie Domain. Previously a manual step, setting the Cookie Domain (cookieDomain) to “auto” is now the default option in Google Analytics scripts and the Google Analytics Settings variable in Google Tag Manager.
Simo Ahava explains that setting the Cookie Domain to “auto” applies a recursive algorithm that
tries to write the cookie, starting from the most generic domain-level (the top-level domain), and stopping once it succeeds. What should be left is the root domain, and thus the cookie will be available to all subdomains.
Because the algorithm sets the cookie at the highest possible level (the root domain), a user who lands on a subdomain and later migrates to the core domain won’t generate a new Client ID—or initiate a new session.
A second step is to add your root domain to the Referral Exclusion list so that visits between subdomains and the core domain don’t initiate new sessions. (The first step ensures only that Google sees the visitor as the same user.) Google automatically adds the root domain to the Referral Exclusion list when you create your Google Analytics property, but the setup is worth double-checking.
In theory, these updates automate subdomain tracking—the Cookie Domain and Referral Exclusion lists are set, by default, to the correct values.
4. Cross-domain tracking
Cross-domain tracking is the most complex of any session-stitching process because many solutions are bespoke: Proper implementation depends on the setup of your site, payment processor, log-in tool, or—Lord help you—iframe.
If multiple sites share the same tracking code but no other technical changes are made:
- Analytics will duplicate sessions between domains (since the Client ID won’t transfer from one to the other).
- The original attribution information will be lost, converted into a referral from the other domain, which, since it shares the same tracking code, will appear as a self-referral.
As with AMP tracking, successful cross-domain tracking requires passing the Client ID from one site to another without passing the cookie itself. There are several core use cases, each with unique solutions.
Intra-company cross-domain tracking
Large organizations often manage several domains but want to track visitors as they move from one to another. Assuming the sites share the same Google Analytics code, tracking users across multiple domains has three additional steps.
The first two steps alter the tracking code to allow domains to pass and receive client IDs via links:
- Auto Link Domains. Add all domains as a comma-separated list within the Google Analytics Settings variable in Google Tag Manager or amend your Google Analytics code to include those domains.
- allowLinker. To ensure domains can receive Client IDs passed via links, add a field in the Google Analytics Settings variable in Google Tag Manager named “allowLinker,” and set the value to “true.” (If the user flow is one directional, you need to allow the linker only on destination—not source—domains.)
The linker appends a timestamp and other metadata to validate the Client ID, which reduces the likelihood that a shared link with the Client ID affects Analytics data.
The final step is to add all domains to your Referral Exclusion list. Otherwise, you’ll generate mountains of self-referrals—Google Analytics will correctly recognize one user between domains but will still generate a new session.
To analyze data gathered from cross-domain tracking efficiently, prepend the hostname to the URL path. Otherwise, paths shared by multiple domains will be grouped together. Both URLs below would appear only as /about-us/ in page-level reports:
You can prepend the hostname by setting up a custom filter with the following values:
(If you’re trying to rescue historical data that wasn’t properly filtered, you can use a secondary dimension with the hostname to differentiate URLs in a view.)
Third-party payment processing
For third-party payment processing, a correct setup is vital: Without it, you’ll lose attribution data for all transactions, which will show up as referrals from the payment processor. However, you have limited control over the payment processor’s page.
One solution is to set up a Referral Exclusion for the domain of your payment processor; however, that effort—a manual one—could become a whack-a-mole task if:
- You work with many payment processors.
- Payment processors frequently change domains.
- Excluding a proccesor’s domain also risks excluding “real” referral traffic (e.g. you get referral visits from a link on PayPal’s blog).
Ahava details a creative, comprehensive solution: creating a Referral Exclusion for all traffic to your receipt or “thank you” page. The Referral Exclusion preserves the original source data and prevents Google Analytics from generating a new session when users return to your site from the payment processor’s domain.
Implementing Ahava’s solution has two steps:
- Modification of the tags that fire on the thank-you page. For any tag that fires on your thank you page, set the “referrer” field to the recently created variable.
A blanket ban on referrals to a given page may seem risky, but thank you pages are accessible (or should be accessible) only within the checkout funnel—no one starts their user journey on the thank you page—so there’s no risk of losing valuable source data.
Social logins can’t rely on a blanket domain Referral Exclusion—while a Google login may come from accounts.google.com (a subdomain you could safely exclude), others, like Facebook, come via facebook.com, and almost every site has non-login referral traffic from Facebook.
A common solution is to open the authorization in a new tab or window, which maintains continuity in the session on your site. However, ad blockers may interfere with this process, or you may prefer—for the sake of user experience—not to open a new window.
Another solution—much like Ahava’s strategy for thank you pages—is to override or ignore referrer information for the post-login page hosted on your site. Setting the referrer value to your own domain or “null” ensures that the source registers as Direct, thereby preserving the single session. The strategy works only if the post-login page has a unique URL.
Iframes are a challenge for session stitching, in part, because iframe content typically loads before Analytics tags fire. That means traditional tracking solutions—like appending the Client ID to the URL—require adjustment, as Google’s Developer Guide details:
To solve this problem you can configure the page inside the iframe to delay creating its tracker until after it receives the client ID data from the parent page. And on the parent page you configure it to send the client ID to the iframe page using postMessage.
Painful as cross-domain tracking on iframes can be—Ahava refers to them as “untrackable little shit-monsters that exist in the void between websites”—they’re (too) often used in web forms by vendors who focus more on moving form data into a CRM than making those interactions trackable in Google Analytics.
Bounteous explains the process for using postMessage in cross-domain iframe tracking:
we can have our child iframe emit a message, which we can ‘listen’ for and use to notify GTM that an important interaction has occurred. This is great for tracking things like simple form submissions within iframes [. . .] We’ll need to take the following steps:
1.) Post a message from our child iframe
2.) Listen for the message in our parent frame
3.) When we catch the message, push an event into the GTM Data Layer
There is an important caveat: You must be able to add code to the iframe. If not, the process will not work.
Ahava has authored two solutions for cross-domain iframe tracking, the latest of which uses customTask:
The customTask API is a feature of the Universal Analytics library (used by Google Tag Manager’s tags, too). It lets you get and set values from and to the (Measurement Protocol) hit as it’s being generated.
For cross-domain iframe tracking, “customTask leverages a setInterval() script which polls the page periodically until the target iframe is found or a timeout is reached.”
When Google Analytics registers a hit to the parent page, Ahava’s solution prompts customTask to look for an iframe that matches a preset CSS selector, then to decorate the iframe URL with the Client ID of the initial hit.
Even that solution, however, is fragile, especially if the iframe includes redirects—”untrackable shit-monsters” indeed.
Session stitching aligns Google Analytics data with what we know to be true: Users, in one sitting, navigate between domains, complete purchases, or fill out forms that briefly transition them to and from another domain.
The critical nature of those interactions—pre- and post-login, anonymous visitor versus known lead, and potential customer versus past-purchaser—make session stitching well worth the effort.
Weaving together user interactions enhances attribution data and reduces blind spots at critical junctures in the user journey.