Experimentation Agency Message Testing Training Pricing Fast Marketing Community Blog Resources Help

Build an attribution model with BigQuery ML

Use case

Build attribution models tailored to fit your business and domain area.

Upgrade for unlimited access to thousands of playbooks on increasing traffic, improving ROAS and more.

Start 7-day trial for $1

1. Use tools like the SegmentStream JavaScript SDK to collect and store all possible website interactions and micro-conversions in Google Big Query.

For example, micro-conversions that could be collected on an ecommerce product page for a fashion website include image impressions, image clicks, checking size, selecting size, adding to cart, and adding to wishlist.

You can also do the same by exporting hit-level data into Google BigQuery with Google Analytics 360, which has a built-in export to BigQuery. Google Analytics API will not work here, as it provides aggregated data, while hit-level data is required for ML. But new GA4 data will work.

Join the discussion on how to complete this step.

2. Process the raw data into attributes, where every session has a set of features and a label.

Other examples of features include:

  • Recency features. How long ago an event or micro-conversion happened in a certain timeframe;
  • Frequency features. How often an event or micro-conversion happened in a certain timeframe;
  • Monetary features. The monetary value of events or micro-conversions that happened in a certain timeframe;
  • Contextual features. Information about user device, region, screen resolution, etc.;
  • Feature permutations. Permutations of all above features to predict non-linear correlations.
Join the discussion on how to complete this step.

3. Use SQL code to train a model with all possible non-linear permutations, learning and validation set splits, etc. within the Google BigQuery data warehouse.

For example, to create a model that predicts the probability to buy in the next 7 days based on a set of behavioral features, you just need to run an SQL query like this one:

  CREATE OR REPLACE MODEL
          `projectId.segmentstream.mlModel` 
   OPTIONS
          ( model_type = 'logistic_reg')  
   AS SELECT
           features.*
           labels.buyDuring7Days AS label
   FROM
           `projectId.segmentstream.mlLearningSet`
   WHERE date BETWEEN 'YYYY-MM-DD' AND 'YYYY-MM-DD'

Replace projectId with your own Google Cloud project ID, segmentstream with your own dataset name, mlModel with your own model name, mlLearningSet with the name of the the table with your features and labels, and labels.buyDuring7Days with your own label.

Join the discussion on how to complete this step.

4. Use a simple query to evaluate and visualize all the characteristics of your model.

For example, the following query visualizes the precision-recall curve, precision and recall vs. threshold, and ROC curve:

 SELECT * FROM
          ML.EVALUATE(MODEL `projectId.segmentstream.mlModel`,
          (
                SELECT
                       features.*,
                      labels.buyDuring7Days as label
                FROM
                      `projectId.segmentstream.mlTrainingSet`
                WHERE
                      date > 'YYYY-MM-DD'
         ),
         STRUCT(0.5 AS threshold)
     )
Join the discussion on how to complete this step.

5. Apply your model to behavior-based attribution.

For instance, if you have a model that predicts the probability of a user to buy within the next seven days, you can use it to allocate value to different traffic sources, predict the user’s probability to buy at the beginning and end of the session, and calculate the value between the beginning and end of the session. You could create the following table for each user session in your Google BigQuery database, where:

  • session_start_p is the predicted probability to buy in the beginning of the session.
  • session_end_p is the predicted probability to buy at the end of the session.
  • attributed_delta is the value allocated to the session.
table with purchasing behavior data.

This would then produce values for all tracked channels/campaigns based on how they impact a user’s probability to buy in the next 7 days during the session:

full attribution model illustrated.
Join the discussion on how to complete this step.

Current Playbook:

Build an attribution model with BigQuery ML

Mar 14, 2022

2 votes

Request a playbook

Get unlimited access

Thousands of playbooks on increasing traffic, improving ROAS and more.

Sign up now