Site icon CXL

Google BigQuery: A Tutorial for Marketers

When it comes to Google BigQuery, there are plenty of articles and online courses out there. Most are “tech to tech” explanations—which are great. But they can be intimidating for those beginning their marketing-to-tech journey.

So where exactly do you start? Or, if you’re already using BigQuery, how can you go further and do some really cool stuff with it?

Get started—or move faster—with this marketer-focused tutorial.  

What is Google BigQuery?

Google BigQuery is a warehouse for analytics data. It’s a place where you can:

The first terabyte of query data and the first 10 gigabytes of storage per month are free.

BigQuery isn’t the only game in town. Alternatives include Amazon Redshift, Snowflake, Microsoft Azure SQL Data Warehouse, Apache Hive, etc.

Oft-cited advantages of BigQuery include: 

Still, why would you go beyond your usual digital analytics tool and try a cloud solution like BigQuery?

What are the advantages of a tool like BigQuery?

When you work with Google Analytics or other digital analytics tools, you usually have control only over data collection and analysis. With a tool like BigQuery, you have more control over every stage of the analytics infrastructure

It’s not the only difference. While Google Analytics makes it possible to add CRM, back-office, or call-tracking data (via the API or Measurement Protocol), it’s still a suboptimal solution to consolidate your data.

You have little control over the Google Analytics system—if your data is sampled or altered because Analytics wants to, well, that’s your problem.

A (not so) hypothetical use case

Imagine you need a monthly report with data from Google Analytics, your CRM, call tracking software, and some other sources. Every month, you go to each of these tools and search for useful data to fill your report.

But, sometimes, you can’t really access the CRM because you don’t have permissions (i.e. enterprise politics), or you’re at an agency and your client doesn’t want you to touch their CRM. So, you wait for someone to send you the necessary data to integrate into your report, which—as it’s often happened to me—takes time.

Then, you integrate all this data manually, which also takes time. By the 10th of the month, you have everything you need, but it’s kind of late to present these figures and make a decision about the actions to take that month.

Then you think, “We can’t do this anymore—we have to automate!” You propose some tools to your client who says “too expensive,” “too complicated,” etc., to every option. So, you look for the cheapest and simplest solution.

For me, BigQuery was that solution. I thought (and, ultimately, was right) that the amount of client data would never go beyond the free threshold, and that we could connect it to a free and simple Data Studio dashboard.

And that was it—a cheap and simple solution for the monthly reporting struggle. 

Another use case

Imagine you want to know how much revenue your campaigns generated…

…and you sell houses.

…and you have a shitty custom CRM that can never connect to your Ads or Analytics platforms.

You know the number of leads, but you can’t connect them to house purchases. The solution is to give every lead and every purchase a userID (like an encrypted email), to pull CRM and Google Analytics data into your BigQuery data warehouse, and then—with a simple SQL query—join the two tables.

You can then say that userID X, who came on January 11 from Google Ads, brought us $500,000 in revenue. Therefore, it’s okay that we spent $500 to get that lead.

Inspired? Here’s how to get started.

Getting started with BigQuery

BigQuery is part of the Google Cloud Platform. To start working with it, you have to create (or log in to) a Gmail account and then go to Google Cloud Console to create a Cloud Project. The creation of these elements is straightforward.

Note: When you enter a Cloud account, it asks you to provide a credit card to get $300 in credits to test the platform. Don’t be afraid—$300 is more than enough for vetting or educational purposes, and they won’t charge you without notifying you that your credits have run out.

If you don’t want to enter your credit card and only want to play with BigQuery and public data (there are plenty of public datasets within BigQuery), you can use a BigQuery sandbox. 

Just enter a BigQuery service after creating a Cloud Project and accepting all the terms, etc. You’ll see a “Sandbox” label in the top-left corner.

If you’re using only BigQuery in your Cloud Project, the schema below is a good explanation of your project structure:

Accesses are managed via Google Cloud IAM. You can find it in the menu (top-left corner) of your Cloud Project.

You also have the option to create an Organization in your Google Cloud account. Organizations are available to GSuite users (paid Gmail, basically) or Cloud Identity owners.

The Organization can have its own billing account and projects, and it can have access to other projects without access to their billing account:

In our agency, we have an Organization as a GSuite user. In some cases, we create projects for our clients and link them to our billing account. In other cases (when the client already has a project on the Cloud Platform), we just link their project to our organization to work without access to our client’s billing account.

Once the project is created and you’re in BigQuery, you’ll need to know some SQL to start playing with your BigQuery data. SQL is not rocket science; you can learn the basic concepts quickly and find plenty of SQL query examples to tailor to your needs. (Here’s a great tutorial for using SQL in BigQuery.)

Now, let’s look at some important steps for using BigQuery. I divide these into three stages:

  1. Before. Get data into BigQuery.
  2. During. Work with BigQuery data.
  3. After. Build dashboards with BigQuery data.

1. Get data into BigQuery 

Build your data schema

Before starting your BigQuery journey, I recommend that you build a data schema. Plan out the datasets, tables, and table fields you’ll need. To do this, ask yourself these questions:

The taxonomy of BigQuery flows as follows:

Projects > datasets > tables and views.

For me, one dataset = one data source.  So, to answer the questions above, you would need three datasets (CRM, Google Analytics, back office).

If you have several brands, you can say that one table is one brand of your company. If you want to store previous years separately (because you rarely use previous years’ data) you can have one table per year. That’s for you to decide. You’re charged less for long-term data storage (i.e. data you didn’t change in the last 90 days).

Once you’ve answered all the above questions, you can start building your schema. I did it with a database schema tool. (There are plenty of them on the Internet—and always one that’s absolutely free.).

After building a schema—which, honestly, you can sketch out on paper—start creating your datasets. Click on your project name (e.g., “angular-radar-255111” on the image below). Project names are based on a random project ID assigned by Google Cloud; you can change it.

Next, click on “Create Dataset.”

Usually, you only need to name your dataset and choose a location for your data. Choose an EU location if your client is in the EU (GDPR!). You’ll notice a table expiration of 60 days if you use a BigQuery Sandbox, the free version mentioned earlier.

Use Google services

Google provides some built-in services to import your data into BigQuery. The first one is BigQuery Data Transfer, which can get data from Google Ads, Cloud Storage, Amazon S3, Google Play, and YouTube. It’s free for Amazon S3 and Cloud Storage. 

BigQuery also connects to Google Drive (Google Sheets and CSV, Avro, or JSON files), but the data is stored in Drive—not in BigQuery. You can, however, query it from Drive directly. 

That has an interesting use-case: Imagine that data must be added manually to Google Sheets on a daily basis. For example, a recruitment agency fills in a sheet at the end of the day with the number of candidates received and candidates placed.

You can get to that data using a Google Sheets link: 

Google Analytics 360, Firebase (Blaze plan), and Google Analytics App + Web provide free integration with BigQuery. For other tools and a standard Google Analytics version, you’ll have to use non-Google connectors.

Use non-Google services

Some CRMs provide a native integration with different cloud data warehouses, including BigQuery. In most cases, our clients have custom CRMs, so we had to ask their developers to build a custom connector to Cloud Storage or BigQuery.

For a standard Google Analytics account, there are a bunch of paid connectors available, starting around $100 per 100K monthly visitors. Pulling your Google Analytics data into BigQuery has benefits:

BigQuery is a popular service—it’s not hard to find connectors for just about any ad or analytics platform. 

Put your developers to work

Previously, we talked about a solution to create your own connector. There are two options here—to BigQuery directly or, first, to Cloud Storage. Both have API documentation to help your developers.

In one of our use cases, we asked the developers to send two CSV files (one from our CRM and a second with back-office data) every midnight with the previous day’s data to Cloud Storage.

In terms of development, it was the cheapest solution—the dev team had to export only two CSVs, once per day. Then, we used a Cloud Function to pull the updated files from Cloud Storage into our BigQuery tables.

Google Cloud Functions are lightweight solutions to automate simple operations. They consist of a piece of JavaScript/Python/Go code and a trigger (rule). I found a code in a Medium blog post and tailored it to my needs.

Here’s a code that you can use in your project:

Some BigQuery professionals won’t like this solution. It has pitfalls:

I chose it because it was the simplest and the cheapest for my client and it works pretty well—for now.

Batch vs. stream processing

There are two ways to send your data to Cloud: batch or streaming. Batch processing sends data once per period (e.g., data from the previous day at 1:00 a.m.). It’s a good option unless you want real-time data.

Mobile applications are a great example—you may want to know in real time if there are issues with your application. Streaming your data is a bit more complicated than batching it. 

You may need a Cloud Dataflow and/or additional services to create a streaming pipeline. It would take a separate article to address that subject. We’ll stick to batch processing for now.

2. Work with data in BigQuery

The BigQuery interface

Let’s take a look at the BigQuery interface. Some courses/articles show the old version of BigQuery:

The new interface is similar to the old one:

Here are the component parts: 

The “Create connection” option is for developers to send data directly to BigQuery.

Introduction to datasets, tables, views, and queries

It’s easiest to understand the structure of a BigQuery project with an analogy from Google Analytics:

Within a project, you can create/delete/copy datasets and tables:

When you click on a table, you have options to query, copy, delete, or export:

You can export your table to Cloud Storage, explore it in Data Studio, or scan it with the Google Data Loss Prevention service (all via the “Export” button).

If you find yourself running a particular query often, it’s simpler to create a view. A view is a table based on your query that gets created whenever you work with it. You can use it in Data Studio, which we’ll talk about later. 

A query is your SQL code—how you communicate with your BigQuery data. Create a query to get the data you need from the tables you have: SELECT (fields) FROM (your table), LIMIT (quantity of lines).

Of course, this is the simplest example of a query. It can be a long piece of code, but the object of this article isn’t to teach you SQL.

Some best practices for working with data in BigQuery

To use BigQuery more efficiently, here are some tips:

3. Build dashboards with BigQuery data

Google Data Studio

To create a Data Studio dashboard using your BigQuery data, open your existing dashboard or create a new one. Click on “Create New Data Source”:

Choose “BigQuery” from all possible sources. After that, you’ll refine your selection by project and dataset.

From there, you can connect to a table or a view. My choice was a view, as it’s basically a pre-created query with only the data I need. You can also connect directly to a table and do all the magic in Google Data Studio directly.

When you connect to a view or a table, you’ll see the fields available in your data source:

When you click on “Add to Report,” you create a connection between your data source (BigQuery view or table) and Data Studio. So go ahead, you’re ready to create a dataviz with your BigQuery data.

Google Sheets

If you’re a GSuite user, you can use a native BigQuery connector to connect with your Google Sheet:

When you click on “Connect to BigQuery,” you’ll have to choose a BigQuery project and then create a query as you would in the BigQuery interface:

One problem: You can’t schedule a query—at least not yet. (The BigQuery connector is new.) You’ll have to refresh the query regularly to fill your Google Sheets table with the newest data.

Once your data is pulled into Google Sheets, you can start creating Google Sheets dashboards. For non-GSuite users, there are some Google Sheets Add-ons (free and paid) that can pull in BigQuery data. At our agency, we use OWOX BI BigQuery Reports, which also lets you schedule your queries.

Non-Google tools

Of course, you’re not limited to Google Data Studio or Google Sheets. Here are some common data tools that integrate easily with BigQuery:

The list is limited to my own knowledge—I’m sure there are tons of other options.

Conclusion

BigQuery is a great option to start consolidating your data. You have plenty of possibilities to test, learn, and embrace this service. To improve your knowledge of Google Cloud, Google BigQuery, and SQL, check out these courses:

There’s a great BigQuery community out there, too, so don’t be afraid to search for answers or ask questions.

As you progress, you can go further with BigQuery, using its integrated machine-learning models, which include pre-built templates. Segment your audiences based on the potential to purchase, predict customer lifetime value, etc.

If you learn the basics, you’re most of the way there. 

Related Posts

Exit mobile version