October 18, 2024

Google BigQuery Compute Cost Optimization: Mastering Editions Pricing

Co-founder & CEO, Masthead Data

This article is part of BigQuery Cost Optimization Series.
Here is a Previous Article about GBQ Storage Cost.

Ever wondered how Google BigQuery’s pricing works when it comes to compute costs? 🤔 It can be a bit puzzling to navigate between on-demand pricing and the newer Editions model. Well, you’re not alone!

Let’s start by understanding current Google BigQuery’s options for compute billing.
GBQ compute today consists of two billing pricing modules:
Editions and On-demand

Courtesy of Masthead Data

On-demand pricing model is measured in the TiB processed; users pay for the volume of data processed by each query.

Editions pricing model is measured in slots, which are virtual CPUs that are used to process query(s) in Google BigQuery.

Editions is a fairly new pricing model; it is billed in slots, but the job execution is calculated in slots/hours. The best way to explain is with an example.

Think of this like the GBQ job needs the X number of slots to execute but the execution can last 1 hour or 2 hours. So if the job execution costs 1000 slots and it can be finished in 1 hour, it will cost 1000 slots, and if it will take 2 hours, it will cost 2000 slots.

The formula behind calculation looks like this:

Overview of Google BigQuery Editions

Google recently did a great job in their documentation, helping to choose and compare all possible options for Google BigQuery compute.

Below, we highlight the peculiarities of choosing between Editions plans and compare them to the On-demand plan.

Courtesy of Masthead: Editions plans.

5 Good to Know about Editions

All Edition plans are measured in slots/min

Editions Standard is only available for the pay-as-you-go version without a commit, unlike Editions Pro and Editions Enterprise. Editions Standard plan is capped at 1600 slots per reservation while Editions Pro and Editions Enterprise have much higher limits. It is worth noting that On-demand (per TiB), billing is also limited with 2000 slots per-project capacity.

The most significant disadvantage of Editions Standard is that it is unavailable for BQML, unlike any other Editions plan or On-demand. It also does not have access to pumped continuous queries or create or refresh materialized views; it can only query already created materialized views, not to mention security controls like column-level access control, row-level security, or dynamic data masking.
All Editions Plans have autoscale capacity; yes, the Standard plan as well.The autoscale is optional. The use case for having autoscale configured automatically is to adjust the allocated capacity to meet your workload demands. In practice, we saw that BigQuery is designed to go through all the jobs in the queue as fast as possible, but it also means that it runs into autoscale roughly 8 out of 10 times. It is not a bad thing and is not necessarily that expensive. Before enabling autoscale, decide whether you need your workloads to be finished as soon as possible or you can have extra time for jobs to finish without potentially consuming extra slots in autoscale.

Enterprise Editions and Enterprise Plus Editions offer more sophisticated workload management (bigger number of reservations possible, idle capacity sharing, target concurrency) and enhanced security controls. These editions are also available with a 1- or 3-year commitment at discounted prices of 20% and 40%, respectively. However, a crucial point to watch is that you cannot change the region once you commit to any of these plans in a specific area. It will be considered a new SKU by Google Cloud, meaning you would need to use standard editions to operate workloads in a different region, opt for on-demand pricing, or purchase Enterprise Editions or Enterprise Plus Editions in the new region.

Another important consideration is determining how much to purchase upfront. Estimating the cost of running the same workload you previously used to run at On-demand pricing and comparing it to Editions can be challenging, as it is conceptually different. Additionally, clients don’t know when their workload will trigger autoscaling, how much it will consume in autoscale, which jobs use autoscale slots, or when idle slots will be used before progressing into autoscale.

Let’s turn to the practical side. The market is still trying to figure out what billing model is best for their business. The most frequent questions we hear from clients at Masthead are whether it is worse for us to switch to Editions from On-demand, whether we should commit, and whether autoscale is good for us. I’ll elaborate on all these questions in this guide.

Should we switch to Editions, or is it better to use On-demand?

Based on feedback from clients and expert users of GBQ, we observed that Editions are generally a better and more efficient choice for highly regarded projects with a saturated and consistent compute load, i.e., Scheduled jobs (ETLs) are running, so the compute is quite predictable. However, the compute volume should justify the shift from On-demand to Editions. We’ve encountered situations where an organization has dozens of Google Cloud projects billed On-demand, but only one project is billed under the Editions model, specifically for Looker jobs. Looker operates across the entire organization, while the other Google Cloud projects are dedicated to specific use cases or data products, primarily handling batch processes. In these cases, paying for constant compute availability at slot rates wouldn’t be cost-effective compared to a project that handles all Looker jobs.

If your slot usage constantly does not exceed 1,000 slots, then there is a good chance that the Editions plans are probably not for you unless this workload requires the usage of BQML. The reason is that slots are quite expensive. For instance, if you had a set of queries that run consistently that process 1 TiB of data and use 1,000 slots over the course of an hour, you would spend $40 USD on that using Edition Standard, $60 USD on the most common Edition (Enterprise), or $6.25 for the same set of queries under On-demand.

On the flip side, for an extreme example, if those queries used 1,000 slots over the hour and, instead, processed 40 TiB of data, then your on-demand cost would be $250 USD versus $40 or $60 USD on Standard Editions or Editions Enterprise.“ Sayle Matthews

On-demand pricing works better when the process needs to be parallelized to avoid an “out of time” error. It also works better when the BigQuery query is ad hoc or batch.

How do we estimate the slot consumption compared to GiB in On-Demand?

Some clients replicate the most lauded process to a single project where they assign the Editions pricing model and try to see if that makes more sense for them to switch and restructure their billing to it. This very much fascinates me, as it requires so much work and does not guarantee good results or some partial learning;

Why is it difficult? To understand the workload in slots, you can’t simply convert on-demand consumption to total slots used. When you start using slots, you also need to consider the time dimension. Essentially, in Editions, you secure physical servers to supply your workload with the X amount of CPUs, which are converted into the Y amount of slots available every second of the day. However, your jobs run with varying workloads throughout the day and possibly throughout the week, meaning that at any given second, the Y amount of slots might be either too big or too small. This can push your workload to use idle slots and/or autoscale slots. The challenge is that you don’t know which job will use idle slots, which will trigger autoscale, and, more importantly, you can’t accurately predict how many jobs you’ll have in the future.

Courtesy of Masthead

If you want to assess whether it will really work for you, don’t hesitate to reach out to us at Masthead. In just a few hours, we can use retrospective data to estimate how much it would cost you to run your on-demand project in Editions. We will also identify which pipelines would be better allocated to a separate Google Cloud project billed in slots, making it more cost-efficient for you.

At Masthead, we have feature that show their current on-demand workload translates into slots/hours at the pipeline level (not just the job level). This feature helps identify pipelines with the most consistent workloads, enabling better allocation to a separate project, which will be beneficial when switching to Editions. Everything is done fully automatically, providing you with insights to help you make the best data-informed decision.

Courtesy of Masthead Data: product screenshot

Column 1 shows the volume of GiB used for every pipeline executed within the chosen time frame (Aug 3 – Sep 3, 2024). Next to the GiB, you can also see the slots per hour that this particular pipeline would consume if the project were billed in Editions.

In column 2, you can see the price paid for processing this pipeline (in bold) within the chosen time period, and it shows how much it would cost the client to process the same pipeline in Editions Standard within the same region.

We want to highlight that whether it makes sense to move a pipeline to Editions billing depends heavily on the pipeline itself. The aim of this exercise is to help our clients determine which pipelines can be allocated to a separate Google Cloud project and switched to Editions Standard (or any other Editions) to make it more efficient and ultimately lower their compute costs.

So, is it worth switching to Editions? It depends! If you have a consistent, high-compute workload, Editions might be your best bet. But if your workloads are smaller or more sporadic, on-demand could still be the way to go.

Do you want to learn more about BigQuery Cost Optimization? Check out our GBQ Cost Optimization Guide

Post Tags:

Google BigQuery