BigQuery Editions offer a flexible approach to data warehousing and analytics, catering to organizations of all sizes and with diverse workloads. The choice between pay-as-you-go and commitment-based pricing models can significantly impact your overall costs and operational efficiency.
This article delves into the key factors to consider when deciding whether to commit to a BigQuery Edition or opt for the pay-as-you-go model. We’ll explore the benefits and drawbacks of each approach, highlighting the specific features and capabilities available in each Edition.
Each version of Editions offers a pay-as-you-go option. Enterprise Editions and Enterprise Plus Editions also offer a 1- or 3-year commitment with a discount. The price for each plan also depends on the chosen region. Â
Courtesy of Masthead
Let’s assume you have already understood that you need to switch to Editions, meaning you have identified a steadily running process where you can spare and allocate the budget better.
We suggest the decision flow will look like this:
Courtesy of Masthead
If BQML is something you depend on, Standard Edition is not an option for you, as it is unavailable for it.
Courtesy of Masthead
One of the crucial points is to estimate the scale of workloads correctly.
Courtesy of Masthead
However, if your organization needs Assured Workflows, which is unavailable in the Enterprise and Standard Editions, and you do not want to upgrade to the Enterprise Plus Edition, it is available with On-demand pricing.
Courtesy of Masthead
Generally, features available in the Enterprise and Enterprise Plus Editions are also available in On-demand but not Continuous Queries. It is something we suggest paying careful attention to.
Courtesy of Masthead
If your organization needs fine-grained security controls but does not have a significant workload to justify the Editions pricing model, these controls are still available with On-demand pricing.
Courtesy of Masthead
This is interesting: custom-key encryption is available only in the Enterprise Plus Edition, but Google-owned and Google-managed encryption keys are still available in the Standard and Enterprise Editions. The same applies to egress control, which is available only in the Enterprise Plus Edition and On-demand but not in the Standard and Enterprise Editions.
Courtesy of Masthead
This factor is also worth paying additional attention to. Editions Enterprise Plus is the only plan that has a fully managed disaster recovery provided by Google Cloud.
Generally, Enterprise Editions and Enterprise Plus Editions are for bigger organizations with mature and advanced data practices. Let’s look at how Editions Enterprise and Enterprise Plus are structured.
The other reason to choose Enterprise Editions and Enterprise Plus Editions is the availability to commit and purchase slots with a discount. The commit is purchased by an organization centrally and can be distributed across a maximum of 200 reservations, which the organization creates based on its structure, whereas Editions Standard can be distributed only across a maximum of 10 reservations.
Reservations are the smallest entity where slots could be assigned for use by multiple projects.
Courtesy of Masthead
When organizations purchase commitments, they can be distributed based on project needs.
Because of the scale and complexity of the projects Editions Enterprise and Enterprise Plus were designed for, they have advanced workflow management, which is not available for Editions Standart (except autoscale).
Autoscale slots are the difference between baseline slots and the total number of slots allocated in the reservation. You must consider how many slots you would need to have assigned to autoscale based on capacity to accommodate your workload demands, which cannot always be entirely predictable. Remember that the slots you would like to allocate from autoscale come from the commitment pool purchased by the organization and assigned to a reservation.
Courtesy of Masthead
Before enabling autoscale, you need to know that there is no way to estimate the future workload or the actual workload precisely through the INFORMATION_SCHEMA .
The hurdle is that baseline slots represent the number of slots allocated to the workload at every second of the day. When the jobs are fired, it is relatively easy to know the costs in slots. But if you have more than one pipeline assigned to one project, and if the pipeline runs simultaneously and uses more than baseline jobs, you cannot know which particular job used autoscale slots. The hurdle is that with autoscale jobs, the client is not billed for slots used in autoscale, but it is rounded up to 50 slots for the project that is assigned autoscale after the 10th of July 2024. The project that created the autoscale before the autoscale may be rounded to 100 slots. Another hurdle is that the cool-off period takes at least 60 seconds, meaning that if the job used autoscale slots for at least 1 sec, it used 50 slots/minute (or 100 slots/minute).
All this makes it super challenging to calculate the best workload, as it also has to do with the queue on how bigquery prioritizes jobs. Here is the description of the process, which makes it more transparent but does not necessarily help calculate or predict actual slot usage.
More than that, BigQuery aims to use any other unused slots at a particular moment to sustain the workload that requires more slots than are available in the project’s baseline reservation. The instrument for that is the Idle slot, but this also makes it hard to estimate the price for each job and used slots since no one knows which job was using baseline plus idle slots and what portion of the job was using the autoscale slots if they were.
Baseline slots are the allocated number of slots for which the reservation will always be charged. This is an optional configuration, but you need to configure baseline slots if you want/need autoscale slots.
Idle slots are those allocated to reservations and included in the baseline that are not currently in use, or committed slots that have not yet been assigned to any reservation yet. You can think of idle slots as a shared pool of slots that can be used by reservations in the same organization in case the reservation needs them to process a workload that requires more slots than the ones assigned to its baseline.
Idle slots are shared slots that aim to utilize as many committed slots as possible and not push jobs into autoscale. The BigQuery team uses a fair scheduling algorithm to share Idle slots within one reservation.
Courtesy of Masthead
In the case above, idle slots will include all baseline slots not in use at the moment in the Google Cloud project within the same reservation, plus any unassigned slots available within the administration project. It is also possible to secure a specified slot capacity on a project level or disable slot sharing at all, for example, for production projects.