February 25, 2025

BigQuery Storage Costs: Confident Recommendations Perspective

For businesses aiming to streamline data usage, managing cloud storage costs is crucial. Understanding billing models and optimization opportunities is key. In this post, we share learnings from implementing BigQuery storage cost insights within Masthead. By deeply analyzing BigQuery’s storage pricing and cloud usage metrics, we refined our storage cost recommendations. This resulted in more accurate and reliable billing model suggestions, empowering data owners to achieve optimal cost efficiency.

Masthead Storage Costs Insights and recommendations

Identifying Key Knowledge Areas

Our journey began by pinpointing critical concepts and gaps in BigQuery’s storage pricing documentation. While Google Cloud’s documentation is robust, some aspects required deeper investigation. We categorized our findings into two main areas: core concepts and definitions, and the usage-based billing process in Google Cloud.

Core Concepts and Definitions

Understanding these fundamental concepts is essential for managing BigQuery storage costs:

  • Logical vs. Physical Storage Models: BigQuery offers two billing models:
    • Logical (default): Charges based on uncompressed data volume.
    • Physical: Charges based on compressed data volume, potentially offering cost savings.

The logical model is the default, while the physical model is an alternative that offers potential cost-saving opportunities by billing the compressed data bytes.

  • Active vs. Long-Term Storage: Both models further categorize storage:
    • Active storage: Partitions modified within the last 90 days.
    • Long-term storage: Data unmodified for over 90 days.
  • Additional Physical Model Metrics: The physical model includes these extra storage usage metrics:
    • Time Travel: Data deleted or updated within the time travel window.
    • Fail-safe Storage: Data for emergency recovery, available for seven days post-time travel.
Lifecycle of BigQuery storage types

These definitions are key to understanding how BigQuery charges for storage. But there’s also the matter of how the usage-based billing process works.

The Usage-Based Billing Process

Optimizing storage costs requires understanding Google Cloud’s billing calculation:

Google Cloud tracks data storage usage changes after each data operation. Usage is then rounded up to Mebibytes (MiB) and prorated per second. The final usage is calculated by multiplying data size (in Gibibytes) by storage duration (in months).

Key factors influencing storage cost:

  • Data Volume: The amount of data stored in BigQuery tables.
  • Storage Duration: How long data is retained on the partition level.
  • Compression Efficiency: Data’s compressibility, impacting physical model costs.
  • Data Modifications: Recency and frequency and type of data changes (updates, deletions), impacting physical extra storage costs.

Masthead implements frequent, granular storage usage analysis to provide confident and timely recommendations.

Tracking Data Volumes and Usage Trends

By following the Google Cloud billing logic and tracking historical storage metrics per table, Masthead monitors how different types of storage usage evolve over time. This allows us to identify patterns and trends in storage consumption that influence cost, enabling more informed decisions about future storage strategies. 

We are tracking the state of the data volumes on a regular basis to analyze the data change trends and the monthly costs. Let’s look at the data storage estimation difference looking at data assets lifecycle analysis vs a single snapshot.

Data volume life-cycle charges related to a data stored in BigQuery for 1 day

Don’t rely on single snapshots for BigQuery storage cost estimations – you could be significantly off. We’ve seen up to 50% discrepancies compared to actual Google Cloud bills, particularly after large data operations. This is because snapshots (like a query to INFORMATION_SCHEMA.TABLE_STORAGE) offers a static view, while true costs are driven by data lifecycle and changes over time, especially during irregular operations like aggregations and periodic pipeline adjustment releases.

Masthead overcomes this limitation by using frequent data collection to provide dynamic and accurate cost recommendations. Analyzing data lifecycle costs, rather than just snapshots, underscores the value of this approach.

Take Strategic and Timely Actions

In conclusion, strategic and timely estimation is no longer optional – it’s essential for optimizing BigQuery storage costs. By understanding nuances like billing models, region-specific pricing (where locations like asia-southeast1 can offer savings or higher costs like us-south1), and the cost implications of different data asset types, you gain crucial control.

Leveraging these BigQuery billing insights and employing refined estimation techniques, especially with tools like Masthead, empowers businesses to unlock up to 50% storage cost optimization.

Take control of your storage expenses and ensure consistently high performance – with strategic estimation, and Masthead, it’s within reach.