Masthead Data x Arpeely

 The main sign of a high ROI product is when it becomes a core part of our monitoring routine. In the case of Masthead, the most critical pipelines of our data platform are monitored by their solution, allowing us to monitor the health of our data platform and ensure that we can proactively address any issue that may arise. In addition, Masthead has a cost management module that allows us to effectively control the ever-growing cost of our data platform and ensure its efficiency.

Roee Sheffer,
 VP of R&D at Arpeely
65%
pipeline error reduction
Industry
Technology
Company size
100 employees
Founded
2016
Data stack
Google BigQuery, Dataform, Looker, Python

Arpeely’s Data Platform Evolution: Achieving Full Data and Pipeline Observability with Masthead

Challenge

Arpeely handles large volumes of data, processing over 2 million ad impressions per second and generating 100 billion ML-based predictions daily. To manage this extensive workload, Arpeely has implemented a highly intricate GCP-based infrastructure comprising approximately 20,000+ tables, processing around 80TB of data daily. Arpeely developed a complex system relying on both in-house solutions powered by Python and Google Cloud services supporting their data management and analytics, like Dataform, Workflow, and Looker. 

Arpeely’s data engineers relied on Dataplex Data Lineage to track dependencies in their data architecture. However, the Data Catalog did not cover some of their more specific needs, such as column-level data lineage. Consequently, Arpeely’s data engineers sought a data observability solution that could simplify their processes by offering the necessary data lineage and deeper insights into the various components of their data platform that go beyond data health, but also data pipeline health, and pipeline execution metrics.  

Masthead’s onboarding highlighted challenges, like the extreme complexity of Arpeely’s data system, which involves hundreds of pipelines running costly and complex queries. These conditions made tasks like continuous data observability, real-time issue resolution, pipeline and architecture health audits, and cost tracking exceptionally demanding.

Solution

Masthead provided Arpeely with automated data lineage, allowing the customer to track their data flows down to Looker dashboards within 30 minutes of deployment. As a result, Arpeely gained a more complete vision of its complex data architecture, comprising multiple Google Cloud projects, more than 20,000 tables, over 2000 pipelines and models, and hundreds of reports and looks in Looker. 

Within just 24 hours of implementing Masthead, Arpeely could accurately estimate the number of active pipelines, providing them with a comprehensive overview of their data architecture. Masthead not only fulfilled Arpeely’s initial request for data column lineage but also exceeded their expectations by automatically offering clear insights into the structure and cost of their data products and the performance of pipelines.

Another key benefit Arpeely gained from using Masthead is data platform reliability. Masthead detects and alerts Arpeely’s data engineers in real time about data errors such as syntax errors, unrecognized names in queries, and query inconsistencies. Masthead’s data observability allows Arpeely to address these issues promptly and understand their impact on tables and BI reports that rely on the affected data. With such functionality, Arpeely’s teams proactively resolve issues before they can impact their business data. As a result, informed insights helped Arpeely’s data team achieve reliability and control over their core data assets. Quick response to data errors and anomalies directly impacts the value Arpeely provides to its customers. 

Results

What initially began as a desire for comprehensive data lineage has transformed into Arpeely gaining a much more comprehensive visibility over its complex data architecture. Arpeely’s data engineers and analysts use Masthead to track the connections between their tables and the individual columns, allowing for much smoother root cause analysis. Masthead made a big difference by allowing Arpeely’s data teams to promptly identify issues in pipelines or tables, see their dependencies, and troubleshoot these issues much faster. 

Checking Masthead’s real-time incident reports has become a main part of Arpeely’s data engineers’ reliability and alerting stack allowing Arpeely’s specialists to swiftly respond to any issues in real-time. 

Instead of being a tool for data lineage, as was initially expected, Masthead has gained prominent status in Arpeely’s data practices and data health environment. As a perfectly compatible tool for GCP, Masthead empowers Arpeely with comprehensive visibility over multiple facets of its data platform. This includes managing pipelines, data warehouses, BI solutions, and even specific rows and columns within tables.