In the era of the Modern Data Stack, many organizations find themselves trapped between two extremes: manual, error-prone data handling or prohibitively expensive ETL (Extract, Transform, Load) platforms that demand high fixed monthly costs. The challenge for engineering teams is to build a robust, scalable system that provides business intelligence without paying for idle servers.

The following architecture demonstrates how to leverage Google Cloud Platform (GCP) to create a fully automated, serverless data pipeline. By combining event-driven ingestion with scheduled transformation, this system ensures that costs are incurred mostly during active processing, effectively driving the operational overhead to near zero for low-to-medium volume workloads.

Data flow diagram from sources through bronze, silver, and gold layers to Data Studio visualization. - Moov

1. Ingesting Data Without Always-On Infrastructure

The entry point of the pipeline is designed around the two most common sources of business data: external platforms, such as social networks, CRMs, and analytics tools, and files uploaded manually for ad-hoc datasets:

Landing Data in the Lake

Both ingestion paths end in Cloud Storage (GCS). Airbyte writes ready-to-use AVRO files directly to the lake, while manually uploaded files are cleaned and normalized by Cloud Run before being stored. At this point, GCS becomes the landing layer and source of truth for the rest of the pipeline.

2. Orchestrating the Pipeline with Events

Once data lands in Cloud Storage, the ingestion workflow becomes event-driven. Eventarc listens for new files and triggers the workflow that loads them into BigQuery:

3. From Raw Data to Business-Ready Models

Once the data is ingested, it moves through a three-layer model within BigQuery to ensure data quality, traceability, and performance.

The Bronze-Silver-Gold Data Warehouse Model

Versioned SQL Transformation with Dataform

Instead of using external transformation engines, we use Dataform. This allows us to treat our data transformations like software code using SQLX, with all transformation logic stored in a Git repository and versioned over time. Dataform manages the dependencies and execution of SQL scripts that move data from Bronze to Silver and finally to Gold. In this setup, Dataform is triggered by a cron job one hour after the Airbyte sync, giving the ingestion process enough time to complete before transformations begin.

4. Operational Simplicity by Design

Beyond cost reduction, this architecture is designed to reduce the operational burden of running a data platform. There is no Airflow cluster to size, patch, or monitor, and no idle compute waiting for the next ingestion window. Each responsibility is isolated in a managed service: Cloud Run handles custom preprocessing and ingestion logic, Eventarc reacts to new files, Workflows coordinates the execution steps, BigQuery stores and processes analytical data, and Dataform manages SQL dependencies.

This separation also improves resilience and maintainability. Cloud Storage acts as a replayable source of truth, so failed loads or changed business rules can be reprocessed without calling the original APIs again. Transformations are versioned in Git, making changes reviewable and auditable. The Bronze-Silver-Gold structure creates a clear boundary between raw data, reusable business logic, and dashboard-ready marts, which keeps BI tools simple and protects business users from upstream complexity.

5. Why the Cost Stays Close to Zero

The primary driver behind this architecture is cost-efficiency. By choosing these specific components, we maximize the use of GCP's free tiers and "pay-as-you-go" models:

the key principle is that every major component either scales to zero or charges only when work is performed. Cloud Run runs only during preprocessing and ingestion requests. Eventarc charges for events rather than idle listeners. Workflows charges by executed steps rather than by server uptime. Dataform runs on a schedule instead of requiring a permanent transformation worker. BigQuery separates storage from compute and charges query processing based on the amount of data scanned. For low-to-medium volume workloads, this means the platform can remain fully automated without carrying the fixed monthly cost of always-on infrastructure.