Do you actually need a marketing data warehouse?
The honest answer is probably not yet. Here's how to tell if you're ready and what to build if you are.
I talk more clients out of building a marketing data warehouse than into it. That’s not great for my revenue, but it’s honest.
Data warehouses are trendy. Every marketing conference has at least two talks about “the modern data stack.” Vendors like Fivetran and dbt have made it easier than ever to spin one up. And every marketing team that’s felt the pain of inconsistent numbers across platforms starts dreaming about “one source of truth.”
The dream is real. A properly built data warehouse is genuinely transformative. But “properly built” means months of work, ongoing maintenance, and a team that knows how to use it. Most marketing teams aren’t there yet. And that’s fine.
Signs you actually need a data warehouse
Let me be specific about what makes a data warehouse worth the investment.
You’re pulling data from five or more platforms manually. If someone on your team spends every Monday morning downloading CSVs from Google Ads, Meta, LinkedIn, GA4, your CRM, and your email platform, then pasting them into a spreadsheet to get a combined view, that’s a workflow a data warehouse solves well. Not because it’s fancy, but because manual data aggregation at that scale introduces errors and eats time every single week.
You need custom attribution modeling. GA4’s attribution is decent for simple funnels. But if you need to build custom models that weigh touchpoints differently based on your business logic, or if you need to combine online and offline touchpoints, you need raw event-level data in a queryable database. That’s what a data warehouse gives you.
You want to do predictive analytics or ML. Customer lifetime value prediction, churn modeling, propensity scoring. These need historical data at scale, and they need it in a format that data scientists can work with. GA4’s interface isn’t that format. BigQuery is.
Your numbers don’t match across tools and nobody can explain why. Google Ads says 500 conversions. GA4 says 420. Your CRM says 380. Everyone is using different attribution windows, different conversion definitions, and different counting methodologies. A data warehouse lets you define one set of rules and apply them to all data sources.
You’re spending over €50K per month on ads. At that spend level, even a 5% improvement in allocation efficiency from better data is worth more than the cost of the warehouse. Below that, the math gets harder to justify.
Signs you don’t need one (yet)
You’re under €5M annual revenue. There are exceptions, but generally, companies at this size don’t have the data volume or complexity to justify a warehouse. Your GA4 data, combined with platform-native reporting, is probably sufficient.
Your marketing team is under five people. Who’s going to maintain the warehouse? Who’s going to write SQL queries? Who’s going to build and update the dbt models when requirements change? If the answer is “we’ll figure it out,” you’re going to end up with an expensive system that nobody maintains and everyone stops trusting after three months.
You don’t have a data engineer. Not a marketer who learned SQL. Not a freelancer you’ll hire for setup and then never talk to again. A person (or contractor with ongoing hours) whose job includes maintaining data pipelines. Without this, your warehouse becomes a graveyard of stale data.
Your funnel is simple. If your customer journey is “ad click, landing page, form submit, sales call,” you don’t need BigQuery to understand it. GA4 handles this natively. A data warehouse pays off when the journey involves multiple touchpoints across multiple platforms over weeks or months.
You haven’t maxed out what GA4 can do. I’d estimate that 80% of marketing teams use about 20% of GA4’s capabilities. Before investing in a warehouse, make sure you’re using Explorations, custom dimensions, audiences, BigQuery exports (the free ones), and proper event tracking. If you haven’t done these yet, start there. For SaaS companies, getting GA4 right often covers most of what a warehouse would provide.
The build path (if you do need one)
Alright, so you’ve read the warning signs and you still think you need a data warehouse. Here’s how I build them, step by step.
Step 1: GA4 BigQuery Export
This is the foundation. Link your GA4 property to BigQuery and enable the daily export (or streaming export if you need fresher data, though it costs more).
The GA4 BigQuery export gives you raw event-level data. Every pageview, every click, every purchase, with all parameters. It’s nested and a bit annoying to query at first, but it’s incredibly powerful. And for most sites, the BigQuery cost is under $20/month.
If you do nothing else, do this. Even if you never build a full warehouse, having raw GA4 data in BigQuery is insurance. You can always go back and analyze historical data in ways that GA4’s interface doesn’t support.
Step 2: Add advertising platform data
This is where it gets interesting and where the costs start. You need to get spend, impressions, clicks, and conversions data from Google Ads, Meta Ads, LinkedIn, TikTok, and whatever else you’re running.
Options, from cheapest to most expensive:
Not sure if you need a data warehouse?I'll assess your data maturity and give you an honest recommendation.
Book a Free Audit →Google Ads Data Transfer. Free. Gets your Google Ads data into BigQuery automatically. This is a no-brainer if you’re running Google Ads.
Platform API scripts. Free (if you can write Python). Each ad platform has an API. Write scripts that pull data daily and load it into BigQuery. I’ve done this with Cloud Functions or Cloud Run. It works, but you’re maintaining custom code forever.
Fivetran, Stitch, or Airbyte. $100-500/month depending on connectors and volume. These pull data from any platform and load it into your warehouse. Fivetran is the most reliable. Airbyte is open source if you want to self-host. Stitch is owned by Talend and is somewhere in between.
Supermetrics. Marketed more toward analysts than engineers. Good for getting ad platform data into BigQuery or Google Sheets. Pricing is per-connector.
For most setups I build, I use Google Ads Data Transfer (free) plus Fivetran for one or two non-Google sources. Total cost for the data pipeline: $200-400/month.
Step 3: Transform with dbt
Raw data from multiple sources is messy. Different date formats, different currency handling, different naming conventions. You need a transformation layer.
dbt (data build tool) is the standard here, and for good reason. It lets you write SQL transformations that run on a schedule, turning raw data into clean, joined, business-ready tables. It handles dependencies between transformations, tests data quality, and documents everything.
This is the step that actually requires SQL knowledge. If nobody on your team writes SQL comfortably, you either need to hire someone or reconsider whether you need a warehouse.
What I typically build in dbt:
- A unified sessions table combining GA4 data with campaign metadata from ad platforms
- A marketing spend table normalized across all platforms (same column names, same currency, same date format)
- A conversions table that matches GA4 conversions with CRM outcomes
- A channel performance summary table that’s ready for Looker Studio
The dbt project usually takes me two to three weeks for a new client. It’s the bulk of the work.
Step 4: Connect Looker Studio (or your BI tool)
Once your warehouse has clean, transformed tables, connecting Looker Studio is straightforward. Point it at your summary tables in BigQuery. Build dashboards on top of reliable, consistent data.
The difference between a Looker Studio dashboard connected to raw GA4 data and one connected to a properly modeled warehouse is night and day. Metrics match across reports. Attribution is consistent. Historical comparisons are reliable.
Cost reality check
Let me be honest about what this actually costs per month in steady state.
BigQuery storage and compute: $20-100, depending on data volume and query frequency. For most marketing setups, this is cheap.
Data pipeline (Fivetran or similar): $200-500, depending on how many connectors you need.
dbt Cloud (optional): $100 for the team plan, or free if you run dbt Core yourself.
Maintenance time: 4-8 hours per month. Pipelines break. Schemas change. New data sources need to be added. Someone needs to be monitoring this.
Total: $500-1,500/month plus personnel time. For a company spending $100K/month on ads, this is a rounding error. For a company spending $5K/month, it’s a significant percentage of the marketing budget.
This doesn’t include the initial build cost, which is where I typically come in. Budget 40-80 hours of consulting time for a first implementation, depending on complexity.
The lightweight alternative
For teams that aren’t ready for the full warehouse but want more than GA4’s interface provides, there’s a middle path that I recommend often.
GA4 BigQuery export (set it up, it’s free for daily exports).
Scheduled queries in BigQuery. Write SQL queries that run daily and output summary tables. No dbt, no pipeline tool, just BigQuery’s built-in scheduler. It’s not as maintainable as dbt, but for two or three queries, it’s fine.
Google Sheets as a bridge. Use the BigQuery connector for Sheets (or the Sheets data connector in Looker Studio) to pull in ad platform data that you update weekly. Yes, it’s manual. But if you’re updating one spreadsheet with three tabs once a week, that’s 15 minutes of work.
Looker Studio on top. Connect to both your BigQuery summary tables and your Google Sheets data sources. Build blended charts where needed.
This setup costs almost nothing. It handles 70% of what a full warehouse does. And it’s dead simple to maintain. I’ve had clients run this for a year before deciding they needed the full version, and during that year, they made better decisions than they were making with platform-native reporting alone.
The honest recommendation
If you’re reading this and thinking “should I build a data warehouse?” you probably shouldn’t. Not yet.
Here’s my decision framework. Answer these questions:
- Are you spending more than €50K/month on advertising?
- Do you have data from five or more platforms that need to be combined?
- Do you have someone on your team who can write SQL?
- Are you currently making decisions based on data you don’t fully trust?
- Have you already maxed out what GA4 and platform-native reporting can do?
If you answered yes to four or five of these, build the warehouse. If you answered yes to two or three, start with the lightweight alternative and revisit in six months. If you answered yes to one or zero, focus on getting your GA4 implementation right first.
The best data infrastructure is the one your team actually uses. A BigQuery warehouse that nobody queries is worse than a Google Sheet that someone updates every Monday. Start with the simplest thing that solves your actual problem, and build up from there when the pain justifies the investment.
Artem Reiter
Web Analytics Consultant