If your company is anything like some I've worked in, data is segregated, siloed and unstructured. Finance and Sales can’t agree on last month’s customer acquisitions, and worse yet, net revenue. Moreover, you’ve exhausted Excel’s row limits attempting to join data from disparate data sources so much so that your laptop’s processor is burning a hole in your desk, or in my case, sweatpants. If any of this sounds familiar, it’s time to elevate your company's data stack with a data warehouse.
In the last few years alone, there's been tremendous growth in data infrastructure technologies and the best practices that support this technology. From traditional and often clunky on-premise data warehouses, expensive and brittle ETL tooling and a heavy dependency on highly-skilled Data Engineers, we’re now observing the rapid shift to cloud-based SaaS data warehousing applications, flexible EL pipelines and the emergence of self-serve analytics tools for the non-technical user.
In this blog post, I’m going to talk about the telltale signs your business has outgrown its current, and in some cases, non-existent, data capabilities and how adopting a modern data stack would, among other things, increase the efficiency and speed in which you arrive at insights and address seemingly complex business problems.
The critical question you need to ask yourself before embarking on your data stack journey is: how do I know I’m ready? Simply put, when your productivity is diminished by your capacity to efficiently elicit meaningful business insight, it's likely time to move forward with a data warehouse.
Here’s my 4 top indicators you’ve outgrown your current analytical capabilities:
In today’s data-centric world, in order to stay competitive, one thing that is certain is that you must embrace a culture in which business decisions are supported by your data. As we just learned, when the volume of data and the complexity of your use cases increases, managing your data will routinely become more difficult, and the processes that may have once worked may no longer drive any utility. Combating the previously mentioned symptoms require robust infrastructure that can make data available proficiently and a framework that will guide the management of your data as your business scales. Enter the Modern Data Stack.
Increasingly, I’m observing companies with relatively small data teams and operating budgets adopt cloud-native data warehousing and business intelligence infrastructure that embrace the flexibility and scale that cloud tech offers. In addition, with the rise of free, self-serve enterprise-grade data vendors and options for open-source tooling, traditional barriers of entry are being dismantled and companies of all sizes can build infrastructure that significantly speeds up time-to-insights. The process of provisioning this stack is simple, future-proof and requires minimal technical oversight from Data Engineers. Let's now take a look at the common ingredients of a modern data stack.
The core components of a modern data stack are typically made up of a cloud-based data warehouse, data pipelines and connectors, a business intelligence platform and workflow orchestrator that manages the propagation and transformation of data through the stack.
Cloud-based Data Warehouse: to ensure a single-source of truth, a data warehouse serves as the central location to collect, store and integrate data from all your disparate data sources. Unlike traditional on-premise data warehouses, SaaS cloud data warehouses leverage the storage and computing resources allocated by your cloud provider to ensure your data infrastructure is available, scalable and secure. Popular services providers include: Snowflake, AWS Redshift and Google’s BigQuery.
Data Pipelines/ Ingestion: a connector service that offers flexible solutions to easily load diverse data streams into your data warehouse with minimal engineering effort. Selecting a data pipeline management tool depends on the variety and unique nature of source systems, data structures and data types you plan on collecting in your data warehouse. Common data source types may include
Connections to third-party data sources are typically established one time and as your data sources grow or change, you can leverage existing connector infrastructure to funnel data into your data warehouse. Popular service providers include Stitch and Fivetran.
BI and Analytics Platform: a powerful visualization and data science platform that can take advantage of the consolidated data warehouse to offer agile exploratory analysis and an ability to generate richer business insights. Once data is available in your data warehouse, you need a BI and Analytics platform to make sense of it. The primary goal of this layer is to make data actionable - with modern BI tools, we now have the ability to look beyond the bar chart and automate marketing propensity models, monitor and detect security and intrusion threats and in general, automate what were once very rudimentary and manual reporting tasks. Popular services providers include Narrator, Tableau and Looker.
Data Orchestration and Transformation: the final element in a typical modern data stack, data orchestration and transformation utilities add an additional modeling layer for your business logic. Though both your data warehousing and BI applications may offer this functionality, it is recommended that you decouple your transformations and attributions from these other layers in your stack so that it can be accessible to other tools and applications that may want to leverage them. In addition, these tools can coordinate and schedule the sequence and dependencies of all your data pipelines. Popular services providers include dbt, Airflow and Narrator.
The strengths of this proposed architecture leverage the availability and scalability of modern cloud-based applications that can handle voluminous and diverse datasets and require limited technical oversight from engineering resources.
Additional benefits of this approach, particularly for smaller shops, include the low costs to provision and maintain this stack, the speed and ease of getting started, and the wide variety of data applications and features to select from.
Only a few years ago this proposed data architecture would not have been possible. With the advancement of cloud-based solutions, even early-stage businesses can deploy flexible infrastructure allowing them to spend less time engineering their data management applications and more time analyzing their data.
For more inspiration, Jason Harris of Fivetran lists his Five Reason to Consider a Modern Data Stack and Tristan Handy, founder of dbt, offers his thoughtful predictions on The Future of the Modern Data Stack.
A great place to get started is Mode's guide to building a modern data stack in 30 minutes.
Need more guidance? Get in touch to find out how Narrator can help.