Enterprises are making significant investments to build robust data foundations to get ready to power their AI initiatives. Modern cloud-based data platforms like Snowflake, Google BigQuery, and Microsoft Fabric have provided the technical means to consolidate datasets scattered across various platforms into unified repositories — often known as Lakehouses or Data Warehouses.
At the same time, these enterprises are also the largest buyers of SaaS solutions. SaaS platforms power critical enterprise functions like CRM, HR, finance, and marketing. However, the data created or consumed by SaaS solutions often exists in silos, creating a conflict with this architecture and governance objective.
Each SaaS deployment is an island of data, with critical enterprise data trapped inside the product. This data is difficult to discover, challenging to govern, and, most importantly, difficult to extract and merge with other datasets of an enterprise for powering analytics and AI.
SaaS vendors have been slow to provide mechanisms to extract data easily and move it to analytics data stores. They have provided APIs and flat file interfaces, but these are inadequate and inefficient to meet the needs of enterprise initiatives to build a true enterprise-wide data platform capable of driving analytics and AI.
Often the SaaS vendor requires significant amounts of sensitive data from clients to provide a specific service. For example, many CDP SaaS vendors require customer 360 profiles to be transferred out to their databases. Transferring millions of records of sensitive customer data is a deal breaker for most enterprises.
Over the last few years, enterprises are getting more aggressive and demanding that their SaaS vendors provide features that would allow the easy transfer of data back to them. Incumbent vendors are facing increasing threats of being replaced in favor of alternatives who enable mechanisms for seamless data sharing. New RFPs are increasingly making data sharing a critical feature to win the business.
What is Zero-Copy Data Sharing, and Why Is It Important?
Zero-copy data sharing is a mechanism for sharing data between two databases without physically moving or making a copy of the data. This eliminates the error-prone and costly steps of building a pipeline to move data from the source to the target. In many cases, this pipeline is currently as arcane as downloading the data from the source, moving it via secure FTP to the consumer’s environment and loading the data into the target database.
Here are the key distinguising features of a zero-copy data architecture:
No physical data movement: Data remains in its original location, eliminating the need for a separate physical copy.
Elimination of ETL processes: Consumers access data directly as tables, columns, and relationships, rather than handling CSV files.
Zero latency: New data is instantly visible to consumers as soon as it becomes available at the source.
Enhanced data quality: The elimination of process steps and code to extract, format, and transfer data improves quality.
Cost efficiency: Reduces expenses related to storage, coding, and data pipeline operations.
Control over sensitive data: Allows SaaS software to access sensitive data without it leaving the client’s control.
Snowflake pioneered this architecture, using it to power their data marketplace and data cleanrooms. Other cloud databases have followed and built their own capabilities in this space. Many SaaS companies like Salesforce, Simon Data, and (recently) ServiceNow have zero-copy data sharing partnerships with Snowflake. But data sharing across data cloud vendors is still a challenge. And for a SaaS company, it’s expensive to publish data products to support all major cloud data platforms’ proprietary formats.
But in the past year, most cloud database vendors have announced support for the open-source Apache Iceberg table format. By providing a common table format, Iceberg enables seamless data sharing between different cloud platforms and services. This ensures that data can be easily integrated and accessed across various environments and provides an opportunity for SaaS vendors to publish standard schemas in a database vendor-agnostic format that can be shared with the client.
What Does All This Mean for a SaaS Platform Vendor?
In the zero-copy architecture, the data from the SaaS product would appear in the enterprise warehouse, structured as a canonical schema, with complete metadata. In other words, the SaaS vendor is providing their clients with a data product, ready for consumption with no extra investment in coding and infrastructure from the client’s technology teams.
Here are some imperatives for SaaS companies:
Zero-copy data sharing is a must-have: Enterprises are now actively asking SaaS vendors to integrate seamlessly with their analytics platform. Zero-copy data sharing is becoming a critical feature in RFPs.
Revisit your product roadmap: Smart SaaS companies have already adopted this paradigm — or are actively working on it. To defend your market share or to win new customers, put this on your roadmap as a priority.
Strategic decisions are necessary: You have decisions to make. Do you adopt Iceberg and stay data cloud vendor-agnostic, or do you directly support sharing mechanisms provided by a specific data cloud vendor? If one or two data cloud platforms have dominant market share in your industry, then the latter might be a better place to start.
Complexity is inevitable: It is going to be messy as this is all still very new. The control plane to manage data sharing, especially across multiple CSPs and data cloud vendors is still not mature. But this is the future of data integration.
Much like APIs became table stakes for operational system integrations, zero-copy data sharing will soon define successful integration with enterprise data.
Zero-copy data sharing is no longer a nice-to-have — it is an essential feature for SaaS vendors looking to retain existing customers and win new ones.
SaaS vendors must act decisively, investing in data sharing capabilities that align with the needs of their customers. By adopting cutting edge technologies like Iceberg or partnering with leading data cloud platforms, venors can position themselves as critical enablers of enterprise-wide data strategies in the AI era.
Mihir is a venture partner with F-Prime Capital and Eight Roads Ventures, and an Advisor-in-Residence at Ernest & Young. He recently retired from Fidelity Investments where he was the CIO responsible for “All Things Data” for the firm. He is currently advising VCs, startups, and large corporations on their data and analytics strategy.
Much like APIs became table stakes for operational system integrations, zero-copy data sharing will soon define successful integration with enterprise data.