Imagine you're the new CIO of an international group.
One of the first projects you want to undertake is to harmonize the data ecosystem.
You understand that the Data Mesh approach must provide strong autonomy to teams, but you don’t forget that one of the most important aspects of Data Mesh is Federated Governance.
This article discusses the implementation of Federated Governance over Snowflake.
In Snowflake, you have several containers at your disposal to “organize” your data.
Here is their hierarchy:
That’s all for the technical part. Let’s now talk about what makes data architectures exciting.
To ensure proper data usage and management within Snowflake, you may want to monitor the following:
All governance elements are now under a new umbrella called Snowflake Horizon, which continues to expand.
As I mentioned to grab your attention, we have three options for managing accounts:
1️⃣ Use a single account and separate departments into different databases.
2️⃣ Use multiple accounts and deploy your governance from your CI/CD.
3️⃣ Use multiple accounts, including a Zero Data Account that carries the governance.
It's possible, and indeed it has been done for years, to use a single Snowflake account and isolate departments into separate databases.
Example in a diagram:
Advantages:
Disadvantages:
One can use a CIO account and one account per department capable of operating its own Snowflake. If this isn’t the case, it might be preferable to keep objects at the CIO account level and provide a complete service to subsidiaries.
To deploy governance elements (tags, masking policies, etc.), we won't manually execute scripts. No, not here.
We will use CI/CD (e.g., Github Actions + schemachange) to automatically execute scripts across the different Snowflake accounts.
We can also use direct git integration in Snowflake and code a procedure that manages deployments. I'll talk about this in a future article.
Note: You will not be able to perform joins from one account to another directly in your queries.
We will publish the shared object (table, view, etc.) on a Private Listing accessible to one or more accounts.
I see this constraint as an opportunity to manage the publication of data products more controlledly. Because when you know you are a data producer, you must provide your consumers with a quality experience (documentation, quality data, no changes to the interface contract, etc.).
Advantages:
Disadvantages:
The approach of a Zero Data Account is now possible with Snowflake thanks to the introduction of Replication Groups. But the principle is simple and widespread in DevOps approaches (e.g., AWS Control Tower).
Instead of using CI/CD, we will deploy governance elements directly from Snowflake through replication groups that will be deployed on other accounts in the same organization.
We can centralize all previously mentioned governance information at this Zero Data Account level, which, as the name suggests, does not intend to host data.
The Zero Data Account must be at a Business Critical subscription level to use the governance object replication.
I haven’t found in the documentation whether the target accounts also need to be Business Critical, but it seems they do not.
To verify that accounts have properly used the governance elements, we could ask them to share tables like SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCE with us.
But we can also talk to each other during federated governance meetings. I prefer that.
Advantages:
Disadvantages:
Using Snowflake is simple.
Administering Snowflake within a large company while ensuring governance consistency across various accounts requires a bit more thought.
Without that, where would the fun be?!
Masterclass: Deliver a Domain-Driven Data Mesh Architecture Successfully