When you share a photo album with a specific group of friends or grant a colleague “editor” access to a document, you take for granted that only the right people can see or modify it. This simple act of sharing is powered by a complex system of permissions working seamlessly behind the scenes.Now, imagine this challenge at the scale of Google. Services like Google Drive, Photos, and YouTube must manage permissions for billions of objects shared among billions of users. A unified authorization system is essential, as it establishes consistent user experience across applications, simplifies interoperability, and allows common infrastructure to be built on top. How can such a system ensure that every single access request—millions per second—is answered correctly, flexibly, and almost instantaneously? This is the immense authorization challenge that Google solved with Zanzibar: a single, global system for managing access control.This article will demystify the core concepts behind Zanzibar. We will explore its elegant data model, its flexible configuration language, and the clever consistency guarantees that make it a cornerstone of trust for Google’s services. To solve this immense challenge, Zanzibar’s design begins with a single, foundational concept: the relation tuple.
1. The Core Building Block: Relation Tuples
At the heart of Zanzibar, every permission and relationship is stored as a fundamental unit of data called a relation tuple . This is a simple, three-part statement that clearly defines a single fact about who can do what with a given object. The genius of this approach lies in its uniform structure, which is the key that unlocks the system’s ability to treat permissions and group memberships as the same underlying concept.The structure of a relation tuple is straightforward and can be understood as a simple sentence:object # relation @ user
- Object: The “thing” you want to protect (e.g., a specific document, doc:readme).
- Relation: The “permission” or relationship a user has with the object (e.g., owner, viewer, member).
- User: The “who” that has the permission (e.g., a specific user, user:10).To make this concrete, here are a few examples of how these tuples translate into plain English:| Example Tuple | Simple English Explanation || —— | —— || doc:readme#owner@10 | User 10 is an owner of the document “readme”. || group:eng#member@11 | User 11 is a member of the engineering group. |
A key feature that makes this model so powerful is that the user part of a tuple doesn’t have to be an individual person. It can also be a userset —another group of users defined by an object-relation pair.Consider this example:
doc:readme#viewer@group:eng#member
This single tuple doesn’t just grant access to one person. It states that all members of the engineering group ( group:eng#member ) are now viewers of the “readme” document. The system can resolve who is in the eng group by looking up other relation tuples.This design choice has a profound implication. As the Zanzibar paper notes, “Defining our data model around tuples, instead of per-object ACLs, allows us to unify the concepts of ACLs and groups.” By representing all permissions as simple, declarative tuples and allowing these tuples to refer to other sets of users, Zanzibar’s data model unifies access control lists and group memberships into a single, elegant concept.
2. Defining the Rules: Namespaces and Userset Rewrites
While relation tuples are excellent for storing who has what permission, services like Google Drive need a way to define their own custom rules, such as “anyone who can edit a document can also view it.” This is where namespaces come in.Each service using Zanzibar, like Google Drive or YouTube, defines its permission logic within a namespace . A namespace configuration is a set of rules that defines the relations for a service’s objects (e.g., owner, editor, viewer) and, most importantly, how those relations interact.The most powerful feature of a namespace is the userset rewrite . This seemingly simple feature is the key to Zanzibar’s flexibility, as it moves complex permission logic from individual objects into a central, manageable configuration. These rules allow services to build complex policies from simple building blocks. The two primary types of rules are:
- Inheritance Between Relations: This rule allows one permission to automatically include another, handled by a computed_userset rule. For instance, a service can define a rule stating that anyone who is an editor of a document is also automatically a viewer . The viewer relation’s definition simply refers to the editor userset on the same object. This means the service doesn’t need to create a separate viewer tuple for every editor of every document; the relationship is defined once and inherited everywhere.
- Inheritance From Other Objects: This rule allows an object to inherit permissions from another object it is related to, using a mechanism called tuple_to_userset. The classic example is a document in a folder. A service can create a rule that says anyone who is a viewer of a folder is also automatically a viewer of any document inside it . The tuple_to_userset primitive allows the system to look up the parent folder of the document and inherit its viewers, elegantly handling permissions for entire hierarchies.Userset rewrites provide the flexibility for services to create rich, layered policies (e.g., editors are viewers, documents inherit permissions from folders) without the immense overhead of storing a massive number of individual permission tuples for every object.
3. The Cornerstone of Trust: Solving the “New Enemy” Problem
Having a flexible system for storing and defining rules is only half the battle. For an authorization system to be trustworthy, it must guarantee that these rules are applied correctly and consistently. This is not just a technical requirement; it’s fundamental to preserving user privacy and respecting their intentions. Zanzibar was explicitly designed to solve a critical consistency challenge known as the “new enemy” problem .This problem occurs when an access control system fails to respect the causal order between a user being removed from a permission list and new content being added. Here are two classic examples:
- Neglecting ACL update order: Alice removes her ex-colleague, Bob, from a shared folder’s access list. Immediately after, she adds a sensitive new document to that folder. If the system processes the document addition before it processes Bob’s removal, Bob might incorrectly gain access to the new document for a short time.
- Misapplying old ACLs to new content: Alice removes Bob from a document’s access list. She then adds a new, confidential paragraph to that same document. If a permission check for the new content is evaluated using a stale, outdated permission list from before Bob was removed, he might incorrectly see the new paragraph.Zanzibar’s solution to this is an elegant mechanism called a zookie . A zookie is not a complex timestamp but rather an “opaque consistency token”—think of it as a “snapshot ticket” that represents a specific point in time in the system’s history.The protocol works in three simple steps:
- When an application is about to save a content change, it first requests a zookie from Zanzibar. Zanzibar returns a token encoding a timestamp that is guaranteed to be later than any existing permission change, establishing a clear causal link.
- The application then saves this zookie alongside the new version of the content in its own storage.
- Later, when a user tries to access that content, the application sends the stored zookie along with its permission check request to Zanzibar.The zookie guarantees that Zanzibar will check permissions against a snapshot of the rules that is at least as new as the content itself . This elegantly solves the “new enemy” problem by ensuring that access decisions always respect the causal order of events, providing a rock-solid foundation of trust.
4. Conclusion: Flexibility, Correctness, and Scale
Zanzibar’s design demonstrates how a few well-designed concepts can be combined to solve an incredibly complex problem. By breaking down authorization into its fundamental components, the system achieves correctness, flexibility, and performance at a scale that is difficult to comprehend.Let’s briefly recap the three pillars we’ve explored:
- Relation Tuples: A simple and uniform object#relation@user data model provides the foundation for storing every permission fact and unifying ACLs with groups.
- Userset Rewrites: Flexible, server-side rules allow for powerful permission inheritance between relations and across objects, simplifying management for client services.
- Zookies: An opaque token mechanism provides a powerful consistency guarantee, ensuring that authorization decisions respect the causal order of events and solving the “new enemy” problem.The impact of this design is proven by its performance in production. Zanzibar stores over two trillion access control lists and handles more than 10 million client queries per second (QPS) . It accomplishes this while maintaining a 95th-percentile latency of less than 10 milliseconds and achieving greater than 99.999% availability .Zanzibar’s principled design demonstrates how a few powerful, composable concepts can solve a monumental challenge, forming the invisible and unwavering foundation of trust for the Google products billions of us rely on every day.
Discover more from OpenSaaS
Subscribe to get the latest posts sent to your email.