Understanding Zanzibar: How Google Manages Permissions for Billions of Users

When you share a photo album with a specific group of friends or grant a colleague “editor” access to a document, you take for granted that only the right people can see or modify it. This simple act of sharing is powered by a complex system of permissions working seamlessly behind the scenes.Now, imagine this challenge at the scale of Google. Services like Google Drive, Photos, and YouTube must manage permissions for billions of objects shared among billions of users. A unified authorization system is essential, as it establishes consistent user experience across applications, simplifies interoperability, and allows common infrastructure to be built on top. How can such a system ensure that every single access request—millions per second—is answered correctly, flexibly, and almost instantaneously? This is the immense authorization challenge that Google solved with Zanzibar: a single, global system for managing access control.This article will demystify the core concepts behind Zanzibar. We will explore its elegant data model, its flexible configuration language, and the clever consistency guarantees that make it a cornerstone of trust for Google’s services. To solve this immense challenge, Zanzibar’s design begins with a single, foundational concept: the relation tuple.

1. The Core Building Block: Relation Tuples

At the heart of Zanzibar, every permission and relationship is stored as a fundamental unit of data called a  relation tuple . This is a simple, three-part statement that clearly defines a single fact about who can do what with a given object. The genius of this approach lies in its uniform structure, which is the key that unlocks the system’s ability to treat permissions and group memberships as the same underlying concept.The structure of a relation tuple is straightforward and can be understood as a simple sentence:object  #  relation  @  user

  • Object:  The “thing” you want to protect (e.g., a specific document, doc:readme).
  • Relation:  The “permission” or relationship a user has with the object (e.g., owner, viewer, member).
  • User:  The “who” that has the permission (e.g., a specific user, user:10).To make this concrete, here are a few examples of how these tuples translate into plain English:| Example Tuple | Simple English Explanation || —— | —— || doc:readme#owner@10 | User 10 is an  owner  of the document “readme”. || group:eng#member@11 | User 11 is a  member  of the engineering group. |

A key feature that makes this model so powerful is that the user part of a tuple doesn’t have to be an individual person. It can also be a  userset —another group of users defined by an object-relation pair.Consider this example:

doc:readme#viewer@group:eng#member

This single tuple doesn’t just grant access to one person. It states that  all members of the engineering group (  group:eng#member  )  are now viewers of the “readme” document. The system can resolve who is in the eng group by looking up other relation tuples.This design choice has a profound implication. As the Zanzibar paper notes, “Defining our data model around tuples, instead of per-object ACLs, allows us to unify the concepts of ACLs and groups.” By representing all permissions as simple, declarative tuples and allowing these tuples to refer to other sets of users, Zanzibar’s data model unifies access control lists and group memberships into a single, elegant concept.

2. Defining the Rules: Namespaces and Userset Rewrites

While relation tuples are excellent for storing  who  has  what  permission, services like Google Drive need a way to define their own custom rules, such as “anyone who can edit a document can also view it.” This is where namespaces come in.Each service using Zanzibar, like Google Drive or YouTube, defines its permission logic within a  namespace . A namespace configuration is a set of rules that defines the relations for a service’s objects (e.g., owner, editor, viewer) and, most importantly, how those relations interact.The most powerful feature of a namespace is the  userset rewrite . This seemingly simple feature is the key to Zanzibar’s flexibility, as it moves complex permission logic from individual objects into a central, manageable configuration. These rules allow services to build complex policies from simple building blocks. The two primary types of rules are:

  • Inheritance Between Relations:  This rule allows one permission to automatically include another, handled by a computed_userset rule. For instance, a service can define a rule stating that anyone who is an editor of a document is  also automatically a  viewer . The viewer relation’s definition simply refers to the editor userset on the same object. This means the service doesn’t need to create a separate viewer tuple for every editor of every document; the relationship is defined once and inherited everywhere.
  • Inheritance From Other Objects:  This rule allows an object to inherit permissions from another object it is related to, using a mechanism called tuple_to_userset. The classic example is a document in a folder. A service can create a rule that says anyone who is a viewer of a folder is  also automatically a  viewer  of any  document  inside it . The tuple_to_userset primitive allows the system to look up the parent folder of the document and inherit its viewers, elegantly handling permissions for entire hierarchies.Userset rewrites provide the flexibility for services to create rich, layered policies (e.g., editors are viewers, documents inherit permissions from folders) without the immense overhead of storing a massive number of individual permission tuples for every object.

3. The Cornerstone of Trust: Solving the “New Enemy” Problem

Having a flexible system for storing and defining rules is only half the battle. For an authorization system to be trustworthy, it must guarantee that these rules are applied correctly and consistently. This is not just a technical requirement; it’s fundamental to preserving user privacy and respecting their intentions. Zanzibar was explicitly designed to solve a critical consistency challenge known as the  “new enemy” problem .This problem occurs when an access control system fails to respect the causal order between a user being removed from a permission list and new content being added. Here are two classic examples:

  1. Neglecting ACL update order:  Alice removes her ex-colleague, Bob, from a shared folder’s access list. Immediately after, she adds a sensitive new document to that folder. If the system processes the document addition  before  it processes Bob’s removal, Bob might incorrectly gain access to the new document for a short time.
  2. Misapplying old ACLs to new content:  Alice removes Bob from a document’s access list. She then adds a new, confidential paragraph to that same document. If a permission check for the new content is evaluated using a stale, outdated permission list from before Bob was removed, he might incorrectly see the new paragraph.Zanzibar’s solution to this is an elegant mechanism called a  zookie . A zookie is not a complex timestamp but rather an “opaque consistency token”—think of it as a “snapshot ticket” that represents a specific point in time in the system’s history.The protocol works in three simple steps:
  3. When an application is about to save a content change, it first requests a zookie from Zanzibar. Zanzibar returns a token encoding a timestamp that is guaranteed to be  later  than any existing permission change, establishing a clear causal link.
  4. The application then saves this zookie alongside the new version of the content in its own storage.
  5. Later, when a user tries to access that content, the application sends the stored zookie along with its permission check request to Zanzibar.The zookie guarantees that Zanzibar will check permissions against a snapshot of the rules that is  at least as new as the content itself . This elegantly solves the “new enemy” problem by ensuring that access decisions always respect the causal order of events, providing a rock-solid foundation of trust.

4. Conclusion: Flexibility, Correctness, and Scale

Zanzibar’s design demonstrates how a few well-designed concepts can be combined to solve an incredibly complex problem. By breaking down authorization into its fundamental components, the system achieves correctness, flexibility, and performance at a scale that is difficult to comprehend.Let’s briefly recap the three pillars we’ve explored:

  • Relation Tuples:  A simple and uniform object#relation@user data model provides the foundation for storing every permission fact and unifying ACLs with groups.
  • Userset Rewrites:  Flexible, server-side rules allow for powerful permission inheritance between relations and across objects, simplifying management for client services.
  • Zookies:  An opaque token mechanism provides a powerful consistency guarantee, ensuring that authorization decisions respect the causal order of events and solving the “new enemy” problem.The impact of this design is proven by its performance in production. Zanzibar stores over  two trillion  access control lists and handles  more than 10 million client queries per second (QPS) . It accomplishes this while maintaining a 95th-percentile latency of  less than 10 milliseconds  and achieving greater than  99.999% availability .Zanzibar’s principled design demonstrates how a few powerful, composable concepts can solve a monumental challenge, forming the invisible and unwavering foundation of trust for the Google products billions of us rely on every day.

Discover more from OpenSaaS

Subscribe to get the latest posts sent to your email.

Leave a Reply