Google stores virtually all of its code in a single version-controlled repository containing over 2 billion lines of code across 9 million source files. The repository is not a chaotic dump. It is maintainable because Google built and enforces a specific set of tooling, processes, and engineering culture around it that does not exist in most organizations.
Analysis Briefing
- Topic: Monorepo at Google scale: tooling, code review, and dependency management
- Analyst: Mike D (@MrComputerScience)
- Context: What started as a quick question to Claude Sonnet 4.6 became this
- Source: Pithy Cyborg | AI News Made Simple
- Key Question: What does Google actually have to build to make a single repository for all code not collapse under its own weight?
Piper and CitC: The Version Control Infrastructure
Git does not scale to a 2-billion-line repository. Cloning it would take days. Status checks would take minutes. Google built its own version control system called Piper, which runs as a distributed service on Google’s infrastructure.
Developers do not clone the entire repository. They use Clients in the Cloud (CitC), a virtual file system that gives each developer a workspace that appears to contain the full repository but only materializes files on access. A developer editing a file in //google3/search/ranking/features/ sees that file and its neighbors without having all 9 million other files present locally. CitC stores workspace changes as a sparse overlay on top of the repository.
This approach removes the fundamental impediment to monorepo adoption at large organizations: you cannot check out 2 billion lines of code. With CitC, you never need to. The engineering investment required to build this infrastructure is substantial. Organizations adopting monorepos without equivalent tooling (Bazel but not CitC, for example) will hit scaling walls that Google’s approach avoids.
Bazel: Hermetic, Reproducible, Incremental Builds
Google open-sourced Bazel in 2015. Internally they call it Blaze. It is a build system designed for the specific constraints of a monorepo at Google scale.
Bazel builds are hermetic: every build action runs in a sandbox with only its declared inputs. A build that succeeds at commit X will produce identical output at commit X on any machine, one year later. This eliminates the “works on my machine” problem entirely.
Bazel builds are incremental: only targets affected by a change are rebuilt. In a 9-million-file repository, rebuilding everything on every change is not feasible. Bazel’s dependency graph tracks which targets depend on which other targets. A change to a utility library rebuilds only the targets that transitively depend on it.
Remote caching means that if any machine in Google’s build infrastructure has already built a target with the same inputs, the output is served from cache rather than recomputed. A developer making a change to a low-level library and running tests for dependent code does not wait for all those tests to compile from scratch.
# Example Bazel BUILD file
py_library(
name = "feature_store",
srcs = ["feature_store.py"],
deps = [
"//google3/learning/core:data_utils",
"@pypi//redis",
],
)
py_test(
name = "feature_store_test",
srcs = ["feature_store_test.py"],
deps = [":feature_store"],
)
Explicit dependency declarations in BUILD files mean that dependency graphs are always known. There are no implicit or transitive dependencies that can silently change. Diamond dependency problems (where A depends on B and C which both depend on different versions of D) are structurally impossible because the monorepo has only one version of every library.
Code Ownership, Readability Review, and the Culture of Maintenance
Technical tooling is necessary but not sufficient. Google’s monorepo works because of cultural and process systems that accompany it.
OWNERS files designate the people who must approve changes to specific directories. A change to //google3/search/core/ requires approval from the owners of that directory. This creates accountability without requiring every change to go through a central bottleneck. Owners are responsible for the quality and health of code in their directory.
Readability review is a cultural practice unique to Google. To have code submitted in a given language, an engineer must receive a “readability” certification from an experienced reviewer in that language. This enforces idiomatic, consistent code style across the entire repository for every language Google uses internally.
Large-scale changes (LSCs) are automated refactors that touch thousands or millions of files. When an API is deprecated or a build target is renamed, Google’s tooling generates the changes, and a single approval covers the entire refactor rather than requiring thousands of individual reviews. This is how the entire repository can be migrated from one API version to another without years of dual-maintenance.
What This Means For You
- Do not adopt a monorepo without investing in build tooling first, because a monorepo without fast, incremental, hermetic builds becomes slower and more painful to work with every month as it grows.
- Use Bazel or Nx for large TypeScript/JavaScript monorepos rather than relying on npm workspaces alone, because workspace tools handle package management but do not provide the incremental build and remote caching that make large monorepos practical.
- Implement OWNERS files or their equivalent immediately when adopting a monorepo, because without explicit ownership every directory becomes a shared ownership directory where no one feels responsible for quality.
- Plan for large-scale automated refactors as a first-class engineering practice rather than a one-time migration event, because a monorepo’s primary benefit (atomic cross-codebase changes) is only realized when you build the tooling to make those changes safely.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg | AI News Made Simple → AI news made simple without hype.
