Making Quartz site rebuilds 25x faster

I. Introduction

This site was first built with the static site generator Quartz. I would write posts in markdown and Quartz would transform them into html pages. Like with many static site generators, its possible to set up Quartz to watch for changes to markdown files and rebuild the html on the flyCommonly known as “watch mode”. so I could see what my post looks like in the browser as I was writing markdown in an editor.

While writing this post, the process that watched for changes and rebuilt the site began to crash quite frequently, nearly every time I saved a change. Starting the build process again took more than 5 seconds to show the new page on my browser, which made it difficult to focus on writing. Even if it didn’t crash, the rebuild process took longer and longer to complete as I edited the same file. For example, this sentence took 7 seconds to appear on the page after I saved the file.

I wanted to fix this and decided to look at crashes first. Some investigation showed that Quartz crashed always because it was unable to delete some file in the output directory. Sadly, I couldn’t figure out why from looking at process monitor logs, so this was a dead end.

I did notice that the process nearly always crashed when trying to delete or write some unrelated file, like the site’s icon. That was strange, because the icon file shouldn’t need to change when I’m editing a blog post. After a look at the code, it turned out that Quartz’s build process would delete the entire output directory and write all the files again (including the icon), even if they wouldn’t have changed. Luckily both the crashing problem and the slow rebuild times could be fixed by implementing partial rebuilds — that is, rewriting only the files that are affected by an edit.

II. How Quartz builds sites

Before getting into partial rebuilds, we need a little background on Quartz’s architecture.

Quartz architecture

First, markdown files are parsed into a markdown Abstract Syntax Tree (AST), stored as a ProcessedFile data structure in memory.

Second, transformer plugins are applied. At this point, text -> text and markdown -> markdown transformations take place. For example, one transformation the ObsidianFlavoredMarkdown plugin implements is removing comments.

After these transformations, the build process converts the markdown AST into an html AST. It then applies the remaining transformer plugins for html -> html transformations, like adding syntax highlighting to code blocks.

Third, filter plugins exclude files from the build. We’ll ignore filter plugins from now on.

At this point, the build method has a list of ProcessedFiles corresponding to html content.

Fourth and last, emitter plugins write the html content to files on disk, copy assets like images & css, and create related files like an rss feed.

In watch mode, when a file is changed, only that file is reprocessed by the transformer and filter plugins into a ProcessedFile. The build method then deletes everything in the output directory, and resupplies all the ProcessedFiles to all the emitters. This means Quartz was writing files that didn’t need to be written because they were unchanged, like the site’s icon.

There’s a clear opportunity to speed up the build process: only call the emitters that need to run when a file changes.

III. Implementing partial rebuilds

One way of thinking about Quartz’s build process is a 2-step “compilation” pipeline. The first step in the pipeline compiles markdown into an html ASTAbstract Syntax Tree. in memory. In the second step, many compilers (i.e. the emitters) take the ASTs, or other source files, and produce output files.

Quartz "compilation" pipeline

In full rebuilds, step 1 is fast because only the changed file is parsed into html. But for step 2, Quartz runs all the compilers/emitters, even if they don’t use the changed file. Profiling the build functionsBy generously sprinkling performance.now() calls in the code. during a full rebuild shows indeed that the emitters are taking 99% (!) of the build time.

We can speed up the build process by only writing files that will change. To figure out which destination or output files will change in response to a change in a source file, we need to create a dependency graph.

Let’s take a simple example with three emitters. The ContentPage emitter takes a ProcessedFile and writes it to disk. The ContentIndex emitter takes all ProcessedFiles to build an rss feed. The Assets emitter copies images from the source folder to the destination folder. We can create separate dependency graphs for each emitter that look like this:

Dependency graphs

The nodes in the graph represent files, and the edges are dependencies between files. The leaf nodes on the right are the output files written to disk.

When a.md changes, we can look at the leaf nodes that are downstream of a.md in each emitter to figure out which output files will change — in this case, a.html from ContentPage and rss.xml from ContentGraph will change.

Dependency graphs with leaves highlighted

Notice that the ContentIndex emitter requires all the ProcessedFiles to make rss.xml. This makes sense: the RSS feed should contain all posts. Therefore, to find which files the emitter needs to make the output file, we compute the leaf node ancestors of the changed file, i.e. after getting the leaf node for a.md, we look back to see which files are needed to make the leaf node.

Dependency graphs with ancestors highlighted

With this, we can implement an algorithm to decide which emitters to run when a file changes:

  1. For each emitter, check to see if the file name is in its dependency graph. If it isn’t, then the emitter doesn’t depend on the file and we can skip calling the emitter.
  2. If the file is in the dependency graph, get the leaf node ancestor file names and get the corresponding ProcessedFiles.
  3. Run the emitter with the relevant ProcessedFiles.

Note the difference with full rebuilds — we don’t supply b.md to the ContentPage emitter, because b.html would not have changed; and we also avoid running the Assets emitter entirely.

Transclusions

Quartz was designed to be compatible with Obsidian markdown and so supports a feature called transclusions. A markdown file can include/transclude parts of another markdown file with the ![[file]] syntax. When a file changes, other files that transclude it should also be rebuilt. We can support this by having the dependency graph also track transclusionsIn practice, Quartz does this by parsing the file’s AST and looking for links to other markdown content..

Here’s an example where a.md includes all of b.md, i.e. a depends on b.

Dependency graph with transclusion

Since a includes content from b, when b.md changes, we must also rebuild a.html. The strategy of getting the leaf node ancestors of the changed file works here as well: we will return a.md & b.md, meaning we rebuild both files to get fresh versions of a.html and b.html.

As a bonus, if we expand this search to other relative links like images, we can also ensure the page is refreshed if embedded non-markdown content is changed. Quartz didn’t support this before in full rebuild mode because it only looked for changes to markdown files.

Lastly, we also need to update the dependency graph when a file adds or removes transclusions. I’ll skip over the details since it’s not that interesting — handling this requires merging two dependency graphs in a specific way.

Benchmarking

To measure the improvement, I set up a sample repository with 100 empty markdown files. I wrote a quick benchmark script that ran Quartz in watch mode and edited one of the files 50 times, waiting for each rebuild to complete before editing the file again. Excluding the initial build, the average time for a full rebuild was 2677ms and the average time for a partial rebuild was 104ms. This is a 25x speedup, or a 96% improvement.

I ran the benchmark on an ageing laptop running Windows 10. I didn’t test for transclusions, embedded content, or long markdown files — though I’d expect the speedup to be even greater in those cases since full rebuilds would be writing even more content to disk.

IV. Endnote

If you use Quartz, you can try partial rebuilds out for local development with npx quartz build --serve --fastRebuild.

Published