blog

SwiftUI's diffing algorithm

Rens Breur, August 18, 2022

If one thing sets SwiftUI apart from other declarative UI frameworks like React, it is its use of a static type system. The typing of views makes the diffing of views fast and unambiguous. How exactly does that work? And what more is involved in the diffing that SwiftUI has to perform on views?

Recently I have started SwiftTUI, which is a SwiftUI clone for text-based terminal applications. For implementing this, I needed to find out how the diffing in SwiftUI works, and here I present my findings.

Diffing and declarative UI

Declarative UI frameworks and diffing go hand in hand. With declarative UI programming, instead of directly modifying the UI elements displayed on screen, you modify some state. The UI framework will have to compute how this affects the views.

It cannot just replace the views in their entirety after state changed. Many views have child views that also have state. We need to know if a child view was removed or inserted, to see if we need to let go of state for this view, or create new state.

We also need to know which views are added or removed to update the container they are in. Stacks will want to layout their children again if their content changed, and a list might only act on an insertion or removal if the view in question is in the currently visible area. If it knows what changed, it can also animate those changes.

Diffing views

The problem with diffing views is that it can not only be inefficient, the changes can even be ambiguous.

Take a list of two labels:

→ List
  → Label("One")
  → Label("Two")

And say that it changed to the following list:

→ List
  → Label("Two")
  → Label("Three")

What has changed here? Maybe the label displaying "One" was removed from the beginning of the list, and the label displaying "Three" was added at the end. Or maybe the first label had its text changed from "One" into "Two", and the second label from "Two" into "Three". Both could be true.

A UI framework can decide to just favor one type of changes over the other. Or it can require UI elements to have identifiers so that it can keep them apart. React gives you the choice.

If you have a list of UI elements with identifiers, you need a shortest-edit-script algorithm like the Myers algorithm, that can find the smallest set of inserts and removals to get from one sequence to another.

For most changes, SwiftUI does not need such an algorithm. It also does not need identifiers for views to prevent ambiguity. This relies solely on the static typing of views.

One or many views

This is the official description of a View in SwiftUI:

A type that represents part of your app’s user interface and provides modifiers that you use to configure views.

There is something a bit misleading with this description. The description makes sense if we think of views like Text, Color or of shapes. But views like ForEach combine multiple other views. Those views are still SwiftUI views, but they represent multiple parts of an app's user interface.

A particularly confusing view in this sense is EmptyView. EmptyView is a view that does not represent any user interface element. If you put only an EmptyView in a List, the list will not have any cells.

Other interesting views are those created with view builders. View builders can have if-statements, so views built with them can represent a dynamic number of user interface elements, like ForEach.

Composed and primitive views

In code, View is defined by a protocol, for which the only requirement is that is has a body which is also a view.

protocol View {
  associatedtype Body: View
  var body: Body { get }
}

This cannot go on forever, there must be some views which are not defined in terms of other views. I will call those views primitive views. Normal views that are built out of other views are composed views. Primitive views have a body type Never and would crash if you were to call their body. They also have logic that is hidden from SwiftUI's public interface.

As clients of the SwiftUI framework, we can only create composed views. Composed views can have state, and can be seen as a function of this state to another view. Composed views use primitive views in their body. Primitive views are the building blocks of any type of view.

Diffing commonly starts with a composed view, when its body needs to be re-evaluated in response to some state change. SwiftUI then has an old and a new body, made of some combination of primitive and composed views. It needs to diff them, and respond to the changes. The type of the old and new body is always the same.

Diffing commonly also ends with composed views, which is a SwiftUI performance optimization. If the body of an invalidated view contains another composed view of which the properties haven't changed, SwiftUI does not need to evaluate the body of this view further, and is done.

SwiftUI thus mainly compares the remaining, primitive views to find out what changed, making them the most important in the diffing algorithm. Composed views trigger diffing and provide new bodies that need to be diffed. They can also prevent further diffing. But the primitive views need to be compared, to find out what the changes were.

Categories of primitive views

The fact that a view represents a number of user interface elements, does not necessarily mean that all those elements are there when the view is on screen. Lazy views, such as List, only fetch new user interface elements from underlying views if they are or will be inside the scrolling area. We can call the user interface elements that a view can generate displayables. A SwiftUI view can have zero or more dispayables.

When it comes to diffing, we can put the primitive views that SwiftUI provides into four categories:

You can find the types of the structural views that view builders create in the documentation. (As a recap: Multiple statements are combined into a TupleView. if-statements without else create an optional views, which are views themselves. if-statements with else become _ConditionalViews.)

Container views take the displayables of the view they wrap and put them on screen. I will call a displayable that is rendered a graphic. HStack and VStack always make graphics for all the displayables in a view and lay them out. Other containers, such as List are lazy and do not immediately turn all displayables into graphics. Container views present themselves as views with a single displayable again to views higher up in the hierarchy.

Modifiers applied to views create ModifiedContent views. Modifiers apply an effect to all the displayables of another view individually. A .border modifier used on a TupleView will put a border graphic on top of all the displayables of the TupleView. This means modifiers can have displayables that are put on screen multiple times, so a single displayable is turned into multiple graphics.

The runtime view graph

Every view hierarchy contains primitive views, and every primitive view will be the combination of types of views of the categories listed above. Take this view, the body of which only has primitive views.

struct MyView: View {
  @State var showSecret = false

  var body: some View {
    VStack {
      Group {
        Button("Show secret") { showSecret.toggle() }
        if showSecret {
          Text("Secret")
        } else {
          Color.red
        }
      }
      .padding()
    }
  }
}

SwiftUI will need to keep track of the view hierarchy of an app at runtime, in a structure that I will call the view graph. The view graph stores references to the View that they were built for, and it can store state for the views.

For an app that uses MyView, the part of the view graph associated with MyView will look like this:

I have colored the nodes in this view graph by the category of their view.

View graphs for primitive views look very similar in structure to initialized View structures themselves. But they are not exactly the same. Composed views have a child view graph node for their body. Views underneath lazy container views might not have been evaluated completely.

Very interesting in this sense are the overlay and background modifiers. A view that is used as an overlay or background, has seperate state for every graphic it is used upon. The overlay and background modifier view graph nodes will thus have as many extra child nodes as they have graphics.

View graph dynamics

View graphs are not static, and a major part in the diffing that SwiftUI does will be to find out how to update the view graph. In the example of MyView, if showSecret is changed, the Text underneath the _ConditionalView node in the view graph is replaced by a Color node.

The structure of views graphs can only change in a limited number of ways:

  1. A _ConditionalView changes from showing its TrueContent to its FalseContent or the other way around. This is commonly the result of a boolean condition in a composed view changing such that the body of the view uses a different branch of an if statement.
  2. An Optional<View> changes from showing nothing to showing something, or the other way around. This is commonly the result of a boolean condition in a composed view changing such that the body of the view uses the content inside an if-statement or not.
  3. The data of a ForEach view changes, so that it needs to update its children. The type of the children cannot change.
  4. The type of the content of an AnyView changes. The old content is removed from screen, and the new content is added.

In all four cases, entire view subgraphs are added or removed from underneath the view graph nodes. The view graph node relationships cannot be changed in any other way. You can never add or remove a view modifier. You can also never move a view to a different parent.

The layout tree

Not all the views in a view graph play a role in layout, and the ones that do can play different roles. We can visualize the displayables of MyView in the following layout tree:

Because the view graph is the representation of all the views at runtime, SwiftUI will need to be able to build a layout tree from the view graph. During diffing, it will need to compute how the displayables in the layout tree change.

Notice how the view graph and layout tree are related, but different. Structural views do not have any displayable of their own, and are absent in the layout tree. Modifiers are parents of a structural view in the view graph, but parents of every displayable of the unary views in the layout tree.

There are multiple ways in which the graphics can take shape. SwiftUI can create a UIView or a CALayer to display them. But although some views, such as HStack and VStack participate in layout, they don't get their own UIView or CALayer.

The diffing algorithm

SwiftUI needs to perform diffing to find out how to change the view graph and how to update the layout tree. Primitive views play the most important role, and there are only a small number of structural views underneath which a view graph can change. Let us take a closer look at how SwiftUI can use static types to do this.

We know that a SwiftUI view can generate a list of displayables, so we can create our own View protocol where we make this explicit. For the diffing, we will also give it an update function, so that the view can help us decide which view graph and layout tree changes to make.

protocol View {
  associatedtype Body: View
  var body: Body { get }

  var displayables: [Displayable] { get }
  func update(using view: Self, offset: Int)
}

The computed variable displayables in the View protocol gives us all the displayables in the view as an array. Similarly, SwiftUI will have a way to get all displayable items out of a View, but this is not public. Container views, that are present in the layout tree, will get the displayables out of the view they wrap. The containers themselves will have a displayables of length 1.

The update function in our view protocol is used for diffing. It compares the current view to a new view of the same type. It should be able to tell us which view changed so that we can update the view graph, and how the displayables changed so that we can update the container views.

We use the offset parameter for the index that the first displayable has in the displayables list of the current container. We need this, because the view on which update is called, might not be the first view inside a container.

For composed views, we will provide a default implementation of displayables and update that forwards the calls to the body. SwiftUI avoids calling the body when it isn't necessary, but this is only an optimization and doesn't affect the diffing.

extension View {
  var displayables: [Displayable] { body.displayables }
  func update(using view: Self, offset: Int) {
    body.update(using: view.body, offset: offset)
  }
}

To make primitive views work, with a body of type Never, we also need to make Never itself conform to View.

extension Never: View {
  var body: some View {
    fatalError()
  }
}

We know that a Displayable is part of the layout tree, and that it can generate a graphic. But the internals of Displayable aren't relevant either for the SwiftUI diffing algorithm right now.

protocol Displayable {}

Using our View protocol, we will now re-define a number of primitive views from SwiftUI. We would then like to diff two views of the same type, and see if we can find out which views are added or removed from their view graph, and which displayables were added to or removed from their container in the layout tree.

We would also like to see how the types of the views are helping us when comparing them, and why using them is better than comparing the displayables of type [Displayable] directly.

Alongside making some primitive views, we will also make a view builder so that we can use the same nice syntax SwiftUI provides.

@resultBuilder
struct ViewBuilder {
    // This is the only required function
    static func buildBlock<Content: View>(_ content: Content) -> Content {
      content
    }
}

Text

Let us first create just one single unary view, a Text. This view will produce a single displayable, I will say that this is a view with a size of 1.

struct Text: View {
  let text: String
  init(_ text: String) {
    self.text = text
  }
  var body: Never { fatalError() }
}
extension Text {
  var displayables: [Displayable] {
    [TextDisplayable(text: text)]
  }
}
struct TextDisplayable: Displayable {
  let text: String
}

Finding out what changed in the update function is also simple. The string value used inside the text either changed, or it didn't.

extension Text {
  func update(using view: Text, offset: Int) {
    if self.text != view.text {
      print("Changed displayable at \(offset)")
    }
  }
}

TupleView

The easiest View with a size bigger then 1 is probably the TupleView. Like SwiftUI, we will also create one if you put two views into a ViewBuilder. For now, our TupleView will only support tuples of length 2.

struct TupleView<C0: View, C1: View>: View {
  let content: (C0, C1)
  var body: Never { fatalError() }
}
extension ViewBuilder {
  static func buildBlock<C0: View, C1: View>(_ c0: C0, _ c1: C1) -> TupleView<C0, C1> {
    TupleView(content: (c0, c1))
  }
}

A tuple view with a tuple of length 2 takes takes two other views, and combines their displayables.

extension TupleView {
  var displayables: [Displayable] {
    content.0.displayables + content.1.displayables
  }
}

Tuple views are fixed, there is nothing inside this view that can change. Only its children can change.

extension TupleView {
  func update(using view: TupleView<C0, C1>, offset: Int) {
    content.0.update(using: view.content.0, offset: offset)
    content.1.update(using: view.content.1, offset: offset + view.content.0.displayables.count)
  }
}

Do you know what the size of a tuple view of length 2 is? Trick question, because it is not necessarily 2. Rather, it is the size of the views in the tuple combined. TupleView<Text, Text> has size 2, but TupleView<TupleView<Text, Text>, Text> has size 3.

_ConditionalView

For a more interesting view, let us create _ConditionalView, and the view builder functions that allow us to make them by writing if-statements.

struct _ConditionalView<TrueContent: View, FalseContent: View>: View {
  enum ConditionalContent {
    case first(TrueContent)
    case second(FalseContent)
  }

  let content: ConditionalContent

  var body: Never { fatalError() }
}
extension ViewBuilder {
  static func buildEither<TrueContent: View, FalseContent: View>(first: TrueContent) -> _ConditionalView<TrueContent, FalseContent> {
    _ConditionalView(content: .first(first))
  }

  static func buildEither<TrueContent: View, FalseContent: View>(second: FalseContent) -> _ConditionalView<TrueContent, FalseContent> {
    _ConditionalView(content: .second(second))
  }
}
extension _ConditionalView {
  var displayables: [Displayable] {
    switch content {
    case .first(let content):
      return content.displayables
    case .second(let content):
      return content.displayables
    }
  }
}

The conditional view is more interesting, because it is the first view we look at, where content can be added to or removed from screen. If the content didn't change from TrueContent to FalseContent or the other way around, only the child of _ConditionalView can have changed, and we need to update it similarly to how we did it for TupleView. But if the content did change, we remove the old items from screen, and add the new items.

extension _ConditionalView {
  func update(using view: _ConditionalView<TrueContent, FalseContent>, offset: Int) {
    switch (content, view.content) {
    case (.first(let oldValue), .first(let newValue)):
      // Content didn't change, but the child may have changed
      oldValue.update(using: newValue, offset: offset)

    case (.second(let oldValue), .second(let newValue)):
      // Content didn't change, but the child may have changed
      oldValue.update(using: newValue, offset: offset)

    case (.first(let oldValue), .second(let newValue)):
      // Content was replaced!
      for i in 0 ..< oldValue.displayables.count {
        print("Removed displayable at \(offset + i)")
      }
      for i in 0 ..< newValue.displayables.count {
        print("Inserted displayable at \(offset + i)")
      }

    case (.second(let oldValue), .first(let newValue)):
      // Content was replaced!
      for i in 0 ..< oldValue.displayables.count {
        print("Removed displayable at \(offset + i)")
      }
      for i in 0 ..< newValue.displayables.count {
        print("Inserted displayable at \(offset + i)")
      }
    }
  }
}

We can write the structural views Group, Optional where Wrapped: View, and EmptyView in similar ways. We will look into ForEach later, but let's take what we already have out for a spin!

struct TestList: View {
  var value: Bool

  @ViewBuilder
  var body: some View {
    Text("One")
    if (value) {
      Text("Two")
    } else {
      Text("Three")
    }
  }
}

let view1 = TestList(value: true)
print(view1.body.displayables)
[TextDisplayable(text: "One"), TextDisplayable(text: "Two")]
let view2 = TestList(value: false)
view1.body.update(using: view2.body, offset: 0)
Removed displayable at 1
Inserted displayable at 1

The code we wrote found out that a view was replaced, and could tell how the displayables should change in the container we are currently inside.

Let us change the experiment slightly.

struct OtherTestList: View {
  var value: Bool

  @ViewBuilder
  var body: some View {
    Text("One")
    Text(value ? "Two" : "Three")
  }
}

let view1 = OtherTestList(value: true)
print(view1.body.displayables)
[TextDisplayable(text: "One"), TextDisplayable(text: "Two")]
let view2 = OtherTestList(value: false)
view1.body.update(using: view2.body, offset: 0)
Changed displayable at 1

Remember the example in the introduction where it was ambiguous what the changes in a list of labels was? This is the same example! But the algorithm we created knows exactly whether a label's text has changed, or whether it was replaced with another label. It was able to do that, by using static typing, and by making the view types themselves help determine what has changed.

We now know how to get a list of displayables from a view. We also know what changed if we are given a view of the same type. A container view, like a List, would read the displayables of the view it wraps, and register itself for change notifications of those displayables. You can see how it could use a UITableView with that.

Applying the changes

In the View protocol in the example above, views compare other instances of the same type to themselves. Like that, we could find out which views are added, removed or changed. We could also find what changed in the current container view, and thus what the changes to the layout tree are.

But the update function did not have any references to the view graph nodes, nor to the current container, so apart from printing what needs to change, we can't actually apply the changes. To make the View protocol internal functions closer to ones that could actually work, we would have to add those references.

We can add a reference to the current view graph node as a parameter to the update function. That way, when we see that a view is inserted or removed, we can also create or remove a view graph node.

The reference to the current container can be added to the view graph node. When we see that a view was inserted or removed, we can then inform the current container of the change with the offsets of the associated displayables.

View graph nodes of unary views would also have to hold on to the displayable. That way, it can change it later. If the text inside a Text changes, we need to update the displayable as well. If a modifier changes, we need to get all the displayables under its view graph node and update them.

ForEach

We didn't discuss ForEach yet, because unfortunately, ForEach is different. We were able to use the types of views to see how the bodies of views change, but there is no way for us to tell how the data inside a ForEach changed. That means, we cannot prevent the ambiguity in the same way we did it for other structural views.

But it also makes sense: this is why ForEach requires the data it uses to be identifiable. And it means we need a shortest-edit-script algorithm. It seems that SwiftUI is not using it, but Swift even has a such a diffing algorithm built in.

Laziness and the size of views

Laziness might not be directly associated with SwiftUI's diffing algorithm, but we have explored the view graph and containers already, and it would be a shame not to discuss a very interesting property of lazy views. Also, it will give us a hint on how to make the offset parameter more efficient.

One of the lazy container views in SwiftUI is List. That is a nice view, because it internally uses a UITableView, which we know how to work with. You can even check that, by putting breakpoints on the insertRows and removeRows methods on UITableView.

Although a list is lazy, because it uses a table view, it does still need to know how many rows it will have to display, and what the offset is of elements that are added or removed. That doesn't seem to work well together with laziness. Can we know the size of a composed view without evaluating it?

Turns out, in many cases we can, just by knowing the type of the body of a composed view. For example, the size of a TupleView<(Text, Text, Button)> is always 3, no matter the particular instance. So a composed view where the body is a TupleView<(Text, Text, Button)> also always has size 3.

But you can't always know the size of a view just by its type. A _ConditionalView<EmptyView, Text> can have sizes 0 or 1. If a composed view has a body of this type, and it is used in a lazy stack or a list, we still have to evaluate it to determine its size.

We can add a static computed size variable to our View protocol for this. A composed view will forward the call to the Body type. Primitive types can either return a size if that size is static, or nil if the size is dynamic. If it is nil, we cannot determine the size just by looking at the type.

You can confirm that this is how it works in SwiftUI. Let's say that you have a view like this:

struct ContentView: some View {
  var body: some View {
    List {
      ForEach(0 ..< 1000, id: \.self) { i in
        ListItems(i)
      }
    }
  }
}

If ListItems has a statically sized body type, such as Text or TupleView<(Text, Text, Button)>, the body of your view is only called for views that are about to be displayed on screen. But if you put a composed view with a dynamically sized body in it, for example because you wrote an if-statement in the body, all the bodies are called immediately.

Going deeper

We have explored how the static view types in SwiftUI can be used to perform efficient and unambiguous diffing. If you want to know even more about how this part of SwiftUI could be implemented, check the SwiftTUI source code.

Using what we explored, we can explain some of the behavior of SwiftUI. We see why it is not possible to move a view to another place, by moving the view to a different parent. We also see why we cannot add or remove modifiers to views.

SwiftUI seems magical because it is closed-source and because its internals are not described. But by investigating concrete examples of views, we can find out many of its principles. And by using features of the Swift language, we can re-create many parts of it. I hope that like me, it lets you appreciate the framework even more, understand it better, and make even better use of it.