blog.calder.dev

I've been hacking away at some work work today in Go, and I've come to a startling, albeit not particularly surprising realization about safety and mutability of slices.

Consider the following trivial example:

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[3:7]
// a: []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
// b: []int{3, 4, 5, 6}

We start with a slice a, then we construct b as a subslice over a. No allocations occur (ignoring the slice header), and none are needed.

But what about if we start mutating b?

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[3:7]
b[5] = 42
// a: []int{0, 1, 2, 3, 4, 42, 6, 7, 8, 9}
// b: []int{3, 4, 42, 6}

Okay, fine, I was asking for it. This is well-known and reasonable, since mutating an element in an existing slice definitely should not reallocate anything in the slice. Mutating an element of b thus has the side effect of mutating a as well

But what if we try to append instead?

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[3:7]
c := append(b, 42)
// a: []int{0, 1, 2, 3, 4, 5, 6, 42, 8, 9}
// b: []int{3, 4, 5, 6}
// c: []int{3, 4, 5, 6, 42}

sigh... Ouch, this one hurts a bit more. So if we append to a slice which is itself a subslice of another slice, the newly-appended value mutates the original slice.

I don't like this, but I don't really see another reasonable option, either. Since append() can result in an allocation, it would make some sense that for consistency/safety, rather than mutating an array which is shared by multiple slices, the append() just reallocated a new slice. However, I don't see a way this is possible without the runtime the compiler being able to check whether the original slice will ever be used again (which some languages do, but Go does not). Checking this at runtime would be probably impossible in Go, and definitely cursed.

What if we try to append a slice of items instead of just one item?

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[3:7]
c := append(b, []int{11, 12, 13}...)
// a: []int{0, 1, 2, 3, 4, 5, 6, 11, 12, 13}
// b: []int{3, 4, 5, 6}
// c: []int{3, 4, 5, 6, 11, 12, 13}

More of the same, it seems. At least it's consistent....

One last try: what happens if we append a slightly longer slice?

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[3:7]
c := append(b, []int{0, 1, 1, 2}...)
// a: []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
// b: []int{3, 4, 5, 6}
// c: []int{3, 4, 5, 6, 0, 1, 1, 2}

muffled screaming

This is upsetting to me. Here, we've appended a slice which would extend beyond the capacity of original slice, and as a result, rather than overwriting elements of the original slice, it allocates a new slice and leaves the original untouched.

This means that to know whether an append() will cause some other slice to be mutated, you need to know the capacity of the slice to which you're appending (as is always the case), and to know whether the array which backs the slice to which you're appending is referenced by any other slices.

The cap() function does exist to tell you the capacity of a slice, but in the Go code I've seen, it's rarely if ever used. And as far as I know, the Go language doesn't expose any way to identify whether an operation (append() or directly mutating some slice) will mutate memory referenced from some other variable, and if so, if that variable will be used again in the future.

The bigger picture

In short, the possibility of mutating one slice by appending to another feels like a pretty big footgun to me. If the only safe way to use append() is to explicitly copy() or check cap(), then it seems like the compiler should enforce this in some way.

None of this is really a revelation, it's more of an acknowledgement of reality or an invitation to check myself when I start to feel a bit too comfortable with Go's “ease-of-use”.

After writing Go for the past two years, it's increasingly clear to me that Go is excellent for simple code that's easy to write and read. And big projects which insist on that simplicity, potentially at the expense of optimal performance, can be exceptionally pleasant to maintain and onboard new contributors.

But performance optimizations can cause Go code to grow in complexity more quickly than other languages, in my experience. Part of this is because Go feels like a small language, with a simple syntax and basically all data structures built from structs, maps, and slices. If you need something better-optimized for your use case, you're probably going to need to build it yourself. Which can be fun, but can also end up messy.

But another piece is that safe performance optimizations often rely on enforcing and then relying on constraints and assumptions around data. For example, assuming a slice is pre-sorted, or ensuring data can/cannot be mutated or referenced elsewhere. Compared to other modern languages, Go provides fewer facilities to encode these constraints and assumptions aside from comments, which are the responsibility of the programmer to read and enforce.

Nightmare fuel

What really scares me, though, is that the compiler might make optimizations which happen to break assumptions which the programmer thought they put in place.

For example, the Go compiler doesn't expose a way to grow the size of a slice (akin to realloc in C), and this functionality was only recently added to the Go standard library. The canonical way to grow a slice s has been the following (as is used in slices.Grow):

s := []int{1, 2, 3}
growBy := 20
s = append(s[:cap(s)], make([]int, growBy)...)[:len(s)]

This looks like it would make two allocations: one for the inner make(), then one for the append(), since we're deliberately appending beyond the capacity of the slice. However, as noted in the source code only one allocation occurs. The Go compiler is clever.

What I'm afraid of is that compiler might get clever in a situation like this:

var a []int
b := []int{1, 2, 3}
c := append(a, b...)

If the compiler decided that since a didn't point to any array on disk yet, it could just point c to the existing b, but the programmer was relying on changes to c never mutating b, this would be a huge problem.

Thankfully, the Go compiler doesn't do this (and probably never will). The zero value of a slice has a capacity of 0, so any append() to it should ensure an allocation, and never reuse the slice which is being appended.

Sometimes Go makes me second guess things like that, though.

Go can be a nice blend of the “simplicity” of Python and the... “simplicity” of C. More often then not, I think it's an excellent tool for the job. But from time to time, it can make me feel like https://duckduckgo.com/?q=i'm+in+danger+meme

This is the first entry in a new blog I'm hosting with writefreely. I'm currently running it in a lxd container on my homelab, though I'm interested in packaging it as a snap. Doing so would make it easy to install on Linux without the need to manual unpack the release tarball, with automatic updates, and with security sandboxing without the complexity/overhead of running in a container or VM.

I plan to write up my experience in creating the snap here on blog.calder.dev, and I'll switch over to using the snap instead of a manual install.

After that, I have some fun things in the works...

Plans (in no particular order):

  • Write up a complete guide for setting up Nextcloud on a home server, from blank hardware to full configuration, reverse proxy, DNS/network settings, and SSL certs
  • Migrate my home server to use ZFS for bulk data storage, and figure out whether ZFS storage pools for snap data or running everything through lxd is more convenient (and write it up, of course)
  • Set up a high-availability MicroCloud cluster on the trio of Dell Wyse 5070 thin clients I just bought for cheap, and migrate this blog and some other services to them to run (and write up the process here)
  • Finish the MVP for a Rust project I've been working on for my homelab (and maybe yours, too) and write about it here

So stay tuned, I hope to have more for you soon!