Skip to content

sync: RWMutex scales poorly with CPU count #17973

Open
@bcmills

Description

@bcmills

On a machine with many cores, the performance of sync.RWMutex.R{Lock,Unlock} degrades dramatically as GOMAXPROCS increases.

This test program:

package benchmarks_test

import (
	"fmt"
	"sync"
	"testing"
)

func BenchmarkRWMutex(b *testing.B) {
	for ng := 1; ng <= 256; ng <<= 2 {
		b.Run(fmt.Sprint(ng), func(b *testing.B) {
			var mu sync.RWMutex
			mu.Lock()

			var wg sync.WaitGroup
			wg.Add(ng)

			n := b.N
			quota := n / ng

			for g := ng; g > 0; g-- {
				if g == 1 {
					quota = n
				}

				go func(quota int) {
					for i := 0; i < quota; i++ {
						mu.RLock()
						mu.RUnlock()
					}
					wg.Done()
				}(quota)

				n -= quota
			}

			if n != 0 {
				b.Fatalf("Incorrect quota assignments: %v remaining", n)
			}

			b.StartTimer()
			mu.Unlock()
			wg.Wait()
			b.StopTimer()
		})
	}
}

degrades by a factor of 8x as it saturates threads and cores, presumably due to cache contention on &rw.readerCount:

# ./benchmarks.test -test.bench . -test.cpu 1,4,16,64
testing: warning: no tests to run
BenchmarkRWMutex/1      20000000                72.6 ns/op
BenchmarkRWMutex/1-4    20000000                72.4 ns/op
BenchmarkRWMutex/1-16   20000000                72.8 ns/op
BenchmarkRWMutex/1-64   20000000                72.5 ns/op
BenchmarkRWMutex/4      20000000                72.6 ns/op
BenchmarkRWMutex/4-4    20000000               105 ns/op
BenchmarkRWMutex/4-16   10000000               130 ns/op
BenchmarkRWMutex/4-64   20000000               160 ns/op
BenchmarkRWMutex/16     20000000                72.4 ns/op
BenchmarkRWMutex/16-4   10000000               125 ns/op
BenchmarkRWMutex/16-16  10000000               263 ns/op
BenchmarkRWMutex/16-64   5000000               287 ns/op
BenchmarkRWMutex/64     20000000                72.6 ns/op
BenchmarkRWMutex/64-4   10000000               137 ns/op
BenchmarkRWMutex/64-16   5000000               306 ns/op
BenchmarkRWMutex/64-64   3000000               517 ns/op
BenchmarkRWMutex/256                    20000000                72.4 ns/op
BenchmarkRWMutex/256-4                  20000000               137 ns/op
BenchmarkRWMutex/256-16                  5000000               280 ns/op
BenchmarkRWMutex/256-64                  3000000               602 ns/op
PASS

A "control" test, calling a no-op function instead of RWMutex methods, displays no such degradation: the problem does not appear to be due to runtime scheduling overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions