False sharing is an insidious problem for multithreaded programs
running on multicore processors, where it can silently degrade
performance and scalability. Previous tools for detecting false
sharing are severly limited: they cannot distinguish false sharing
from true sharing, have high false positive rates, and provide limited
assistance to help programmers locate and resolve false sharing.
This paper presents two tools that attack the problem of false
sharing: Sheriff-Detect and Sheriff-Protect. Both tools leverage a
framework we introduce here called Sheriff. Sheriff breaks out
threads into separate processes, and exposes an API that allows
programs to perform per-thread memory isolation and tracking on a
per-page basis. We believe Sheriff is of independent interest.
Sheriff-Detect finds instances of false sharing by comparing updates within
the same cache lines by different threads, and uses sampling to rank
them by performance impact. Sheriff-Detect is precise (no false
positives), runs with low overhead (on average, 20%), and is
accurate, pinpointing the exact objects involved in false sharing. We
present a case study demonstrating Sheriff-Detect's effectiveness at
locating false sharing in a variety of benchmarks.
Rewriting a program to fix false sharing can be infeasible when source
is unavailable, or undesirable when padding objects would
unacceptably increase memory consumption or further worsen runtime
performance. Sheriff-Protect mitigates false sharing by adaptively
isolating shared updates from different threads into separate physical
addresses, effectively eliminating most of the performance impact of
false sharing. We show that Sheriff-Protect can improve
performance for programs with catastrophic false sharing by up to
9X, without programmer intervention.