SMP Scalability Goal

If you can fill the unforgiving second
with sixty minutes worth of distance run,
“Highly scalable” your code will be reckoned,
And—which is more—you'll have parallel fun!

With apologies to Rudyard Kipling.

SMP Scalability Papers

  1. Linux-Kernel Memory Ordering: Help Arrives At Last!, with Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern, Linux Kernel Summit Track. (Additional litmus tests here.) November 2016.
  2. Linux-Kernel Memory Ordering: Help Arrives At Last!, with Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern, LinuxCon EU. October 2016.
  3. High-Performance and Scalable Updates: The Issaquah Challenge, guest lecture to the Distributed Operating Systems class at TU Dresden (video), June 2016.
  4. Practical Experience With Formal Verification Tools at Beaver BarCamp, Corvallis, Oregon USA, April 2016.
  5. Practical Experience With Formal Verification Tools, Verified Trustworthy Software Systems Specialist Meeting, April 2016.
  6. Linux-Kernel Community Validation Practices, The Royal Society Verified Trustworthy Software Systems Meeting, “Verification in Industry” discussion, April 2016.
  7. Formal Verification and Linux-Kernel Concurrency, guest lecture to the CS569 class at Oregon State University, June 2015.
  8. Formal Verification and Linux-Kernel Concurrency, guest lecture to the CS362 class at Oregon State University, June 2015. (AKA “what would have to happen for me to add formal verification to Linux-kernel RCU's regression test suite?”)
  9. High-Performance and Scalable Updates: The Issaquah Challenge, guest lecture to the Distributed Operating Systems class at TU Dresden (video), June 2015.
  10. Formal Verification and Linux-Kernel Concurrency at Beaver BarCamp, Corvallis, Oregon USA, April 2015.
  11. Creating scalable APIs, in Linux Weekly News, August 2014.
  12. High-Performance and Scalable Updates: The Issaquah Challenge at linux.conf.au in Auckland, January 2015.
  13. Bare-Metal Multicore Performance in a General-Purpose Operating System (Adventures in Ubiquity) at linux.conf.au in Auckland, January 2015.
  14. Use Cases for Thread-Local Storage ISO SC22 WG21 (C++ Language), November 2014. (revised N4376 2015-02-06).
  15. Linux-Kernel Memory Model ISO SC22 WG21 (C++ Language), November 2014. Official version: N4216 (revised N4374 2015-02-06).
  16. Axiomatic validation of memory barriers and atomic instructions, in Linux Weekly News, August 2014.
  17. Out-of-Thin-Air Execution is Vacuous ISO SC22 WG21 (C++ Language), May 2014. Official version: N4216 (revised N4323 2014-11-20, revised N4375 2015-02-06).
  18. Reordering and Verification at the Linux Kernel REORDER workshop in Vienna Summer of Logic, July 2014.
  19. But What About Updates? Guest lecture to Portland State University CSE510 (Concurrency), Prof. Jonathan Walpole, June 2014.
  20. N4037: Non-Transactional Implementation of Atomic Tree Move ISO SC22 WG21 (C++ Language), May 2014.
  21. Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Beaver BarCamp, Corvallis, OR, USA, April 2014.
  22. Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Linux Collaboration Summit, Napa, CA, USA, March 2014.
  23. But What About Updates? at Linux Collaboration Summit, Napa, CA, USA, March 2014.
  24. Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at linux.conf.au in Perth, January 2014.
  25. Advances in Validation of Concurrent Software at linux.conf.au in Perth, January 2014.
  26. Scaling Talks at Linux Kernel Summit Scaling microconference October 2013.
  27. But What About Updates? at Linux Plumbers Conference Scaling microconference, New Orleans, LA, USA. September 2013.
  28. Bare-Metal Multicore Performance in a General-Purpose Operating System (Now With Added Energy Efficiency!) at Linux Plumbers Conference, New Orleans, LA, USA. September 2013.
  29. Advances in Validation of Concurrent Software at Linux Plumbers Conference, New Orleans, LA, USA. September 2013.
  30. Beyond Expert-Only Parallel Programming? at LinuxCon North America, New Orleans, LA, USA. September 2013.
  31. Bare-Metal Multicore Performance in a General-Purpose Operating System at Linux Foundation Enterprise End User Summit, May 2013.
  32. Bare-Metal Multicore Performance in a General-Purpose Operating System at Multicore World, February 2013. (Updated for Oregon State University BarCamp, April 2013.)
  33. January 2013 Validating Core Parallel Software? at linux.conf.au Open Programming Miniconference.
  34. Beyond Expert-Only Parallel Programming? (presentation), at the Workshop on Relaxing Synchronization for Multicore and Manycore Software (RACES'12), October 2012.
  35. Scheduling and big.LITTLE Architecture, at Scheduling Microconference, Linux Plumbers Conference, August 2012.
  36. Signed overflow optimization hazards in the kernel, in Linux Weekly News, August 2012.
  37. Validating Core Parallel Software, at Linux Collaboration Summit, San Francisco, CA, USA, April 2012.
  38. Validating Memory Barriers and Atomic Instructions, in Linux Weekly News, December 2011.
  39. Validating Core Parallel Software, at TU Dresden, Germany, October 2011.
  40. Validating Core Parallel Software, at the 2011 China Linux Kernel Developer Conference, Nanjing, China, October 2011. (Invited)
  41. Is Parallel Programming Hard, And If So, What Can You Do About It?, at the 2011 Android System Developer Forum, Taipei, Taiwan, April 2011. (Invited)
  42. Verifying Parallel Software: Can Theory Meet Practice?, at Verification of Concurrent Data Structures (Verico), Austin, TX, USA, January 2011. (Invited)
  43. Concurrent code and expensive instructions, Linux Weekly News, January 2011.
  44. Is Parallel Programming Hard, And, If So, Why?, linux.conf.au January 2011.
  45. Verifying Parallel software: Can Theory Meet Practice?, linux.conf.au January 2011.
  46. Multi-Core Memory Models and Concurrency Theory: A View from the Linux Community, Dagstuhl workshop January 2011.
  47. N1525: Memory-Order Rationale with Blaine Garst (revised). ISO SC22 WG14 (C Language), November 2010.
  48. Omnibus Memory Model and Atomics Paper, ISO SC22 WG21 (C++ Language), with Mark Batty, Clark Nelson, Hans Boehm, Anthony Williams, Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber, Michael Wong, Lawrence Crowl, and Benjamin Kosnik. August 2010. Updated November 2010,
  49. Scalable concurrent hash tables via relativistic programming, August 2010, with Josh Triplett and Jonathan Walpole.
  50. Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory, August 2010, with Maged M. Michael, Josh Triplett, and Jonathan Walpole.
  51. Synchronization and Scalability in the Macho Multicore Era , Scuola Superiore Sant'Anna, Pisa, Italy, July 2010.
  52. Additional Atomics Errata. ISO SC22 WG14 (C Language), May 2010.
  53. Additional Atomics Errata, complete with typo in title. ISO SC22 WG14 (C Language), May 2010.
  54. Rationale for C-Language Dependency Ordering. ISO SC22 WG14 (C Language), May 2010.
  55. Updates to C++ Memory Model Based on Formalization. ISO SC22 WG14 (C Language), April 2010. Updated May 2010.
  56. Explicit Initializers for Atomics. ISO SC22 WG14 (C Language), April 2010. Updated May 2010.
  57. Dependency Ordering for C Memory Model. ISO SC22 WG14 (C Language), April 2010.
  58. Explicit Initializers for Atomics. ISO SC22 WG21 (C++ Language) March 2010.
  59. Updates to C++ Memory Model Based on Formalization. ISO SC22 WG21 (C++ Language) February 2010. Updated March 2010.
  60. Dependency Ordering for C Memory Model. ISO SC22 WG14 (C Language) November 2009.
  61. Updates to C++ Memory Model Based on Formalization. ISO SC22 WG14 (C Language) October 2009.
  62. Performance, Scalability, and Real-Time Response From the Linux Kernel short course for ACACES 2009.
  63. Is Parallel Programming Hard, and If So, Why?, presented at January 2009 linux.conf.au, along with corresponding Portland State University technical report.
  64. Example POWER Implementation for C/C++ Memory Model, revision of ISO WG21 N2745. ISO SC22 WG21 (C++ Language) September 2008. This mapping was proven to be pointwise locally optimal in 2012 by Batty, Memarian, Owens, Sarkar, and Sewell of University of Cambridge. In other words, to improve on this mapping, it is necessary to consider successive atomic operations: Taken one at a time, each is optimal.
  65. Concurrency and Race Conditions at Linux Plumbers Conference Student Day, September 2008.
  66. After 25 Years, C/C++ Understands Concurrency at linux.conf.au 2008 Mel8ourne. February 2008.
  67. Comparison of locking and transactional memory and presentation at PLOS 2007 with Maged Michael and Jon Walpole. October 2007. (revised presentation.) ( Official version of paper.)
  68. C++0x memory model user FAQ with Hans Boehm, August 2007.
  69. C++ Data-Dependency Ordering: Atomics (Updated), C++ Data-Dependency Ordering: Memory Model (Updated), and C++ Data-Dependency Ordering: Function Annotation (Updated). August 2007. (Updated version of the May 2007 paper.)
  70. C++ Data-Dependency Ordering. May 2007.
  71. A simple and efficient memory model for weakly ordered architectures. Makes case for weakly ordered primitives in programming languages. Updated May 2007.
  72. Overview of Linux-Kernel Reference Counting. January 2007.
  73. Memory Ordering in Modern Microprocessors, appearing in two parts in the August and September 2005 Linux Journal (revised April 2009).
  74. Storage Improvements for 2.6 and 2.7 in August 2004 Linux Journal.
  75. Linux Kernel Scalability: Using the Right Tool for the Job. Presentation on scalability given at the 2004 Ottawa Linux Symposium and revised for the 2005 linux.conf.au.
  76. Issues with Selected Scalability Features of the 2.6 Kernel OLS paper describing scalability, DoS, and realtime limitations of the Linux kernel at that time. With Dipankar Sarma.
  77. Fairlocks--a High-Performance Fair Locking Scheme Bit-vector fair locking scheme for NUMA systems. Revision of paper that appeared in 2002 Parallel and Distributed Computing and Systems, with Swaninathan Sivasubramanian, Jack F. Vogel, and John Stultz. Of course, it is even better to design your software so that lock contention is low enough that fancy locking techniques don't help! We implemented a number of variations on this theme.
  78. Practical Performance Estimation On Shared-Memory Multiprocessors (bibtex). The silver lining of the memory-latency dark cloud--programs whose run time is dominated by memory latency are often amenable to simple performance-estimation methods. Some of these methods are applicable at design time. Revision of PDCS'99 paper.
  79. Differential Profiling (bibtex). Revised version of the MASCOTS'95 and the '99 SP&E papers.
  80. Experience With an Efficient Parallel Kernel Memory Allocator (bibtex). Revised version of the W'93 USENIX and 2001 SP&E papers.
  81. Selecting Locking Designs for Parallel Programs (bibtex). Revised version of the PLoPD-II paper.
  82. Selecting Locking Primitives for Parallel Programs (bibtex). Revised version of the October '96 CACM paper.
  83. Efficient Demultiplexing of Incoming TCP Packets (bibtex). Analytic comparison of a number of demultiplexing techniques. The winner is hashing.
  84. Stochastic Fairness Queueing (bibtex). High-speed approximate implementation of Fair Queueing.
  85. High-Speed Event-Counting and -Classification Using a Dictionary Hash Technique (bibtex). Revised version of the ICPP'89 paper.
  86. Bibtex for other papers.