Rust: How Thread-Local Storage Actually Works


Thread-Local Storage (TLS) gives each thread its own independent value. Rust supports TLS much like C does, but with a safer and more nuanced design.

In this post, we explore how Rust manages TLS under the hood: how values are initialized, when they’re dropped, and why thread_local! behaves differently from #[thread_local]. We’ll also walk through a concrete example using Tokio’s multi-threaded runtime, showing how thread reuse affects TLS lifetime and why values persist longer than you might expect.

A Simple Example

Let’s start with a small example to understand what Thread-Local Storage (TLS) is and how Rust’s thread_local! behaves.

use std::cell::Cell;
use std::hint::black_box;
use tokio::task::JoinHandle;

#[repr(transparent)]
struct TestStruct(Cell<usize>);

thread_local! {
    static TLS_DATA: TestStruct = TestStruct(Cell::new(0));
}

impl Drop for TestStruct {
    fn drop(&mut self) {
        println!("drop called! final value = {}", self.0.get());
    }
}

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
async fn main() {
    println!("--- First batch ---");

    let tasks: Vec<JoinHandle<()>> = (0..5)
        .map(|i| {
            tokio::spawn(async move {
                TLS_DATA.with(|v| {
                    let old = v.0.get();
                    v.0.set(old + 1);

                    println!(
                        "[task {}] TLS old={}, new={}",
                        i, old, v.0.get()
                    );

                    black_box(v.0.get());
                });
            })
        })
        .collect();

    for t in tasks {
        let _ = t.await;
    }

    println!("\n--- Second batch ---");

    let tasks: Vec<JoinHandle<()>> = (5..10)
        .map(|i| {
            tokio::spawn(async move {
                TLS_DATA.with(|v| {
                    let old = v.0.get();
                    v.0.set(old + 1);

                    println!(
                        "[task {}] TLS old={}, new={}",
                        i, old, v.0.get()
                    );

                    black_box(v.0.get());
                });
            })
        })
        .collect();

    for t in tasks {
        let _ = t.await;
    }

    println!("\n--- main done ---");
}

When we run this program, we get output similar to the following:

--- First batch ---
[task 0] TLS old=0, new=1
[task 1] TLS old=1, new=2
[task 3] TLS old=0, new=1
[task 2] TLS old=2, new=3
[task 4] TLS old=1, new=2

--- Second batch ---
[task 5] TLS old=2, new=3
[task 6] TLS old=3, new=4
[task 7] TLS old=4, new=5
[task 9] TLS old=5, new=6
[task 8] TLS old=3, new=4

--- main done ---
drop called! final value = 6
drop called! final value = 4

Understanding What’s Happening

This example uses a Tokio runtime configured with 2 worker threads:

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]

Because each worker thread maintains its own TLS instance, our TLS_DATA actually exists twice—one instance per worker thread.

A few important observations from the output:

  • The counter increases independently on each worker thread.
    Two separate sequences appear because thread A and thread B each maintain their own TestStruct.
  • The TLS values from the first batch are not dropped before the second batch.
    Tokio reuses the same worker threads, so those threads never exit between the two batches.
    Therefore, their TLS values persist.
  • Drop is called exactly twice when the program ends.
    Each TLS instance is dropped when its thread shuts down—only at the end of the entire runtime, not between task batches.

So the important takeaway is:

TLS is tied to threads, not tasks.
A new async task does not create a new TLS value. Only creating/destroying OS threads does.


#[thread_local] vs. thread_local!

Rust provides two different mechanisms for thread-local storage (TLS).
The first is the #[thread_local] attribute, which corresponds closely to native OS-level TLS.
The second is the thread_local! macro, which is Rust’s higher-level and safer abstraction, similar to TLS usage in C but with additional Rust guarantees.

Here’s a detailed comparison:

Category#[thread_local]thread_local!
Basic conceptNative TLS. Attaches thread-local storage directly to a static variable.Standard library TLS abstraction that creates a LocalKey<T> accessed via .with().
Declaration#[thread_local] static FOO: T = ...;thread_local! { static FOO: T = ...; }
InitializationRequires compile-time, constant initialization only.Supports lazy initialization using closures or expressions.
Initialization timingInitialized when the thread is created (or when the loader allocates TLS).Initialized the first time it is accessed on each thread.
Drop supportDestructor execution is limited or unsupported on many targets. Not guaranteed at thread exit.Destructor runs at thread exit (on supported platforms) via Rust’s TLS destructor list.
OverheadMinimal; essentially direct TLS pointer access.Slightly higher: a .with() call plus a lazy-init branch.
Intended useLow-level, native TLS usage, FFI interoperability, or extreme micro-optimization.Idiomatic Rust TLS: safe access, lazy init, destructor support when needed.

Diving into thread_local!

Rust’s thread_local! macro is a front-end wrapper around an internal macro, thread_local_inner!. This helper macro expands into the actual implementation that builds a LocalKey<T> and manages its storage, initialization strategy, and destructor registration.

Let’s first look at the outer macro:

macro_rules! thread_local {
    // empty (base case for the recursion)
    () => {};

    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const $init:block; $($rest:tt)*) => (
        $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, const $init);
        $crate::thread_local!($($rest)*);
    );

    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const $init:block) => (
        $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, const $init);
    );

    // process multiple declarations
    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = $init:expr; $($rest:tt)*) => (
        $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, $init);
        $crate::thread_local!($($rest)*);
    );

    // handle a single declaration
    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = $init:expr) => (
        $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, $init);
    );
}

The macro simply forwards each declaration to:

pub macro thread_local_inner {
    // NOTE: we cannot import `LocalKey`, `LazyStorage` or `EagerStorage` with a `use` because that
    // can shadow user provided type or type alias with a matching name. Please update the shadowing
    // test in `tests/thread.rs` if these types are renamed.

    // Used to generate the `LocalKey` value for const-initialized thread locals.
    (@key $t:ty, const $init:expr) => {{
        const __INIT: $t = $init;

        unsafe {
            $crate::thread::LocalKey::new(const {
                if $crate::mem::needs_drop::<$t>() {
                    |_| {
                        #[thread_local]
                        static VAL: $crate::thread::local_impl::EagerStorage<$t>
                            = $crate::thread::local_impl::EagerStorage::new(__INIT);
                        VAL.get()
                    }
                } else {
                    |_| {
                        #[thread_local]
                        static VAL: $t = __INIT;
                        &VAL
                    }
                }
            })
        }
    }},

    // used to generate the `LocalKey` value for `thread_local!`
    (@key $t:ty, $init:expr) => {{
        #[inline]
        fn __init() -> $t {
            $init
        }

        unsafe {
            $crate::thread::LocalKey::new(const {
                if $crate::mem::needs_drop::<$t>() {
                    |init| {
                        #[thread_local]
                        static VAL: $crate::thread::local_impl::LazyStorage<$t, ()>
                            = $crate::thread::local_impl::LazyStorage::new();
                        VAL.get_or_init(init, __init)
                    }
                } else {
                    |init| {
                        #[thread_local]
                        static VAL: $crate::thread::local_impl::LazyStorage<$t, !>
                            = $crate::thread::local_impl::LazyStorage::new();
                        VAL.get_or_init(init, __init)
                    }
                }
            })
        }
    }},
    ($(#[$attr:meta])* $vis:vis $name:ident, $t:ty, $($init:tt)*) => {
        $(#[$attr])* $vis const $name: $crate::thread::LocalKey<$t> =
            $crate::thread::local_impl::thread_local_inner!(@key $t, $($init)*);
    }

The real logic happens inside thread_local_inner!.


Understanding thread_local_inner!

The thread_local_inner! macro is responsible for generating a LocalKey<T> instance.
Its behavior depends on whether the thread-local variable is const-initialized or not:

  • Const-initialized -> uses EagerStorage
  • Dynamically initialized ($init:expr) -> uses LazyStorage

This branching is implemented using the special @key matcher inside the macro:

(@key $t:ty, const $init:expr) => { ... }
(@key $t:ty, $init:expr) => { ... }

This design allows the macro system to implement different initialization strategies entirely inside macro expansions without needing runtime branching.


Storage Implementations

Rust uses two different internal storage types depending on initialization style.

1. EagerStorage (for const initialization)

Eager storage eagerly allocates and initializes the TLS value at thread creation time.

#[derive(Clone, Copy)]
enum State {
    Initial,
    Alive,
    Destroyed,
}

#[allow(missing_debug_implementations)]
pub struct Storage<T> {
    state: Cell<State>,
    val: UnsafeCell<T>,
}

Key points:

  • Value is fully constructed upfront (val: UnsafeCell<T>).
  • Destructor registration happens immediately during initialization.
  • This is the fastest possible TLS mechanism—essentially just “load from TLS slot”.

Eager Initialization Function

    #[cold]
    unsafe fn initialize(&self) -> *const T {
        unsafe {
            destructors::register(ptr::from_ref(self).cast_mut().cast(), destroy::<T>);   // register drop at thread exit
        }

        self.state.set(State::Alive);
        self.val.get()
    }

So EagerStorage = simple state + raw value + destructor registration.


2. LazyStorage (for non-const initialization)

Non-const initializers must run at first use, so Rust wraps the value in MaybeUninit<T> and tracks state more carefully:

pub unsafe trait DestroyedState: Sized + Copy {
    fn register_dtor<T>(s: &Storage<T, Self>);
}

unsafe impl DestroyedState for ! {
    fn register_dtor<T>(_: &Storage<T, !>) {}
}

unsafe impl DestroyedState for () {
    fn register_dtor<T>(s: &Storage<T, ()>) {
        unsafe {
            destructors::register(ptr::from_ref(s).cast_mut().cast(), destroy::<T>);
        }
    }
}

#[derive(Copy, Clone)]
enum State<D> {
    Uninitialized,
    Alive,
    Destroyed(D),
}

#[allow(missing_debug_implementations)]
pub struct Storage<T, D> {
    state: Cell<State<D>>,
    value: UnsafeCell<MaybeUninit<T>>,
}

But there’s something new: DestroyedState.

Why DestroyedState?

Rust has two scenarios:

  1. The TLS type needs Drop
    -> Destructor must be registered once.
  2. The TLS type does not need Drop
    -> Destructor registration is a no-op.

Therefore, Rust uses two marker types:

  • () -> needs destruction
  • ! -> no destructor needed

This allows Rust to avoid unnecessary destructor registration for trivial types, optimizing performance.

Lazy Initialization Function

    #[cold]
    unsafe fn get_or_init_slow(
        &self,
        i: Option<&mut Option<T>>,
        f: impl FnOnce() -> T,
    ) -> *const T {
        match self.state.get() {
            State::Uninitialized => {}
            State::Alive => return self.value.get().cast(),
            State::Destroyed(_) => return ptr::null(),
        }

        let v = i.and_then(Option::take).unwrap_or_else(f);

        let mut old_value = unsafe { self.value.get().replace(MaybeUninit::new(v)) };
        match self.state.replace(State::Alive) {
            State::Uninitialized => D::register_dtor(self),

            State::Alive => unsafe { old_value.assume_init_drop() },

            State::Destroyed(_) => unreachable!(),
        }

        self.value.get().cast()
    }

This is where initialization, destruction registration, and recursive-init edge cases are handled.


Destructor Registration

Both storage types call destructors::register(...) to schedule the value to be dropped when the thread exits.

But how does Rust ensure those destructors actually run?

The guard Module

Rust uses guard::enable() to register the destructor runner with the OS or platform runtime.

The platform decides how destructors run:

  • Linux, Android → uses ELF TLS destructor mechanisms
  • Windows → uses Fls callbacks
  • macOS → uses key-based pthread destructor
  • WASM → lacks thread exit → destructors do not run
  • Embedded/UEFI → Rust runtime manually runs destructors

Example: pub(super) use linux_like::register; pub(super) use list::run;

When a thread exits:

  1. Rust calls destructors::run()
  2. Each registered TLS value is popped off the destructor list
  3. Drop is executed in reverse-registration order (similar to C++ TLS dtors)

This explains why:

  • Drop runs once per thread
  • TLS values survive for the lifetime of the worker thread
  • In WASM, TLS destructors are effectively leaked (no thread teardown)

Use Cases — What You Actually Need in Real Rust

While Rust technically exposes two different kinds of TLS, most developers don’t need to choose between them.

Use thread_local! (almost always)

thread_local! is stable, safe, supports Drop, allows lazy initialization, and has minimal overhead.
It is the correct choice for:

  • thread-local caches
  • buffers and scratch memory
  • statistics counters
  • storing per-thread handles or context

#[thread_local] is rarely practical

#[thread_local]:

  • requires Nightly
  • does not guarantee Drop
  • requires const initialization
  • has limited use-cases outside of compiler internals

Only consider it if you are:

  • integrating with native TLS symbols from C
  • building a runtime or OS-level system
  • doing aggressive micro-optimizations on Nightly

For everyone else, the macro is the right tool.

TL;DR: If you’re not writing the Rust standard library, use thread_local!.

Looking Ahead

Rust’s TLS mechanisms show how much thought the language puts into balancing safety, performance, and portability.
Behind a simple macro like thread_local! lies a surprisingly sophisticated system: lazy initialization, state tracking, destructor registration, and platform-specific behavior.

Understanding these internals isn’t required for everyday Rust programming, but it helps explain why TLS behaves the way it does—especially in multi-threaded or async environments where threads are reused. With this foundation, it becomes easier to reason about thread-local state, avoid subtle bugs, and write more predictable concurrent Rust code.