Skip to content

replace gettid system call with pthread_self() to avoid constant cost… #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

frankeh
Copy link
Contributor

@frankeh frankeh commented Feb 1, 2016

…ly system calls

@MadCoder
Copy link
Contributor

MadCoder commented Feb 1, 2016

this is broken on 64bit platforms, as pthread_self() is a long and mach_port_t a uint32_t.
and it would be significantly simpler for further porting efforts if mach_port_t stays a uint32_t.

I think libdispatch should fix all the TSD stuff anyway and have a single __thread structure to replace the TSD calls, which are fast on linux.

and capturing the gettid() should be memoized in that structure. that is the right fix and will yield significant other wins anyway.

@MadCoder
Copy link
Contributor

MadCoder commented Feb 1, 2016

You probably want to introduce a configure.ac USE_THREAD_LOCAL_STORAGE that is YES for linux and platforms where the ABI isn't expensive (it is a lot on OS X, and that may be true of other platforms).

and then revamp tsd.h and friends to have a single pthread_setpsecific_key() that registers the address of that __thread struct (so that the destructors do run on pthread_exit()) with a single pthread specific key, and abstract the _get/_set specific into member derefs into that structure.

bonus points to type the member of the structure properly (but that may prove difficult, I haven't tried).

@frankeh
Copy link
Contributor Author

frankeh commented Feb 3, 2016

agree, missed the 64 -> 32 downgrade.
Agree pthread_get_specific is extremely thin .. traced at ~40 instruction
Will get on it. Thanks for feedback.

@frankeh
Copy link
Contributor Author

frankeh commented Feb 3, 2016

actually to be clear and so not to waste time.

struct tsd { pid_t tid; otherjunk; }
__thread struct tsd thread_local_storage;

Question, what's the point of keeping everything in a single datastructure?
Modern systems do this through linkers and offsets to the __thread local segment.

@MadCoder
Copy link
Contributor

MadCoder commented Feb 3, 2016

Question, what's the point of keeping everything in a single datastructure?

You don't need to on paper, but for portability purposes, you could do tricks such as:

struct dispatch_tsd {
    bool initialized;
    pid_t tid;
    dispatch_continuation_t dispatch_cache_key;
    dispatch_queue_t dispatch_queue_key;
    …
};
static __thread struct dispatch_tsd __dispatch_tsd;
static pthread_key_t __dispatch_tsd_key;
DISPATCH_ALWAYS_INLINE
struct dispatch_tsd *
_dispatch_get_tsd_base(void)
{
    if (slowpath(!__dispatch_tsd.initialized)) {
        __dispatch_tsd.tid = gettid();
        pthread_setspecific(__dispatch_tsd_key, &__dispatch_tsd);
        __dispatch_tsd.initialized = true;
    }
    return &_dispatch_tsd;
}

#define _dispatch_thread_getspecific(key) (_dispatch_get_tsd_base()->key)
#define _dispatch_thread_setspecific(key, value) (void)(_dispatch_get_tsd_base()->key = (value))

and then you need to simulate the destructors on all the fields that have one in dispatch (the __dispatch_tsd_key above is the only one you really need to register at libdispatch init time that way), so that at thread exit you introspect these values and call the destructors for the ones that have one (such as dispatch_queue_key which is used to find bugs in dispatch where the current queue TSD hasn't been reset properly which would be a grave bug).

do I make sense?

@MadCoder
Copy link
Contributor

MadCoder commented Feb 3, 2016

Also, performance wise, TSD structures addresses are only a segment load away, but it's common in dispatch to use several TSD values in a row in hot paths. If you pack them in the same structure, the compiler should be clever enough to load the __dispatch_tsd address only once, and then do offset loads from this, which will use less registers and spill less things on the stack.

@frankeh
Copy link
Contributor Author

frankeh commented Feb 4, 2016

Indeed makes sense. I have an initial version without the struct running and compiling, which does it differently (just to test) .. will look at the above and adjust.
In principle the same to use macros etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants