Skip to content

SR-4201: DispatchSourceSignal not working on Linux #231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2017

Conversation

dgrove-oss
Copy link
Contributor

The observed bug is that signals sent to the main thread
after it called dispatch_main were not triggering the
event handler of a DispatchSourceSignal that had been
previously created and activated to handle that signal.

If plausible signals are added to the sigsuspend mask,
then they are properly routed to the signalfd and the
DispatchSourceSignal event handler runs as expected.

This commit also changes event_epoll to use pthread_sigmask
to block the signal when creating a signalfd because
the Linux man page for sigprocmask states that sigprocmask's
behavior is undefined for multi-threaded programs and
pthread_sigmask should be used instead.

Copy link
Contributor

@MadCoder MadCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to re-read doc about signals and linux, the fact that you need to block signals for signalfd to work properly is a major PITA that will cause platform dependance issues with Swift. but I'm not 100% sure we can avoid it anyway.

src/queue.c Outdated
(void)dispatch_assume_zero(r);

#if DISPATCH_EVENT_BACKEND_EPOLL
// If there are signals that the user may have reasonably
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that feels like the wrong solution, but I need to think about it more. I hate signals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better solution is to handle this lazily in the code that creates the unotes for signals: when a dispatch source of type SIGNAL is created for a given signal, making a sigprocmask(SIG_BLOCK, ...) and document that it does this in the headerdoc for DISPATCH_SOURCE_TYPE_SIGNAL on linux.

Here you're just fishing and there are other signals that are useful that you don't cover which is the part I don't like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which ... icidentally is exactly what the current code is doing so why is it not working again?

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Mar 10, 2017

I agree. This option feels like a complete kludge, but I'm out of ideas. I hacked together a pretty small C program that shows the same symptoms. I'll attach that to the JIRA entry in case it is useful for thinking about the problem or playing with options.

@weissi
Copy link
Contributor

weissi commented May 5, 2017

@dgrove-oss @MadCoder any updates for this?

src/queue.c Outdated
(void)dispatch_assume_zero(r);

#if DISPATCH_EVENT_BACKEND_EPOLL
// If there are signals that the user may have reasonably
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better solution is to handle this lazily in the code that creates the unotes for signals: when a dispatch source of type SIGNAL is created for a given signal, making a sigprocmask(SIG_BLOCK, ...) and document that it does this in the headerdoc for DISPATCH_SOURCE_TYPE_SIGNAL on linux.

Here you're just fishing and there are other signals that are useful that you don't cover which is the part I don't like.

src/queue.c Outdated
(void)dispatch_assume_zero(r);

#if DISPATCH_EVENT_BACKEND_EPOLL
// If there are signals that the user may have reasonably
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which ... icidentally is exactly what the current code is doing so why is it not working again?

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

I think what may be broken is that the sigprocmask() should be done before the signalfd_create() maybe?

but what you're doing makes no sense to me: if you only block the signal on the thread that happens to register this signal, it can still be delivered to other threads. I thought on linux sigprocmask() was applying the mask to the process?

If not then I don't understand how libraries can use signals safely (which is a joke in itself anyway) in a process that went multi-threaded.

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

oh but the manpage says:

sigprocmask() is used to fetch and/or change the signal mask of the
       calling thread.

doh I'm so so so confused.

@dgrove-oss
Copy link
Contributor Author

My memory is paged out. I can probably find the C programs I was playing with back in March are refresh it.

One issue is that Linux docs flat out says you can't correctly use sigprocmask in a mutli-threaded program. You have to use pthread_sigmask.

http://man7.org/linux/man-pages/man2/sigprocmask.2.html

@dgrove-oss
Copy link
Contributor Author

It might work to lazily add signals to the mask being used in dispatch_signal_thread and poke that thread to wake up and reset its pthread_sigmask.

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

hmm, actually I think I have a good fix, modulo testing it and making sure it builds and nitpicks like that ;)

From 101af4bb8c554c7073605fe100b527a30fbf8e7a Mon Sep 17 00:00:00 2001
From: Pierre Habouzit <phabouzit@apple.com>
Date: Fri, 5 May 2017 10:34:20 -0700
Subject: [PATCH] Fix the signal story on linux

- export _dispatch_sigmask() as the way to block signals in a process
  using dispatch
- call it when dispatch_main() is called
- each time dispatch registers a signal handler, register a sigaction that
  will catch signals for threads with wrong masks and will fix the thread
  configuration then raise the signal again so that it is delivered to the
  signalfd as expected.
---
 src/event/event_epoll.c | 22 +++++++++++++++++++++-
 src/init.c              | 23 +++++++++++++++++++++++
 src/internal.h          |  2 ++
 src/queue.c             | 40 ++--------------------------------------
 4 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/src/event/event_epoll.c b/src/event/event_epoll.c
index 3ebff148..c6c2c592 100644
--- a/src/event/event_epoll.c
+++ b/src/event/event_epoll.c
@@ -111,9 +111,26 @@ _dispatch_muxnote_dispose(dispatch_muxnote_t dmn)
 	free(dmn);
 }
 
+static void
+_dispatch_muxnote_signal_block_and_raise(int signo)
+{
+	// On linux, for signals to be delivered to the signalfd, signals
+	// must be blocked, else any thread that hasn't them blocked may
+	// receive them.  Fix that by lazily noticing, blocking said signal,
+	// and raising the signal again when it happens
+	_dispatch_sigmask();
+	raise(signo);
+}
+
 static dispatch_muxnote_t
 _dispatch_muxnote_create(dispatch_unote_t du, int16_t events)
 {
+	static sigset_t signals_with_unotes;
+	static struct sigaction sa = {
+		.sa_handler = _dispatch_muxnote_signal_block_and_raise,
+		.sa_flags = SA_RESTART,
+	};
+
 	dispatch_muxnote_t dmn;
 	struct stat sb;
 	int fd = du._du->du_ident;
@@ -123,13 +140,16 @@ _dispatch_muxnote_create(dispatch_unote_t du, int16_t events)
 
 	switch (filter) {
 	case EVFILT_SIGNAL:
+		if (!sigismember(&signals_with_unotes, du->du_ident)) {
+			sigaddset(&signals_with_unotes, du->du_ident);
+			sigaction(du->du_ident, &sa, NULL);
+		}
 		sigemptyset(&sigmask);
 		sigaddset(&sigmask, du._du->du_ident);
 		fd = signalfd(-1, &sigmask, SFD_NONBLOCK | SFD_CLOEXEC);
 		if (fd < 0) {
 			return NULL;
 		}
-		sigprocmask(SIG_BLOCK, &sigmask, NULL);
 		break;
 
 	case EVFILT_WRITE:
diff --git a/src/init.c b/src/init.c
index c1506422..5490bd91 100644
--- a/src/init.c
+++ b/src/init.c
@@ -74,6 +74,29 @@ dispatch_atfork_child(void)
 	_dispatch_unsafe_fork = 0;
 }
 
+int
+_dispatch_sigmask(void);
+{
+	sigset_t mask;
+	int r = 0;
+
+	/* Workaround: 6269619 Not all signals can be delivered on any thread */
+	r |= sigfillset(&mask);
+	r |= sigdelset(set, SIGILL);
+	r |= sigdelset(set, SIGTRAP);
+#if HAVE_DECL_SIGEMT
+	r |= sigdelset(set, SIGEMT);
+#endif
+	r |= sigdelset(set, SIGFPE);
+	r |= sigdelset(set, SIGBUS);
+	r |= sigdelset(set, SIGSEGV);
+	r |= sigdelset(set, SIGSYS);
+	r |= sigdelset(set, SIGPIPE);
+	r |= sigdelset(set, SIGPROF);
+	r |= pthread_sigmask(how, set, oset);
+	(void)dispatch_assume_zero(r);
+}
+
 #pragma mark -
 #pragma mark dispatch_globals
 
diff --git a/src/internal.h b/src/internal.h
index 500030a9..e6e8ccc5 100644
--- a/src/internal.h
+++ b/src/internal.h
@@ -1086,6 +1086,8 @@ extern int _dispatch_kevent_workloop_disabled;
 #endif // DISPATCH_KEVENT_WORKLOOP_FALLBACK
 #endif // __OPEN_SOURCE__
 
+int _dispatch_sigmask(void);
+
 /* #includes dependent on internal.h */
 #include "object_internal.h"
 #include "semaphore_internal.h"
diff --git a/src/queue.c b/src/queue.c
index 07ae855d..3c1eceb5 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -70,7 +70,6 @@ static void _dispatch_worker_thread2(int priority, int options, void *context);
 #endif
 #if DISPATCH_USE_PTHREAD_POOL
 static void *_dispatch_worker_thread(void *context);
-static int _dispatch_pthread_sigmask(int how, sigset_t *set, sigset_t *oset);
 #endif
 
 #if DISPATCH_COCOA_COMPAT
@@ -5884,13 +5883,8 @@ _dispatch_worker_thread(void *context)
 		pqc->dpq_thread_configure();
 	}
 
-	sigset_t mask;
-	int r;
 	// workaround tweaks the kernel workqueue does for us
-	r = sigfillset(&mask);
-	(void)dispatch_assume_zero(r);
-	r = _dispatch_pthread_sigmask(SIG_BLOCK, &mask, NULL);
-	(void)dispatch_assume_zero(r);
+	_dispatch_sigmask();
 	_dispatch_introspection_thread_add();
 
 	const int64_t timeout = 5ull * NSEC_PER_SEC;
@@ -5906,37 +5900,6 @@ _dispatch_worker_thread(void *context)
 	_dispatch_release(dq); // retained in _dispatch_global_queue_poke_slow
 	return NULL;
 }
-
-int
-_dispatch_pthread_sigmask(int how, sigset_t *set, sigset_t *oset)
-{
-	int r;
-
-	/* Workaround: 6269619 Not all signals can be delivered on any thread */
-
-	r = sigdelset(set, SIGILL);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGTRAP);
-	(void)dispatch_assume_zero(r);
-#if HAVE_DECL_SIGEMT
-	r = sigdelset(set, SIGEMT);
-	(void)dispatch_assume_zero(r);
-#endif
-	r = sigdelset(set, SIGFPE);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGBUS);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGSEGV);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGSYS);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGPIPE);
-	(void)dispatch_assume_zero(r);
-	r = sigdelset(set, SIGPROF);
-	(void)dispatch_assume_zero(r);
-
-	return pthread_sigmask(how, set, oset);
-}
 #endif // DISPATCH_USE_PTHREAD_POOL
 #if DISPATCH_USE_KEVENT_WORKLOOP && !defined(__OPEN_SOURCE__)
 
@@ -6430,6 +6393,7 @@ dispatch_main(void)
 		pthread_key_t dispatch_main_key;
 		pthread_key_create(&dispatch_main_key, _dispatch_sig_thread);
 		pthread_setspecific(dispatch_main_key, &dispatch_main_key);
+		_dispatch_sigmask();
 #endif
 		pthread_exit(NULL);
 		DISPATCH_INTERNAL_CRASH(errno, "pthread_exit() returned");
-- 
2.12.2

github desn't let me attach the diff easily (I know I should create a pull request but for reasons I can't from where I am so bear with me).

the gist of the patch is to expose (in init.c, it has to move away from queue.c) a _dispatch_sigmask() that takes over signals from that thread (IOW if you use dispatch, you decide unequivocally that process-wide signals [as opposed to SIGSEGV, SIGBUS and friends] are taken over by dispatch).

Then each time we register a source for a signal, if it's new, to fix the configuration of threads that have been pthread_created and don't have the right mask, we have a handler that will apply that sigmask and raise the signal again. it's gross but should do the job.

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

@dgrove-oss can I let you have a look at that idea and massage this patch to victory?

oh and the _dispatch_sigmask() in dispatch_main() is obviously about trying to fix that guy since clearly he's the broken one most of the time ;)

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

(typically I just realized _dispatch_sigmask() doesn't even build you have to s/set/&mask/g)

@dgrove-oss
Copy link
Contributor Author

sure, I'll take a look later this afternoon.

@dgrove-oss
Copy link
Contributor Author

I couldn't get git apply to do anything reasonable, so I applied via patch and made minor adjustments. Left it on the branch associated with this PR and haven't squashed away the previous commits yet in case we need them.

It is tantalizingly close to working....if one is patient and waits a couple of seconds before sending the signal to the process, than all is fine. If one sends the signal too soon, it is missed.
(The test program prints out the kill command from the registration handler, so needing to wait for several seconds after seeing the command seems to indicate a problem somewhere).

@dgrove-oss
Copy link
Contributor Author

hmm, actually there are more kinks to work out. Although the signal gets delivered to the DispatchSource, we also get into "signal-based spin loop" because the signal handler re-raises the signal, which causes the signal handler to run again, which re-raises, ad infinitum.

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

How can it spin loop? the signal is supposed to be blocked in all threads, and I mask it too on if we ever hit the handler. I'm very very confused. I'm very confused.

@dgrove-oss
Copy link
Contributor Author

I was trying to see why it seemed like we needed to wait a few seconds before giving the signal. I stuck debug printfs into the code, including:

diff --git a/src/event/event_epoll.c b/src/event/event_epoll.c
index e4c5cd9..0c33358 100644

--- a/src/event/event_epoll.c
+++ b/src/event/event_epoll.c
@@ -120,12 +120,15 @@ _dispatch_muxnote_dispose(dispatch_muxnote_t dmn)
 static void
 _dispatch_muxnote_signal_block_and_raise(int signo)
 {
+       printf("Entered block_and_raise for %d\n", signo);
        // On linux, for signals to be delivered to the signalfd, signals
        // must be blocked, else any thread that hasn't them blocked may
        // receive them.  Fix that by lazily noticing, blocking said signal,
        // and raising the signal again when it happens
        _dispatch_sigmask();
+       printf("about to re-raise %d\n", signo);
        raise(signo);
+       printf("about exit block_and_raise for %d\n", signo);
 }

If run the test program, and send it a single kill -17, in addition to getting the message we want (from the DispatchSource event handler), we get endless output of:

Entered block_and_raise for 17
about to re-raise 17
about exit block_and_raise for 17
Entered block_and_raise for 17
about to re-raise 17
about exit block_and_raise for 17
Entered block_and_raise for 17
about to re-raise 17
about exit block_and_raise for 17

I haven't spent too much time digging into why yet, just noting that my earlier comment about it being almost working was not right.

@MadCoder
Copy link
Contributor

MadCoder commented May 5, 2017

Actually instead of raise() we want to pthread_kill() the manager thread I think, I'll try to amend your branch in that direction, it's probably not too hard by:

  • having a static pthread_t manager_thread = pthread_self() taken when we register a signal, exactly where I do the sigaction in event_epoll.c
  • replace the raise with a pthread_kill(manager_thread, signo)

that should have the desired effect.

You have to make sure the manager thread (the one blocking in epoll_wait) also calls _dispatch_sigmask()). If linux has a way to pthread_create() with an initial mask that'd be great too

@dgrove-oss
Copy link
Contributor Author

yep, that fixes it. Will push shortly.

@MadCoder
Copy link
Contributor

MadCoder commented May 6, 2017

LGTM please squash

The observed bug is that signals sent to the main thread
after it called dispatch_main were not triggering the
event handler of a DispatchSourceSignal that had been
previously created and activated to handle that signal.

This patch addresses the problem by:

- export _dispatch_sigmask() as the way to block signals in a process
  using dispatch
- call it when dispatch_main() is called
- each time dispatch registers a signal handler, register a sigaction that
  will catch signals for threads with wrong masks and will fix the thread
  configuration then raise the signal again by directing a pthread_kill
  to the manager thread so that it is delivered to the signalfd as expected.

Patch by Pierre Habouzit <phabouzit@apple.com> from PR-231
with minor compilation fixes by Dave Grove.
@dgrove-oss dgrove-oss force-pushed the SR-4201-signalsource branch from 2bf0921 to 62741ca Compare May 6, 2017 15:22
@dgrove-oss
Copy link
Contributor Author

squashed; ready to be tested.

@MadCoder
Copy link
Contributor

MadCoder commented May 7, 2017

@swift-ci please test

1 similar comment
@MadCoder
Copy link
Contributor

MadCoder commented May 7, 2017

@swift-ci please test

@MadCoder MadCoder merged commit f76b8f5 into swiftlang:master May 7, 2017
@dgrove-oss dgrove-oss deleted the SR-4201-signalsource branch May 8, 2017 13:39
@das
Copy link
Contributor

das commented May 19, 2017

this causes a build failure on Darwin, the _dispatch_sigmask() function is declared with int return value but nothing is returned.

das pushed a commit that referenced this pull request May 25, 2017
SR-4201: DispatchSourceSignal not working on Linux

Signed-off-by: Daniel A. Steffen <dsteffen@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants