-
Notifications
You must be signed in to change notification settings - Fork 63
Introduce ThreadPool and use it for quicksort #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a ThreadPool class to enable parallel quicksort using std::threads as an alternative to OpenMP. Key changes include:
- Updating tests to trigger the new std::threads-based parallel sort.
- Adding the ThreadPool implementation in src/xss-thread-pool.hpp.
- Integrating std::thread-based parallel quicksort logic and updating the CI workflow to enable the new threading option.
Reviewed Changes
Copilot reviewed 5 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/test-qsort.cpp | Updated conditional macro to trigger extended tests with std::threads. |
src/xss-thread-pool.hpp | Introduced a new ThreadPool implementation for task-based parallelism. |
src/xss-common-qsort.h | Integrated std::thread-based quicksort with conditional compilation. |
src/avx512-16bit-qsort.hpp | Updated parallel sort logic to use ThreadPool instead of OpenMP. |
.github/workflows/c-cpp.yml | Modified Meson setup command to include std::threads build flag. |
Files not reviewed (7)
- Makefile: Language not supported
- lib/meson.build: Language not supported
- meson.build: Language not supported
- meson_options.txt: Language not supported
- scripts/bench-compare.sh: Language not supported
- scripts/branch-compare.sh: Language not supported
- tests/meson.build: Language not supported
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty good overall. Besides the comments below, the only thing that might be nice is making a CI run for thread sanitizer; running that locally didn't show any issues, so it's probably fine, but it would be nice now that we are handling threading manually.
(1) Add meson flag to build with std threads to ASAN tests (2) Remove unnecessary comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed only the one commit
constexpr int thread_limit = 8; | ||
int thread_count = std::min(thread_limit, omp_get_max_threads()); | ||
int thread_count = std::min(thread_limit, | ||
(int)std::thread::hardware_concurrency()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is 8 threads in a system with a lot of cores still better? Maybe you need to partition the data better to avoid locality problems.
src/xss-thread-pool.hpp
Outdated
std::condition_variable condition; // Condition variable for task queue | ||
std::condition_variable done_condition; // Condition variable for waiting | ||
int active_tasks {0}; | ||
bool stop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be std::atomic<bool>
src/xss-thread-pool.hpp
Outdated
// Add a new task to the queue and notify one of the worker threads | ||
std::unique_lock<std::mutex> lock(queue_mutex); | ||
tasks.emplace(std::forward<F>(func)); | ||
condition.notify_one(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: the started threads may not have reached the first lock yet:
// Lock the queue mutex and wait for a task to be available
std::unique_lock<std::mutex> lock(queue_mutex);
if that's the case, then this notify_one
will notify no one. That means they will all lock that mutex, then proceed to wait (forever) on the condition
.
You either want a semaphore or you need to check above before waiting if there already is work to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand this one fully. The ThreadPool
constructor is called before the call to qsort_threads
which is well before submit_task
which in turn calls the enqueue
method. Is it possible to get enqueue
before the ThreadPool
constructor is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the fact that it's two threads. You've described what's happening on the original thread, not on the thread you've just started for the pool. It's entirely possible (and even likely) that the original thread continues running after a couple hundred cycles spent in kernel mode, while the newly started thread takes several thousand cycles to start, in another core (which may be busy). That means the original thread may advance all the way to this function and lock the queue_mutex
before the started thread has had a chance to lock it and call wait()
. That means we notify no one here.
src/xss-thread-pool.hpp
Outdated
void enqueue(F &&func) | ||
{ | ||
// Add a new task to the queue and notify one of the worker threads | ||
std::unique_lock<std::mutex> lock(queue_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use C++17 for this code? If so, you can use CTAD:
std::unique_lock lock(queue_mutex);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We do use C++-17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you should also use std::jthread
instead of plain std::thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm isn't that C++-20? https://en.cppreference.com/w/cpp/thread/jthread/jthread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, so it is. Never mind this point then. You can still use CTAD to avoid writing <std::mutex>
in C++17
src/xss-thread-pool.hpp
Outdated
std::unique_lock<std::mutex> lock(queue_mutex); | ||
stop = true; | ||
lock.unlock(); | ||
condition.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same race condition problem
src/xss-thread-pool.hpp
Outdated
pool.task_start(); | ||
pool.enqueue([f = std::forward<F>(f), &pool]() { | ||
f(); | ||
pool.task_end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: if the first task you submit finishes before you enqueue the second, the thread pool will have notified on done_condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for catching that. Would a potential solution be to keep the threads running until explicitly asked to shut down? That can be done by adding a member function terminate_pools()
which can be called once the qsort_threads()
completes executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But what if you need to do more work? You should keep the threads running for a little while to avoid the cost of startup again.
This pull request adds Threadpool class and uses it for parallel quicksort using
std::threads
as an alternative to OpenMP. It updates the sorting logic to incorporatestd::threads
-based parallelism. The performance matches that of openMP.Most of the code in the PR was written by copilot and needs a careful review.