Skip to content

Introduce ThreadPool and use it for quicksort #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

r-devulap
Copy link
Contributor

This pull request adds Threadpool class and uses it for parallel quicksort using std::threads as an alternative to OpenMP. It updates the sorting logic to incorporate std::threads-based parallelism. The performance matches that of openMP.

Most of the code in the PR was written by copilot and needs a careful review.

@r-devulap r-devulap requested a review from Copilot May 8, 2025 22:10
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a ThreadPool class to enable parallel quicksort using std::threads as an alternative to OpenMP. Key changes include:

  • Updating tests to trigger the new std::threads-based parallel sort.
  • Adding the ThreadPool implementation in src/xss-thread-pool.hpp.
  • Integrating std::thread-based parallel quicksort logic and updating the CI workflow to enable the new threading option.

Reviewed Changes

Copilot reviewed 5 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test-qsort.cpp Updated conditional macro to trigger extended tests with std::threads.
src/xss-thread-pool.hpp Introduced a new ThreadPool implementation for task-based parallelism.
src/xss-common-qsort.h Integrated std::thread-based quicksort with conditional compilation.
src/avx512-16bit-qsort.hpp Updated parallel sort logic to use ThreadPool instead of OpenMP.
.github/workflows/c-cpp.yml Modified Meson setup command to include std::threads build flag.
Files not reviewed (7)
  • Makefile: Language not supported
  • lib/meson.build: Language not supported
  • meson.build: Language not supported
  • meson_options.txt: Language not supported
  • scripts/bench-compare.sh: Language not supported
  • scripts/branch-compare.sh: Language not supported
  • tests/meson.build: Language not supported

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@sterrettm2 sterrettm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good overall. Besides the comments below, the only thing that might be nice is making a CI run for thread sanitizer; running that locally didn't show any issues, so it's probably fine, but it would be nice now that we are handling threading manually.

(1) Add meson flag to build with std threads to ASAN tests
(2) Remove unnecessary comments
Copy link
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed only the one commit

Comment on lines 561 to +563
constexpr int thread_limit = 8;
int thread_count = std::min(thread_limit, omp_get_max_threads());
int thread_count = std::min(thread_limit,
(int)std::thread::hardware_concurrency());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 8 threads in a system with a lot of cores still better? Maybe you need to partition the data better to avoid locality problems.

std::condition_variable condition; // Condition variable for task queue
std::condition_variable done_condition; // Condition variable for waiting
int active_tasks {0};
bool stop;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be std::atomic<bool>

// Add a new task to the queue and notify one of the worker threads
std::unique_lock<std::mutex> lock(queue_mutex);
tasks.emplace(std::forward<F>(func));
condition.notify_one();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: the started threads may not have reached the first lock yet:

                    // Lock the queue mutex and wait for a task to be available
                    std::unique_lock<std::mutex> lock(queue_mutex);

if that's the case, then this notify_one will notify no one. That means they will all lock that mutex, then proceed to wait (forever) on the condition.

You either want a semaphore or you need to check above before waiting if there already is work to be done.

Copy link
Contributor Author

@r-devulap r-devulap May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this one fully. The ThreadPool constructor is called before the call to qsort_threads which is well before submit_task which in turn calls the enqueue method. Is it possible to get enqueue before the ThreadPool constructor is called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the fact that it's two threads. You've described what's happening on the original thread, not on the thread you've just started for the pool. It's entirely possible (and even likely) that the original thread continues running after a couple hundred cycles spent in kernel mode, while the newly started thread takes several thousand cycles to start, in another core (which may be busy). That means the original thread may advance all the way to this function and lock the queue_mutex before the started thread has had a chance to lock it and call wait(). That means we notify no one here.

void enqueue(F &&func)
{
// Add a new task to the queue and notify one of the worker threads
std::unique_lock<std::mutex> lock(queue_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use C++17 for this code? If so, you can use CTAD:

    std::unique_lock lock(queue_mutex);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We do use C++-17.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you should also use std::jthread instead of plain std::thread

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, so it is. Never mind this point then. You can still use CTAD to avoid writing <std::mutex> in C++17

std::unique_lock<std::mutex> lock(queue_mutex);
stop = true;
lock.unlock();
condition.notify_all();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same race condition problem

pool.task_start();
pool.enqueue([f = std::forward<F>(f), &pool]() {
f();
pool.task_end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: if the first task you submit finishes before you enqueue the second, the thread pool will have notified on done_condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks for catching that. Would a potential solution be to keep the threads running until explicitly asked to shut down? That can be done by adding a member function terminate_pools() which can be called once the qsort_threads() completes executing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But what if you need to do more work? You should keep the threads running for a little while to avoid the cost of startup again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants