Skip to content

Fix wrong allocation size of ringbuffer #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2021

Conversation

RinHizakura
Copy link
Contributor

I'm not sure why the actual size is calculated by ALIGN_FLOOR.

Since the constraint of count is that it must be the power of 2, so it is valid to be like 2 or 4. However, in these cases, ALIGN_FLOOR won't allocate any size for *ring[] in ringbuf_t when CACHE_LINE_SIZE == 64. It makes the overflow access of memory which can be detected by valgrind or ASAN. So I think ALIGN_CEIL should be the correct one in general? Do I miss some reason for using ALIGN_FLOOR?

@@ -80,7 +81,7 @@ ssize_t ringbuf_get_memsize(const unsigned count)
return -EINVAL;

ssize_t sz = sizeof(ringbuf_t) + count * sizeof(void *);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way is to mention the size, which is multiple of cache line size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be alright to limit the size to the multiple of cache line size. But in this situation, I think ALIGN_FLOOR won't change anything of the original sz. Is there any reason to use ALIGN_FLOOR here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RinHizakura, you are right. ALIGN_CEIL should be used for allocating dynamic memory. I was thinking of the memory pool which manages the elements where ALIGN_FLOOR was used.

@@ -56,6 +56,7 @@ typedef struct {
#define RING_SIZE_MASK (unsigned) (0x0fffffff) /**< Ring size mask */
#define ALIGN_FLOOR(val, align) \
(typeof(val))((val) & (~((typeof(val))((align) -1))))
#define ALIGN_CEIL(val, align) (ALIGN_FLOOR((val) -1, align) + (align))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reimplement ALIGN_CEIL as following:

(typeof(val))((val) + (-((typeof)(val)) & ((align) - 1)))

Then, remove ALIGN_FLOOR.

#define ALIGN_FLOOR(val, align) \
(typeof(val))((val) & (~((typeof(val))((align) -1))))
#define ALIGN_CEIL(val, align) \
(typeof(val))((val) + (-(typeof(val))(val) & ((align) -1)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this is what you want, thanks!

@jserv jserv merged commit 937dfef into sysprog21:master May 21, 2021
@RinHizakura RinHizakura deleted the pr_ringbuffer branch June 8, 2021 06:36
linD026 added a commit to linD026/concurrent-programs that referenced this pull request Apr 22, 2023
Currently, the reader_func is counting the number of grace periods based
on the value of dut, which is updated by mthpc_rcu_replace_pointer.
However, the value of dut actually represents the time we update the
value, not the number of grace periods.

Also, the original method might result in incorrect counting if someone
tried to update the gp_idx while others who saw the same dut value with
prev_count still depend on the old gp_idx to increase the counter.

To fix the problem, instead of relying on the dut value to increase the
gp_idx, we manually increase gp_idx on write side. Then, we can easily
determine the gp on read side.

For dut value, we simply check the old count value is not greater than
the newest one.

Additionally, since synchronize_rcu is quite slow, readers generally
will pass through the critical section during the first grace period.
To generate more realistic output, we add a delay on read side before
entering the critical section.

Before:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]  100 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    0 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    0 reader(s)
[grace period sysprog21#5]    0 reader(s)

After, we added a delay:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]   76 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    1 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    3 reader(s)
[grace period sysprog21#5]   20 reader(s)
linD026 added a commit to linD026/concurrent-programs that referenced this pull request Apr 22, 2023
Currently, the reader_func is counting the number of grace periods based
on the value of dut, which is updated by rcu_assign_pointer. However, the
value of dut actually represents the time we update the value, not the
number of grace periods.

Also, the original method might result in incorrect counting if someone
tried to update the gp_idx while others who saw the same dut value with
prev_count still depend on the old gp_idx to increase the counter.

To fix the problem, instead of relying on the dut value to increase the
gp_idx, we manually increase gp_idx on write side. Then, we can easily
determine the gp on read side.

For dut value, we simply check the old count value is not greater than
the newest one.

Additionally, since synchronize_rcu is quite slow, readers generally
will pass through the critical section during the first grace period.
To generate more realistic output, we add a delay on read side before
entering the critical section.

Before:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]  100 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    0 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    0 reader(s)
[grace period sysprog21#5]    0 reader(s)

After, we added a delay:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]   76 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    1 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    3 reader(s)
[grace period sysprog21#5]   20 reader(s)
linD026 added a commit to linD026/concurrent-programs that referenced this pull request Apr 22, 2023
Currently, the reader_func is counting the number of grace periods based
on the value of dut, which is updated by rcu_assign_pointer. However,
the value of dut actually represents the time we update the value, not
the number of grace periods.

Also, the original method might result in incorrect counting if someone
tried to update the gp_idx while others who saw the same dut value with
prev_count still depend on the old gp_idx to increase the counter.

To fix the problem, instead of relying on the dut value to increase the
gp_idx, we manually increase gp_idx on write side. Then, we can easily
determine the gp on read side.

For dut value, we simply check the old count value is not greater than
the newest one.

Additionally, since synchronize_rcu is quite slow, readers generally
will pass through the critical section during the first grace period.
To generate more realistic output, we add a delay on read side before
entering the critical section.

Before:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]  100 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    0 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    0 reader(s)
[grace period sysprog21#5]    0 reader(s)

After, we added a delay:

100 reader(s), 5 update run(s), 6 grace period(s)
[grace period #0]   76 reader(s)
[grace period sysprog21#1]    0 reader(s)
[grace period sysprog21#2]    1 reader(s)
[grace period sysprog21#3]    0 reader(s)
[grace period sysprog21#4]    3 reader(s)
[grace period sysprog21#5]   20 reader(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants