Skip to content

Fix the missing users in the subscriptions endpoint. #26797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

CaiCandong
Copy link
Member

@CaiCandong CaiCandong commented Aug 29, 2023

Fix #26750

Formal description

For issue cur_issue_id of repo cur_repo_id, the following 4 sets exist for its related subscribed users:

  1. Subscribing to the issue explicitly. (certain
SELECT `issue_watch`.user_id
FROM issue_watch
WHERE `issue_watch`.issue_id = ${cur_issue_id} 
AND `issue_watch`.is_watching = true
  1. Do not subscribe to the issue explicitly.(impossible)
SELECT `issue_watch`.user_id
FROM issue_watch
WHERE `issue_watch`.issue_id = ${cur_issue_id}
AND `issue_watch`.is_watching = false
  1. Participated in the issue(possible)
SELECT `comment`.poster_id
FROM comment
WHERE `comment`.issue_id = ${cur_issue_id}
AND `comment`.type IN (CommentTypeComment, CommentTypeCode, CommentTypeReview)
  1. Subscribed to the issue's repository, and did not choose to decline the subscription.(possible)
SELECT `watch`.user_id
FROM watch
WHERE `watch`.repo_id = ${cur_repo_id}
AND `watch`.mode not in (WatchModeNone,WatchModeDont)

So, the final subscription list is $set_1 \cup set_3 \cup set_4 -set_2 $

Raw SQL
SELECT
    `id`, `lower_name`, `name`, `full_name`, `email`, `keep_email_private`,
    `email_notifications_preference`, `passwd`, `passwd_hash_algo`,
    `must_change_password`, `login_type`, `login_source`, `login_name`, `type`,
    `location`, `website`, `rands`, `salt`, `language`, `description`,
    `created_unix`, `updated_unix`, `last_login_unix`, `last_repo_visibility`,
    `max_repo_creation`, `is_active`, `is_admin`, `is_restricted`,
    `allow_git_hook`, `allow_import_local`, `allow_create_organization`,
    `prohibit_login`, `avatar`, `avatar_email`, `use_custom_avatar`,
    `num_followers`, `num_following`, `num_stars`, `num_repos`, `num_teams`,
    `num_members`, `visibility`, `repo_admin_change_team_access`,
    `diff_view_style`, `theme`, `keep_activity_private`
FROM
    `user`
WHERE
    (
                id IN (
                SELECT `issue_watch`.user_id
                FROM issue_watch
                WHERE `issue_watch`.is_watching=true
                  AND `issue_watch`.issue_id=114
            )
            OR id IN (
            SELECT `watch`.user_id
            FROM watch
            WHERE `watch`.repo_id=45
              AND `watch`.mode NOT IN (2,0)
        )
            OR id IN (
            SELECT `comment`.poster_id
            FROM comment
            WHERE `comment`.issue_id=114
              AND `comment`.type IN ('comment', 'code', 'review')
        )
        )
  AND id NOT IN (
    SELECT `issue_watch`.user_id
    FROM issue_watch
    WHERE `issue_watch`.is_watching=false
      AND `issue_watch`.issue_id=114
)
  AND `user`.is_active=true
  AND `user`.prohibit_login=false
LIMIT 10;

Query Plan

image

-> Limit: 10 row(s)  (cost=1.04 rows=1) (actual time=0.153..0.606 rows=2 loops=1)
    -> Nested loop antijoin  (cost=1.04 rows=1) (actual time=0.130..0.583 rows=2 loops=1)
        -> Filter: ((`user`.prohibit_login = false) and (<in_optimizer>(`user`.id,`user`.id in (select #2)) or <in_optimizer>(`user`.id,<exists>(select #3)) or <in_optimizer>(`user`.id,`user`.id in (select #4))))  (cost=0.62 rows=1) (actual time=0.125..0.576 rows=2 loops=1)
            -> Index lookup on user using IDX_user_is_active (is_active=true)  (cost=0.62 rows=12) (actual time=0.077..0.099 rows=12 loops=1)
            -> Select #2 (subquery in condition; run only once)
                -> Filter: ((`user`.id = `<materialized_subquery>`.user_id))  (cost=0.30..0.30 rows=0.2) (actual time=0.004..0.004 rows=0 loops=13)
                    -> Limit: 1 row(s)  (cost=0.28..0.28 rows=0.2) (actual time=0.004..0.004 rows=0 loops=13)
                        -> Index lookup on <materialized_subquery> using <auto_distinct_key> (user_id=`user`.id)  (actual time=0.002..0.002 rows=0 loops=13)
                            -> Materialize with deduplication  (cost=0.28..0.28 rows=0.2) (actual time=0.015..0.015 rows=1 loops=1)
                                -> Filter: (issue_watch.is_watching = true)  (cost=0.27 rows=0.2) (actual time=0.007..0.008 rows=1 loops=1)
                                    -> Index lookup on issue_watch using UQE_issue_watch_watch (issue_id=114)  (cost=0.27 rows=1) (actual time=0.007..0.008 rows=1 loops=1)
            -> Select #3 (subquery in condition; dependent)
                -> Limit: 1 row(s)  (actual time=0.003..0.003 rows=0 loops=11)
                    -> Filter: (watch.`mode` not in (2,0))  (actual time=0.003..0.003 rows=0 loops=11)
                        -> Single-row index lookup on watch using UQE_watch_watch (user_id=<cache>(`user`.id), repo_id=45)  (actual time=0.003..0.003 rows=0 loops=11)
            -> Select #4 (subquery in condition; run only once)
                -> Filter: ((`user`.id = `<materialized_subquery>`.poster_id))  (cost=0.80..0.80 rows=1) (actual time=0.377..0.377 rows=0 loops=1)
                    -> Limit: 1 row(s)  (cost=0.70..0.70 rows=1) (actual time=0.377..0.377 rows=0 loops=1)
                        -> Index lookup on <materialized_subquery> using <auto_distinct_key> (poster_id=`user`.id)  (actual time=0.377..0.377 rows=0 loops=1)
                            -> Materialize with deduplication  (cost=0.70..0.70 rows=1) (actual time=0.377..0.377 rows=0 loops=1)
                                -> Filter: ((`comment`.`type` in ('comment','code','review')) and (`comment`.issue_id = 114))  (cost=0.60 rows=1) (actual time=0.375..0.375 rows=0 loops=1)
                                    -> Intersect rows sorted by row ID  (cost=0.60 rows=1) (actual time=0.175..0.175 rows=0 loops=1)
                                        -> Index range scan on comment using IDX_comment_type over (type = 0)  (cost=0.25 rows=5) (actual time=0.122..0.124 rows=5 loops=1)
                                        -> Index range scan on comment using IDX_comment_issue_id over (issue_id = 114)  (cost=0.25 rows=7) (actual time=0.045..0.045 rows=1 loops=1)
        -> Filter: (issue_watch.is_watching = false)  (cost=0.33 rows=1) (actual time=0.003..0.003 rows=0 loops=2)
            -> Single-row index lookup on issue_watch using UQE_issue_watch_watch (issue_id=114, user_id=`user`.id)  (cost=0.33 rows=1) (actual time=0.003..0.003 rows=0 loops=2)

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Aug 29, 2023
@CaiCandong CaiCandong marked this pull request as draft August 30, 2023 00:45
@CaiCandong CaiCandong marked this pull request as ready for review August 30, 2023 03:09
@CaiCandong CaiCandong changed the title Fix missing watcher bug in endpoint Fix the missing watcher of the participated bug in the endpoint. Aug 30, 2023
@CaiCandong CaiCandong changed the title Fix the missing watcher of the participated bug in the endpoint. Fix the missing watcher of the participated bug in the endpoint Aug 30, 2023
@CaiCandong CaiCandong changed the title Fix the missing watcher of the participated bug in the endpoint Fix the missing participated watcher in the subscriptions endpoint. Aug 30, 2023
CaiCandong and others added 3 commits August 30, 2023 16:10
@CaiCandong
Copy link
Member Author

@puni9869

SELECT id
	FROM `user`
	WHERE (
	    id IN (
	        SELECT `comment`.poster_id
	        FROM comment
	        WHERE `comment`.issue_id = ?
	        AND `comment`.type IN (?, ?, ?)
	        AND `comment`.poster_id NOT IN (
	            SELECT `issue_watch`.user_id
	            FROM issue_watch
	            WHERE `issue_watch`.issue_id = ?
	            AND `issue_watch`.is_watching = ?
	        )
	    )
	    OR id IN (
	        SELECT `issue_watch`.user_id
	        FROM issue_watch
	        WHERE `issue_watch`.issue_id = ?
	        AND `issue_watch`.is_watching = ?
	    )
	)
	AND `user`.is_active = ?
	AND `user`.prohibit_login = ?

@puni9869
Copy link
Member

puni9869 commented Sep 1, 2023

@puni9869

SELECT id
	FROM `user`
	WHERE (
	    id IN (
	        SELECT `comment`.poster_id
	        FROM comment
	        WHERE `comment`.issue_id = ?
	        AND `comment`.type IN (?, ?, ?)
	        AND `comment`.poster_id NOT IN (
	            SELECT `issue_watch`.user_id
	            FROM issue_watch
	            WHERE `issue_watch`.issue_id = ?
	            AND `issue_watch`.is_watching = ?
	        )
	    )
	    OR id IN (
	        SELECT `issue_watch`.user_id
	        FROM issue_watch
	        WHERE `issue_watch`.issue_id = ?
	        AND `issue_watch`.is_watching = ?
	    )
	)
	AND `user`.is_active = ?
	AND `user`.prohibit_login = ?

Could you optimized this query,
see the schema of the table and run the ANALYSE <query> you will see some hints on indexes and more.
https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan

The query plan.

Co-authored-by: delvh <dev.lh@web.de>
@CaiCandong
Copy link
Member Author

CaiCandong commented Sep 4, 2023

@puni9869

Query Plan

image

-> Filter: ((`user`.prohibit_login = 0) and (`user`.is_active = 1) and (<in_optimizer>(`user`.id,`user`.id in (select #2)) or <in_optimizer>(`user`.id,`user`.id in (select #4))))  (cost=1.45 rows=1) (actual time=0.088..0.097 rows=1 loops=1)
    -> Table scan on user  (cost=1.45 rows=12) (actual time=0.032..0.034 rows=12 loops=1)
    -> Select #2 (subquery in condition; run only once)
        -> Filter: ((`user`.id = `<materialized_subquery>`.poster_id))  (cost=1.07..1.07 rows=0.2) (actual time=0.014..0.014 rows=0 loops=1)
            -> Limit: 1 row(s)  (cost=1.05..1.05 rows=0.2) (actual time=0.013..0.013 rows=0 loops=1)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (poster_id=`user`.id)  (actual time=0.013..0.013 rows=0 loops=1)
                    -> Materialize with deduplication  (cost=1.05..1.05 rows=0.2) (actual time=0.012..0.012 rows=0 loops=1)
                        -> Filter: ((`comment`.`type` in (0,1,2)) and <in_optimizer>(`comment`.poster_id,<exists>(select #3) is false))  (cost=1.02 rows=0.2) (actual time=0.010..0.010 rows=0 loops=1)
                            -> Index lookup on comment using IDX_comment_issue_id (issue_id=102)  (cost=1.02 rows=1) (actual time=0.009..0.009 rows=0 loops=1)
                            -> Select #3 (subquery in condition; dependent)
                                -> Limit: 1 row(s)  (never executed)
                                    -> Filter: ((issue_watch.is_watching = 0) and (issue_watch.issue_id = 102) and <if>(outer_field_is_not_null, (<cache>(`comment`.poster_id) = issue_watch.user_id), true))  (never executed)
                                        -> Alternative plans for IN subquery: Index lookup unless user_id IS NULL  (never executed)
                                            -> Single-row index lookup on issue_watch using UQE_issue_watch_watch (user_id=<cache>(`comment`.poster_id), issue_id=102)  (never executed)
                                            -> Table scan on issue_watch  (never executed)
    -> Select #4 (subquery in condition; run only once)
        -> Filter: ((`user`.id = `<materialized_subquery>`.user_id))  (cost=1.05..1.05 rows=1) (actual time=0.003..0.003 rows=0 loops=13)
            -> Limit: 1 row(s)  (cost=0.95..0.95 rows=1) (actual time=0.003..0.003 rows=0 loops=13)
                -> Index lookup on <materialized_subquery> using <auto_distinct_key> (user_id=`user`.id)  (actual time=0.003..0.003 rows=0 loops=13)
                    -> Materialize with deduplication  (cost=0.95..0.95 rows=1) (actual time=0.032..0.032 rows=1 loops=1)
                        -> Filter: ((issue_watch.issue_id = 102) and (issue_watch.is_watching = 1))  (cost=0.85 rows=1) (actual time=0.024..0.027 rows=1 loops=1)
                            -> Table scan on issue_watch  (cost=0.85 rows=6) (actual time=0.023..0.026 rows=6 loops=1)

@CaiCandong
Copy link
Member Author

@puni9869 I adjusted the order of the index fields of issue_watch, the query selected by index except the user table.

image

@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 4, 2023
@CaiCandong CaiCandong marked this pull request as draft September 4, 2023 07:44
@CaiCandong CaiCandong marked this pull request as ready for review September 4, 2023 12:36
@CaiCandong CaiCandong changed the title Fix the missing participated watcher in the subscriptions endpoint. Fix the missing users in the subscriptions endpoint. Sep 5, 2023
@CaiCandong CaiCandong mentioned this pull request Sep 5, 2023
3 tasks
}
// Drop the old index :(user_id,issue_id)
// Then automatically created new index => (issue_id,user_id)
return x.DropIndexes(new(IssueWatch))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the new indexes?

@CaiCandong CaiCandong closed this Oct 4, 2023
@CaiCandong CaiCandong deleted the bugfix/Fix-missing-watcher-in-endpoint branch October 4, 2023 12:36
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Jan 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue subscription API will not return all participants
6 participants