You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*Highlighted projects with Contributing Info on the front page of Scaladex*
19
19
20
-
Furthermore, I improved the search feature of Scaladex by adding [Github Topics](https://github.com/blog/2309-introducing-topics) to the projects stored in Scaladex so that users can search projects based on Topics. Topics are essentially categories that open-source projects belong to like android, databases, json, ...
20
+
Furthermore, I improved the search feature of Scaladex by adding [GitHub Topics](https://github.com/blog/2309-introducing-topics) to the projects stored in Scaladex so that users can search projects based on Topics. Topics are essentially categories that open-source projects belong to like android, databases, json, ...
*Topics for projects on the front page of Scaladex*
@@ -39,9 +39,9 @@ Here's some more info about how each piece of contributing info gets set:
39
39
- chatroom - auto-populated to a project's gitter room if it has one
40
40
- contributing guide - auto-populated to a project's CONTRIBUTING.md if it has one
41
41
42
-
As an example, the [Scaladex project](https://github.com/scalacenter/scaladex) (for the code behind the website) uses the label "low-hanging fruit" to mark beginner-friendly issues in Github so this label can be set by the maintainer in the edit project page and all the [issues with this label](https://github.com/scalacenter/scaladex/labels/low-hanging%20fruit) will be stored for this project. It also has a [gitter room](https://gitter.im/scalacenter/scaladex) for chatting and a [contributing guide](https://github.com/scalacenter/scaladex/blob/master/CONTRIBUTING.md) which will be auto-populated for the project when all the projects are indexed.
42
+
As an example, the [Scaladex project](https://github.com/scalacenter/scaladex) (for the code behind the website) uses the label "low-hanging fruit" to mark beginner-friendly issues in GitHub so this label can be set by the maintainer in the edit project page and all the [issues with this label](https://github.com/scalacenter/scaladex/labels/low-hanging%20fruit) will be stored for this project. It also has a [gitter room](https://gitter.im/scalacenter/scaladex) for chatting and a [contributing guide](https://github.com/scalacenter/scaladex/blob/master/CONTRIBUTING.md) which will be auto-populated for the project when all the projects are indexed.
43
43
44
-
Scaladex uses Github's GraphQL API to get a project's beginner-friendly issues, see the [Github Topics](#github-topics) section below for more info about Github's GraphQL API. To get a project's contributing guide, Scaladex uses Github's REST API to send a GET request to the Community Profile API which will return links to a project's contributing guide, code of conduct and license. Lastly, to get a project's chatroom, Scaladex generates a URL for a project's gitter room based on the project's repository name and the organization it belongs to (Ex. <https://gitter.im/scalacenter/scaladex>) and checks if that URL exists.
44
+
Scaladex uses GitHub's GraphQL API to get a project's beginner-friendly issues, see the [GitHub Topics](#github-topics) section below for more info about GitHub's GraphQL API. To get a project's contributing guide, Scaladex uses GitHub's REST API to send a GET request to the Community Profile API which will return links to a project's contributing guide, code of conduct and license. Lastly, to get a project's chatroom, Scaladex generates a URL for a project's gitter room based on the project's repository name and the organization it belongs to (Ex. <https://gitter.im/scalacenter/scaladex>) and checks if that URL exists.
45
45
46
46
You can also find Contributing Info on the front page of Scaladex. Now, Scaladex highlights a random subset of projects which have Contributing Info on the front page of Scaladex. It picks a random selection of projects each time the page is loaded to give the same amount of exposure to all projects with Contributing Info. We hope to highlight and better guide potential contributors to projects and issues that are of interest to them!
47
47
@@ -54,9 +54,9 @@ The Contributing Search page is similar to the normal search page in Scaladex wh
54
54
The code for Contributing Info was committed in 2 pull requests, 1 for the [back-end](https://github.com/scalacenter/scaladex/pull/448) and 1 for the [front-end](https://github.com/scalacenter/scaladex/pull/467).
55
55
56
56
### Challenge
57
-
One interesting challenge I ran into was filtering a project's issues based on a search term. For example, say a user is searching for all issues related to documentation so they enter "docs" as a search term in the Contributing Search page. A project called akka-http has some beginner-friendly issues, one of which is related to documentation with the title "#22874 - Add examples to Sink.actorRefWithAck and Source.queue docs". Since this is the only issue for akka-http that has "docs" in it's title, it should be the only issue that shows up for akka-http in the search results.
57
+
One interesting challenge I ran into was filtering a project's issues based on a search term. For example, say a user is searching for all issues related to documentation so they enter "docs" as a search term in the Contributing Search page. A project called akka-http has some beginner-friendly issues, one of which is related to documentation with the title "#22874 - Add examples to Sink.actorRefWithAck and Source.queue docs". Since this is the only issue for akka-http that has "docs" in its title, it should be the only issue that shows up for akka-http in the search results.
58
58
59
-
All the projects in Scaladex are stored in an [elasticsearch index](https://www.elastic.co/blog/what-is-an-elasticsearch-index) which is like a database in a relational database. Each project stored in elasticsearch has the following fields:
59
+
All the projects in Scaladex are stored in an [Elasticsearch index](https://www.elastic.co/blog/what-is-an-elasticsearch-index) which is like a database in a relational database. Each project stored in Elasticsearch has the following fields:
60
60
```
61
61
name: Text
62
62
description: Text
@@ -69,24 +69,24 @@ github: Object
69
69
title: Text
70
70
...
71
71
```
72
-
Each project has a `github` field of type `Object` containing Github info like a project's readme and it's number of commits. The `github` field has a `beginnerIssues` field which is a list of a project's beginner-friendly issues. The `beginnerIssues` field is of type Nested, which is a special version of the `Object` type used for lists of `Object`s. Each issue in `beginnerIssues` is of type `Object` and it has a `number` field and a `title` field.
72
+
Each project has a `github` field of type `Object` containing GitHub info like a project's readme and its number of commits. The `github` field has a `beginnerIssues` field which is a list of a project's beginner-friendly issues. The `beginnerIssues` field is of type Nested, which is a special version of the `Object` type used for lists of `Object`s. Each issue in `beginnerIssues` is of type `Object` and it has a `number` field and a `title` field.
73
73
74
-
When Scaladex generates a search query to match the input search term ("docs" from the example above) to an elasticsearch query, all you have to do to match the search term against a project's beginner-friendly issues is add a Nested Query against the `github.beginnerIssues` field and specify you want to match the search term against the issue's `title` field. So this is the Nested Query I added to [DataRepository.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-c5de88d14364dfaadbdecdc462d6c7d1R254) which generates the elasticsearch query:
74
+
When Scaladex generates a search query to match the input search term ("docs" from the example above) to an Elasticsearch query, all you have to do to match the search term against a project's beginner-friendly issues is add a Nested Query against the `github.beginnerIssues` field and specify you want to match the search term against the issue's `title` field. So this is the Nested Query I added to [DataRepository.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-c5de88d14364dfaadbdecdc462d6c7d1R254) which generates the Elasticsearch query:
This sort of worked. It would return the correct projects that have issues matching the search term, but instead of returning only the issues related to the search term, it would return all the issues. So in the example with the "docs" search term, all of akka-http's issues would be returned, not just the one related to documentation.
81
81
82
-
After looking through the elasticsearch documentation for awhile, I came across Inner Hits which can be used with Nested Queries to select out the nested inner objects that matched the query. So inner hits would return only the beginner-friendly issues that matched the search term. So I updated the code that creates the Nested Query to also extract the inner hits that get returned:
82
+
After looking through the Elasticsearch documentation for a while, I came across Inner Hits which can be used with Nested Queries to select out the nested inner objects that matched the query. So inner hits would return only the beginner-friendly issues that matched the search term. So I updated the code that creates the Nested Query to also extract the inner hits that get returned:
And then I added the filtered beginner-friendly issues from inner hits to the project that gets created from the results of the elasticsearch query. I did this by updating the code in [package.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-0aa128fca8ddf4b576663970f7fc4940R39) that reads in each result of the elasticsearch query (`hit`) and converts it to a Scala `Project` object which is used by the server elsewhere.
89
+
And then I added the filtered beginner-friendly issues from inner hits to the project that gets created from the results of the Elasticsearch query. I did this by updating the code in [package.scala](https://github.com/scalacenter/scaladex/pull/467/commits/5bcecb58e91c52590e4460189d0415db4d4d2e1f#diff-0aa128fca8ddf4b576663970f7fc4940R39) that reads in each result of the Elasticsearch query (`hit`) and converts it to a Scala `Project` object which is used by the server elsewhere.
To categorize projects in Scaladex, the old process was for project maintainers to manually set keywords for their project in Scaladex. Users could then search for projects based on keywords.
123
123
124
-
Github recently added ["topics"](https://github.com/blog/2309-introducing-topics) to projects stored in Github which are labels that can be set for a project corresponding to categories that a project belongs to. Topics are essentially the same as keywords in Scaladex but maintainers could set them for their project in Github instead of having to do so in Scaladex.
124
+
GitHub recently added ["topics"](https://github.com/blog/2309-introducing-topics) to projects stored in GitHub which are labels that can be set for a project corresponding to categories that a project belongs to. Topics are essentially the same as keywords in Scaladex but maintainers could set them for their project in GitHub instead of having to do so in Scaladex.
125
125
126
-
Topics are part of Github’s new [GraphQL API](https://developer.github.com/v4/) which is meant to eventually replace their old [REST API](https://developer.github.com/v3/). [GraphQL](https://graphql.org/) is a "A query language for your API". It is both a query language and a graph-structured schema which stores data with nodes as objects and edges as relationships between objects. It was developed by Facebook and is different from a traditional REST API by having all API requests go to one route and having a query defined in the request body to specify precisely what data you want.
126
+
Topics are part of GitHub’s new [GraphQL API](https://docs.github.com/en/graphql) which is meant to eventually replace their old [REST API](https://docs.github.com/en/rest). [GraphQL](https://graphql.org/) is a "A query language for your API". It is both a query language and a graph-structured schema which stores data with nodes as objects and edges as relationships between objects. It was developed by Facebook and is different from a traditional REST API by having all API requests go to one route and having a query defined in the request body to specify precisely what data you want.
127
127
128
-
With Github's REST API, you have to make multiple requests to different routes to get project info about multiple projects. And when you make a request, all the data related to that request would be returned. For example, if you wanted to get the most recent 3 issues created for 5 different projects, you would make 5 requests to 5 different routes for each project. Each request would return all the project’s issues. With the GraphQL API, all requests are made to the same route and in the body of the request you input a GraphQL query which specifies exactly what information you want and for which projects. So for the example above of getting the most recent 3 issues created for 5 projects, you would make 1 request to 1 route containing a query to get only the 3 most recent issues for the 5 projects and only those 3 issues for each of the projects would be returned. This results in less requests to Github’s API and less data returned in each response.
128
+
With GitHub's REST API, you have to make multiple requests to different routes to get project info about multiple projects. And when you make a request, all the data related to that request would be returned. For example, if you wanted to get the most recent 3 issues created for 5 different projects, you would make 5 requests to 5 different routes for each project. Each request would return all the project’s issues. With the GraphQL API, all requests are made to the same route and in the body of the request you input a GraphQL query which specifies exactly what information you want and for which projects. So for the example above of getting the most recent 3 issues created for 5 projects, you would make 1 request to 1 route containing a query to get only the 3 most recent issues for the 5 projects and only those 3 issues for each of the projects would be returned. This results in less requests to GitHub’s API and less data returned in each response.
129
129
130
-
So I replaced keywords with topics for projects in Scaladex and used Github’s new GraphQL API to fetch the topics. These topics are fetched for all projects when the server is indexed. A lot more projects have topics than keywords (which had to manually be set by maintainers in Scaladex), so this greatly improved the ability to search for projects based on categories in Scaladex since there are a lot more projects with categories.
130
+
So I replaced keywords with topics for projects in Scaladex and used GitHub’s new GraphQL API to fetch the topics. These topics are fetched for all projects when the server is indexed. A lot more projects have topics than keywords (which had to manually be set by maintainers in Scaladex), so this greatly improved the ability to search for projects based on categories in Scaladex since there are a lot more projects with categories.
131
131
132
-
Here's the code I added to [GithubDownload.scala](https://github.com/scalacenter/scaladex/commit/a771d7a70fdb7aaa0003abf48aaa87a622d89f03#diff-e03c541cf1bd7ec0322a9a6571160bebR339) which contains the GraphQL query that is put in the POST body of the request sent to Github's GraphQL API to fetch topics for a project. You can see the graph-structure of GraphQL in the query. The query first gets a `repository` node and then accesses it's topics through the `repositoryTopics` edge/connection. Then it selects the names of the topics belonging to that repository.
132
+
Here's the code I added to [GithubDownload.scala](https://github.com/scalacenter/scaladex/commit/a771d7a70fdb7aaa0003abf48aaa87a622d89f03#diff-e03c541cf1bd7ec0322a9a6571160bebR339) which contains the GraphQL query that is put in the POST body of the request sent to GitHub's GraphQL API to fetch topics for a project. You can see the graph-structure of GraphQL in the query. The query first gets a `repository` node and then accesses its topics through the `repositoryTopics` edge/connection. Then it selects the names of the topics belonging to that repository.
0 commit comments