-
Notifications
You must be signed in to change notification settings - Fork 94
Add query documentation #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
87e282b
005f4dc
91e8ab0
93ba3f0
c3a8503
3be533a
b5d74c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
# Zoekt Query Language Guide | ||
|
||
This guide explains the Zoekt query language, used for searching text within Git repositories. Zoekt queries allow combining multiple filters and expressions using logical operators, negations, and grouping. Here's how to craft queries effectively. | ||
|
||
--- | ||
|
||
## Syntax Overview | ||
|
||
A query is made up of expressions. An **expression** can be: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's probably worth mentioning that all expressions are evaluated as regular expressions (except for certain exceptions like if a expression is wrapped in |
||
- A negation (e.g., `-`), | ||
- A field (e.g., `repo:`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: thoughts on calling this concept a "prefix"? |
||
- A grouping (e.g., parentheses `()`), | ||
|
||
Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`. | ||
|
||
--- | ||
|
||
## Query Components | ||
|
||
### 1. **Fields** | ||
|
||
Fields restrict your query to specific criteria. Here's a list of fields and their usage: | ||
|
||
| Field | Aliases | Values | Description | Examples | | ||
|--------------|---------|------------------------|------------------------------------------------------------|----------------------------------------| | ||
| `archived:` | `a:` | `yes` or `no` | Filters archived repositories. | `archived:yes` | | ||
| `case:` | `c:` | `yes`, `no`, or `auto` | Matches case-sensitive or insensitive text. | `case:yes content:"Foo"` | | ||
| `content:` | `c:` | Text (string or regex) | Searches content of files. | `content:"search term"` | | ||
| `file:` | `f:` | Text (string or regex) | Searches file names. | `file:"main.go"` | | ||
| `fork:` | `f:` | `yes` or `no` | Filters forked repositories. | `fork:no` | | ||
| `lang:` | `l:` | Text | Filters by programming language. | `lang:python` | | ||
| `public:` | | `yes` or `no` | Filters public repositories. | `public:yes` | | ||
| `regex:` | | Regex pattern | Matches content using a regular expression. | `regex:/foo.*bar/` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is another zoekt prefix that is unfamiliar and is a bit of a oddball: if everything is evaluated as a regular expression anyways, not sure when this would be useful. For example: the results are the same |
||
| `repo:` | `r:` | Text (string or regex) | Filters repositories by name. | `repo:"github.com/user/project"` | | ||
| `sym:` | | Text | Searches for symbol names. | `sym:"MyFunction"` | | ||
| `branch:` | `b:` | Text | Searches within a specific branch. | `branch:main` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we replace the Note that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack |
||
| `type:` | `t:` | `filematch`, `filename`, `file`, or `repo` | Limits result types. | `type:filematch` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've never actually seen this I tried
Queries with type All in all, it might make sense to omit this prefix from the docs for now until we are certain we "officially" want to support it. Curious to hear your thoughts though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed |
||
|
||
--- | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably have a "Exact Matching" section that explains how the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack |
||
### 2. **Negation** | ||
|
||
Negate an expression using the `-` symbol. | ||
|
||
#### Examples: | ||
- Exclude a repository: | ||
```plaintext | ||
-repo:"github.com/example/repo" | ||
``` | ||
- Exclude a language: | ||
```plaintext | ||
-lang:javascript | ||
``` | ||
|
||
--- | ||
|
||
### 3. **Grouping** | ||
|
||
Group queries using parentheses `()` to create complex logic. | ||
|
||
#### Examples: | ||
- Match either of two repositories: | ||
```plaintext | ||
(repo:repo1 or repo:repo2) | ||
``` | ||
- Find test in either python or javascript files: | ||
```plaintext | ||
content:test (lang:python or lang:javascript) | ||
``` | ||
|
||
--- | ||
|
||
### 4. **Logical Operators** | ||
|
||
Use `or` to combine multiple expressions. | ||
|
||
#### Examples: | ||
- Match files in either of two languages: | ||
```plaintext | ||
lang:go or lang:java | ||
``` | ||
|
||
`and` boolean operator is applied automatically when expressions are separated by a space. | ||
|
||
--- | ||
|
||
## Special Query Values | ||
|
||
- **Boolean Values**: | ||
Use `yes` or `no` for fields like `archived:` or `fork:`. | ||
|
||
- **Text Fields**: | ||
Text fields (`content:`, `repo:`, etc.) accept: | ||
- Strings: `"my text"` | ||
- Regular expressions: `/my.*regex/` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: everything is always evaluated as a regular expression, so wrapping things in "/" is not needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack |
||
|
||
- **Escape Characters**: | ||
To include special characters, use backslashes (`\`). | ||
|
||
#### Examples: | ||
- Match the string `foo"bar`: | ||
```plaintext | ||
content:"foo\"bar" | ||
``` | ||
- Match the regex `foo.*bar`: | ||
```plaintext | ||
content:/foo.*bar/ | ||
``` | ||
|
||
--- | ||
|
||
## Advanced Examples | ||
|
||
1. **Search for content in Python files in public repositories**: | ||
```plaintext | ||
lang:python public:yes content:"my_function" | ||
``` | ||
|
||
2. **Exclude archived repositories and match a regex**: | ||
```plaintext | ||
archived:no regex:/error.*handler/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same nit on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need / or regex if you want to do a regex with spaces in it. |
||
``` | ||
|
||
3. **Find files named `README.md` in forks**: | ||
```plaintext | ||
file:"README.md" fork:yes | ||
``` | ||
|
||
4. **Search for a specific branch**: | ||
```plaintext | ||
branch:main content:"TODO" | ||
``` | ||
|
||
5. **Combine multiple fields**: | ||
```plaintext | ||
(repo:"github.com/example" or repo:"github.com/test") and lang:go | ||
``` | ||
|
||
--- | ||
|
||
## Tips | ||
|
||
1. **Combine Filters**: You can combine as many fields as needed. For instance: | ||
```plaintext | ||
repo:"github.com/example" lang:go content:"init" | ||
``` | ||
|
||
2. **Use Regular Expressions**: Make complex content searches more powerful: | ||
```plaintext | ||
content:/func\s+\w+\s*\(/ | ||
KonradStanski marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
3. **Case Sensitivity**: Use `case:yes` for exact matches: | ||
```plaintext | ||
case:yes content:"ExactMatch" | ||
``` | ||
|
||
4. **Match Specific File Types**: | ||
```plaintext | ||
file:".*\.go" content:"package main" | ||
``` | ||
|
||
### EBNF Summary | ||
|
||
```ebnf | ||
query = expression , { "or" , expression } ; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: does this encode the fact that expressions can also be separated by spaces? Maybe we want something like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'll omit this for now, then write a lezer grammar for zoekt query language and integrate it into the search bar. |
||
|
||
expression = negation | ||
| grouping | ||
| field ; | ||
|
||
negation = "-" , expression ; | ||
|
||
grouping = "(" , query , ")" ; | ||
|
||
field = ( ( "archived:" | "a:" ) , boolean ) | ||
| ( ( "case:" | "c:" ) , ("yes" | "no" | "auto") ) | ||
| ( ( "content:" | "c:" ) , text ) | ||
| ( ( "file:" | "f:" ) , text ) | ||
| ( ( "fork:" | "f:" ) , boolean ) | ||
| ( ( "lang:" | "l:" ) , text ) | ||
| ( ( "public:" ) , boolean ) | ||
| ( ( "regex:" ) , text ) | ||
| ( ( "repo:" | "r:" ) , text ) | ||
| ( ( "sym:" ) , text ) | ||
| ( ( "branch:" | "b:" ) , text ) | ||
| ( ( "type:" | "t:" ) , type ); | ||
|
||
boolean = "yes" | "no" ; | ||
text = string | regex ; | ||
string = '"' , { character | escape } , '"' ; | ||
regex = '/' , { character | escape } , '/' ; | ||
|
||
type = "filematch" | "filename" | "file" | "repo" ; | ||
``` | ||
|
||
--- | ||
|
||
### **Complex Query Examples** | ||
|
||
1. **Search for functions in Go files with TODO comments** | ||
```plaintext | ||
lang:go /func .* \/\/ TODO/ | ||
``` | ||
Matches Go files where functions are annotated with TODO comments. | ||
|
||
2. **Find Python test files containing the word "assert"** | ||
```plaintext | ||
lang:python file:".*test.*\\.py" content:"assert" | ||
``` | ||
Looks for test files in Python containing assertions. | ||
|
||
3. **Search for all README files mentioning "installation"** | ||
```plaintext | ||
file:"README.*" content:"installation" | ||
``` | ||
Matches README files across repositories containing the word "installation." | ||
|
||
4. **Find public repositories containing "openapi" in YAML files** | ||
```plaintext | ||
file:".*\\.yaml$" content:"openapi" | ||
``` | ||
Matches YAML files mentioning "openapi." | ||
|
||
5. **Search Java repositories for method signatures matching `public static`** | ||
```plaintext | ||
lang:java /public static .*\\(/ | ||
``` | ||
Finds Java methods declared as public static. | ||
|
||
6. **Find JavaScript files importing React** | ||
```plaintext | ||
lang:javascript content:"import React from 'react';" | ||
``` | ||
Matches JavaScript files importing React. | ||
|
||
7. **Find all Markdown files mentioning "license" or "agreement"** | ||
```plaintext | ||
file:".*\\.md" (content:"license" or content:"agreement") | ||
``` | ||
Targets Markdown files containing either "license" or "agreement." | ||
|
||
8. **Find log statements in Go files** | ||
```plaintext | ||
lang:go /"log\\.(Print|Printf|Fatal|Panic).*\\(.*\\)"/ | ||
``` | ||
Matches Go log statements. | ||
|
||
9. **Look for Python repositories containing Flask imports in their `app.py` file** | ||
```plaintext | ||
lang:python file:"app\\.py" content:"from flask import .*" | ||
``` | ||
Matches Flask applications. | ||
|
||
10. **Search for JSON files containing an array of objects** | ||
```plaintext | ||
file:".*\\.json" /\\[\\s*{.*/ | ||
``` | ||
Finds JSON files with object arrays. | ||
|
||
11. **Search for Kubernetes YAML files containing `kind: Deployment`** | ||
```plaintext | ||
file:".*\\.yaml" content:"kind: Deployment" | ||
``` | ||
Matches Kubernetes deployment files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We might want to title to be something like "Search Query Language Guide" and modify the intro a bit to something along the lines of:
Reasoning: there are some differences between the prefixes we support vs. what Zoekt supports (e.g.,
revision:
. See my other comment), so want to make sure it's clear to the reader that there isn't always a 1:1 between queries used in Sourcebot vs. queries used in regular Zoekt.