Skip to content

parse git-attribute files #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ members = [
"git-features",
"git-commitgraph",
"git-chunk",
"git-quote",
"git-object",
"git-diff",
"git-traverse",
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ Follow linked crate name for detailed status. Please note that all crates follow
* [git-bitmap](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-bitmap)
* [git-revision](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-revision)
* [git-attributes](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-attributes)
* [git-quote](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-quote)
* **idea**
* [git-pathspec](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-pathspec)
* [git-subomdule](https://github.com/Byron/gitoxide/blob/main/crate-status.md#git-submodule)
Expand Down
20 changes: 19 additions & 1 deletion crate-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
* [x] pack only changed objects as derived from input
* [x] base object compression
* [ ] delta compression
* [ ] respect the `delta=false` attribute
* [x] create 'thin' pack, i.e. deltas that are based on objects the other side has.
* [x] parallel implementation that scales perfectly
* [x] entries to pack data iterator
Expand Down Expand Up @@ -209,9 +210,16 @@ Check out the [performance discussion][git-traverse-performance] as well.

### git-attributes

* [ ] parse git-ignore files (aka git-attributes without the attributes or negation)
* [x] parse git-ignore files (aka git-attributes without the attributes or negation)
* [ ] parse git-attributes files
* [ ] create an attributes stack, ideally one that includes 'ignored' status from .gitignore files.
* [ ] support for built-in `binary` macro for `-text -diff -merge`

### git-quote

* **ansi-c**
* [x] quote
* [ ] unquote

### git-pathspec

Expand All @@ -226,6 +234,15 @@ Check out the [performance discussion][git-traverse-performance] as well.
- [ ] handle sparse directories
- [ ] handle sparse index
- [ ] linear scaling with multi-threading up to IO saturation
- supported attributes to affect working tree and index contents
- [ ] eol
- [ ] working-tree-encoding
- …more
- **filtering**
- [ ] `text`
- [ ] `ident`
- [ ] filter processes
- [ ] single-invocation clean/smudge filters
* manage multiple worktrees
* deal with exclude specifications, like .gitignore and other exclude files.

Expand Down Expand Up @@ -384,6 +401,7 @@ See its [README.md](https://github.com/Byron/gitoxide/blob/main/git-lock/README.

### git-bundle
* [ ] create a bundle from an archive
* [ ] respect `export-ignore` and `export-subst`
* [ ] extract a branch from a bundle into a repository
* [ ] API documentation
* [ ] Some examples
Expand Down
3 changes: 2 additions & 1 deletion etc/check-package-size.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ echo "in root: gitoxide CLI"
(enter cargo-smart-release && indent cargo diet -n --package-size-limit 85KB)
(enter git-actor && indent cargo diet -n --package-size-limit 5KB)
(enter git-pathspec && indent cargo diet -n --package-size-limit 5KB)
(enter git-attributes && indent cargo diet -n --package-size-limit 5KB)
(enter git-attributes && indent cargo diet -n --package-size-limit 10KB)
(enter git-index && indent cargo diet -n --package-size-limit 30KB)
(enter git-worktree && indent cargo diet -n --package-size-limit 20KB)
(enter git-quote && indent cargo diet -n --package-size-limit 5KB)
(enter git-revision && indent cargo diet -n --package-size-limit 10KB)
(enter git-bitmap && indent cargo diet -n --package-size-limit 5KB)
(enter git-tempfile && indent cargo diet -n --package-size-limit 25KB)
Expand Down
9 changes: 9 additions & 0 deletions git-attributes/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,20 @@ edition = "2018"
[lib]
doctest = false

[features]
## Data structures implement `serde::Serialize` and `serde::Deserialize`.
serde1 = ["serde", "bstr/serde1"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
git-quote = { version = "^0.1.0", path = "../git-quote" }

bstr = { version = "0.2.13", default-features = false, features = ["std"]}
bitflags = "1.3.2"
unicode-bom = "1.1.4"
quick-error = "2.0.0"
serde = { version = "1.0.114", optional = true, default-features = false, features = ["derive"]}

[dev-dependencies]
git-testtools = { path = "../tests/tools"}
20 changes: 20 additions & 0 deletions git-attributes/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
#![forbid(unsafe_code, rust_2018_idioms)]

use bstr::BStr;

#[derive(PartialEq, Eq, Debug, Hash, Ord, PartialOrd, Clone)]
#[cfg_attr(feature = "serde1", derive(serde::Serialize, serde::Deserialize))]
pub enum State<'a> {
/// The attribute is listed, or has the special value 'true'
Set,
/// The attribute has the special value 'false', or was prefixed with a `-` sign.
Unset,
/// The attribute is set to the given value, which followed the `=` sign.
/// Note that values can be empty.
Value(&'a BStr),
/// The attribute isn't mentioned with a given path or is explicitly set to `Unspecified` using the `!` sign.
Unspecified,
}

pub mod ignore;

pub mod parse;

pub fn parse(buf: &[u8]) -> parse::Lines<'_> {
parse::Lines::new(buf)
}
168 changes: 168 additions & 0 deletions git-attributes/src/parse/attribute.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
use bstr::{BStr, BString, ByteSlice};
use std::borrow::Cow;

#[derive(PartialEq, Eq, Debug, Hash, Ord, PartialOrd, Clone)]
#[cfg_attr(feature = "serde1", derive(serde::Serialize, serde::Deserialize))]
pub enum Kind {
/// A pattern to match paths against
Pattern(BString),
/// The name of the macro to define, always a valid attribute name
Macro(BString),
}

mod error {
use bstr::BString;
use quick_error::quick_error;

quick_error! {
#[derive(Debug)]
pub enum Error {
PatternNegation { line_number: usize, line: BString } {
display("Line {} has a negative pattern, for literal characters use \\!: {}", line_number, line)
}
AttributeName { line_number: usize, attribute: BString } {
display("Line {} has non-ascii characters or starts with '-': {}", line_number, attribute)
}
Unquote(err: git_quote::ansi_c::undo::Error) {
display("Could not unquote attributes line")
from()
source(err)
}
}
}
}
use crate::ignore;
pub use error::Error;

pub struct Lines<'a> {
lines: bstr::Lines<'a>,
line_no: usize,
}

pub struct Iter<'a> {
attrs: bstr::Fields<'a>,
line_no: usize,
}

impl<'a> Iter<'a> {
pub fn new(attrs: &'a BStr, line_no: usize) -> Self {
Iter {
attrs: attrs.fields(),
line_no,
}
}

fn parse_attr(&self, attr: &'a [u8]) -> Result<(&'a BStr, crate::State<'a>), Error> {
let mut tokens = attr.splitn(2, |b| *b == b'=');
let attr = tokens.next().expect("attr itself").as_bstr();
let possibly_value = tokens.next();
let (attr, state) = if attr.first() == Some(&b'-') {
(&attr[1..], crate::State::Unset)
} else if attr.first() == Some(&b'!') {
(&attr[1..], crate::State::Unspecified)
} else {
(
attr,
possibly_value
.map(|v| crate::State::Value(v.as_bstr()))
.unwrap_or(crate::State::Set),
)
};
if !attr_valid(attr) {
return Err(Error::AttributeName {
line_number: self.line_no,
attribute: attr.into(),
});
}
Ok((attr, state))
}
}

fn attr_valid(attr: &BStr) -> bool {
if attr.first() == Some(&b'-') {
return false;
}

attr.bytes().all(|b| {
matches!(b,
b'-' | b'.' | b'_' | b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9')
})
}

impl<'a> Iterator for Iter<'a> {
type Item = Result<(&'a BStr, crate::State<'a>), Error>;

fn next(&mut self) -> Option<Self::Item> {
let attr = self.attrs.next().filter(|a| !a.is_empty())?;
self.parse_attr(attr).into()
}
}

impl<'a> Lines<'a> {
pub fn new(buf: &'a [u8]) -> Self {
let bom = unicode_bom::Bom::from(buf);
Lines {
lines: buf[bom.len()..].lines(),
line_no: 0,
}
}
}

impl<'a> Iterator for Lines<'a> {
type Item = Result<(Kind, crate::ignore::pattern::Mode, Iter<'a>, usize), Error>;

fn next(&mut self) -> Option<Self::Item> {
for line in self.lines.by_ref() {
self.line_no += 1;
let line = skip_blanks(line.into());
if line.first() == Some(&b'#') {
continue;
}
match parse_line(line, self.line_no) {
None => continue,
Some(Ok((pattern, flags, attrs))) => {
return Some(if flags.contains(ignore::pattern::Mode::NEGATIVE) {
Err(Error::PatternNegation {
line: line.into(),
line_number: self.line_no,
})
} else {
Ok((pattern, flags, attrs, self.line_no))
})
}
Some(Err(err)) => return Some(Err(err)),
}
}
None
}
}

fn parse_line(
line: &BStr,
line_number: usize,
) -> Option<Result<(Kind, crate::ignore::pattern::Mode, Iter<'_>), Error>> {
if line.is_empty() {
return None;
}

let (line, attrs): (Cow<'_, _>, _) = if line.starts_with(b"\"") {
let (unquoted, consumed) = match git_quote::ansi_c::undo(line) {
Ok(res) => res,
Err(err) => return Some(Err(err.into())),
};
(unquoted, &line[consumed..])
} else {
line.find_byteset(BLANKS)
.map(|pos| (line[..pos].as_bstr().into(), line[pos..].as_bstr()))
.unwrap_or((line.into(), [].as_bstr()))
};

let (pattern, flags) = super::ignore::parse_line(line.as_ref())?;
Ok((Kind::Pattern(pattern), flags, Iter::new(attrs, line_number))).into()
}

const BLANKS: &[u8] = b" \t\r";

fn skip_blanks(line: &BStr) -> &BStr {
line.find_not_byteset(BLANKS).map(|pos| &line[pos..]).unwrap_or(line)
}
Loading