What's the unit of character in Point

In [Point](https://github.com/syntax-tree/unist/blob/master/readme.md#point) section, it's mentions:
> The `line` field (1-indexed integer) represents a line in a source file. The `column` field (1-indexed integer) represents a column in a source file. The `offset` field (0-indexed integer) represents a character in a source file.

What's the unit of 'character' and 'column'? Is it UTF-16 _code unit_ (used in JavaScript) or Unicode _code point_? See Wikipedia:
> [UTF-16] encoding is variable-length, as code points are encoded with one or two 16-bit code units

I tried using remark to parse this markdown piece:
```markdown
a𠮷b
```
Here, `𠮷` is one Unicode code point that can not be encoded into one UTF-16 code unit. In JavaScript, because String uses UTF-16, so:
```js
'a𠮷b'.length
//=> 4
```
But in other languages like Python:
```python
len('a𠮷b')
#=> 3
```
As for remark, the above markdown piece is parsed into:
```json
{
  "type": "text",
  "value": "a𠮷b",
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 1,
      "column": 5,
      "offset": 4
    },
    "indent": []
  }
}
```
The `column` of `end` is 5, while the `offset` of `end` is 4, that means remark treat this text four 'chars' long, measured in UTF16 code units.

So what's the unit of character? It's so confused.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What's the unit of character in Point #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

What's the unit of character in Point #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions