Skip to content

add ntv-pandas to the ecosystem #55421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Oct 6, 2023
Merged
Changes from 33 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
049b7b0
Create 0007-compact-and-reversible-JSON-interface.md
loco-philippe Jun 18, 2023
6dc92e4
Merge branch 'main' into ntv
loco-philippe Jun 18, 2023
cd6884d
Merge branch 'main' into ntv
loco-philippe Jun 18, 2023
09ed538
change PDEP number (7 -> 12)
loco-philippe Jun 19, 2023
13e6fd2
Merge branch 'main' into ntv
loco-philippe Jul 22, 2023
f4d1f5e
Add FAQ to the PDEPS 0012
loco-philippe Jul 22, 2023
fff7adb
Merge branch 'main' into ntv
loco-philippe Jul 22, 2023
92732e6
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Aug 3, 2023
8d0f2f4
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Aug 4, 2023
3f3aae0
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Aug 4, 2023
82b1992
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Aug 4, 2023
a051d9c
pre-commit codespell
loco-philippe Aug 4, 2023
a0f16dd
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Aug 4, 2023
63d92ec
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Aug 4, 2023
aca4a47
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Aug 4, 2023
d0b41a6
delete summary
loco-philippe Aug 5, 2023
3f7135a
delete mermaid flowchart
loco-philippe Aug 5, 2023
16d7201
with summary, without mermaid flowchart
loco-philippe Aug 5, 2023
08cf17b
rename Annexe -> Appendix
loco-philippe Aug 11, 2023
7da5c63
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Aug 11, 2023
ec31662
add tableschema specification
loco-philippe Sep 5, 2023
4dbb822
add orient="table"
loco-philippe Sep 6, 2023
38e92b2
Add Table Schema extension
loco-philippe Sep 6, 2023
1e3f793
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Sep 6, 2023
8dad555
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Sep 6, 2023
2239c66
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Sep 7, 2023
7e7d878
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Sep 7, 2023
fbc5fe5
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Sep 7, 2023
cceee0b
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Oct 1, 2023
65bee1d
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Oct 1, 2023
c567277
Update 0012-compact-and-reversible-JSON-interface.md
loco-philippe Oct 1, 2023
698835c
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Oct 6, 2023
1bbe9bf
add 'ntv-pandas' in the 'ecosystem' file
loco-philippe Oct 6, 2023
3d234de
Update ecosystem.md
loco-philippe Oct 6, 2023
cf1a82d
Update ecosystem.md
loco-philippe Oct 6, 2023
f555a75
Merge remote-tracking branch 'upstream/main' into ntv
loco-philippe Oct 6, 2023
fcbfa72
Update ecosystem.md
loco-philippe Oct 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions web/pandas/community/ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,23 @@ which pandas excels.

## IO

### [NTV-pandas](https://github.com/loco-philippe/ntv-pandas)*: A semantic, compact and reversible JSON-pandas converter*

pandas provides JSON converter but three limitations are present:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributiin. Can you phrase this not talking about pandas or its limitations, but about NTV-pandas directly? Something like "NTV-pandas provides a JSON converter with more data types than the ones supoorted by pandas directly. Its main features are: ...". You don't need to use my description of course, but I think it's more useful for users to go direct to what the project does than starting by pandas limitations. If users can have an intuition on whether the project can be for them in a more concise way, then they can quickly jump to the project site for the details.

What I'd add here is a very concise example, so users can faster understand not only what the project does, but what it takes to use it.

Happy to discuss further if you disagree on my points, but I think this approach will make readers life easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick reply and your advise !

You are right and I suggest changing the text as below,:

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

*It supports the following data-types: *

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

NTV-pandas was developed originally in the json-NTV project

Is it better and sufficiently clear and concise ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks great. Just couple of styling comments, I wouldn't use bold for the line between * if that's the idea. And I would use "data types" consistently in the bullet points instead of dtype and data-type. I'd also remove the commas at the end of the bullet points (or be consistent).

The last line on where it was developed doesn't seem relevant to me for the average pandas user potentially interested in this, I'd leave this to the project page itself.

And as I said, if you can show a minimal example on how the library is used, I think that can help users understand better before clicking in the link if this project may be of interest.

Anyways, great job, I think it's useful this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> I took the comments into account and added the example. That's what happens (not too long ?):

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

It supports the following data types:

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

data example

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'dates::date':  [date(1964,1,1), date(1985,2,5)],
                'value32':      pd.Series([12, 10], dtype='int32'),
                'coord::point': [Point(1,2), Point(3,4)],
                'unique':       True }
        df = pd.DataFrame(data)

In [3]: df
Out[3]:     dates::date  value32  coord::point   unique
        1    1964-01-01       12   POINT (1 2)   True
        2    1985-02-05       10   POINT (3 4)   True

JSON representation

In [4]: pprint(npd.to_json(df), sort_dicts=False)
Out[4]: {':tab': {'index': [0, 1],
                  'dates::date': ['1964-01-01', '1985-02-05'],
                  'value32::int32': [12, 10],
                  'coord::point': [[1.0, 2.0], [3.0, 4.0]],
                  'unique': True}}

In [5]: pprint(npd.to_json(df, table=True), width=100, sort_dicts=False)
Out[5]: {'schema': {'fields': [{'name': 'index', 'type': 'integer'},
                               {'name': 'dates', 'type': 'date'},
                               {'name': 'value32', 'type': 'integer', 'format': 'int32'},
                               {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
                               {'name': 'unique', 'type': 'boolean'}],
                    'primaryKey': ['index'],
                    'pandas_version': '1.4.0'},
         'data': [{'index': 0, 'dates': '1964-01-01', 'value32': 12, 'coord': [1.0, 2.0], 'unique': True},
                  {'index': 1, 'dates': '1985-02-05', 'value32': 10, 'coord': [3.0, 4.0], 'unique': True}]}

Reversibility

In [6]: print(npd.read_json(npd.to_json(df)).equals(df),
              npd.read_json(npd.to_json(df, table=True)).equals(df))
Out[6]: True True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of the example can be reduced (without [3]):

In [1]: from shapely.geometry import Point
        import pandas as pd
        import ntv_pandas as npd

In [2]: df = pd.DataFrame({'dates::date':  [datetime.date(1964,1,1), datetime.date(1985,2,5)],
                           'value32':      pd.Series([12, 10], dtype='int32'),
                           'coord::point': [Point(1,2), Point(3,4)],
                           'unique':       True }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example is indeed way too big. What do you think about just something like:

import ntv_pandas as npd

df = npd.read_json('data.json')  # load (maybe add some arg that highlights the difference with pandas.read_json?
npd.to_json(df)  # save (not sure if the code you wrote about is correct, seems to be missing the path where to save)

In any case, can you update the PR with whatever you think it's best, it's easier to review in the PR than in a comment.

Btw, if you are the author of the library, do you know that you can register an accessor so on import ntv_pandas you get something like df.npd.to_json() available? You can check the docs on extending pandas for more info, it's trivial to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR is updated !

You are right with the accessor, i try this:

@pd.api.extensions.register_dataframe_accessor("npd")
class NtvAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    def to_json(self, **kwargs):
        return ntv_pandas.to_json(self._obj, **kwargs)

and it works !!!

- the JSON-pandas converter take into account few data types,
- the JSON-pandas converter is not always reversible (conversion round trip),
- external data types (e.g. TableSchema types) are not included.

The NTV-pandas converter uses the [semantic NTV format](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm)
to include a large set of data types in a JSON representation.

The converter integrates:
- all the pandas `dtype` and the data-type associated to a JSON representation,
- an always reversible conversion,
- a full compatibility with [Table Schema specification](http://dataprotocols.org/json-table-schema/#field-types-and-formats).

NTV-pandas was developped originally in the [json-NTV project](https://github.com/loco-philippe/NTV)

### [BCPandas](https://github.com/yehoshuadimarsky/bcpandas)

BCPandas provides high performance writes from pandas to Microsoft SQL Server,
Expand Down