From 9a170fcb9291a9b7a1b76866500ac012af8735fa Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Mon, 13 Feb 2023 10:03:11 +0000 Subject: [PATCH 1/2] Add polars to ecosystem --- protocol/purpose_and_scope.md | 5 +++++ spec/purpose_and_scope.md | 6 ++++++ 2 files changed, 11 insertions(+) diff --git a/protocol/purpose_and_scope.md b/protocol/purpose_and_scope.md index 387ec6b0..83dab6e9 100644 --- a/protocol/purpose_and_scope.md +++ b/protocol/purpose_and_scope.md @@ -126,6 +126,11 @@ It uses SQLAlchemy and a custom SQL compiler to translate its pandas-like API to SQL statements, executed by the backends. It supports conventional DBMS, as well as big data systems such as Apache Impala or BigQuery. +[Polars](https://www.pola.rs/) is a lightning fast DataFrame library/in-memory query +engine. It has a different API to pandas, which allows it to be extremely expressive +and powerful. It performed exceptionally well in the +[H20 benchmark](https://h2oai.github.io/db-benchmark/). + #### History of this dataframe protocol While there is no dataframe protocol like the one described in this document in diff --git a/spec/purpose_and_scope.md b/spec/purpose_and_scope.md index 3f1731d1..0843de13 100644 --- a/spec/purpose_and_scope.md +++ b/spec/purpose_and_scope.md @@ -62,6 +62,11 @@ It uses SQLAlchemy and a custom SQL compiler to translate its pandas-like API to SQL statements, executed by the backends. It supports conventional DBMS, as well as big data systems such as Apache Impala or BigQuery. +[Polars](https://www.pola.rs/) is a lightning fast DataFrame library/in-memory query +engine. It has a different API to pandas, which allows it to be extremely expressive +and powerful. It performed exceptionally well in the +[H20 benchmark](https://h2oai.github.io/db-benchmark/). + Given the growing Python dataframe ecosystem, and its complexity, this document provides a standard Python dataframe API. Until recently, pandas has been a de-facto standard for Python dataframes. But currently there are a growing number of not only dataframe libraries, @@ -179,6 +184,7 @@ The list of known Python dataframe libraries at the time of writing this documen - [Mars](https://docs.pymars.org/en/latest/) - [Modin](https://github.com/modin-project/modin) - [pandas](https://pandas.pydata.org/) +- [polars](https://www.pola.rs/) - [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) - [StaticFrame](https://static-frame.readthedocs.io/en/latest/) - [Turi Create](https://github.com/apple/turicreate) From 8d5555c5dbfd313d57df15bb3998355122126ca0 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Fri, 17 Feb 2023 16:18:13 +0000 Subject: [PATCH 2/2] attempt more neutral wording --- protocol/purpose_and_scope.md | 6 ++---- spec/purpose_and_scope.md | 6 ++---- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/protocol/purpose_and_scope.md b/protocol/purpose_and_scope.md index 83dab6e9..3715e258 100644 --- a/protocol/purpose_and_scope.md +++ b/protocol/purpose_and_scope.md @@ -126,10 +126,8 @@ It uses SQLAlchemy and a custom SQL compiler to translate its pandas-like API to SQL statements, executed by the backends. It supports conventional DBMS, as well as big data systems such as Apache Impala or BigQuery. -[Polars](https://www.pola.rs/) is a lightning fast DataFrame library/in-memory query -engine. It has a different API to pandas, which allows it to be extremely expressive -and powerful. It performed exceptionally well in the -[H20 benchmark](https://h2oai.github.io/db-benchmark/). +[Polars](https://www.pola.rs/) is a DataFrame library written in Rust, with +Python bindings available. Their API is intentionally different to the pandas one. #### History of this dataframe protocol diff --git a/spec/purpose_and_scope.md b/spec/purpose_and_scope.md index 0843de13..abb8dc95 100644 --- a/spec/purpose_and_scope.md +++ b/spec/purpose_and_scope.md @@ -62,10 +62,8 @@ It uses SQLAlchemy and a custom SQL compiler to translate its pandas-like API to SQL statements, executed by the backends. It supports conventional DBMS, as well as big data systems such as Apache Impala or BigQuery. -[Polars](https://www.pola.rs/) is a lightning fast DataFrame library/in-memory query -engine. It has a different API to pandas, which allows it to be extremely expressive -and powerful. It performed exceptionally well in the -[H20 benchmark](https://h2oai.github.io/db-benchmark/). +[Polars](https://www.pola.rs/) is a DataFrame library written in Rust, with +Python bindings available. Their API is intentionally different to the pandas one. Given the growing Python dataframe ecosystem, and its complexity, this document provides a standard Python dataframe API. Until recently, pandas has been a de-facto standard for