From bfc1c57da6003c25ccf106136f7f4aff1de8acd7 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Wed, 14 Mar 2018 10:21:05 -0400 Subject: [PATCH 01/16] DOC: ecosystem.rst: Vaex (closes #20335) --- doc/source/ecosystem.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 30cdb06b28487..bfea00ecc3cee 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -252,6 +252,11 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph based approach is also extensible by end users for custom formats that may be too specific for the core of odo. +`Vaex `_ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). + + .. _ecosystem.data_validation: Data validation From 08d3116cea93aa03e313035766646d3ccddb11a8 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Wed, 14 Mar 2018 10:26:09 -0400 Subject: [PATCH 02/16] DOC: ecosystem.rst: Pandas on Ray (closes #20334) --- doc/source/ecosystem.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index bfea00ecc3cee..70fde2b494690 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -252,6 +252,16 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph based approach is also extensible by end users for custom formats that may be too specific for the core of odo. +`Ray `_ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. + +.. code:: python + + # import pandas as pd + import ray.dataframe as pd + + `Vaex `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). From f9b3e98ccceb25bec0e7e359cffb23125ca74d82 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Wed, 14 Mar 2018 10:27:55 -0400 Subject: [PATCH 03/16] DOC: ecosystem.rst: alphabetize Out-of-Core section --- doc/source/ecosystem.rst | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 70fde2b494690..4d72129770cbf 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -224,6 +224,13 @@ dimensional arrays, rather than the tabular data for which pandas excels. Out-of-core ------------- +`Blaze `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Blaze provides a standard API for doing computations with various +in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, +PySpark. + `Dask `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -235,14 +242,6 @@ provides a familiar ``DataFrame`` interface for out-of-core, parallel and distri Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow. - -`Blaze `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Blaze provides a standard API for doing computations with various -in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, -PySpark. - `Odo `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -254,6 +253,7 @@ too specific for the core of odo. `Ray `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. .. code:: python @@ -264,6 +264,7 @@ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transpar `Vaex `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). From cfb5262a674ce1577e20e6eedcaab43fe2646bed Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Wed, 14 Mar 2018 10:31:01 -0400 Subject: [PATCH 04/16] DOC: ecosystem.rst: update pandas-datareader data sources --- doc/source/ecosystem.rst | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 4d72129770cbf..ad809bc4e978b 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -156,14 +156,22 @@ See more in the `pandas-datareader docs `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From a07dd9412b6726b0a190d6bcda27bdc7bfb392c3 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Wed, 14 Mar 2018 10:36:50 -0400 Subject: [PATCH 05/16] DOC: ecosystem.rst: Vaex from_pandas, to_pandas_df --- doc/source/ecosystem.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index ad809bc4e978b..123fd2adf8d2b 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -275,6 +275,9 @@ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transpar Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). + * vaex.from_pandas + * vaex.to_pandas_df + .. _ecosystem.data_validation: From 77edcd07ec79462b03e77501a94513719432edb2 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Fri, 16 Mar 2018 00:36:56 -0400 Subject: [PATCH 06/16] DOC: ecosystem.rst: remove Vincent per #20355 --- doc/source/ecosystem.rst | 8 -------- 1 file changed, 8 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 123fd2adf8d2b..3efd0cd21be3b 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -76,14 +76,6 @@ but a faithful implementation for Python users has long been missing. Although s (as of Jan-2014), the `yhat/ggplot `__ project has been progressing quickly in that direction. -`Vincent `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The `Vincent `__ project leverages `Vega `__ -(that in turn, leverages `d3 `__) to create -plots. Although functional, as of Summer 2016 the Vincent project has not been updated -in over two years and is `unlikely to receive further updates `__. - `IPython Vega `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 26c4c9e9e63c6ca37282dc6fe048bb8f22d3c6d5 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Fri, 16 Mar 2018 00:39:22 -0400 Subject: [PATCH 07/16] DOC: ecosystem.rst: ipyvega wording --- doc/source/ecosystem.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 3efd0cd21be3b..ac328935103b8 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -12,7 +12,7 @@ build powerful and more focused data tools. The creation of libraries that complement pandas' functionality also allows pandas development to remain focused around it's original requirements. -This is an in-exhaustive list of projects that build on pandas in order to provide +This is an inexhaustive list of projects that build on pandas in order to provide tools in the PyData space. We'd like to make it easier for users to find these project, if you know of other @@ -79,9 +79,8 @@ progressing quickly in that direction. `IPython Vega `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Like Vincent, the `IPython Vega `__ project leverages `Vega -`__ to create plots, but primarily -targets the IPython Notebook environment. +`IPython Vega `__ leverages `Vega +`__ to create plots within Jupyter Notebook. `Plotly `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 035dd405f1c1440c6227de9444d0ab5e10e083d4 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Fri, 16 Mar 2018 00:46:08 -0400 Subject: [PATCH 08/16] DOC: ecosystem.rst: header line lengths --- doc/source/ecosystem.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index ac328935103b8..cbc784e389c81 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -231,13 +231,13 @@ in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, PySpark. `Dask `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask is a flexible parallel computing library for analytics. Dask provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing. `Dask-ML `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow. @@ -251,7 +251,7 @@ based approach is also extensible by end users for custom formats that may be too specific for the core of odo. `Ray `_ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. @@ -262,7 +262,7 @@ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transpar `Vaex `_ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). From dfc129b288ae25ade6a738783f544f66fb7a3edc Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Fri, 16 Mar 2018 00:54:18 -0400 Subject: [PATCH 09/16] DOC: ecosystem.rst: link to libraries.io and pypi https://libraries.io/pypi/pandas/usage --- doc/source/ecosystem.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index cbc784e389c81..4e3e9b4d397e7 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -13,9 +13,12 @@ The creation of libraries that complement pandas' functionality also allows pand development to remain focused around it's original requirements. This is an inexhaustive list of projects that build on pandas in order to provide -tools in the PyData space. +tools in the PyData space. For a list of projects that depend on pandas, +see the +`libraries.io usage page for pandas `_ +or `search pypi for pandas `_. -We'd like to make it easier for users to find these project, if you know of other +We'd like to make it easier for users to find these projects, if you know of other substantial projects that you feel should be on this list, please let us know. From 0feb2695a9197a2c44e146f01e3a21a64d20a926 Mon Sep 17 00:00:00 2001 From: westurner <@westurner> Date: Fri, 16 Mar 2018 01:27:42 -0400 Subject: [PATCH 10/16] DOC: ecosystem.rst: ggplot -> ggpy --- doc/source/ecosystem.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 4e3e9b4d397e7..90dee5ad1db54 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -68,15 +68,15 @@ also goes beyond matplotlib and pandas with the option to perform statistical estimation while plotting, aggregating across observations and visualizing the fit of statistical models to emphasize patterns in a dataset. -`yhat/ggplot `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`yhat/ggpy `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hadley Wickham's `ggplot2 `__ is a foundational exploratory visualization package for the R language. Based on `"The Grammar of Graphics" `__ it provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data. It's really quite incredible. Various implementations to other languages are available, but a faithful implementation for Python users has long been missing. Although still young -(as of Jan-2014), the `yhat/ggplot `__ project has been +(as of Jan-2014), the `yhat/ggpy `__ project has been progressing quickly in that direction. `IPython Vega `__ From 8f6cf0cf01ec62ac878ab53e0c0a3ab25e9dc2e8 Mon Sep 17 00:00:00 2001 From: westurner <@westurner> Date: Fri, 16 Mar 2018 01:31:46 -0400 Subject: [PATCH 11/16] DOC: ecosystem.rst: PyCharm --- doc/source/ecosystem.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 90dee5ad1db54..5fab390a2f76e 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -121,6 +121,14 @@ which are utilized by IPython Notebook for displaying (abbreviated) HTML tables. (Note: HTML tables may or may not be compatible with non-HTML IPython output formats.) +`PyCharm `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +PyCharm is a full-featured Python IDE. +`PyCharm supports Jupyter Notebook +`__ +, recognizes ``*.ipynb files``, +and allows you to edit them. + `quantopian/qgrid `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 7f9c00b5bc0d3f182967818cdfbcaa7eca651647 Mon Sep 17 00:00:00 2001 From: westurner <@westurner> Date: Fri, 16 Mar 2018 01:55:33 -0400 Subject: [PATCH 12/16] DOC: ecosystem.rst: Spyder note re: # In[0] prompts --- doc/source/ecosystem.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 5fab390a2f76e..294852caa2313 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -143,6 +143,18 @@ editing, testing, debugging, and introspection features. Spyder can now introspect and display Pandas DataFrames and show both "column wise min/max and global min/max coloring." +Spyder can also execute ``.py`` files containing IPython prompts +cell by cell: + +.. code:: python + + # In[0]: + subprocess.call('jupyter convert --to python notebook.ipynb') + + # In[1]: + print('Ctrl-Return -- Run cell') + print('Shift-Return -- Run cell and advance') + .. _ecosystem.api: From ffdd5e9bebf1192edd5d6e3470f0a970f95b0f8c Mon Sep 17 00:00:00 2001 From: westurner <@westurner> Date: Fri, 16 Mar 2018 02:19:13 -0400 Subject: [PATCH 13/16] DOC: ecosystem.rst: IPython/Jupyter Notebook, _repr_latex_, options.available --- doc/source/ecosystem.rst | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 294852caa2313..ec91e05ccc665 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -106,20 +106,28 @@ IDE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IPython is an interactive command shell and distributed computing -environment. -IPython Notebook is a web application for creating IPython notebooks. -An IPython notebook is a JSON document containing an ordered list +environment. IPython tab completion works with Pandas methods and also +attributes like DataFrame columns. + +`Jupyter Notebook / Jupyter Lab `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Jupyter Notebook is a web application for creating Jupyter notebooks. +A Jupyter notebook is a JSON document containing an ordered list of input/output cells which can contain code, text, mathematics, plots and rich media. -IPython notebooks can be converted to a number of open standard output formats +Jupyter notebooks can be converted to a number of open standard output formats (HTML, HTML presentation slides, LaTeX, PDF, ReStructuredText, Markdown, -Python) through 'Download As' in the web interface and ``ipython nbconvert`` +Python) through 'Download As' in the web interface and ``jupyter convert`` in a shell. -Pandas DataFrames implement ``_repr_html_`` methods -which are utilized by IPython Notebook for displaying -(abbreviated) HTML tables. (Note: HTML tables may or may not be -compatible with non-HTML IPython output formats.) +Pandas DataFrames implement ``_repr_html_``and ``_repr_latex`` methods +which are utilized by Jupyter Notebook for displaying +(abbreviated) HTML or LaTeX tables. LaTeX output is properly escaped. +(Note: HTML tables may or may not be +compatible with non-HTML Jupyter output formats.) + +See :ref:`Options and Settings ` and :ref:`` +for pandas ``display.`` settings. `PyCharm `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 42438c7b86a85f589717bd61251e41a97e9de724 Mon Sep 17 00:00:00 2001 From: Wes Turner <@westurner> Date: Sun, 15 Apr 2018 18:21:47 -0400 Subject: [PATCH 14/16] DOC: ecosystem.rst: fix header underlines --- doc/source/ecosystem.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index ec91e05ccc665..c6a6fa9598ace 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -262,13 +262,13 @@ in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, PySpark. `Dask `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask is a flexible parallel computing library for analytics. Dask provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing. `Dask-ML `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow. @@ -281,8 +281,8 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph based approach is also extensible by end users for custom formats that may be too specific for the core of odo. -`Ray `_ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`Ray `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. @@ -292,8 +292,8 @@ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transpar import ray.dataframe as pd -`Vaex `_ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`Vaex `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). From 25f6ae6aa505584df8042b8d7c98e17ba3b49960 Mon Sep 17 00:00:00 2001 From: Wes Turner <@westurner> Date: Sun, 15 Apr 2018 18:25:16 -0400 Subject: [PATCH 15/16] DOC: ecosystem.rst: Altair. closes #20355 --- doc/source/ecosystem.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index c6a6fa9598ace..2103c47fc167b 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -48,6 +48,17 @@ ML pipeline. Visualization ------------- +`Altair `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Altair is a declarative statistical visualization library for Python. +With Altair, you can spend more time understanding your data and its +meaning. Altair's API is simple, friendly and consistent and built on +top of the powerful Vega-Lite JSON specification. This elegant +simplicity produces beautiful and effective visualizations with a +minimal amount of code. Altair works with Pandas DataFrames. + + `Bokeh `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 94f0cdd830d2de4a49631be1bc8cebde577d66da Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Sun, 8 Jul 2018 10:14:16 -0500 Subject: [PATCH 16/16] remove pycharm --- doc/source/ecosystem.rst | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 66b80145579ab..82ca3821fc2ed 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -143,14 +143,6 @@ compatible with non-HTML Jupyter output formats.) See :ref:`Options and Settings ` and :ref:`` for pandas ``display.`` settings. -`PyCharm `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -PyCharm is a full-featured Python IDE. -`PyCharm supports Jupyter Notebook -`__ -, recognizes ``*.ipynb files``, -and allows you to edit them. - `quantopian/qgrid `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -165,19 +157,6 @@ editing, testing, debugging, and introspection features. Spyder can now introspect and display Pandas DataFrames and show both "column wise min/max and global min/max coloring." -Spyder can also execute ``.py`` files containing IPython prompts -cell by cell: - -.. code:: python - - # In[0]: - subprocess.call('jupyter convert --to python notebook.ipynb') - - # In[1]: - print('Ctrl-Return -- Run cell') - print('Shift-Return -- Run cell and advance') - - .. _ecosystem.api: API