-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: ecosystem: Vaex, Pandas on Ray, alphabetization, pandas-datareader #20345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
bfc1c57
DOC: ecosystem.rst: Vaex (closes #20335)
westurner 08d3116
DOC: ecosystem.rst: Pandas on Ray (closes #20334)
westurner f9b3e98
DOC: ecosystem.rst: alphabetize Out-of-Core section
westurner cfb5262
DOC: ecosystem.rst: update pandas-datareader data sources
westurner a07dd94
DOC: ecosystem.rst: Vaex from_pandas, to_pandas_df
westurner 77edcd0
DOC: ecosystem.rst: remove Vincent per #20355
westurner 26c4c9e
DOC: ecosystem.rst: ipyvega wording
westurner 035dd40
DOC: ecosystem.rst: header line lengths
westurner dfc129b
DOC: ecosystem.rst: link to libraries.io and pypi
westurner 0feb269
DOC: ecosystem.rst: ggplot -> ggpy
8f6cf0c
DOC: ecosystem.rst: PyCharm
7f9c00b
DOC: ecosystem.rst: Spyder note re: # In[0] prompts
ffdd5e9
DOC: ecosystem.rst: IPython/Jupyter Notebook, _repr_latex_, options.a…
42438c7
DOC: ecosystem.rst: fix header underlines
25f6ae6
DOC: ecosystem.rst: Altair. closes #20355
6fe2b2f
Merge remote-tracking branch 'upstream/master' into westurner-patch-7
jorisvandenbossche 94f0cdd
remove pycharm
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,10 +12,13 @@ build powerful and more focused data tools. | |
The creation of libraries that complement pandas' functionality also allows pandas | ||
development to remain focused around it's original requirements. | ||
|
||
This is an in-exhaustive list of projects that build on pandas in order to provide | ||
tools in the PyData space. | ||
This is an inexhaustive list of projects that build on pandas in order to provide | ||
tools in the PyData space. For a list of projects that depend on pandas, | ||
see the | ||
`libraries.io usage page for pandas <https://libraries.io/pypi/pandas/usage>`_ | ||
or `search pypi for pandas <https://pypi.org/search/?q=pandas>`_. | ||
|
||
We'd like to make it easier for users to find these project, if you know of other | ||
We'd like to make it easier for users to find these projects, if you know of other | ||
substantial projects that you feel should be on this list, please let us know. | ||
|
||
|
||
|
@@ -48,6 +51,17 @@ Featuretools is a Python library for automated feature engineering built on top | |
Visualization | ||
------------- | ||
|
||
`Altair <https://altair-viz.github.io/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Altair is a declarative statistical visualization library for Python. | ||
With Altair, you can spend more time understanding your data and its | ||
meaning. Altair's API is simple, friendly and consistent and built on | ||
top of the powerful Vega-Lite JSON specification. This elegant | ||
simplicity produces beautiful and effective visualizations with a | ||
minimal amount of code. Altair works with Pandas DataFrames. | ||
|
||
|
||
`Bokeh <http://bokeh.pydata.org>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
@@ -68,31 +82,22 @@ also goes beyond matplotlib and pandas with the option to perform statistical | |
estimation while plotting, aggregating across observations and visualizing the | ||
fit of statistical models to emphasize patterns in a dataset. | ||
|
||
`yhat/ggplot <https://github.com/yhat/ggplot>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
`yhat/ggpy <https://github.com/yhat/ggpy>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Hadley Wickham's `ggplot2 <http://ggplot2.org/>`__ is a foundational exploratory visualization package for the R language. | ||
Based on `"The Grammar of Graphics" <http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html>`__ it | ||
provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data. | ||
It's really quite incredible. Various implementations to other languages are available, | ||
but a faithful implementation for Python users has long been missing. Although still young | ||
(as of Jan-2014), the `yhat/ggplot <https://github.com/yhat/ggplot>`__ project has been | ||
(as of Jan-2014), the `yhat/ggpy <https://github.com/yhat/ggpy>`__ project has been | ||
progressing quickly in that direction. | ||
|
||
`Vincent <https://github.com/wrobstory/vincent>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The `Vincent <https://github.com/wrobstory/vincent>`__ project leverages `Vega <https://github.com/trifacta/vega>`__ | ||
(that in turn, leverages `d3 <http://d3js.org/>`__) to create | ||
plots. Although functional, as of Summer 2016 the Vincent project has not been updated | ||
in over two years and is `unlikely to receive further updates <https://github.com/wrobstory/vincent#2015-08-12-update>`__. | ||
|
||
`IPython Vega <https://github.com/vega/ipyvega>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Like Vincent, the `IPython Vega <https://github.com/vega/ipyvega>`__ project leverages `Vega | ||
<https://github.com/trifacta/vega>`__ to create plots, but primarily | ||
targets the IPython Notebook environment. | ||
`IPython Vega <https://github.com/vega/ipyvega>`__ leverages `Vega | ||
<https://github.com/trifacta/vega>`__ to create plots within Jupyter Notebook. | ||
|
||
`Plotly <https://plot.ly/python>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -115,20 +120,28 @@ IDE | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
IPython is an interactive command shell and distributed computing | ||
environment. | ||
IPython Notebook is a web application for creating IPython notebooks. | ||
An IPython notebook is a JSON document containing an ordered list | ||
environment. IPython tab completion works with Pandas methods and also | ||
attributes like DataFrame columns. | ||
|
||
`Jupyter Notebook / Jupyter Lab <https://jupyter.org>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Jupyter Notebook is a web application for creating Jupyter notebooks. | ||
A Jupyter notebook is a JSON document containing an ordered list | ||
of input/output cells which can contain code, text, mathematics, plots | ||
and rich media. | ||
IPython notebooks can be converted to a number of open standard output formats | ||
Jupyter notebooks can be converted to a number of open standard output formats | ||
(HTML, HTML presentation slides, LaTeX, PDF, ReStructuredText, Markdown, | ||
Python) through 'Download As' in the web interface and ``ipython nbconvert`` | ||
Python) through 'Download As' in the web interface and ``jupyter convert`` | ||
in a shell. | ||
|
||
Pandas DataFrames implement ``_repr_html_`` methods | ||
which are utilized by IPython Notebook for displaying | ||
(abbreviated) HTML tables. (Note: HTML tables may or may not be | ||
compatible with non-HTML IPython output formats.) | ||
Pandas DataFrames implement ``_repr_html_``and ``_repr_latex`` methods | ||
which are utilized by Jupyter Notebook for displaying | ||
(abbreviated) HTML or LaTeX tables. LaTeX output is properly escaped. | ||
(Note: HTML tables may or may not be | ||
compatible with non-HTML Jupyter output formats.) | ||
|
||
See :ref:`Options and Settings <options>` and :ref:`<options.available>` | ||
for pandas ``display.`` settings. | ||
|
||
`quantopian/qgrid <https://github.com/quantopian/qgrid>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -144,11 +157,10 @@ editing, testing, debugging, and introspection features. | |
Spyder can now introspect and display Pandas DataFrames and show | ||
both "column wise min/max and global min/max coloring." | ||
|
||
|
||
.. _ecosystem.api: | ||
|
||
API | ||
----- | ||
--- | ||
|
||
`pandas-datareader <https://github.com/pydata/pandas-datareader>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -159,14 +171,22 @@ See more in the `pandas-datareader docs <https://pandas-datareader.readthedocs. | |
|
||
The following data feeds are available: | ||
|
||
* Yahoo! Finance | ||
* Google Finance | ||
* FRED | ||
* Fama/French | ||
* World Bank | ||
* OECD | ||
* Eurostat | ||
* EDGAR Index | ||
* Google Finance | ||
* Tiingo | ||
* Morningstar | ||
* IEX | ||
* Robinhood | ||
* Enigma | ||
* Quandl | ||
* FRED | ||
* Fama/French | ||
* World Bank | ||
* OECD | ||
* Eurostat | ||
* TSP Fund Data | ||
* Nasdaq Trader Symbol Definitions | ||
* Stooq Index Data | ||
* MOEX Data | ||
|
||
`quandl/Python <https://github.com/quandl/Python>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -227,25 +247,24 @@ dimensional arrays, rather than the tabular data for which pandas excels. | |
Out-of-core | ||
------------- | ||
|
||
`Blaze <http://blaze.pydata.org/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Blaze provides a standard API for doing computations with various | ||
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, | ||
PySpark. | ||
|
||
`Dask <https://dask.readthedocs.io/en/latest/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Dask is a flexible parallel computing library for analytics. Dask | ||
provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing. | ||
|
||
`Dask-ML <https://dask-ml.readthedocs.io/en/latest/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The line lengths appear to be equal in my editor, and the output from rst2html.py doesn't show an error. |
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow. | ||
|
||
|
||
`Blaze <http://blaze.pydata.org/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Blaze provides a standard API for doing computations with various | ||
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, | ||
PySpark. | ||
|
||
`Odo <http://odo.pydata.org>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
@@ -255,6 +274,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph | |
based approach is also extensible by end users for custom formats that may be | ||
too specific for the core of odo. | ||
|
||
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. | ||
|
||
.. code:: python | ||
|
||
# import pandas as pd | ||
import ray.dataframe as pd | ||
|
||
|
||
`Vaex <https://docs.vaex.io/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). | ||
|
||
* vaex.from_pandas | ||
* vaex.to_pandas_df | ||
|
||
|
||
.. _ecosystem.data_validation: | ||
|
||
Data validation | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you are only putting them in alphabetical order, but I not sure to what extent the blaze library itself is still actively developed (in contrast to things that grew out of it, like dask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @llllllllll is still doing some work on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could remove blaze I think. Should add Ibis though:http://docs.ibis-project.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibis also 'only' does to/from in relation to pandas?
http://docs.ibis-project.org/impala.html#ingesting-data-from-pandas
Would we add ibis under 'Out of Core' or a new 'Similar API' section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, and a few others, are still using blaze, but it is not very active. We are still using blaze at @quantopian in production, and it is still maintained. Given its current state, I am fine with either leaving it or removing it.