-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: ecosystem: Vaex, Pandas on Ray, alphabetization, pandas-datareader #20345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It might be worthwhile to also reference https://github.com/altair-viz/altair in the visualization section, since Vincent refers people to Altair: https://github.com/wrobstory/vincent#2016-06-18-update |
Good call. #20355 "DOC: ecosystem.rst: Altair" |
Pending a response to #20355, I'll go ahead and also alphabetize the Visualization Tools section and then this PR will be ready to merge. |
doc/source/ecosystem.rst
Outdated
`Vaex <https://docs.vaex.io/>`_ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Vaex built on top of pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vaex is not built on top of Pandas; which is why I asked whether it's appropriate for this list in #20335.
We could also list PyArrow, which also, in relation to Pandas, only does to/from:
https://arrow.apache.org/docs/python/pandas.html
|
||
Blaze provides a standard API for doing computations with various | ||
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, | ||
PySpark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you are only putting them in alphabetical order, but I not sure to what extent the blaze library itself is still actively developed (in contrast to things that grew out of it, like dask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @llllllllll is still doing some work on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could remove blaze I think. Should add Ibis though:http://docs.ibis-project.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibis also 'only' does to/from in relation to pandas?
http://docs.ibis-project.org/impala.html#ingesting-data-from-pandas
Would we add ibis under 'Out of Core' or a new 'Similar API' section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I, and a few others, are still using blaze, but it is not very active. We are still using blaze at @quantopian in production, and it is still maintained. Given its current state, I am fine with either leaving it or removing it.
doc/source/ecosystem.rst
Outdated
@@ -252,6 +259,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph | |||
based approach is also extensible by end users for custom formats that may be | |||
too specific for the core of odo. | |||
|
|||
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`_ | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is too long here
doc/source/ecosystem.rst
Outdated
|
||
|
||
`Vaex <https://docs.vaex.io/>`_ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be the same length
Codecov Report
@@ Coverage Diff @@
## master #20345 +/- ##
=========================================
Coverage ? 91.84%
=========================================
Files ? 153
Lines ? 49279
Branches ? 0
=========================================
Hits ? 45259
Misses ? 4020
Partials ? 0
Continue to review full report at Codecov.
|
doc/source/ecosystem.rst
Outdated
`Dask <https://dask.readthedocs.io/en/latest/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think length here needs to be edited
|
||
Dask is a flexible parallel computing library for analytics. Dask | ||
provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing. | ||
|
||
`Dask-ML <https://dask-ml.readthedocs.io/en/latest/>`__ | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line lengths appear to be equal in my editor, and the output from rst2html.py doesn't show an error.
can you update |
@westurner can you update |
What needs to be updated here? |
Rebased, thanks @westurner ! |
closes #20334 - Pandas on Ray
closes #20335 - Vaex