Skip to content

DOC: ecosystem: Vaex, Pandas on Ray, alphabetization, pandas-datareader #20345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 8, 2018

Conversation

westurner
Copy link
Contributor

@westurner westurner commented Mar 14, 2018

closes #20334 - Pandas on Ray
closes #20335 - Vaex

@kylebarron
Copy link
Contributor

It might be worthwhile to also reference https://github.com/altair-viz/altair in the visualization section, since Vincent refers people to Altair: https://github.com/wrobstory/vincent#2016-06-18-update

@westurner
Copy link
Contributor Author

Good call. #20355 "DOC: ecosystem.rst: Altair"

@westurner
Copy link
Contributor Author

Pending a response to #20355, I'll go ahead and also alphabetize the Visualization Tools section and then this PR will be ready to merge.

@sinhrks sinhrks added the Docs label Mar 15, 2018
`Vaex <https://docs.vaex.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Vaex built on top of pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vaex is not built on top of Pandas; which is why I asked whether it's appropriate for this list in #20335.

We could also list PyArrow, which also, in relation to Pandas, only does to/from:
https://arrow.apache.org/docs/python/pandas.html


Blaze provides a standard API for doing computations with various
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables,
PySpark.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you are only putting them in alphabetical order, but I not sure to what extent the blaze library itself is still actively developed (in contrast to things that grew out of it, like dask)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @llllllllll is still doing some work on it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could remove blaze I think. Should add Ibis though:http://docs.ibis-project.org/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ibis also 'only' does to/from in relation to pandas?
http://docs.ibis-project.org/impala.html#ingesting-data-from-pandas

Would we add ibis under 'Out of Core' or a new 'Similar API' section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, and a few others, are still using blaze, but it is not very active. We are still using blaze at @quantopian in production, and it is still maintained. Given its current state, I am fine with either leaving it or removing it.

@@ -252,6 +259,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph
based approach is also extensible by end users for custom formats that may be
too specific for the core of odo.

`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is too long here



`Vaex <https://docs.vaex.io/>`_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be the same length

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@20ca108). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20345   +/-   ##
=========================================
  Coverage          ?   91.84%           
=========================================
  Files             ?      153           
  Lines             ?    49279           
  Branches          ?        0           
=========================================
  Hits              ?    45259           
  Misses            ?     4020           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.23% <ø> (?)
#single 41.9% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ca108...94f0cdd. Read the comment docs.

`Dask <https://dask.readthedocs.io/en/latest/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think length here needs to be edited


Dask is a flexible parallel computing library for analytics. Dask
provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing.

`Dask-ML <https://dask-ml.readthedocs.io/en/latest/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line lengths appear to be equal in my editor, and the output from rst2html.py doesn't show an error.

@jreback jreback added this to the 0.23.0 milestone Mar 25, 2018
@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

can you update

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

@westurner can you update

@jreback jreback removed this from the 0.23.0 milestone Apr 14, 2018
@westurner
Copy link
Contributor Author

What needs to be updated here?

@jorisvandenbossche
Copy link
Member

Rebased, thanks @westurner !

@jorisvandenbossche jorisvandenbossche merged commit f90aa44 into pandas-dev:master Jul 8, 2018
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.23.4, 0.24.0 Jul 8, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: ecosystem.rst: Vaex DOC: ecosystem.rst: Pandas and Ray
7 participants