diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 4e15f9069de67..82ca3821fc2ed 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -12,10 +12,13 @@ build powerful and more focused data tools. The creation of libraries that complement pandas' functionality also allows pandas development to remain focused around it's original requirements. -This is an in-exhaustive list of projects that build on pandas in order to provide -tools in the PyData space. +This is an inexhaustive list of projects that build on pandas in order to provide +tools in the PyData space. For a list of projects that depend on pandas, +see the +`libraries.io usage page for pandas `_ +or `search pypi for pandas `_. -We'd like to make it easier for users to find these project, if you know of other +We'd like to make it easier for users to find these projects, if you know of other substantial projects that you feel should be on this list, please let us know. @@ -48,6 +51,17 @@ Featuretools is a Python library for automated feature engineering built on top Visualization ------------- +`Altair `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Altair is a declarative statistical visualization library for Python. +With Altair, you can spend more time understanding your data and its +meaning. Altair's API is simple, friendly and consistent and built on +top of the powerful Vega-Lite JSON specification. This elegant +simplicity produces beautiful and effective visualizations with a +minimal amount of code. Altair works with Pandas DataFrames. + + `Bokeh `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -68,31 +82,22 @@ also goes beyond matplotlib and pandas with the option to perform statistical estimation while plotting, aggregating across observations and visualizing the fit of statistical models to emphasize patterns in a dataset. -`yhat/ggplot `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +`yhat/ggpy `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hadley Wickham's `ggplot2 `__ is a foundational exploratory visualization package for the R language. Based on `"The Grammar of Graphics" `__ it provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data. It's really quite incredible. Various implementations to other languages are available, but a faithful implementation for Python users has long been missing. Although still young -(as of Jan-2014), the `yhat/ggplot `__ project has been +(as of Jan-2014), the `yhat/ggpy `__ project has been progressing quickly in that direction. -`Vincent `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The `Vincent `__ project leverages `Vega `__ -(that in turn, leverages `d3 `__) to create -plots. Although functional, as of Summer 2016 the Vincent project has not been updated -in over two years and is `unlikely to receive further updates `__. - `IPython Vega `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Like Vincent, the `IPython Vega `__ project leverages `Vega -`__ to create plots, but primarily -targets the IPython Notebook environment. +`IPython Vega `__ leverages `Vega +`__ to create plots within Jupyter Notebook. `Plotly `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -115,20 +120,28 @@ IDE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IPython is an interactive command shell and distributed computing -environment. -IPython Notebook is a web application for creating IPython notebooks. -An IPython notebook is a JSON document containing an ordered list +environment. IPython tab completion works with Pandas methods and also +attributes like DataFrame columns. + +`Jupyter Notebook / Jupyter Lab `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Jupyter Notebook is a web application for creating Jupyter notebooks. +A Jupyter notebook is a JSON document containing an ordered list of input/output cells which can contain code, text, mathematics, plots and rich media. -IPython notebooks can be converted to a number of open standard output formats +Jupyter notebooks can be converted to a number of open standard output formats (HTML, HTML presentation slides, LaTeX, PDF, ReStructuredText, Markdown, -Python) through 'Download As' in the web interface and ``ipython nbconvert`` +Python) through 'Download As' in the web interface and ``jupyter convert`` in a shell. -Pandas DataFrames implement ``_repr_html_`` methods -which are utilized by IPython Notebook for displaying -(abbreviated) HTML tables. (Note: HTML tables may or may not be -compatible with non-HTML IPython output formats.) +Pandas DataFrames implement ``_repr_html_``and ``_repr_latex`` methods +which are utilized by Jupyter Notebook for displaying +(abbreviated) HTML or LaTeX tables. LaTeX output is properly escaped. +(Note: HTML tables may or may not be +compatible with non-HTML Jupyter output formats.) + +See :ref:`Options and Settings ` and :ref:`` +for pandas ``display.`` settings. `quantopian/qgrid `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -144,11 +157,10 @@ editing, testing, debugging, and introspection features. Spyder can now introspect and display Pandas DataFrames and show both "column wise min/max and global min/max coloring." - .. _ecosystem.api: API ------ +--- `pandas-datareader `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -159,14 +171,22 @@ See more in the `pandas-datareader docs `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -227,25 +247,24 @@ dimensional arrays, rather than the tabular data for which pandas excels. Out-of-core ------------- +`Blaze `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Blaze provides a standard API for doing computations with various +in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, +PySpark. + `Dask `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask is a flexible parallel computing library for analytics. Dask provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing. `Dask-ML `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow. - -`Blaze `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Blaze provides a standard API for doing computations with various -in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables, -PySpark. - `Odo `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -255,6 +274,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph based approach is also extensible by end users for custom formats that may be too specific for the core of odo. +`Ray `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas. + +.. code:: python + + # import pandas as pd + import ray.dataframe as pd + + +`Vaex `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). + + * vaex.from_pandas + * vaex.to_pandas_df + + .. _ecosystem.data_validation: Data validation