From c1c2b8c927c97832be7d8a744898ba873a917da3 Mon Sep 17 00:00:00 2001 From: Devin Petersohn Date: Mon, 19 Oct 2020 09:12:46 -0500 Subject: [PATCH] DOC: Replace pandas on Ray in ecosystem.rst with Modin * Resolves #37247 Signed-off-by: Devin Petersohn --- doc/source/ecosystem.rst | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index 25ca77627ef39..4654fcf5a6165 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -376,6 +376,23 @@ Dask-ML enables parallel and distributed machine learning using Dask alongside e Koalas provides a familiar pandas DataFrame interface on top of Apache Spark. It enables users to leverage multi-cores on one machine or a cluster of machines to speed up or scale their DataFrame code. +`Modin `__ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``modin.pandas`` DataFrame is a parallel and distributed drop-in replacement +for pandas. This means that you can use Modin with existing pandas code or write +new code with the existing pandas API. Modin can leverage your entire machine or +cluster to speed up and scale your pandas workloads, including traditionally +time-consuming tasks like ingesting data (``read_csv``, ``read_excel``, +``read_parquet``, etc.). + +.. code:: python + + # import pandas as pd + import modin.pandas as pd + + df = pd.read_csv("big.csv") # use all your cores! + `Odo `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -400,16 +417,6 @@ If also displays progress bars. # df.apply(func) df.parallel_apply(func) -`Ray `__ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -pandas on Ray is an early stage DataFrame library that wraps pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use pandas on Ray just like you would pandas. - -.. code:: python - - # import pandas as pd - import ray.dataframe as pd - `Vaex `__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~