diff --git a/.github/CONTRIBUTING.md b/CONTRIBUTING.md similarity index 91% rename from .github/CONTRIBUTING.md rename to CONTRIBUTING.md index 352acee23df2d..ef0c972619f39 100644 --- a/.github/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -109,9 +109,9 @@ For a python 3 environment: conda create -n pandas_dev python=3 --file ci/requirements_dev.txt -If you are on Windows, then you will also need to install the compiler linkages: - - conda install -n pandas_dev libpython +> **warning** +> +> If you are on Windows, see here for a fully compliant Windows environment <contributing.windows>. This will create the new environment, and not touch any of your existing environments, nor any existing python installation. It will install all of the basic dependencies of *pandas*, as well as the development and testing tools. If you would like to install other dependencies, you can install them as follows: @@ -143,6 +143,28 @@ See the full conda docs [here](http://conda.pydata.org/docs). At this point you can easily do an *in-place* install, as detailed in the next section. +### Creating a Windows development environment + +To build on Windows, you need to have compilers installed to build the extensions. You will need to install the appropriate Visual Studio compilers, VS 2008 for Python 2.7, VS 2010 for 3.4, and VS 2015 for Python 3.5. + +For Python 2.7, you can install the `mingw` compiler which will work equivalently to VS 2008: + + conda install -n pandas_dev libpython + +or use the [Microsoft Visual Studio VC++ compiler for Python](https://www.microsoft.com/en-us/download/details.aspx?id=44266). Note that you have to check the `x64` box to install the `x64` extension building capability as this is not installed by default. + +For Python 3.4, you can download and install the [Windows 7.1 SDK](https://www.microsoft.com/en-us/download/details.aspx?id=8279). Read the references below as there may be various gotchas during the installation. + +For Python 3.5, you can download and install the [Visual Studio 2015 Community Edition](https://www.visualstudio.com/en-us/downloads/visual-studio-2015-downloads-vs.aspx). + +Here are some references and blogs: + +- +- +- +- +- + ### Making changes Before making your code changes, it is often necessary to build the code that was just checked out. There are two primary methods of doing this. @@ -258,17 +280,26 @@ Contributing to the code base ### Code standards -*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) standard. There are several tools to ensure you abide by this standard. +*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) standard. There are several tools to ensure you abide by this standard. Here are *some* of the more common `PEP8` issues: + +> - we restrict line-length to 80 characters to promote readability +> - passing arguments should have spaces after commas, e.g. `foo(arg1, arg2, kw1='bar')` + +The Travis-CI will run [flake8](http://pypi.python.org/pypi/flake8) tool and report any stylistic errors in your code. Generating any warnings will cause the build to fail; thus these are part of the requirements for submitting code to *pandas*. + +It is helpful before submitting code to run this yourself on the diff: + + git diff master | flake8 --diff -We've written a tool to check that your commits are PEP8 great, [pip install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch vs master with: +Furthermore, we've written a tool to check that your commits are PEP8 great, [pip install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch vs master with: - pep8radius master --diff + pep8radius master --diff and make these changes with: pep8radius master --diff --in-place -Alternatively, use the [flake8](http://pypi.python.org/pypi/flake8) tool for checking the style of your code. Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions). +Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions). Please try to maintain backward compatibility. *pandas* has lots of users with lots of existing code, so don't break it if at all possible. If you think breakage is required, clearly state why as part of the pull request. Also, be careful when changing method signatures and add deprecation warnings where needed. @@ -315,6 +346,14 @@ The tests suite is exhaustive and takes around 20 minutes to run. Often it is wo nosetests pandas/tests/[test-module].py:[TestClass] nosetests pandas/tests/[test-module].py:[TestClass].[test_method] +Furthermore one can run + +``` sourceCode +pd.test() +``` + +with an imported pandas to run tests similarly. + #### Running the performance test suite Performance matters and it is worth considering whether your code has introduced performance regressions. *pandas* is in the process of migrating to the [asv library](https://github.com/spacetelescope/asv) to enable easy monitoring of the performance of critical *pandas* operations. These benchmarks are all found in the `pandas/asv_bench` directory. asv supports both python2 and python3. @@ -356,7 +395,7 @@ It can also be useful to run tests in your current environment. You can simply d This command is equivalent to: - asv run --quick --show-stderr --python=same + asv run --quick --show-stderr --python=same This will launch every test only once, display stderr from the benchmarks, and use your local `python` that comes from your `$PATH`. diff --git a/doc/source/comparison_with_sql.rst b/doc/source/comparison_with_sql.rst index 26e76e8c5a4f6..e76a70ba34671 100644 --- a/doc/source/comparison_with_sql.rst +++ b/doc/source/comparison_with_sql.rst @@ -372,10 +372,109 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with pd.concat([df1, df2]).drop_duplicates() +Pandas equivalents for some SQL analytic and aggregate functions +---------------------------------------------------------------- +Top N rows with offset + +.. code-block:: sql + + -- MySQL + SELECT * FROM tips + ORDER BY tip DESC + LIMIT 10 OFFSET 5; + +In pandas: + +.. ipython:: python + + tips.nlargest(10+5, columns='tip').tail(10) + +Top N rows per group + +.. code-block:: sql + + -- Oracle's ROW_NUMBER() analytic function + SELECT * FROM ( + SELECT + t.*, + ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn + FROM tips t + ) + WHERE rn < 3 + ORDER BY day, rn; + +Let's add a helper column: `RN` (Row Number) + +.. ipython:: python + + (tips.assign(rn=tips.sort_values(['total_bill'], ascending=False) + .groupby(['day']) + .cumcount() + 1) + .query('rn < 3') + .sort_values(['day','rn']) + ) + +the same using `rank(method='first')` function + +.. ipython:: python + + (tips.assign(rnk=tips.groupby(['day'])['total_bill'] + .rank(method='first', ascending=False)) + .query('rnk < 3') + .sort_values(['day','rnk']) + ) + +.. code-block:: sql + + -- Oracle's RANK() analytic function + SELECT * FROM ( + SELECT + t.*, + RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk + FROM tips t + WHERE tip < 2 + ) + WHERE rnk < 3 + ORDER BY sex, rnk; + +Let's find tips with (rank < 3) per gender group for (tips < 2). +Notice that when using ``rank(method='min')`` function +`rnk_min` remains the same for the same `tip` +(as Oracle's RANK() function) + +.. ipython:: python + + (tips[tips['tip'] < 2] + .assign(rnk_min=tips.groupby(['sex'])['tip'] + .rank(method='min')) + .query('rnk_min < 3') + .sort_values(['sex','rnk_min']) + ) + UPDATE ------ +.. code-block:: sql + + UPDATE tips + SET tip = tip*2 + WHERE tip < 2; + +.. ipython:: python + + tips.loc[tips['tip'] < 2, 'tip'] *= 2 DELETE ------ + +.. code-block:: sql + + DELETE FROM tips + WHERE tip > 9; + +In pandas we select the rows that should remain, instead of deleting them + +.. ipython:: python + + tips = tips.loc[tips['tip'] <= 9]