diff --git a/doc/source/io.rst b/doc/source/io.rst index 066a9af472c24..58eb5563a0823 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -39,6 +39,7 @@ object. * :ref:`read_json` * :ref:`read_msgpack` (experimental) * :ref:`read_html` + * :ref:`read_ga` * :ref:`read_gbq` (experimental) * :ref:`read_stata` * :ref:`read_clipboard` @@ -3496,6 +3497,65 @@ And then issue the following queries: pd.read_sql_query("SELECT * FROM data", con) +.. _io.analytics: + +Google Analytics +---------------- + +The :mod:`~pandas.io.ga` module provides a wrapper for +`Google Analytics API `__ +to simplify retrieving traffic data. +Result sets are parsed into a pandas DataFrame with a shape and data types +derived from the source table. + +Configuring Access to Google Analytics +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The first thing you need to do is to setup accesses to Google Analytics API. Follow the steps below: + +#. In the `Google Developers Console `__ + #. enable the Analytics API + #. create a new project + #. create a new Client ID for an "Installed Application" (in the "APIs & auth / Credentials section" of the newly created project) + #. download it (JSON file) +#. On your machine + #. rename it to ``client_secrets.json`` + #. move it to the ``pandas/io`` module directory + +The first time you use the :func:`read_ga` funtion, a browser window will open to ask you to authentify to the Google API. Do proceed. + +Using the Google Analytics API +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following will fetch users and pageviews (metrics) data per day of the week, for the first semester of 2014, from a particular property. + +.. code-block:: python + + import pandas.io.ga as ga + ga.read_ga( + account_id = "2360420", + profile_id = "19462946", + property_id = "UA-2360420-5", + metrics = ['users', 'pageviews'], + dimensions = ['dayOfWeek'], + start_date = "2014-01-01", + end_date = "2014-08-01", + index_col = 0, + filters = "pagePath=~aboutus;ga:country==France", + ) + +The only mandatory arguments are ``metrics,`` ``dimensions`` and ``start_date``. We can only strongly recommend you to always specify the ``account_id``, ``profile_id`` and ``property_id`` to avoid accessing the wrong data bucket in Google Analytics. + +The ``index_col`` argument indicates which dimension(s) has to be taken as index. + +The ``filters`` argument indicates the filtering to apply to the query. In the above example, the page has URL has to contain ``aboutus`` AND the visitors country has to be France. + +Detailed informations in the followings: + +* `pandas & google analytics, by yhat `__ +* `Google Analytics integration in pandas, by Chang She `__ +* `Google Analytics Dimensions and Metrics Reference `_ + .. _io.bigquery: Google BigQuery (Experimental)