Skip to content

Commit 21f49da

Browse files
authored
DOC: Update "writing" docs with information about schema inference. (#259)
* DOC: Update "writing" docs with information about schema inference. This commit started as a clean-up change to remove the unnecessary pandas_gbq.gbq._update_bq_schema method, but I then also updated the docs for to_gbq to be more clear about how the table_schema argument is to be used. I added a section to the writing.rst how-to guide about the table_schema parameter as well. Some of the "notes" in writing.rst were better as their own subsections. I moved the note on not to use BigQuery as a transactional database to the landing page. I link to the BigQuery sandox docs in the warning about creating a BigQuery account because you can follow those instructions to use BigQuery without entering credit card information. * Blacken
1 parent 547812e commit 21f49da

File tree

7 files changed

+115
-66
lines changed

7 files changed

+115
-66
lines changed

docs/source/conf.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,12 @@
1616
# add these directories to sys.path here. If the directory is relative to the
1717
# documentation root, use os.path.abspath to make it absolute, like shown here.
1818
#
19+
import datetime
1920
import os
2021
import sys
2122

23+
import pandas_gbq
24+
2225
# sys.path.insert(0, os.path.abspath('.'))
2326

2427
# -- General configuration ------------------------------------------------
@@ -62,17 +65,19 @@
6265

6366
# General information about the project.
6467
project = u"pandas-gbq"
65-
copyright = u"2017, PyData Development Team"
68+
copyright = u"2017-{}, PyData Development Team".format(
69+
datetime.datetime.now().year
70+
)
6671
author = u"PyData Development Team"
6772

6873
# The version info for the project you're documenting, acts as replacement for
6974
# |version| and |release|, also used in various other places throughout the
7075
# built documents.
7176
#
7277
# The short X.Y version.
73-
version = u"0.1.0"
78+
version = pandas_gbq.__version__
7479
# The full version, including alpha/beta/rc tags.
75-
release = u"0.1.0"
80+
release = version
7681

7782
# The language for content autogenerated by Sphinx. Refer to documentation
7883
# for a list of supported languages.

docs/source/index.rst

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,24 @@ Welcome to pandas-gbq's documentation!
88

99
The :mod:`pandas_gbq` module provides a wrapper for Google's BigQuery
1010
analytics web service to simplify retrieving results from BigQuery tables
11-
using SQL-like queries. Result sets are parsed into a pandas
12-
DataFrame with a shape and data types derived from the source table.
13-
Additionally, DataFrames can be inserted into new BigQuery tables or appended
14-
to existing tables.
11+
using SQL-like queries. Result sets are parsed into a :class:`pandas.DataFrame`
12+
with a shape and data types derived from the source table. Additionally,
13+
DataFrames can be inserted into new BigQuery tables or appended to existing
14+
tables.
1515

1616
.. warning::
1717

18-
To use this module, you will need a valid BigQuery account. Refer to the
19-
`BigQuery Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__
20-
for details on the service itself.
18+
To use this module, you will need a valid BigQuery account. Use the
19+
`BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__ to
20+
try the service for free.
21+
22+
While BigQuery uses standard SQL syntax, it has some important differences
23+
from traditional databases both in functionality, API limitations (size and
24+
quantity of queries or uploads), and how Google charges for use of the
25+
service. BiqQuery is best for analyzing large sets of data quickly. It is not
26+
a direct replacement for a transactional database. Refer to the `BigQuery
27+
Documentation <https://cloud.google.com/bigquery/what-is-bigquery>`__ for
28+
details on the service itself.
2129

2230
Contents:
2331

@@ -29,7 +37,6 @@ Contents:
2937
howto/authentication.rst
3038
reading.rst
3139
writing.rst
32-
tables.rst
3340
api.rst
3441
contributing.rst
3542
changelog.rst

docs/source/tables.rst

Lines changed: 0 additions & 16 deletions
This file was deleted.

docs/source/writing.rst

Lines changed: 46 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
Writing DataFrames
44
==================
55

6-
Assume we want to write a DataFrame ``df`` into a BigQuery table using
7-
:func:`~pandas_gbq.to_gbq`.
6+
Assume we want to write a :class:`~pandas.DataFrame` named ``df`` into a
7+
BigQuery table using :func:`~pandas_gbq.to_gbq`.
88

99
.. ipython:: python
1010
@@ -21,40 +21,62 @@ Assume we want to write a DataFrame ``df`` into a BigQuery table using
2121
2222
.. code-block:: python
2323
24-
to_gbq(df, 'my_dataset.my_table', projectid)
24+
import pandas_gbq
25+
pandas_gbq.to_gbq(df, 'my_dataset.my_table', project_id=projectid)
2526
26-
.. note::
27+
The destination table and destination dataset will automatically be created
28+
if they do not already exist.
2729

28-
The destination table and destination dataset will automatically be created if they do not already exist.
2930

30-
The ``if_exists`` argument can be used to dictate whether to ``'fail'``, ``'replace'``
31-
or ``'append'`` if the destination table already exists. The default value is ``'fail'``.
31+
Writing to an Existing Table
32+
----------------------------
33+
34+
Use the ``if_exists`` argument to dictate whether to ``'fail'``,
35+
``'replace'`` or ``'append'`` if the destination table already exists. The
36+
default value is ``'fail'``.
3237

3338
For example, assume that ``if_exists`` is set to ``'fail'``. The following snippet will raise
3439
a ``TableCreationError`` if the destination table already exists.
3540

3641
.. code-block:: python
3742
38-
to_gbq(df, 'my_dataset.my_table', projectid, if_exists='fail')
43+
import pandas_gbq
44+
pandas_gbq.to_gbq(
45+
df, 'my_dataset.my_table', project_id=projectid, if_exists='fail',
46+
)
47+
48+
If the ``if_exists`` argument is set to ``'append'``, the destination
49+
dataframe will be written to the table using the defined table schema and
50+
column types. The dataframe must contain fields (matching name and type)
51+
currently in the destination table.
52+
53+
54+
.. _writing-schema:
55+
56+
Inferring the Table Schema
57+
--------------------------
3958

40-
.. note::
59+
The :func:`~pandas_gbq.to_gbq` method infers the BigQuery table schema based
60+
on the dtypes of the uploaded :class:`~pandas.DataFrame`.
4161

42-
If the ``if_exists`` argument is set to ``'append'``, the destination
43-
dataframe will be written to the table using the defined table schema and
44-
column types. The dataframe must contain fields (matching name and type)
45-
currently in the destination table.
62+
========================= ==================
63+
dtype BigQuery Data Type
64+
========================= ==================
65+
i (integer) INTEGER
66+
b (boolean) BOOLEAN
67+
f (float) FLOAT
68+
O (object) STRING
69+
S (zero-terminated bytes) STRING
70+
U (Unicode string) STRING
71+
M (datetime) TIMESTAMP
72+
========================= ==================
4673

47-
.. note::
74+
If the data type inference does not suit your needs, supply a BigQuery schema
75+
as the ``table_schema`` parameter of :func:`~pandas_gbq.to_gbq`.
4876

49-
If an error occurs while streaming data to BigQuery, see
50-
`Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors>`__.
5177

52-
.. note::
78+
Troubleshooting Errors
79+
----------------------
5380

54-
While BigQuery uses SQL-like syntax, it has some important differences
55-
from traditional databases both in functionality, API limitations (size
56-
and quantity of queries or uploads), and how Google charges for use of the
57-
service. You should refer to `Google BigQuery documentation
58-
<https://cloud.google.com/bigquery/docs>`__ often as the service is always
59-
evolving. BiqQuery is best for analyzing large sets of data quickly, but
60-
it is not a direct replacement for a transactional database.
81+
If an error occurs while writing data to BigQuery, see
82+
`Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors>`__.

noxfile.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
import os
77
import os.path
8+
import shutil
89

910
import nox
1011

@@ -51,6 +52,28 @@ def cover(session, python=latest_python):
5152
session.run("coverage", "erase")
5253

5354

55+
@nox.session(python=latest_python)
56+
def docs(session):
57+
"""Build the docs."""
58+
59+
session.install("-r", os.path.join("docs", "requirements-docs.txt"))
60+
session.install("-e", ".")
61+
62+
shutil.rmtree(os.path.join("docs", "source", "_build"), ignore_errors=True)
63+
session.run(
64+
"sphinx-build",
65+
"-W", # warnings as errors
66+
"-T", # show full traceback on exception
67+
"-N", # no colors
68+
"-b",
69+
"html",
70+
"-d",
71+
os.path.join("docs", "source", "_build", "doctrees", ""),
72+
os.path.join("docs", "source", ""),
73+
os.path.join("docs", "source", "_build", "html", ""),
74+
)
75+
76+
5477
@nox.session(python=supported_pythons)
5578
def system(session):
5679
session.install("pytest", "pytest-cov")

pandas_gbq/gbq.py

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -937,14 +937,18 @@ def to_gbq(
937937
List of BigQuery table fields to which according DataFrame
938938
columns conform to, e.g. ``[{'name': 'col1', 'type':
939939
'STRING'},...]``.
940-
If schema is not provided, it will be
941-
generated according to dtypes of DataFrame columns.
942-
If schema is provided, it may contain all or a subset of DataFrame
943-
columns. If a subset is provided, the rest will be inferred from
944-
the DataFrame dtypes.
945-
pandas_gbq.gbq._generate_bq_schema() may be used to create an
946-
initial schema, though it doesn't preserve column order.
947-
See BigQuery API documentation on available names of a field.
940+
941+
- If ``table_schema`` is provided, it may contain all or a subset of
942+
DataFrame columns. If a subset is provided, the rest will be
943+
inferred from the DataFrame dtypes.
944+
- If ``table_schema`` is **not** provided, it will be
945+
generated according to dtypes of DataFrame columns. See
946+
`Inferring the Table Schema
947+
<https://pandas-gbq.readthedocs.io/en/latest/writing.html#writing-schema>`__.
948+
for a description of the schema inference.
949+
950+
See `BigQuery API documentation on valid column names
951+
<https://cloud.google.com/bigquery/docs/schemas#column_names`>__.
948952
949953
.. versionadded:: 0.3.1
950954
location : str, optional
@@ -985,6 +989,7 @@ def to_gbq(
985989
"""
986990

987991
_test_google_api_imports()
992+
from pandas_gbq import schema
988993

989994
if verbose is not None and SHOW_VERBOSE_DEPRECATION:
990995
warnings.warn(
@@ -1029,7 +1034,7 @@ def to_gbq(
10291034
if not table_schema:
10301035
table_schema = default_schema
10311036
else:
1032-
table_schema = _update_bq_schema(
1037+
table_schema = schema.update_schema(
10331038
default_schema, dict(fields=table_schema)
10341039
)
10351040

@@ -1091,15 +1096,16 @@ def generate_bq_schema(df, default_type="STRING"):
10911096

10921097

10931098
def _generate_bq_schema(df, default_type="STRING"):
1094-
from pandas_gbq import schema
1099+
"""DEPRECATED: Given a dataframe, generate a Google BigQuery schema.
10951100
1096-
return schema.generate_bq_schema(df, default_type=default_type)
1097-
1098-
1099-
def _update_bq_schema(schema_old, schema_new):
1101+
This is a private method, but was used in external code to work around
1102+
issues in the default schema generation. Now that individual columns can
1103+
be overridden: https://github.com/pydata/pandas-gbq/issues/218, this
1104+
method can be removed after there is time to migrate away from this
1105+
method. """
11001106
from pandas_gbq import schema
11011107

1102-
return schema.update_schema(schema_old, schema_new)
1108+
return schema.generate_bq_schema(df, default_type=default_type)
11031109

11041110

11051111
class _Table(GbqConnector):

pandas_gbq/schema.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ def generate_bq_schema(dataframe, default_type="STRING"):
1111
does not exist in the schema.
1212
"""
1313

14+
# If you update this mapping, also update the table at
15+
# `docs/source/writing.rst`.
1416
type_mapping = {
1517
"i": "INTEGER",
1618
"b": "BOOLEAN",

0 commit comments

Comments
 (0)