The 2.0 release of the google-cloud-bigquery-storage
client is a significant
upgrade based on a next-gen code generator,
and includes substantial interface changes. Existing code written for earlier versions
of this library will likely require updates to use this version. This document
describes the changes that have been made, and what you need to do to update your usage.
If you experience issues or have questions, please file an issue.
WARNING: Breaking change
The 2.0.0 release requires Python 3.6+.
The library's top-level namespace is google.cloud.bigquery_storage
. Importing
from google.cloud.bigquery_storage_v1
still works, but it is advisable to use
the google.cloud.bigquery_storage
path in order to reduce the chance of future
compatibility issues should the library be restuctured internally.
Before:
from google.cloud.bigquery_storage_v1 import BigQueryReadClient
After:
from google.cloud.bigquery_storage import BigQueryReadClient
WARNING: Breaking change
Enum types have been moved. Access them through the types
module.
Before:
from google.cloud.bigquery_storage_v1 import enums
data_format = enums.DataFormat.ARROW
data_format = BigQueryReadClient.enums.DataFormat.ARROW
After:
from google.cloud.bigquery_storage import types
data_format = types.DataFormat.ARROW
Additionally, enums cannot be accessed through the client anymore. The following code wil not work:
data_format = BigQueryReadClient.enums.DataFormat.ARROW
WARNING: Breaking change
Clients for beta APIs have been removed. The following import will not work:
from google.cloud.bigquery_storage_v1beta1 import BigQueryStorageClient
from google.cloud.bigquery_storage_v1beta2.gapic.big_query_read_client import BigQueryReadClient
The beta APIs are still available on the server side, but you will need to use the 1.x version of the library to access them.
The client.read_rows()
method does not accept None
anymore as a valid value
for the optional metadata
argument. If not given, an empty tuple is used, but
if you want to explicitly pass an "empty" value, you should use an empty tuple, too.
Before:
client.read_rows("stream_name", metadata=None)
After:
client.read_rows("stream_name", metadata=())
OR
client.read_rows("stream_name")
WARNING: Breaking change
Most of the client methods that send requests to the backend expect request objects. We provide a script that will convert most common use cases.
One exception to this is the
BigQueryReadClient.read_rows()
which is a hand-written wrapper around the auto-generatedread_rows()
method.
- Install the library
python3 -m pip install google-cloud-bigquery-storage
- The script
fixup_bigquery_storage_v1_keywords.py
is shipped with the library. It requireslibcst
to be installed. It expects an input directory (with the code to convert) and an empty destination directory.
$ fixup_bigquery_storage_v1_keywords.py --input-directory .samples/ --output-directory samples/
Before:
from google.cloud import bigquery_storage_v1
client = bigquery_storage_v1.BigQueryReadClient()
requested_session = bigquery_storage_v1.types.ReadSession()
requested_session.table = "projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
requested_session.data_format = bigquery_storage_v1.enums.DataFormat.ARROW
session = client.create_read_session(
"projects/parent_project",
requested_session,
max_stream_count=1,
)
After:
from google.cloud import bigquery_storage
client = bigquery_storage.BigQueryReadClient()
requested_session = bigquery_storage.types.ReadSession(
table="projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID",
data_format=bigquery_storage.types.DataFormat.ARROW,
)
session = client.create_read_session(
request={
"parent": "projects/parent_project",
"read_session": requested_session,
"max_stream_count" 1,
},
)
In google-cloud-bigquery-storage<2.0.0
, parameters required by the API were positional
parameters and optional parameters were keyword parameters.
Before:
def create_read_session(
self,
parent,
read_session,
max_stream_count=None,
retry=google.api_core.gapic_v1.method.DEFAULT,
timeout=google.api_core.gapic_v1.method.DEFAULT,
metadata=None,
):
In the 2.0.0
release, methods that interact with the backend have a single
positional parameter request
. Method docstrings indicate whether a parameter is
required or optional.
Some methods have additional keyword only parameters. The available parameters depend
on the google.api.method_signature
annotation
specified by the API producer.
After:
def create_read_session(
self,
request: storage.CreateReadSessionRequest = None,
*,
parent: str = None,
read_session: stream.ReadSession = None,
max_stream_count: int = None,
retry: retries.Retry = gapic_v1.method.DEFAULT,
timeout: float = None,
metadata: Sequence[Tuple[str, str]] = (),
) -> stream.ReadSession:
NOTE: The
request
parameter and flattened keyword parameters for the API are mutually exclusive. Passing both will result in an error.
Both of these calls are valid:
session = client.create_read_session(
request={
"parent": "projects/parent_project",
"read_session": requested_session,
"max_stream_count" 1,
},
)
response = client.create_read_session(
parent="projects/parent_project",
read_session=requested_session,
max_stream_count=1,
)
This call is invalid because it mixes request
with a keyword argument
max_stream_count
. Executing this code will result in an error:
session = client.create_read_session(
request={
"parent": "projects/parent_project",
"read_session": requested_session,
},
max_stream_count=1,
)
NOTE: The
request
parameter of some methods can also contain a more rich set of options that are otherwise not available as explicit keyword only parameters, thus these must be passed throughrequest
.
WARNING: Breaking change
Several utility methods such as project_path()
and table_path()
have been removed.
These paths must now be constructed manually:
project_path = f"project/{PROJECT_ID}"
table_path = f"projects/{PROJECT_ID}/datasets/{DATASET_ID}/tables/{TABLE_ID}"
The two that remained are read_session_path()
and read_stream_path()
.
The client cannot be constructed with channel
or client_config
arguments anymore,
these deprecated parameters have been removed.
If you used client_config
to customize retry and timeout settings for a particular
method, you now need to do it upon method invocation by passing the custom timeout
and
retry
arguments, respectively.