Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-426] Update googledrive provider to use googledrive v3 API #276

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

TomBaxter
Copy link
Member

@TomBaxter TomBaxter commented Oct 2, 2017

Ticket

SVCS-426

Purpose

Update GoogleDrive provider to v3 of the GoogleDrive API

Changes

Substantial changes to all aspects of the provider.

Of particular note:

GD v3 API has no method of downloading GoogleDoc revisions.
This PR leaves behind GD v2 calls, in order to maintain this
functionality as long as possible.

GD v3 returns minimal representations of resources. Fields must be
specified, to be returned.

Many field names changed. Most commonly in the provider:
title -> name
modifiedDate -> modifiedTime
fileSize -> size

etags are no longer available from GoogleDrive

exportLinks are no longer available in the 'file' representation of
GoogleDoc files.

GD v3 returns lists of resources as either 'files' or 'revisions' as
opposed to GD v2 which returned 'items' for all resource lists.

Side effects

None expected

QA Notes

This provider will need a full test of all functionality.

Deployment Notes

We will need to keep an eye on 'End of Life' announcements for GoogleDrive v2 API. As we have kept v2 call for GoogleDocs revisions downloads.

@coveralls
Copy link

coveralls commented Oct 17, 2017

Coverage Status

Coverage increased (+0.08%) to 89.091% when pulling d7c8a4e on TomBaxter:feature/osf-426 into 481c9d9 on CenterForOpenScience:develop.

@TomBaxter TomBaxter changed the title [WIP] Update googledrive provider to use googledrive v3 API Update googledrive provider to use googledrive v3 API Oct 17, 2017
Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly style changes + a few questions to start.

@cslzchen cslzchen changed the title Update googledrive provider to use googledrive v3 API [SVCS-426] Update googledrive provider to use googledrive v3 API Oct 30, 2017
Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very long line length change needed.
One moving of functions change.
after this, will be good for next phase

@@ -75,10 +74,12 @@ class GoogleDriveProvider(provider.BaseProvider):
NAME = 'googledrive'
BASE_URL = settings.BASE_URL
FOLDER_MIME_TYPE = 'application/vnd.google-apps.folder'
FILE_FIELDS = {'fields': 'version, id, name, size, modifiedTime, createdTime, mimeType, webViewLink, webContentLink, md5Checksum, capabilities(canDelete, canEdit, canTrash, canDownload, canRename, canReadRevisions, canShare, canCopy)'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line length

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed.

@@ -140,6 +141,45 @@ def can_intra_move(self,
path=None) -> bool:
return self == other

def build_move_url(self, src_path, dest_path):
Copy link
Contributor

@AddisonSchiller AddisonSchiller Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you split up can_intra_move and can_intra_copy with all the build URL methods.
Move all the can_<do something> methods so they are all right by each other again (including can_duplicate_names

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed.

@coveralls
Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage increased (+0.9%) to 90.003% when pulling e790d6d on TomBaxter:feature/osf-426 into 473191c on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 13, 2017

Coverage Status

Coverage increased (+0.06%) to 90.003% when pulling b0e31a2 on TomBaxter:feature/osf-426 into 26bf209 on CenterForOpenScience:develop.

Copy link
Contributor

@AddisonSchiller AddisonSchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Ready for next phase.

@cslzchen
Copy link
Contributor

@TomBaxter Let's sit down and do an in-person review for this.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick look view a few questions. Let's discuss in person @TomBaxter 🎆

@@ -159,7 +159,8 @@ def created_utc(self):

@property
def etag(self):
return self.raw['etag']
# Google Doc revision representations do not return etag
return '{}::{}'.format(self.raw['id'], self.raw['modifiedTime'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we use etag or its alternative?
etag tracks both file and metadata change while modifiedTime only tracks file content change. Can this be a problem? Is it possible to still use v2 API for etag?
It is possible for file have the different modifiedTime but the same hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I brought this up with @felliott at the time of writing PR. Can't remember the specifics of the conversation though. Let's revisit with him. Not sure which is the less of two evils. Sub par etag or staying on V2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More question for @felliott: What do we use etag for?

'id': dest_path.parent.identifier
}],
'title': dest_path.name
'name': dest_path.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dropping 'parents' intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For myself, double check where parents is retrieved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_docs_file = drive_utils.is_docs_file(metadata.raw)
# GoogleDrive v3 API does not allow the download of GoogleDoc revisions
# Use v2 API for this functionality
if revision and is_docs_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The if..elif...else... makes the logic much clearer than metadata.raw.get('downloadUrl') or drive_utils.get_export_link(metadata.raw). I used to have a hard time understanding the how it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to make the meaning of the four conditions more clear.

@@ -146,6 +149,45 @@ def can_intra_copy(self,
# gdrive doesn't support intra-copy on folders
return self == other and (path and path.is_file)

def build_move_url(self, src_path, dest_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many methods here. It is not intuitive to use. Let's discuss if we can make only two util methods: build_v3_url() and build_v2_url()? The actions are parsed as arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made them separate for ease of use in tests. Have now re-factored them all out. Complete.

# GoogleDrive v3 API does not allow the download of GoogleDoc revisions
# Use v2 API for this functionality
if revision and is_docs_file:
meta_url = self.build_url('files', path.identifier, 'revisions', revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, let's make a build_url_v2().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do this, however, I was holding off creating a new method for logic that will only be run in one place and likely removed entirely in the future. If it turns out that we want to use v2 for retrieving etags then it might make more sense to me to make build_url_v2 it's own thing.

meta_url = self.build_url('files', path.identifier, 'revisions', revision)
meta_url = meta_url.replace('/v3/', '/v2/', 1)

async with self.request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some difference between the old and the new logic. Let's discuss.


def _build_upload_metadata(self, folder_id: str, name: str) -> dict:
return {
'parents': [
{
'kind': 'drive#parentReference',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to V3 upgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it was coded that way previously. I don't believe there was a change in the API. And I don't believe it was coded correctly originally.

@@ -24,6 +24,7 @@
'type': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
},
]
DOCS_MIMES = [format['mime_type'] for format in DOCS_FORMATS]
DOCS_DEFAULT_FORMAT = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add 'mime_type': None to the default one?

Copy link
Member Author

@TomBaxter TomBaxter Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked code. mime_type is only referenced after a check for is_google_doc , so adding mime_type = '' for consistency, though it currently will never be referenced.

@@ -685,22 +763,22 @@ def _build_upload_metadata(self, folder_id: str, name: str) -> dict:
resp = await self.make_request(
'GET',
self.build_url('files',
q="'{}' in parents".format(file_id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have trashed = false before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prevents returning deleted files.

if drive_utils.is_docs_file(data):
if can_access_revisions:
if data['capabilities'].get('canReadRevisions', None) and data['capabilities']['canReadRevisions']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

if data.get('capabilities', {}).get('canReadRevisions', None):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do the same thing, but yours is shorter. Change made.

@coveralls
Copy link

coveralls commented Dec 18, 2017

Coverage Status

Coverage increased (+0.03%) to 89.802% when pulling f3d87c5 on TomBaxter:feature/osf-426 into 3ec764a on CenterForOpenScience:develop.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Looks good to me. I will test it fully locally (and make a PR with minor changes if necessary).

@coveralls
Copy link

coveralls commented Jan 3, 2018

Coverage Status

Coverage increased (+0.03%) to 89.898% when pulling 85d497b on TomBaxter:feature/osf-426 into cae41fc on CenterForOpenScience:develop.

[SVCS-426 ]
GoogleDrive migration guide
https://developers.google.com/drive/v3/web/migration

Of particular note:

GD v3 API has no method of downloading GoogleDoc revisions.
This PR leaves behind GD v2 calls, in order to maintain this
functionality as long as possible.

GD v3 returns minimal representations of resources. Fields must be
specified, to be returned.

Many field names changed. Most commonly in the provider:
title -> name
modifiedDate -> modifiedTime
fileSize -> size

etags are no longer available from GoogleDrive

exportLinks are no longer available in the 'file' representation of
GoogleDoc files.

GD v3 returns lists of resources as either 'files' or 'revisions' as
opposed to GD v2 which returned 'items' for all resource lists.
@coveralls
Copy link

coveralls commented Jan 5, 2018

Coverage Status

Coverage increased (+0.03%) to 89.907% when pulling ac5dc54 on TomBaxter:feature/osf-426 into f17d276 on CenterForOpenScience:develop.

@cslzchen cslzchen removed the Add'l Dev label Jan 5, 2018
@cslzchen
Copy link
Contributor

cslzchen commented Jan 5, 2018

Update: secondary PR https://github.com/TomBaxter/waterbutler/pull/1 pending.

@coveralls
Copy link

coveralls commented Jan 8, 2018

Coverage Status

Coverage increased (+0.03%) to 89.912% when pulling 6be0f45 on TomBaxter:feature/osf-426 into f17d276 on CenterForOpenScience:develop.

@cslzchen
Copy link
Contributor

cslzchen commented Jan 9, 2018

Update: a second secondary PR https://github.com/TomBaxter/waterbutler/pull/2 pending.

- gdrive metadata and provider tests
[SVCS-426] Update Style and Trivial Code Change for Tests
@coveralls
Copy link

coveralls commented Jan 9, 2018

Coverage Status

Coverage increased (+0.1%) to 90.019% when pulling 9f314e9 on TomBaxter:feature/osf-426 into f17d276 on CenterForOpenScience:develop.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and move to PCR. 🎆 🎆 🎆 🎆 🎆

@felliott The side effect of the style refactor we did is that the diffs are harder to read. You can refer to the old commit: ac5dc54 for better diffs.

]

# Use dummy ID if no revisions found
metadata = await self.metadata(path, raw=True)
# TODO: considering keep using v2 to obtain etag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 2: etag is no longer available through v3, but we can still use v2.\

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 2: please be ware that using id, modifiedTime (and even with md5Checksum) is not equivalent to etag. Here is a useful discussion I found: https://stackoverflow.com/questions/42174600/alternative-for-etag-field-in-google-drive-v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants