-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[copilot][flytedirectory] multipart blob download #5715
[copilot][flytedirectory] multipart blob download #5715
Conversation
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5715 +/- ##
==========================================
+ Coverage 36.85% 36.90% +0.04%
==========================================
Files 1310 1310
Lines 131246 131372 +126
==========================================
+ Hits 48377 48477 +100
- Misses 78670 78682 +12
- Partials 4199 4213 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wayner0628 - i think this is good. I want to get @eapolinario or @EngHabu to take a quick look at this as well though. This is a pretty core interface that's changing in this PR.
flytestdlib/storage/storage.go
Outdated
@@ -78,6 +78,9 @@ type RawStore interface { | |||
// Head gets metadata about the reference. This should generally be a light weight operation. | |||
Head(ctx context.Context, reference DataReference) (Metadata, error) | |||
|
|||
// GetItems retrieves the paths of all items from the Blob store or an error | |||
GetItems(ctx context.Context, reference DataReference) ([]string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be more accurately named ListItems? Also what is retrieved? The relative path to the reference input? can we add comment?
flytestdlib/storage/mem_store.go
Outdated
@@ -54,6 +55,23 @@ func (s *InMemoryStore) Head(ctx context.Context, reference DataReference) (Meta | |||
}, nil | |||
} | |||
|
|||
func (s *InMemoryStore) GetItems(ctx context.Context, reference DataReference) ([]string, error) { | |||
var items []string | |||
prefix := string(reference) + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will reference ever already have a /?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @wayner0628
Can you test cases like this PR?
flyteorg/flytekit#2258
To be more specifically, this case
flyte_dir_io = ContainerTask(
name="flyte_dir_io",
input_data_dir="/var/inputs",
output_data_dir="/var/outputs",
inputs=kwtypes(inputs=FlyteDirectory),
outputs=kwtypes(out=FlyteDirectory),
image="futureoutlier/rawcontainer:0320",
command=[
"python",
"write_flytedir.py",
"{{.inputs.inputs}}",
"/var/outputs/out",
],
)
If possible, please proivde screenshot, thank you.
There is also this PR, https://github.com/flyteorg/flyte/pull/5674/files which I think we should merge first. The change to core api should probably be done separately. |
@wayner0628 #5741 this was just merged, adding a list api to the storage client. mind using the new interface to do this? |
@wild-endeavor No problem, I'll update this PR to align with the new interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tips to develop copilot in single binary.
- config
plugins:
logs:
dynamic-log-links:
- comet-ml-execution-id:
displayName: Comet
templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .executionName }}{{ .nodeId }}{{ .taskRetryAttempt }}{{ .taskConfig.link_suffix }}"
- comet-ml-custom-id:
displayName: Comet
templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .taskConfig.experiment_key }}"
kubernetes-enabled: true
kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: "http://flyte-sandbox-minio.flyte:9000"
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
- MLFLOW_TRACKING_URI: postgresql+psycopg2://postgres:@postgres.flyte.svc.cluster.local:5432/flyteadmin
co-pilot:
image: "localhost:30000/copilot-flytefile:0603"
- how to build copilot image?
useDockerfile.flytecopilot
to build it.
…ltipart-blob Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Hi all, I've tested the downloader with a Python task that writes to a FlyteDirectory, and the raw container is able to read it successfully, as shown below: Additionally, I tested the scenario where a raw container writes a directory, and a downstream Python task reads it. It appears that the sidecar (uploader for the container task) does not currently support multi-blob tasks. Addressing this limitation will require more time, but since we already have the downloader with directory support, I suggest we proceed with merging this PR and address the sidecar support in a follow-up PR. (#5924) I'll update the testing .py code and this screenshot in the description, thank you! |
It's looking pretty good, will come here at Thursday or Friday, thank you Wayner. |
Signed-off-by: wayner0628 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo:
- 1 download error, return error in
handleBlob
- upload todo: recursive directory upload (create an issue first)
os.MkdirAll(dir, 0777)
add comments (it will handle case if dir exist already)- success count error handling?
- readerSuccess, writer success
- add comments to explain life cycles and error handling
https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L2177-L2201 - comments (go routien leaks, error handling...)
- add comments to tell others list api handle recursive case already
- add comments to tell user the down code is single blob
- todo: add comments that we should have timeout
- create an issue about sidecar rename to upload
- ping fabio and Buğra Gedik to take a look at mem store changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review from @eapolinario
- can stow api install more than 1 file a time (We think not)
- let's not set a limit of files we can download now (may be add comments to do this if someone hit rate limit in the future)
- follow up: use chunk to download it (don't need to do it in this PR)
Thank you @wayner0628 , lots of people want it for a LONG TIME.
let's add comments and fix the error handling then merge it
Signed-off-by: wayner0628 <[email protected]>
Hi @Future-Outlier, I’ve pushed some modifications based on your feedback—please take a look!
I’m not sure what this means. Could you clarify what changes you’d like here? |
Yes no problem, will do it today. |
Signed-off-by: Wei-Yu Kao <[email protected]>
Co-authored-by: Han-Ru Chen (Future-Outlier) <[email protected]> Signed-off-by: Wei-Yu Kao <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you Wayner
Tracking issue
#3632
Why are the changes needed?
Supporting multipart blob downloads allows us to completely copy the specified directory into the input path.
What changes were proposed in this pull request?
List
api to collect items under container before downloadList
api for memory storageHow was this patch tested?
Unit testing and E2E testing
Setup process
docker build -f Dockerfile.flytecopilot -t my-flytecopilot-app:latest . docker tag my-flytecopilot-app:latest localhost:30000/my-flytecopilot-app:latest docker push localhost:30000/my-flytecopilot-app:latest
Screenshots
Unit testing
E2E testing
Check all the applicable boxes
Related PRs
flyteorg/flytekit#2258
Docs link
NA