Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Really slow csv process #220

Open
BlueFelix opened this issue Nov 4, 2019 · 1 comment
Open

[QST] Really slow csv process #220

BlueFelix opened this issue Nov 4, 2019 · 1 comment
Labels
question Further information is requested

Comments

@BlueFelix
Copy link

What is your question?
I' trying to run NYCTaxi-E2E and noted very slow csv process,

Below part takes 2min 34s on 16V100, is it normal?

%%time
X_train = taxi_df.query('day < 25').persist()

# create a Y_train ddf with just the target variable
Y_train = X_train[['fare_amount']].persist()
# drop the target variable from the training ddf
X_train = X_train[X_train.columns.difference(['fare_amount'])]

# this wont return until all data is in GPU memory
done = wait([X_train, Y_train])

@BlueFelix BlueFelix added the question Further information is requested label Nov 4, 2019
@taureandyernv
Copy link
Contributor

Hey @BlueFelix this part may be slow due to the fact that it's downloading ~300GB of data into GPU memory and bandwidth/speed can vary. I know that for me, it does take some time. I've found that getting the data is sometimes the longest part of a notebook. :). Does this help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants