Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance considerations when reading through http #109

Open
tischi opened this issue Jul 30, 2022 · 5 comments
Open

Performance considerations when reading through http #109

tischi opened this issue Jul 30, 2022 · 5 comments

Comments

@tischi
Copy link
Collaborator

tischi commented Jul 30, 2022

Current we are using this code from java.net:

/**
     * Opens a connection to this {@code URL} and returns an
     * {@code InputStream} for reading from that connection. This
     * method is a shorthand for:
     * <blockquote><pre>
     *     openConnection().getInputStream()
     * </pre></blockquote>
     *
     * @return     an input stream for reading from the URL connection.
     * @exception  IOException  if an I/O exception occurs.
     * @see        java.net.URL#openConnection()
     * @see        java.net.URLConnection#getInputStream()
     */
    public final InputStream openStream() throws java.io.IOException {
        return openConnection().getInputStream();
    }

I wonder what that actually does?
Specifically, does the InputStream (a) already contain all the downloaded data or (b) not?

@tischi
Copy link
Collaborator Author

tischi commented Jul 30, 2022

Here is something to read: https://www.baeldung.com/java-download-file

@tischi
Copy link
Collaborator Author

tischi commented Jul 30, 2022

@axtimwalde do you know the most performant why to completely load a txt file from an URL into memory?

@axtimwalde
Copy link
Contributor

The InputStream does not yet contain all the downloaded data but can deliver it at request. I haven't done a performance evaluation. I believe the most significant difference between the various approaches is whether you have to load the entire file or only some parts of it via random access. This is pretty comprehensive and includes loading from URLs https://www.baeldung.com/reading-file-in-java

@tischi
Copy link
Collaborator Author

tischi commented Aug 1, 2022

The InputStream does not yet contain all the downloaded data but can deliver it at request

@axtimwalde
This is interesting, because I think http requests can have a significant overhead independent of the amount of data transfer.

For example here in your code: https://github.com/saalfeldlab/n5-google-cloud/blob/master/src/main/java/org/janelia/saalfeldlab/n5/googlecloud/N5GoogleCloudStorageReader.java#L206

I would be worried that this code currently entails two http requests (one in line 206 and another one in line 207), just for reading a small text file. Downloading all the information in one go (if possible) might be more performant, what do you think?

@tischi
Copy link
Collaborator Author

tischi commented Aug 1, 2022

I could not find a method that does it "in one go". There seems to be always first the step of opening the InputStream.
I tried to benchmark, reading a not so small file:

		long start;

		final String tableURL = "https://raw.githubusercontent.com/mobie/platybrowser-project/main/data/1.0.1/tables/sbem-6dpf-1-whole-segmented-cells/default.tsv";

		start = System.currentTimeMillis();
		URL url = new URL(tableURL);
		final InputStream inputStream = url.openStream();
		System.out.println("Open Table InputStream [ms]: " + ( System.currentTimeMillis() - start ));

		start = System.currentTimeMillis();
                // using apache.commons.io
		final String s = IOUtils.toString(inputStream, StandardCharsets.UTF_8.name());
		System.out.println("Read InputStream into String [ms]: " + ( System.currentTimeMillis() - start ));

and I am getting:

Open Table InputStream [ms]: 766
Read InputStream into String [ms]: 2703

More things to explore: https://stackoverflow.com/questions/309424/how-do-i-read-convert-an-inputstream-into-a-string-in-java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants