Performance considerations when reading through http #109

tischi · 2022-07-30T08:10:04Z

Current we are using this code from java.net:

/**
     * Opens a connection to this {@code URL} and returns an
     * {@code InputStream} for reading from that connection. This
     * method is a shorthand for:
     * <blockquote><pre>
     *     openConnection().getInputStream()
     * </pre></blockquote>
     *
     * @return     an input stream for reading from the URL connection.
     * @exception  IOException  if an I/O exception occurs.
     * @see        java.net.URL#openConnection()
     * @see        java.net.URLConnection#getInputStream()
     */
    public final InputStream openStream() throws java.io.IOException {
        return openConnection().getInputStream();
    }

I wonder what that actually does?
Specifically, does the InputStream (a) already contain all the downloaded data or (b) not?

The text was updated successfully, but these errors were encountered:

tischi · 2022-07-30T08:11:39Z

Here is something to read: https://www.baeldung.com/java-download-file

tischi · 2022-07-30T08:14:03Z

@axtimwalde do you know the most performant why to completely load a txt file from an URL into memory?

axtimwalde · 2022-07-31T17:21:58Z

The InputStream does not yet contain all the downloaded data but can deliver it at request. I haven't done a performance evaluation. I believe the most significant difference between the various approaches is whether you have to load the entire file or only some parts of it via random access. This is pretty comprehensive and includes loading from URLs https://www.baeldung.com/reading-file-in-java

tischi · 2022-08-01T06:47:24Z

The InputStream does not yet contain all the downloaded data but can deliver it at request

@axtimwalde
This is interesting, because I think http requests can have a significant overhead independent of the amount of data transfer.

For example here in your code: https://github.com/saalfeldlab/n5-google-cloud/blob/master/src/main/java/org/janelia/saalfeldlab/n5/googlecloud/N5GoogleCloudStorageReader.java#L206

I would be worried that this code currently entails two http requests (one in line 206 and another one in line 207), just for reading a small text file. Downloading all the information in one go (if possible) might be more performant, what do you think?

tischi · 2022-08-01T08:02:40Z

I could not find a method that does it "in one go". There seems to be always first the step of opening the InputStream.
I tried to benchmark, reading a not so small file:

		long start;

		final String tableURL = "https://raw.githubusercontent.com/mobie/platybrowser-project/main/data/1.0.1/tables/sbem-6dpf-1-whole-segmented-cells/default.tsv";

		start = System.currentTimeMillis();
		URL url = new URL(tableURL);
		final InputStream inputStream = url.openStream();
		System.out.println("Open Table InputStream [ms]: " + ( System.currentTimeMillis() - start ));

		start = System.currentTimeMillis();
                // using apache.commons.io
		final String s = IOUtils.toString(inputStream, StandardCharsets.UTF_8.name());
		System.out.println("Read InputStream into String [ms]: " + ( System.currentTimeMillis() - start ));

and I am getting:

Open Table InputStream [ms]: 766
Read InputStream into String [ms]: 2703

More things to explore: https://stackoverflow.com/questions/309424/how-do-i-read-convert-an-inputstream-into-a-string-in-java

tischi added the discussion label Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance considerations when reading through http #109

Performance considerations when reading through http #109

tischi commented Jul 30, 2022

tischi commented Jul 30, 2022

tischi commented Jul 30, 2022

axtimwalde commented Jul 31, 2022

tischi commented Aug 1, 2022 •

edited

Loading

tischi commented Aug 1, 2022 •

edited

Loading

Performance considerations when reading through http #109

Performance considerations when reading through http #109

Comments

tischi commented Jul 30, 2022

tischi commented Jul 30, 2022

tischi commented Jul 30, 2022

axtimwalde commented Jul 31, 2022

tischi commented Aug 1, 2022 • edited Loading

tischi commented Aug 1, 2022 • edited Loading

tischi commented Aug 1, 2022 •

edited

Loading

tischi commented Aug 1, 2022 •

edited

Loading