Navigating sub-directories/buckets? #61

justinperkins · 2012-03-02T04:27:58Z

Once you dig into a bucket and retrieve the object contents, you just get a giant list of everything. When you're trying to read/list contents on a per-directory basis this proves difficult.

Any way or future plan to allow navigating through sub-directories (or buckets, if that's what they really are)?

qoobaa · 2012-03-02T15:43:18Z

Hm, that'd be nice thing to have. If you have an idea how to implement that, please submit a pull request.

Cheers.

chouck · 2012-03-03T13:52:51Z

I had the same issue yesterday. The comment about sending a :delimiter to find_all in objects_extension.rb implies that is the way to do it, but the code doesn't parse the returned data correctly.

I ended up adding the following code locally to gain this functionality:

module S3
  class Bucket
    def directory_list(options = {})
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)
      parse_directory_list_result(response.body)
    end

    def parse_directory_list_result(xml)
      names = []
      rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") { |e| names << e.text }
      names
    end
  end
end

And then just call it with

bucket.directory_list :prefix => "foo/bar/baz/"

Sorry its not a full blown pull request, but I don't have the time to make one right now. You are, of course, welcome to do whatever you want with this. I'd suggest also removing the documentation about :delimiter from objects_extension.rb, since its a bit of a red-herring.

Thanks for writing this module in the first place,
-Chris

qoobaa · 2012-03-05T09:54:51Z

Chouck, can you try to add some tests, and create a pull request for that?

justinperkins · 2012-03-06T16:28:45Z

I'm not really sure if that patch really solves the issue I was experiencing. I want to list just the top-level directories within a given bucket, then with each one of those, this patch becomes effective since you can take a given directory/sub-bucket and pass it into the directory_list method.

UPDATE: This works wonderfully. Sorry for the preemptive comment.

justinperkins · 2012-03-06T19:34:32Z

(sorry for spam)

To get objects within a given subdirectory this patch does not totally solve the problem, you still have to iterate over the entire collection and select just the objects you care about. ala ...

all_objects_in_my_bucket = s3_service.buckets.find('some bucket').objects
objects_grouped_by_sub_dir = s3_service.buckets.find('some bucket').directory_list(:prefix => 'some directory with many sub directories').inject({}) { |memo, dir|
  memo[dir] = all_objects_in_my_bucket.select { |o| o.key.include?(dir) }
  memo
}

There's got to be a better way.

chouck · 2012-03-08T19:19:43Z

True, but I think you and I are trying to solve different problems.

I have a huge tree of data and I wanted a list of sub-directories 3 layers down (which are dynamically generated, so I don't have a fixed list). I don't want all of the files in each of those sub-directories, in fact, I only want one or two out of the thousands that are in each sub-tree.

If I'm understanding what you are saying, it sounds like you want something more like:

my_bucket = s3_service.buckets.find('some bucket')
prefix_list = my_bucket.directory_list(:prefix => 'some directory with many sub directories')
prefix_list.each { |prefix| objects_grouped_by_sub_dir[prefix] = my_bucket.objects.find_all(:prefix => prefix)}

justinperkins · 2012-03-08T21:12:00Z

Yes! Guess I should've dug in on the source some more. Thanks.

ericmwalsh · 2017-02-08T22:56:11Z

Sorry to resurrect an old thread but I needed this feature very much, any progress with this or a linked PR? I wouldn't mind creating it!

ericmwalsh · 2017-02-09T01:28:21Z

Also I expanded on what @chouck created:

module S3
  class Bucket
    # this method recurses if the response coming back
    # from S3 includes a truncation flag (IsTruncated == 'true')
    # then parses the combined response(s) XML body
    # for CommonPrefixes/Prefix AKA directories
    def directory_list(options = {}, responses = [])
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)

      if is_truncated?(response.body)
        directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
      else
        parse_xml_array(responses + [response.body], options)
      end
    end

    private

    def parse_xml_array(xml_array, options = {}, clean_path = true)
      names = []
      xml_array.each do |xml|
        rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
          if clean_path
            names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
          else
            names << e.text
          end
        end
      end
      names
    end

    def next_marker(xml)
      marker = nil
      rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
      if marker.nil?
        raise StandardError
      else
        marker
      end
    end

    def is_truncated?(xml)
      is_truncated = nil
      rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
      is_truncated == 'true'
    end
  end
end

This handles listing out directories when you run into a key limit (due to the S3 API MaxKeys hard limit of 1000 keys). The request will recurse and grab all responses before parsing and returning them. I also added the ability to return "clean directory names" (folder names only) in lieu of returning the entire key/path.

qoobaa added the bug label Jul 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Navigating sub-directories/buckets? #61

Navigating sub-directories/buckets? #61

justinperkins commented Mar 2, 2012

qoobaa commented Mar 2, 2012

chouck commented Mar 3, 2012

qoobaa commented Mar 5, 2012

justinperkins commented Mar 6, 2012

justinperkins commented Mar 6, 2012

chouck commented Mar 8, 2012

justinperkins commented Mar 8, 2012

ericmwalsh commented Feb 8, 2017

ericmwalsh commented Feb 9, 2017 •

edited

Loading

Navigating sub-directories/buckets? #61

Navigating sub-directories/buckets? #61

Comments

justinperkins commented Mar 2, 2012

qoobaa commented Mar 2, 2012

chouck commented Mar 3, 2012

qoobaa commented Mar 5, 2012

justinperkins commented Mar 6, 2012

justinperkins commented Mar 6, 2012

chouck commented Mar 8, 2012

justinperkins commented Mar 8, 2012

ericmwalsh commented Feb 8, 2017

ericmwalsh commented Feb 9, 2017 • edited Loading

ericmwalsh commented Feb 9, 2017 •

edited

Loading