Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use resource as an argument? #251

Open
peterdesmet opened this issue Jul 18, 2024 · 2 comments
Open

Should we use resource as an argument? #251

peterdesmet opened this issue Jul 18, 2024 · 2 comments
Labels
function:get Functions get_schema(), get_resource() refactor

Comments

@peterdesmet
Copy link
Member

The following helper functions have package and resource_name as arguments:

  • locale()
  • read_from_path()
  • write_resource()

But they are called from functions that already have the resource. That means that currently, they recall get_resource(), in the case for locale() for a third time: read_resource() -> read_from_path() -> locale()

This could be simplified by passing resource as an argument

  • locale(): needs resource$encoding and fields. Calls get_schema()
  • read_from_path(): needs resource$dialect, schema and field_names. Calls get_schema()
  • write_resource(): needs resource

get_schema() needs package$directory

The problem with get_schema() is that is needs package$directory, so merely passing the resource is not enough. Solutions:

  1. Adding a directory argument to get_schema(): no, this is clunky
  2. Copying the package directory to resource as part of get_resource(). That way, get_schema() can get it from there. It can be removed as part of write_resource()
  3. Other?

Using resource for get_schema()

If the helper functions use resource as an argument, one could argue to also use it for the public function get_schema(resource). If we do that, get_resource() should become a public function too, which can be useful (especially for developers). The downside is that (public) functions that use resource should then check if it is valid. Most of the get_resource() checks should then move to a check_resource() function.

@damianooldoni @khusmann others... Thoughts?

@peterdesmet peterdesmet added function:get Functions get_schema(), get_resource() refactor labels Jul 18, 2024
@damianooldoni
Copy link
Contributor

Thanks @peterdesmet for the nice summary. Here below my two cents

get_schema() needs package$directory

I like option 2, adding directory to resource. Maybe you would like to add it as attribute instead? Benefit: you don't need to remove it in write_resource. It is in some way "hidden", but still accessible, e.g. attr(x$resources[[1]], "package_directory").

Here a reprex:

library(frictionless)

x <- example_package()

# Add directory as attribute of the resources
x$resources <- purrr::map(x$resources, ~ {
  attr(.x, "package_directory") <- x$directory
  .x
})

# Show the added attribute by printing all attributes of the resources
purrr::walk(x$resources, ~ print(attributes(.x)))
#> $names
#> [1] "name"      "path"      "profile"   "title"     "format"    "mediatype"
#> [7] "encoding"  "schema"   
#> 
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"
#> 
#> $names
#> [1] "name"      "path"      "profile"   "title"     "format"    "mediatype"
#> [7] "encoding"  "schema"   
#> 
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"
#> 
#> $names
#> [1] "name"    "data"    "profile" "title"   "schema" 
#> 
#> $package_directory
#> [1] "C:/R/library/frictionless/extdata"

Created on 2024-07-18 with reprex v2.1.0

Using resource for get_schema()

Indeed, I agree with all the statements here. In particular, it sounds to me not completely logical that the a user can read the data of a resource, but not getting it "as it is".

Moving the resource package checks to a check_resource() is also very nice as it makes the checks more modular and so better software speaking (better to debug/refactor/read...).

@khusmann
Copy link
Contributor

Maybe you would like to add it as attribute instead?

For a while I've been thinking about an approach close to this, but where resources are loaded by read_resource as subclassed tibbles and contain all the resource metadata in a "frictionless" attribute. (This would scope all the frictionless attrs inside the tibble, separating them from other tibble attrs related to its display, e.g. class, row.names, etc).

But I guess here we're talking about just resource descriptors, without their data loaded. Still, I think a resource class would be an interesting approach -- this class could be upgraded to the subclassed tibble when the user decided to load data (but then after loading data the object would still work with all the original descriptor fns)

The downside is that (public) functions that use resource should then check if it is valid.

The nice thing about making the resource descriptor its own class, is that you wouldn't need to always be scanning the descriptor to see if it was valid if you only allowed valid resources to be stamped with the custom class. (So the class operates in a sort of newtype pattern)

This would mean that you would not want to allow resource descriptors to be mutated directly; instead you'd provide an API that would allow the user to add / remove descriptor properties but always verify they resulted in a valid descriptor. (Similar to what I suggest in #198)

I'll see about pulling together an example to demonstrate more of what I mean...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
function:get Functions get_schema(), get_resource() refactor
Projects
None yet
Development

No branches or pull requests

3 participants