Releases: r-lib/vctrs
vctrs 0.3.0
This version features an overhaul of the coercion system to make it
more consistent and easier to implement. See the Breaking changes
and Type system sections for details.
There are three new documentation topics if you'd like to learn how to
implement coercion methods to make your class compatible with
tidyverse packages like dplyr:
-
https://vctrs.r-lib.org/reference/theory-faq-coercion.html for an
overview of the coercion mechanism in vctrs. -
https://vctrs.r-lib.org/reference/howto-faq-coercion.html for a
practical guide about implementing methods for vectors. -
https://vctrs.r-lib.org/reference/howto-faq-coercion-data-frame.html
for a practical guide about implementing methods for data frames.
Reverse dependencies troubleshooting
The following errors are caused by breaking changes.
-
"Can't convert <character> to <list>."
vec_cast()
no longer converts to list. Usevec_chop()
or
as.list()
instead. -
"Can't convert <integer> to <character>."
vec_cast()
no longer converts to character. Useas.character()
to
deparse objects. -
"names for target but not for current"
Names of list-columns are now preserved by
vec_rbind()
. Adjust
tests accordingly.
Breaking changes
-
Double-dispatch methods for
vec_ptype2()
andvec_cast()
are no
longer inherited (#710). Class implementers must implement one set
of methods for each compatible class.For example, a tibble subclass no longer inherits from the
vec_ptype2()
methods betweentbl_df
anddata.frame
. This means
that you explicitly need to implementvec_ptype2()
methods with
tbl_df
anddata.frame
.This change requires a bit more work from class maintainers but is
safer because the coercion hierarchies are generally different from
class hierarchies. See the S3 dispatch section of?vec_ptype2
for
more information. -
vec_cast()
is now restricted to the same conversions as
vec_ptype2()
methods (#606, #741). This change is motivated by
safety and performance:-
It is generally sloppy to generically convert arbitrary inputs to
one type. Restricted coercions are more predictable and allow your
code to fail earlier when there is a type issue. -
When unrestricted conversions are useful, this is generally
towards a known type. For example,glue::glue()
needs to convert
arbitrary inputs to the known character type. In this case, using
double dispatch instead of a single dispatch generic like
as.character()
is wasteful. -
To implement the useful semantics of coercible casts (already used
invec_assign()
), two double dispatch were needed. Now it can be
done with one double dispatch by callingvec_cast()
directly.
-
-
stop_incompatible_cast()
now throws an error of class
vctrs_error_incompatible_type
rather thanvctrs_error_incompatible_cast
.
This means thatvec_cast()
also throws errors of this class, which better
aligns it withvec_ptype2()
now that they are restricted to the same
conversions. -
The
y
argument ofstop_incompatible_cast()
has been renamed toto
to
better matchto_arg
.
Type system
-
Double-dispatch methods for
vec_ptype2()
andvec_cast()
are now
easier to implement. They no longer need any the boiler plate.
Implementing a method for classesfoo
andbar
is now as simple as:#' @export vec_ptype2.foo.bar <- function(x, y, ...) new_foo()
vctrs also takes care of implementing the default and unspecified
methods. If you have implemented these methods, they are no longer
called and can now be removed.One consequence of the new dispatch mechanism is that
NextMethod()
is now completely unsupported. This is for the best as it never
worked correctly in a double-dispatch setting. Parent methods must
now be called manually. -
vec_ptype2()
methods now get zero-size prototypes as inputs. This
guarantees that methods do not peek at the data to determine the
richer type. -
vec_is_list()
no longer allows S3 lists that implement avec_proxy()
method to automatically be considered lists. A S3 list must explicitly
inherit from"list"
in the base class to be considered a list. -
vec_restore()
no longer restores row names if the target is not a
data frame. This fixes an issue wherePOSIXlt
objects would carry
arow.names
attribute after a proxy/restore roundtrip. -
vec_cast()
to and from data frames preserves the row names of
inputs. -
The internal function
vec_names()
now returns row names if the
input is a data frame. Similarly,vec_set_names()
sets row names
on data frames. This is part of a general effort at making row names
the vector names of data frames in vctrs.If necessary, the row names are repaired verbosely but without error
to make them unique. This should be a mostly harmless change for
users, but it could break unit tests in packages if they make
assumptions about the row names.
Compatibility and fallbacks
-
With the double dispatch changes, the coercion methods are no longer
inherited from parent classes. This is because the coercion
hierarchy is in principle different from the S3 hierarchy. A
consequence of this change is that subclasses that don't implement
coercion methods are now in principle incompatible.This is particularly problematic with subclasses of data frames for
which throwing incompatible errors would be too incovenient for
users. To work around this, we have implemented a fallback to the
relevant base data frame class (eitherdata.frame
ortbl_df
) in
coercion methods (#981). This fallback is silent unless you set the
vctrs:::warn_on_fallback
option toTRUE
.In the future we may extend this fallback principle to other base
types when they are explicitly included in the class vector (such as
"list"
). -
Improved support for foreign classes in the combining operations
vec_c()
,vec_rbind()
, andvec_unchop()
. A foreign class is a
class that doesn't implementvec_ptype2()
. When all the objects to
combine have the same foreign class, one of these fallbacks is invoked:-
If the class implements a
base::c()
method, the method is used
for the combination. (FIXME:vec_rbind()
currently doesn't use
this fallback.) -
Otherwise if the objects have identical attributes and the same
base type, we consider them to be compatible. The vectors are
concatenated and the attributes are restored (#776).
These fallbacks do not make your class completely compatible with
vctrs-powered packages, but they should help in many simple cases. -
-
vec_c()
andvec_unchop()
now fall back tobase::c()
for S4 objects if
the object doesn't implementvec_ptype2()
but sets an S4c()
method (#919).
Vector operations
-
vec_rbind()
andvec_c()
with data frame inputs now consistently
preserve the names of list-columns, df-columns, and matrix-columns
(#689). This can cause some false positives in unit tests, if they
are sensitive to internal names (#1007). -
vec_rbind()
now repairs row names silently to avoid confusing
messages when the row names are not informative and were not created
on purpose. -
vec_rbind()
gains option to treat input names as row names. This
is disabled by default (#966). -
New
vec_rep()
andvec_rep_each()
for repeating an entire vector
and elements of a vector, respectively. These two functions provide
a clearer interface for the functionality ofvec_repeat()
, which
is now deprecated. -
vec_cbind()
now callsvec_restore()
on inputs emptied of their
columns before computing the common type. This has
consequences for data frame classes with special columns that
devolve into simpler classes when the columns are subsetted
out. These classes are now always simplified byvec_cbind()
.For instance, column-binding a grouped data frame with a data frame
now produces a tibble (the simplified class of a grouped data
frame). -
vec_match()
andvec_in()
gain parameters for argument tags (#944). -
The internal version of
vec_assign()
now has support for assigning
names and inner names. For data frames, the names are assigned
recursively. -
vec_assign()
gainsx_arg
andvalue_arg
parameters (#918). -
vec_group_loc()
, which powersdplyr::group_by()
, now has more
efficient vector access (#911). -
vec_ptype()
gained anx_arg
argument. -
New
list_sizes()
for computing the size of every element in a list.
list_sizes()
is tovec_size()
aslengths()
is tolength()
, except
that it only supports lists. Atomic vectors and data frames result in an
error. -
new_data_frame()
infers size from row names whenn = NULL
(#894). -
vec_c()
now acceptsrlang::zap()
as.name_spec
input. The
returned vector is then always unnamed, and the names do not cause
errors when they can't be combined. They are still used to create
more informative messages when the inputs have incompatible types (#232).
Classes
-
vctrs now supports the
data.table
class. The common type of a data
frame and a data table is a data table. -
new_vctr()
now always appends a base"list"
class to list.data
to
be compatible with changes tovec_is_list()
. This affectsnew_list_of()
,
which now returns an object with a base class of"list"
. -
dplyr methods are now implemented for
vec_restore()
,
vec_ptype2()
, andvec_cast()
. The user-visible consequence (and
breaking change) is that row-binding a grouped data frame and a data
frame or tibble now returns a grouped data frame. It would
previously return a tibble. -
The
is.na<-()
method forvctrs_vctr
now supp...
vctrs 0.2.4
-
Factors and dates methods are now implemented in C for efficiency.
-
new_data_frame()
now correctly updates attributes and supports merging
of the"names"
and"row.names"
arguments (#883). -
vec_match()
gains anna_equal
argument (#718). -
vec_chop()
'sindices
argument has been restricted to positive integer
vectors. Character and logical subscripts haven't proven useful, and this
alignsvec_chop()
withvec_unchop()
, for which only positive integer
vectors make sense. -
New
vec_unchop()
for combining a list of vectors into a single vector. It
is similar tovec_c()
, but gives greater control over how the elements
are placed in the output through the use of a secondaryindices
argument. -
Breaking change: When
.id
is supplied,vec_rbind()
now creates
the identifier column at the start of the data frame rather than at
the end. -
numeric_version
andpackage_version
lists are now treated as
vectors (#723). -
vec_slice()
now properly handles symbols and S3 subscripts. -
vec_as_location()
andvec_as_subscript()
are now fully
implemented in C for efficiency. -
num_as_location()
gains a new argument,zero
, for controlling whether
to"remove"
,"ignore"
, or"error"
on zero values (#852).
vctrs 0.2.3
-
The main feature of this release is considerable performance
improvements with factors and dates. -
vec_c()
now falls back tobase::c()
if the vector doesn't
implementvec_ptype2()
but implementsc()
. This should improve
the compatibility of vctrs-based functions with foreign classes
(#801). -
new_data_frame()
is now faster. -
New
vec_is_list()
for detecting if a vector is a list in the vctrs sense.
For instance, objects of classlm
are not lists. In general, classes need
to explicitly inherit from"list"
to be considered as lists by vctrs. -
Unspecified vectors of
NA
can now be assigned into a list (#819).x <- list(1, 2) vec_slice(x, 1) <- NA x #> [[1]] #> NULL #> #> [[2]] #> 2
-
vec_ptype()
now errors on scalar inputs (#807). -
vec_ptype_finalise()
is now recursive over all data frame types, ensuring
that unspecified columns are correctly finalised to logical (#800). -
vec_ptype()
now correctly handles unspecified columns in data frames, and
will always return an unspecified column type (#800). -
vec_slice()
andvec_chop()
now work correctly withbit64::integer64()
objects when anNA
subscript is supplied. By extension, this means that
vec_init()
now works with these objects as well (#813). -
vec_rbind()
now binds row names. When named inputs are supplied
andnames_to
isNULL
, the names define row names. Ifnames_to
is supplied, they are assigned in the column name as before. -
vec_cbind()
now binds row names if they are congruent across
inputs. If the row names are not identical that's an error. -
The
c()
method forvctrs_vctr
now throws an error when
recursive
oruse.names
is supplied (#791).
vctrs 0.2.2
-
New
vec_as_subscript()
function to cast inputs to the base type
of a subscript (logical, numeric, or character).vec_as_index()
has been renamed tovec_as_location()
. Usenum_as_location()
if
you need more options to control how numeric subscripts are
converted to a vector of locations. -
New
vec_as_subscript2()
,vec_as_location2()
, and
num_as_location2()
variants for validating scalar subscripts and
locations (e.g. for indexing with[[
). -
vec_as_location()
now preserves names of its inputs if possible. -
vec_ptype2()
methods for base classes now prevent
inheritance. This makes sense because the subtyping graph created by
vec_ptype2()
methods is generally not the same as the inheritance
relationships defined by S3 classes. For instance, subclasses are
often a richer type than their superclasses, and should often be
declared as supertypes (e.g.vec_ptype2()
should return the
subclass).We introduced this breaking change in a patch release because
new_vctr()
now adds the base type to the class vector by default,
which causedvec_ptype2()
to dispatch erroneously to the methods
for base types. We'll finish switching to this approach in vctrs
0.3.0 for the rest of the base S3 classes (dates, data frames, ...). -
vec_equal_na()
now works with complex vectors. -
vctrs_vctr
class gains anas.POSIXlt()
method (#717). -
vec_is()
now ignores names and row names (#707). -
vec_slice()
now support Altvec vectors (@jimhester, #696). -
vec_proxy_equal()
is now applied recursively across the columns of
data frames (#641). -
vec_split()
no longer returns theval
column as alist_of
. It is now
returned as a bare list (#660). -
Complex numbers are now coercible with integer and double (#564).
-
zeallot has been moved from Imports to Suggests, meaning that
%<-%
is no
longer re-exported from vctrs. -
vec_equal()
no longer propagates missing values when comparing list
elements. This means thatvec_equal(list(NULL), list(NULL))
will continue to
returnNA
becauseNULL
is the missing element for a list, but now
vec_equal(list(NA), list(NA))
returnsTRUE
because theNA
values are
compared directly without checking for missingness. -
Lists of expressions are now supported in
vec_equal()
and functions that
compare elements, such asvec_unique()
andvec_match()
. This ensures that
they work with the result of modeling functions likeglm()
andmgcv::gam()
which store "family" objects containing expressions (#643). -
new_vctr()
gains an experimentalinherit_base_type
argument
which determines whether or not the class of the underlying type
will be included in the class. -
list_of()
now inherits explicitly from "list" (#593). -
vec_ptype()
has relaxed default behaviour for base types; now if two
vectors both inherit from (e.g.) "character", the common type is also
"character" (#497). -
vec_equal()
now correctly treatsNULL
as the missing value element for
lists (#653). -
vec_cast()
now casts data frames to lists rowwise, i.e. to a list of
data frames of size 1. This preserves the invariant of
vec_size(vec_cast(x, to)) == vec_size(x)
(#639). -
Positive and negative 0 are now considered equivalent by all functions that
check for equality or uniqueness (#637). -
New experimental functions
vec_group_rle()
for returning run
length encoded groups;vec_group_id()
for constructing group
identifiers from a vector;vec_group_loc()
for computing the
locations of unique groups in a vector (#514). -
New
vec_chop()
for repeatedly slicing a vector. It efficiently captures
the pattern ofmap(indices, vec_slice, x = x)
. -
Support for multiple character encodings has been added to functions that
compare elements within a single vector, such asvec_unique()
, and across
multiple vectors, such asvec_match()
. When multiple encodings are
encountered, a translation to UTF-8 is performed before any comparisons are
made (#600, #553). -
Equality and ordering methods are now implemented for raw and
complex vectors (@romainfrancois).
vctrs 0.2.1
Maintenance release for CRAN checks.
vctrs 0.2.0
With the 0.2.0 release, many vctrs functions have been rewritten with
native C code to improve performance. Functions like vec_c()
and
vec_rbind()
should now be fast enough to be used in packages. This
is an ongoing effort, for instance the handling of factors and dates
has not been rewritten yet. These classes still slow down vctrs
primitives.
The API in 0.2.0 has been updated, please see a list of breaking
changes below. vctrs has now graduated from experimental to a maturing
package (see the lifecycle of tidyverse packages).
Please note that API changes are still planned for future releases,
for instance vec_ptype2()
and vec_cast()
might need to return a
sentinel instead of failing with an error when there is no common type
or possible cast.
Breaking changes
-
Lossy casts now throw errors of type
vctrs_error_cast_lossy
.
Previously these were warnings. You can suppress these errors
selectively withallow_lossy_cast()
to get the partial cast
results. To implement your own lossy cast operation, call the new
exported functionmaybe_lossy_cast()
. -
vec_c()
now fails when an input is supplied with a name but has
internal names or is length > 1:vec_c(foo = c(a = 1)) #> Error: Can't merge the outer name `foo` with a named vector. #> Please supply a `.name_spec` specification. vec_c(foo = 1:3) #> Error: Can't merge the outer name `foo` with a vector of length > 1. #> Please supply a `.name_spec` specification.
You can supply a name specification that describes how to combine
the external name of the input with its internal names or positions:# Name spec as glue string: vec_c(foo = c(a = 1), .name_spec = "{outer}_{inner}") # Name spec as a function: vec_c(foo = c(a = 1), .name_spec = function(outer, inner) paste(outer, inner, sep = "_")) vec_c(foo = c(a = 1), .name_spec = ~ paste(.x, .y, sep = "_"))
-
vec_empty()
has been renamed tovec_is_empty()
. -
vec_dim()
andvec_dims()
are no longer exported. -
vec_na()
has been renamed tovec_init()
, as the primary use case
is to initialize an output container. -
vec_slice<-
is now type stable (#140). It always returns the same
type as the LHS. If needed, the RHS is cast to the correct type, but
only if both inputs are coercible. See examples in?vec_slice
. -
We have renamed the
type
particle toptype
:vec_type()
=>vec_ptype()
vec_type2()
=>vec_ptype2()
vec_type_common()
=>vec_ptype_common()
Consequently,
vec_ptype()
was renamed tovec_ptype_show()
.
New features
-
New
vec_proxy()
generic. This is the main customisation point in
vctrs along withvec_restore()
. You should only implement it when
your type is designed around a non-vector class (atomic vectors,
bare lists, data frames). In this case,vec_proxy()
should return
such a vector class. The vctrs operations will be applied on the
proxy andvec_restore()
is called to restore the original
representation of your type.The most common case where you need to implement
vec_proxy()
is
for S3 lists. In vctrs, S3 lists are treated as scalars by
default. This way we don't treat objects like model fits as
vectors. To prevent vctrs from treating your S3 list as a scalar,
unclass it from thevec_proxy()
method. For instance here is the
definition forlist_of
:#' @export vec_proxy.vctrs_list_of <- function(x) { unclass(x) }
If you inherit from
vctrs_vctr
orvctrs_rcrd
you don't need to
implementvec_proxy()
. -
vec_c()
,vec_rbind()
, andvec_cbind()
gain a.name_repair
argument (#227, #229). -
vec_c()
,vec_rbind()
,vec_cbind()
, and all functions relying
onvec_ptype_common()
now have more informative error messages
when some of the inputs have nested data frames that are not
convergent:df1 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = letters[1:3]))) df2 <- tibble(foo = tibble(bar = tibble(x = 1:3, y = 4:6))) vec_rbind(df1, df2) #> Error: No common type for `..1$foo$bar$y` <character> and `..2$foo$bar$y` <integer>.
-
vec_cbind()
now turns named data frames to packed columns.data <- tibble::tibble(x = 1:3, y = letters[1:3]) data <- vec_cbind(data, packed = data) data # A tibble: 3 x 3 x y packed$x $y <int> <chr> <int> <chr> 1 1 a 1 a 2 2 b 2 b 3 3 c 3 c
Packed data frames are nested in a single column. This makes it
possible to access it through a single name:data$packed # A tibble: 3 x 2 x y <int> <chr> 1 1 a 2 2 b 3 3 c
We are planning to use this syntax more widely in the tidyverse.
-
New
vec_is()
function to check whether a vector conforms to a
prototype and/or a size. Unlikevec_assert()
, it doesn't throw
errors but returnsTRUE
orFALSE
(#79).Called without a specific type or size,
vec_assert()
tests whether
an object is a data vector or a scalar. S3 lists are treated as
scalars by default. Implement avec_is_vector()
for your class to
override this property (or derive fromvctrs_vctr
). -
New
vec_order()
andvec_sort()
for ordering and sorting
generalised vectors. -
New
.names_to
parameter forvec_rbind()
. If supplied, this
should be the name of a column where the names of the inputs are
copied. This is similar to the.id
parameter of
dplyr::bind_rows()
. -
New
vec_seq_along()
andvec_init_along()
create useful sequences (#189). -
vec_slice()
now preserves character row names, if present. -
New
vec_split(x, by)
is a generalisation ofsplit()
that can divide
a vector into groups formed by the unique values of another vector. Returns
a two-column data frame containing unique values ofby
aligned with
matchingx
values (#196).
Other features and bug fixes
-
Using classed errors of class
"vctrs_error_assert"
for failed
assertions, and of class"vctrs_error_incompatible"
(with
subclasses_type
,_cast
and_op
) for errors on incompatible
types (#184). -
Character indexing is now only supported for named objects, an error
is raised for unnamed objects (#171). -
Predicate generics now consistently return logical vectors when
passed avctrs_vctr
class. They used to restore the output to
their input type (#251). -
list_of()
now has anas.character()
method. It uses
vec_ptype_abbr()
to collapse complex objects into their type
representation (tidyverse/tidyr#654). -
New
stop_incompatible_size()
to signal a failure due to mismatched sizes. -
New
validate_list_of()
(#193). -
vec_arith()
is consistent with base R when combiningdifftime
anddate
, with a warning if casts are lossy (#192). -
vec_c()
andvec_rbind()
now handle data.frame columns properly
(@yutannihilation, #182). -
vec_cast(x, data.frame())
preserves the number of rows inx
. -
vec_equal()
now handles missing values symmetrically (#204). -
vec_equal_na()
now returnsTRUE
for data frames and records when
every component is missing, not when any component is missing
(#201). -
vec_init()
checks input is a vector. -
vec_proxy_compare()
gains an experimentalrelax
argument, which
allows data frames to be orderable even if all their columns are not
(#210). -
vec_size()
now works with positive short row names. This fixes
issues with data frames created with jsonlite (#220). -
vec_slice<-
now has avec_assign()
alias. Usevec_assign()
when you don't want to modify the original input. -
vec_slice()
now callsvec_restore()
automatically. Unlike the
default[
method from base R, attributes are preserved by default. -
vec_slice()
can correct slice 0-row data frames (#179). -
New
vec_repeat()
for repeating each element of a vector the same number
of times. -
vec_type2(x, data.frame())
ensures that the returned object has
names that are a length-0 character vector.
vctrs 0.1.0
v0.1.0 Fix obvious description issue