CCCL 2.5.0
What's New
This release includes several notable improvements and new features:
- CUB device-level algorithms now support NVTX ranges in Nsight Systems. This integration makes it easier to identify and analyze the time spent in CUB algorithms. Please note that profiling with this feature requires at least C++14.
- We have added new cub::DeviceSelect::FlaggedIf API, which allows you to select items based on applying a predicate to flags. This addition provides more flexibility and control over item selection.
What's Changed
- Clean up libcu++ docs landing page by @jrhemstad in #1492
- PTX: Add
cuda::ptx::elect_sync
by @ahendriksen in #1537 - Print a summary of all tests sorted by execution time. by @alliepiper in #1539
- Fix unused variable warning for
__can_use_complete_tx
by @wmaxey in #1547 - Fix usage of naked array with 0 elements in sm90 barrier tests. by @wmaxey in #1546
- Add support for stream operators for complex by @miscco in #1538
- Fix
__half
for older architectures by @miscco in #1543 - Feat 565 remove redundant thrust dialect conditional by @ZelboK in #566
- fix missing device hint in WarpMergeSort Documentation by @MARD1NO in #1553
- Minor fixes and additions on cub developer guides by @gonidelis in #1559
- Consolidate handling of
constexpr
andif constexpr
by @miscco in #1562 - Ensure that
cuda::aligned_size_t
is usable in a constexpr context by @miscco in #1564 - Group CUB docs by @gevtushenko in #1565
- Update toolkit to 12.4 by @miscco in #1554
- Work around change in cuTensorMapEncode by @miscco in #1567
- Remove stdlib arg from .clangd. by @alliepiper in #1569
- Add the DeviceSelect::FlaggedIf algorithm by @gonidelis in #1533
- Catch2 segmented sort by @alliepiper in #1484
- Do not emit diagnostic with extended device lambdas with preserved re… by @Revaj in #1495
- Use absolute includes for libcu++ by @miscco in #1560
- [NFC] Modularize
<exception>
by @miscco in #199 - Add test support for launching kernels with cluster size > 1 by @ahendriksen in #416
- Fix typo in README.md by @bprb in #1574
- [FEA]: Modularize
<cuda/memory_resource>
by @miscco in #1532 - Cleanup_complex by @miscco in #1555
- Add missing comma in barrier
__try_wait
by @miscco in #1593 - Segmented sort test fix by @alliepiper in #1591
- Add pre-commit configuration by @bdice in #1596
- Preserve
.devcontainer/img/
when cleaning. by @alliepiper in #1604 - Add some documentation for recent additions to libcu++ by @miscco in #1594
- Ensure
cuda::std::nullopt
is visible in device code by @trxcllnt in #1598 - Fix ordering of
alignas
and__shared__
by @miscco in #1601 - Update Thrust CI tests. by @alliepiper in #1605
- Implement tuple interface for cuda vector types by @miscco in #1410
- Inspect PR changes to determine if subproject builds are needed. by @alliepiper in #1572
- Apply clang-format to cub by @bdice in #1602
- Add missing non-volatile atomic overloads. by @wmaxey in #1582
- Drop unused libcxx files by @miscco in #1606
- Apply formatting to libcudacxx by @miscco in #1610
- Add conda documentation to the README. by @bdice in #1581
- Allow jobs to be skipped. by @alliepiper in #1611
- Make libcu++ work with exceptions by @miscco in #1607
- Implement
cuda::mr::cuda_memory_resource
by @miscco in #1578 - Implement
cuda::mr::managed_memory_resource
by @miscco in #1579 - Apply formatting to thrust by @miscco in #1616
- Update example_device_radix_sort.cu by @eriktedhamre in #1608
- Implement
cuda::mr::pinned_memory_resource
by @miscco in #1580 - Set the devcontainers to format on save. by @miscco in #1624
- Enable internal use of
std::allocator
related functionality by @miscco in #1583 - Adds tests for large number of items for
cub::DeviceSelect
by @elstehle in #1612 - Add pre-commit docs to CONTRIBUTING.md. by @bdice in #1627
- Move visibility attributes to cccl by @miscco in #1595
- Work around thrust/memory.h circular include by @dkolsen-pgi in #1634
- Fix mbarrier.init addressing by @ahendriksen in #1636
- Trim trailing whitespace and normalize newlines. by @bdice in #1633
- Add a
git-blame-ignore-revs
file by @miscco in #1629 - Revert "PTX: Add
cuda::ptx::elect_sync
(#1537)" by @ahendriksen in #1638 - Address potential oob in cub when passing in an invalid device counter by @miscco in #1641
- Allow ninja_summary to fail by @jrhemstad in #1644
- Mostly flatten the folder structure of libcu++ by @miscco in #1630
- Make
--cmake-options=""
always override others. by @alliepiper in #1648 - Fix invalid
_CCCL_CUDACC
definition for clang cuda by @miscco in #1656 - Add missing #pragma once in some headers by @bernhardmgruber in #1668
- Add NVTX ranges for all CUB algorithms by @bernhardmgruber in #1657
- Implement LWG-3843 and LWG-3940 by @miscco in #1621
- Modularize
<memory>
by @miscco in #1639 - Expose
<cuda/std/numeric>
to be publicly available by @miscco in #1671 - Add nsight support for automated debugging by @gonidelis in #1660
- Format core headers by @miscco in #1670
- Guard
resource_ref
and friends behind feature flag by @miscco in #1675 - Create major version 2.5.0 by @wmaxey in #1677
- Install CUB headers with .hpp extension by @bernhardmgruber in #1687
- Update CMakePresets.json by @alliepiper in #1686
- Fix deprecated status by @gevtushenko in #1692
- Test combined internal/user-side use of NVTX by @bernhardmgruber in #1690
- CI Overhaul, new nightly workflow by @alliepiper in #1654
- Fix CMake option handling. by @alliepiper in #1698
- Fix issues that came up with building cuDF with main by @miscco in #1643
- Drop new properties until we are certain about the design by @miscco in #1681
- Remove more uses of
__cuda_std__
by @miscco in #1669 - Fix usage of
result_of
in thrust by @miscco in #1705 - Fix thrust::optional<T&>::emplace() by @Snektron in #1707
- Remove old f(void) function signatures by @bernhardmgruber in #1708
- Fix code sample in README and docs by @pauleonix in #1652
- Format libcudacxx/include files without extensions by @bdice in #1676
- Several improvements to zip_iterator/zip_function by @bernhardmgruber in #1710
- Expose thrust's contiguous iterator unwrap helpers by @bernhardmgruber in #1717
- Fix flakey heterogeneous tests by @wmaxey in #1712
- Ensure that we can use
cuda::std::optional
with types that are not__host__ __device__
by @miscco in #1663 - Fix a typo in barrier docs and update the godbolt link by @PointKernel in #1718
- Massively improve test times in heterogeneous atomics tests by @wmaxey in #1719
- Consolidate more common functionality by @miscco in #1716
- Increase timeout for the libcu++ test runs by @miscco in #1720
- Fix nightly CI: H100 runners are not in a testing pool. by @alliepiper in #1723
- Add a new CUDA Next library and a first entry in it with hierarchy_dimensions type template by @pciolkosz in #1485
- Atomics backend refactor by @wmaxey in #1631
- Const-qualify
half_t::operator+/*
by @bernhardmgruber in #1726 - Reenable previously failing histogram test for icc by @bernhardmgruber in #1725
- Enable testing for the other half of the heterogeneous managed memory tests on MSVC. by @wmaxey in #1729
- PTX: mark cp_async_bulk*_multicast functions sm_90a by @ahendriksen in #1734
- Improve libcu++ documentation a bit more by @miscco in #1732
- Make atomic_ref ctor constexpr. again. by @wmaxey in #1737
- Various and sundry fixes for Thrust's CPP backends. by @alliepiper in #1722
- Avoid ABI issues due to MSVC EBCO issues by @miscco in #1739
- Drop unused header from ptx by @miscco in #1740
- Allow an
override
matrix to reduce CI workload. by @alliepiper in #1701 - Fix docs generation by @miscco in #1741
- Add docs instructions on how to utilize CMake Presets by @gonidelis in #1694
- Ensure that {cr}begin works with types that pull in namespace std via ADL by @miscco in #1685
- Merge prep jobs for verify-devcontainers CI. by @alliepiper in #1754
- Fix typo in ci docs. by @alliepiper in #1756
- Add runtime + sccache info to CI comment by @alliepiper in #1744
- Add section about SSH signing keys to developer docs. by @alliepiper in #1755
- Add sm100 support to <nv/target> for NVCC by @wmaxey in #1745
- Fix_duplicate_job_checks by @alliepiper in #1759
- Const-qualify histogram pointer input parameters by @bernhardmgruber in #1762
- Return demangled name in
c2h::type_name
by @bernhardmgruber in #1773 - Simplify argument forwarding in CUB histogram entry-points by @bernhardmgruber in #1776
- Add guard against half support by @miscco in #1735
- Refactor CUB test launch helpers by @bernhardmgruber in #1770
- Replace
cub::ArrayWrapper
bycuda::std::array
and deprecate it by @bernhardmgruber in #1764 - Fix missing qualification of
pow
in two instances by @miscco in #1784 - Add mechanism to split project tests into parallel jobs. by @alliepiper in #1696
- Fix
__half
conversion to float in histogram by @miscco in #1785 - Implement P3029R1: deduction from
integral_constant
by @miscco in #1786 - Revert to showing skipped jobs to WAR GHA bug. by @alliepiper in #1794
- Port to Catch2 and rework device histogram test by @bernhardmgruber in #1695
- Add gcc13, clang17, clang18 to CI by @jrhemstad in #1757
- Drop more of thrust type traits by @miscco in #1721
- Show workflow walltime, job max time in CI comment. by @alliepiper in #1795
- Fix span for non-ranges by @miscco in #1840
- Drop all internal implementations of exceptions (#1806) by @miscco in #1839
- Backport atomic regression fix #1801 by @wmaxey in #1833
- [BACKPORT] Symbol visibility is now invariant in regards to
__cuda_std__
definition (#1832) by @miscco in #1864
New Contributors
- @MARD1NO made their first contribution in #1553
- @Revaj made their first contribution in #1495
- @bprb made their first contribution in #1574
- @eriktedhamre made their first contribution in #1608
- @Snektron made their first contribution in #1707
Full Changelog: v2.4.0...v2.5.0