Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
biojppm committed Apr 3, 2024
1 parent d6e0b5d commit 795910a
Show file tree
Hide file tree
Showing 11 changed files with 796 additions and 146 deletions.
14 changes: 13 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ c4_project(VERSION 0.5.0 STANDALONE

option(RYML_WITH_TAB_TOKENS "Enable parsing of tabs after ':' and '-'. This is costly and disabled by default." OFF)
option(RYML_DEFAULT_CALLBACKS "Enable ryml's default implementation of callbacks: allocate(), free(), error()" ON)
option(RYML_BUILD_TOOLS "build tools" OFF)
if(RYML_DEFAULT_CALLBACKS)
option(RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS "Throw exceptions instead of calling abort in the default error handler provided by ryml" OFF)
endif()
option(RYML_USE_ASSERT "Enable assertions regardless of build type. Default is only when NDEBUG is not defined (which is in release builds). This causes a slowdown of the code." OFF)
option(RYML_BUILD_TOOLS "Build tools" OFF)
option(RYML_BUILD_API "Enable API generation (python, etc)" OFF)
option(RYML_DBG "Enable (very verbose) ryml debug prints." OFF)

Expand Down Expand Up @@ -66,12 +70,20 @@ endif()

if(NOT RYML_DEFAULT_CALLBACKS)
target_compile_definitions(ryml PRIVATE RYML_NO_DEFAULT_CALLBACKS)
else()
if(RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS)
target_compile_definitions(ryml PRIVATE RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS)
endif()
endif()

if(RYML_DBG)
target_compile_definitions(ryml PRIVATE RYML_DBG)
endif()

if(RYML_USE_ASSERT)
target_compile_definitions(ryml PUBLIC RYML_USE_ASSERT=1)
endif()


#-------------------------------------------------------

Expand Down
37 changes: 36 additions & 1 deletion changelog/current.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,42 @@

### Fixes

- Fix [#390](https://github.com/biojppm/rapidyaml/pull/390) - `csubstr::first_real_span()` failed on number with one digit in the exponent.
- Fix major error handling problem reported in [#389](https://github.com/biojppm/rapidyaml/issues/389) ([PR#411](https://github.com/biojppm/rapidyaml/pull/411)):
- The `NodeRef` and `ConstNodeRef` classes had many methods marked `noexcept` that were doing assertions which could throw exceptions, causing an abort instead of a throw whenever the assertion called an exception-throwing error callback.
- Also, this problem was compounded by exceptions being enabled in every build type -- despite the intention to have them only in debug builds. There was a problem in the preprocessor code to enable assertions which led to assertions being enabled in release builds even when `RYML_USE_ASSERT` was defined to 0.
- Although the code is and was extensively tested, the testing was addressing mostly the happy path. In the fix, I added tests to ensure that the error behavior is as intended.
- Together with this changeset, a major revision was carried out of the asserting/checking status of each function in the node classes. In most cases, assertions were added to cases that were missing them. So *beware** - user code that was invalid will now assert or error out.
- A new method was added to the node class:
```
/** Get a child by name, with error checking; complexity is
* O(num_children).
*
* Behaves as operator[](csubstr) const, but always raises an
* error (even when RYML_USE_ASSERT is set to false) when the
* returned node does not exist, or when this node is not
* readable, or when it is not a map. This behaviour is similar to
* std::vector::at(), but the error consists in calling the error
* callback instead of directly raising an exception. */
ConstNodeRef ConstNodeRef::at(csubstr key) const;
ConstNodeRef NodeRef::at(csubstr key) const;
/** Get a child by position, with error checking; complexity is
* O(pos).
*
* Behaves as operator[](size_t) const, but always raises an error
* (even when RYML_USE_ASSERT is set to false) when the returned
* node does not exist, or when this node is not readable, or when
* it is not a map. This behaviour is similar to
* std::vector::at(), but the error consists in calling the error
* callback instead of directly raising an exception. */
ConstNodeRef ConstNodeRef::at(size_t pos) const;
ConstNodeRef NodeRef::at(size_t pos) const;
```
- Also, `RYML_DEBUG_BREAK()` is now enabled only if `RYML_DBG` is defined, as reported in [#362](https://github.com/biojppm/rapidyaml/issues/362).
- Added cmake options to control error handling:
- `RYML_USE_ASSERT` - defines the same macro, which will enable assertions regardless of build type. This is disabled by default.
- `RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS` - defines the same macro, which will make the default error handler provided by ryml throw exceptions instead of calling `std::abort()`. This is disabled by default.
- Fix [#390](https://github.com/biojppm/rapidyaml/pull/390) - `csubstr::first_real_span()` failed on scientific numbers with one digit in the exponent.
- Fix [#361](https://github.com/biojppm/rapidyaml/pull/361) - parse error on map scalars containing `:` and starting on the next line:
```yaml
---
Expand Down
135 changes: 95 additions & 40 deletions samples/quickstart.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -165,21 +165,31 @@ struct CheckPredicate {
/** a brief tour over most features */
void sample_quick_overview()
{
// Parse YAML code in place, potentially mutating the buffer.
// It is also possible to:
// - parse a read-only buffer using parse_in_arena()
// - reuse an existing tree (advised)
// - reuse an existing parser (advised)
// Parse YAML code in place, potentially mutating the buffer:
char yml_buf[] = "{foo: 1, bar: [2, 3], john: doe}";
ryml::Tree tree = ryml::parse_in_place(yml_buf);

// Note: it will always be significantly faster to use mutable
// buffers and reuse tree+parser.
// The resulting tree contains only views to the parsed string. If
// the string was parsed in place, then the string must outlive
// the tree! This works in this case because `yml_buf` and `tree`
// live on the same scope, so have the same lifetime.

// It is also possible to:
//
// - parse a read-only buffer using parse_in_arena(). This
// copies the YAML buffer to the tree's arena, and spares the
// headache of the string's lifetime.
//
// - reuse an existing tree (advised)
//
// - reuse an existing parser (advised)
//
// Note: it will always be significantly faster to parse in place
// and reuse tree+parser.
//
// Below you will find samples that show how to achieve reuse; but
// please note that for brevity and clarity, many of the examples
// here are parsing immutable buffers, and not reusing tree or
// parser.
// here are parsing in the arena, and not reusing tree or parser.


//------------------------------------------------------------------
Expand All @@ -197,10 +207,11 @@ void sample_quick_overview()

// The node API is a lightweight abstraction sitting on top of the
// index API, but offering a much more convenient interaction:
ryml::ConstNodeRef root = tree.rootref();
ryml::ConstNodeRef root = tree.rootref(); // a const node reference
ryml::ConstNodeRef bar = tree["bar"];
CHECK(root.is_map());
CHECK(bar.is_seq());

// A node ref is a lightweight handle to the tree and associated id:
CHECK(root.tree() == &tree); // a node ref points at its tree, WITHOUT refcount
CHECK(root.id() == root_id); // a node ref's id is the index of the node
Expand All @@ -209,36 +220,35 @@ void sample_quick_overview()
// The node API translates very cleanly to the index API, so most
// of the code examples below are using the node API.

// One significant point of the node API is that it holds a raw
// pointer to the tree. Care must be taken to ensure the lifetimes
// match, so that a node will never access the tree after the tree
// went out of scope.
// WARNING. A node ref holds a raw pointer to the tree. Care must
// be taken to ensure the lifetimes match, so that a node will
// never access the tree after the goes out of scope.


//------------------------------------------------------------------
// To read the parsed tree

// ConstNodeRef::operator[] does a lookup, is O(num_children[node]).
CHECK(tree["foo"].is_keyval());
CHECK(tree["foo"].key() == "foo");
CHECK(tree["foo"].val() == "1");
CHECK(tree["foo"].val() == "1"); // get the val of a node (must be leaf node, otherwise it is a container and has no val)
CHECK(tree["foo"].key() == "foo"); // get the key of a node (must be child of a map, otherwise it has no key)
CHECK(tree["bar"].is_seq());
CHECK(tree["bar"].has_key());
CHECK(tree["bar"].key() == "bar");
// maps use string keys, seqs use integral keys:
// maps use string keys, seqs use index keys:
CHECK(tree["bar"][0].val() == "2");
CHECK(tree["bar"][1].val() == "3");
CHECK(tree["john"].val() == "doe");
// An integral key is the position of the child within its parent,
// An index key is the position of the child within its parent,
// so even maps can also use int keys, if the key position is
// known.
CHECK(tree[0].id() == tree["foo"].id());
CHECK(tree[1].id() == tree["bar"].id());
CHECK(tree[2].id() == tree["john"].id());
// Tree::operator[](int) searches a ***root*** child by its position.
CHECK(tree[0].id() == tree["foo"].id()); // 0: first child of root
CHECK(tree[1].id() == tree["bar"].id()); // 1: first child of root
CHECK(tree[2].id() == tree["john"].id()); // 2: first child of root
CHECK(tree[1].id() == tree["bar"].id()); // 1: second child of root
CHECK(tree[2].id() == tree["john"].id()); // 2: third child of root
// NodeRef::operator[](int) searches a ***node*** child by its position:
CHECK(bar[0].val() == "2"); // 0 means first child of bar
CHECK(bar[1].val() == "3"); // 1 means second child of bar
Expand All @@ -257,11 +267,12 @@ void sample_quick_overview()
CHECK(root["bar"].id() == root[1].id());
CHECK(root["john"].id() == root[2].id());

// IMPORTANT. The ryml tree uses indexed linked lists for storing
// children, so the complexity of `Tree::operator[csubstr]` and
// `Tree::operator[size_t]` is linear on the number of root
// children. If you use `Tree::operator[]` with a large tree where
// the root has many children, you will see a performance hit.
// IMPORTANT. The ryml tree uses an index-based linked list for
// storing children, so the complexity of
// `Tree::operator[csubstr]` and `Tree::operator[size_t]` is O(n),
// linear on the number of root children. If you use
// `Tree::operator[]` with a large tree where the root has many
// children, you will see a performance hit.
//
// To avoid this hit, you can create your own accelerator
// structure. For example, before doing a lookup, do a single
Expand All @@ -281,6 +292,7 @@ void sample_quick_overview()
// depending on the data, that number may be very different from
// one to another.


//------------------------------------------------------------------
// Hierarchy:

Expand Down Expand Up @@ -340,12 +352,48 @@ void sample_quick_overview()

//------------------------------------------------------------------
// Gotchas:
CHECK(!tree["bar"].has_val()); // seq is a container, so no val
CHECK(!tree["bar"][0].has_key()); // belongs to a seq, so no key
CHECK(!tree["bar"][1].has_key()); // belongs to a seq, so no key
//CHECK(tree["bar"].val() == BOOM!); // ... so attempting to get a val is undefined behavior
//CHECK(tree["bar"][0].key() == BOOM!); // ... so attempting to get a key is undefined behavior
//CHECK(tree["bar"][1].key() == BOOM!); // ... so attempting to get a key is undefined behavior

// ryml uses assertions to prevent you from trying to obtain
// things that do not exist. For example:

{
ryml::ConstNodeRef seq_node = tree["bar"];
ryml::ConstNodeRef val_node = seq_node[0];

CHECK(seq_node.is_seq()); // seq is a container
CHECK(!seq_node.has_val()); // ... so it has no val
//CHECK(seq_node.val() == BOOM!); // ... so attempting to get a val is undefined behavior

CHECK(val_node.parent() == seq_node); // belongs to a seq
CHECK(!val_node.has_key()); // ... so it has no key
//CHECK(val_node.key() == BOOM!); // ... so attempting to get a key is undefined behavior

CHECK(val_node.is_val()); // this node is a val
//CHECK(val_node.first_child() == BOOM!); // ... so attempting to get a child is undefined behavior

// assertions are also present in methods that /may/ read the val:
CHECK(seq_node.is_seq()); // seq is a container
//CHECK(seq_node.val_is_null() BOOM!); // so cannot get the val to check
}


// By default, assertions are enabled unless the NDEBUG macro is
// defined (which happens in release builds).
//
// This adheres to the pay-only-for-what-you-use philosophy: if
// you are sure that your intent is correct, why would you need to
// pay the runtime cost for the assertions?
//
// The downside, of course, is that when you are not sure, release
// builds may be doing something crazy.
//
// So you can override this behavior and enable/disable
// assertions, by defining the macro RYML_USE_ASSERT to a proper
// value (see c4/yml/common.hpp).
//
// Also, to be clear, this does not apply to parse errors
// happening when the YAML is parsed. Checking for these errors is
// always enabled and cannot be switched off.


//------------------------------------------------------------------
Expand All @@ -368,7 +416,7 @@ void sample_quick_overview()


//------------------------------------------------------------------
// Modifying existing nodes: operator<< vs operator=
// Modifying existing nodes: operator= vs operator<<

// As implied by its name, ConstNodeRef is a reference to a const
// node. It can be used to read from the node, but not write to it
Expand All @@ -377,20 +425,21 @@ void sample_quick_overview()
ryml::NodeRef wroot = tree.rootref();

// operator= assigns an existing string to the receiving node.
// This pointer will be in effect until the tree goes out of scope
// so beware to only assign from strings outliving the tree.
// The contents are NOT copied, and this pointer will be in effect
// until the tree goes out of scope! So beware to only assign from
// strings outliving the tree.
wroot["foo"] = "says you";
wroot["bar"][0] = "-2";
wroot["bar"][1] = "-3";
wroot["john"] = "ron";
// Now the tree is _pointing_ at the memory of the strings above.
// That is OK because those are static strings and will outlive
// the tree.
// In this case it is OK because those are static strings and will
// outlive the tree.
CHECK(root["foo"].val() == "says you");
CHECK(root["bar"][0].val() == "-2");
CHECK(root["bar"][1].val() == "-3");
CHECK(root["john"].val() == "ron");
// WATCHOUT: do not assign from temporary objects:
// But WATCHOUT: do not assign from temporary objects:
// {
// std::string crash("will dangle");
// root["john"] = ryml::to_csubstr(crash);
Expand Down Expand Up @@ -3929,6 +3978,10 @@ d: 3
// However, it is important to note that the error callback must never
// return to the caller! Otherwise, an infinite loop or program crash
// may occur.
//
// The default error handler calls std::abort(). You can use the cmake
// option and the macro RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS to have
// the default error handler throw a std::runtime_error instead.


struct ErrorHandlerExample
Expand Down Expand Up @@ -3989,11 +4042,13 @@ void sample_error_handler()
// crash.
ryml::set_callbacks(errh.callbacks());
errh.check_effect(/*committed*/true);
errh.check_error_occurs([]{
bool had_parse_error = true;
errh.check_error_occurs([&had_parse_error]{
had_parse_error = true;
ryml::Tree tree = ryml::parse_in_arena("errorhandler.yml", "[a: b\n}");
std::cout << tree;
had_parse_error = false; // this line is not executed

Check warning on line 4049 in samples/quickstart.cpp

View check run for this annotation

Codecov / codecov/patch

samples/quickstart.cpp#L4049

Added line #L4049 was not covered by tests
});

CHECK(had_parse_error);
ryml::set_callbacks(errh.defaults); // restore defaults.
errh.check_effect(/*committed*/false);
}
Expand Down
7 changes: 7 additions & 0 deletions src/c4/yml/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
#ifndef RYML_NO_DEFAULT_CALLBACKS
# include <stdlib.h>
# include <stdio.h>
# ifdef RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS
# include <stdexcept>
# endif
#endif // RYML_NO_DEFAULT_CALLBACKS

namespace c4 {
Expand Down Expand Up @@ -39,7 +42,11 @@ void report_error_impl(const char* msg, size_t length, Location loc, FILE *f)
void error_impl(const char* msg, size_t length, Location loc, void * /*user_data*/)
{
report_error_impl(msg, length, loc, nullptr);
#ifdef RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS
throw std::runtime_error(std::string(msg, length));
#else
::abort();
#endif
}

void* allocate_impl(size_t length, void * /*hint*/, void * /*user_data*/)
Expand Down
8 changes: 5 additions & 3 deletions src/c4/yml/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@


#ifndef RYML_USE_ASSERT
/** define this macro with a boolean value to force enable/disable
assertions. This causes a slowdown of the code.*/
# define RYML_USE_ASSERT C4_USE_ASSERT
#endif

Expand All @@ -24,16 +26,16 @@
#endif


#if defined(NDEBUG) || defined(C4_NO_DEBUG_BREAK) || (!defined(RYML_DBG))
# define RYML_DEBUG_BREAK()
#else
#if defined(RYML_DBG) && !defined(NDEBUG) && !defined(C4_NO_DEBUG_BREAK)
# define RYML_DEBUG_BREAK() \
{ \
if(c4::get_error_flags() & c4::ON_ERROR_DEBUGBREAK) \
{ \
C4_DEBUG_BREAK(); \
} \
}
#else
# define RYML_DEBUG_BREAK()
#endif


Expand Down
Loading

0 comments on commit 795910a

Please sign in to comment.