Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add yaml parsing & emitting support #707

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e06286d
kphp-yaml v0.1: parse, emit, parse_file, emit_file, c++ tests
Dec 18, 2022
6cd1ccf
khhp-yaml: codestyle fixes & memory optimizations
Dec 23, 2022
da3ee16
yaml: add empty string tests and number-as-string tests
Feb 26, 2023
4e9cc2e
yaml: fix number-as-string parsing and rewrite mixed_to_yaml() function
Feb 26, 2023
a4422d8
kphp-yaml: fix boolean emitting & parsing
Feb 26, 2023
d7322b5
kphp-yaml: add PHP test
Feb 27, 2023
48febd9
kphp-yaml: add yaml.so extension to kphp_run_once.py for PHP tests to…
Feb 27, 2023
51771d2
kphp yaml: return an accidentally deleted line (tests/python/lib/kphp…
Feb 27, 2023
66131a7
kphp-yaml: temporarily deleted changes in _functions.txt
Feb 28, 2023
dc33da4
Merge branch 'VKCOM:master' into master
NikOsint Feb 28, 2023
3c1d0dd
kphp-yaml: bring back changes in _functions.txt [conflict resolved]
Feb 28, 2023
c456682
yaml: delete obvious comments in runtime/yaml.h
Mar 11, 2023
a5adf3f
yaml: support for empty arrays, new comments, codestyle fixes
Mar 11, 2023
47d382a
yaml: add null and empty array C++ tests
Mar 11, 2023
4d88e37
yaml: add PHP test for docs/_config.yml file
Mar 11, 2023
a2ac35d
yaml: add comma in tests/python/lib/kphp_run_once.py:44
Mar 11, 2023
9f157d9
Merge branch 'VKCOM:master' into master
NikOsint Apr 21, 2023
252cb58
yaml: minor codestyle fixes according to @unserialize review
Apr 23, 2023
61eeb80
Merge branch 'VKCOM:master' into master
NikOsint May 22, 2023
17b3d3e
YAML: implement special characters escaping
May 22, 2023
d1f58a4
Merge branch 'VKCOM:master' into master
NikOsint May 28, 2023
3dcf68a
YAML: add PHP tests for symbols that should be escaped
May 29, 2023
c0f6174
Merge branch 'VKCOM:master' into master
NikOsint Jun 11, 2023
562f7f6
Merge branch 'VKCOM:master' into master
NikOsint Sep 17, 2023
343a9c9
yaml: add a linebreak in test 03
Sep 19, 2023
4d71001
Merge branch 'master' of github.com:NikOsint/kphp
Sep 19, 2023
ecf7065
yaml: make yaml an extension, turning on by CMake option `-DYAML=ON
Oct 1, 2023
1f511fb
Merge branch 'VKCOM:master' into master
NikOsint Oct 1, 2023
361adda
yaml: minor changes
Oct 7, 2023
1ed14d1
Merge branch 'VKCOM:master' into master
NikOsint Oct 7, 2023
29797e4
yaml: move cpp tests to common dir, delete `extensions-tests` dir
Oct 12, 2023
0efca0a
Merge branch 'VKCOM:master' into master
NikOsint Oct 12, 2023
e25efd2
Merge branch 'VKCOM:master' into master
NikOsint Oct 22, 2023
12b31da
yaml: fix issues after @unserialize review, mostly in implementation …
Oct 22, 2023
97474f6
yaml: little codestyle fixes
Oct 29, 2023
f320626
Merge branch 'VKCOM:master' into master
NikOsint Nov 14, 2023
c07c1aa
yaml: add test for strings in single quotes
Nov 14, 2023
a842143
Merge branch 'VKCOM:master' into master
NikOsint Dec 19, 2023
5b3d285
Merge branch 'VKCOM:master' into master
NikOsint Dec 26, 2023
3bf24b8
Merge branch 'VKCOM:master' into master
NikOsint Jan 16, 2024
a8117ec
Merge branch 'VKCOM:master' into master
NikOsint May 2, 2024
1606199
Merge branch 'VKCOM:master' into master
NikOsint May 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions builtin-functions/_functions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1661,3 +1661,8 @@ class DateTimeImmutable implements DateTimeInterface {
}

function getenv(string $varname = '', bool $local_only = false): mixed;

function yaml_emit_file ($filename ::: string, $data ::: mixed) ::: bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string $filename, mixed $data) and others, no :::, just type hints. Same for return — just ':' after ')'

function yaml_emit ($data ::: mixed) ::: string;
function yaml_parse_file ($filename ::: string, $pos ::: int = 0) ::: mixed;
function yaml_parse ($data ::: string, $pos ::: int = 0) ::: mixed;
1 change: 1 addition & 0 deletions runtime/runtime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ prepend(KPHP_RUNTIME_SOURCES ${BASE_DIR}/runtime/
vkext.cpp
vkext_stats.cpp
ffi.cpp
yaml.cpp
zlib.cpp
zstd.cpp)

Expand Down
192 changes: 192 additions & 0 deletions runtime/yaml.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
#include <yaml-cpp/yaml.h>

#include "runtime/optional.h"
#include "runtime/streams.h"
#include "runtime/critical_section.h"
#include "runtime/yaml.h"

/*
* convert YAML::Node to mixed after parsing a YAML document into YAML::Node
*/
static void yaml_node_to_mixed(const YAML::Node &node, mixed &data, const string &source) noexcept {
data.clear(); // sets data to NULL
if (node.IsScalar()) {
const string string_data(node.as<std::string>().c_str());
// check whether the primitive is put in quotes in the source YAML
const bool string_data_has_quotes = (source[node.Mark().pos] == '"' && source[node.Mark().pos + string_data.size() + 1] == '"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about single quotes? I suppose, they also mean a string in yaml

// if so, it is a string
if (string_data_has_quotes) {
data = string_data;
} else if (string_data == string("true")) {
data = true; // "true" without quotes is boolean(1)
} else if (string_data == string("false")) {
data = false; // "false" without quotes is boolean(0)
} else if (string_data.is_int()) {
data = string_data.to_int();
} else {
double float_data = 0.0;
if (string_data.try_to_float(&float_data)) {
data = float_data;
} else {
data = string_data;
}
}
} else if (node.size() == 0 && node.IsDefined() && !node.IsNull()) {
// if node is defined, is not null or scalar and has size 0, then it is an empty array
array<mixed> empty_array;
data = empty_array;
} else if (node.IsSequence()) {
for (auto it = node.begin(); it != node.end(); ++it) {
mixed data_piece;
yaml_node_to_mixed(*it, data_piece, source);
data.push_back(data_piece);
}
} else if (node.IsMap()) {
for (const auto &it : node) {
mixed data_piece;
yaml_node_to_mixed(it.second, data_piece, source);
data.set_value(string(it.first.as<std::string>().c_str()), data_piece);
}
}
// else node is Null or Undefined, so data is Null
}

/*
* print tabs in quantity of nesting_level (used to print nested YAML entries)
*/
static string yaml_print_tabs(const uint8_t nesting_level) noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use uint8_t for a number? There is no reason not to use just int unless memory alignment is a case you care for

Copy link
Author

@NikOsint NikOsint Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I thought it is unrealistic to exceed 255 levels of nesting in a YAML document, so I decided to limit memory for variable a little bit
and uint8_t is safer in terms of sign

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to int

return string(2 * nesting_level, ' ');
}

/*
* print the key of a YAML map entry
*/
static string yaml_print_key(const mixed &data_key) noexcept {
if (data_key.is_string()) {
return data_key.as_string();
}
return string(data_key.as_int()); // key can not be an array; bool and float keys are cast to int
}

/*
* escape special characters in a string entry
*/
static string yaml_escape(const string &data) noexcept {
string escaped_data;
for (size_t i = 0; i < data.size(); ++i) {
if (data[i] == 10) { // line feed - code 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. extract data[i] to a separate (char) variable: you use in dozen of times
  2. why do you compare char with a numeric code, why not `data[i] == '\n' and get rid of comments everywher?

escaped_data.push_back(92); // backslash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'\' instead of 92 (and below)?

escaped_data.push_back('n');
} else if (data[i] == 8) { // backspace - code 8
escaped_data.push_back(92); // backslash
escaped_data.push_back('b');
} else if (data[i] == 9) { // horizontal tab - code 9
escaped_data.push_back(92); // backslash
escaped_data.push_back('t');
} else if (data[i] == 11) { // vertical tab - code 11
escaped_data.push_back(92); // backslash
escaped_data.push_back('v');
} else if (data[i] == 34) { // double quotation mark - code 34
escaped_data.push_back(92); // backslash
escaped_data.push_back(34); // double quotation mark
} else if (data[i] == 92) { // backslash - code 92
escaped_data.push_back(92);
escaped_data.push_back(92); // double backslash
} else {
escaped_data.push_back(data[i]);
}
}
return escaped_data;
}

NikOsint marked this conversation as resolved.
Show resolved Hide resolved
/*
* get a YAML representation of mixed in a string variable
*/
static void mixed_to_string(const mixed &data, string &string_data, const uint8_t nesting_level = 0) noexcept {
string buffer;
if (!data.is_array()) {
if (data.is_null()) {
buffer.push_back('~'); // tilda is a YAML representation of NULL
} else if (data.is_string()) {
buffer.push_back('"'); // cover string entry in double quotes
buffer.append(yaml_escape(data.as_string())); // escape special characters
buffer.push_back('"');
} else if (data.is_int()) {
buffer.append(data.as_int());
} else if (data.is_float()) {
buffer.append(data.as_double());
} else if (data.is_bool()) {
buffer = (data.as_bool()) ? string("true") : string("false");
}
string_data.append(buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for do you need buffer, why not to append into string_data directly?

string_data.push_back('\n');
return;
}
const array<mixed> &data_array = data.as_array();
if (data_array.empty()) {
string_data.append("[]\n"); // an empty array is represented as [] in YAML
tolk-vm marked this conversation as resolved.
Show resolved Hide resolved
return;
}
const bool data_array_is_vector = data_array.is_pseudo_vector(); // check if an array has keys increasing by 1 starting from 0
for (const auto &it : data_array) {
const mixed &data_piece = it.get_value();
buffer = yaml_print_tabs(nesting_level);
if (data_array_is_vector) {
buffer.push_back('-');
} else {
buffer.append(yaml_print_key(it.get_key()));
buffer.push_back(':');
}
if (data_piece.is_array() && !data_piece.as_array().empty()) {
buffer.push_back('\n'); // if an element of an array is also a non-empty array, print it on the next line
} else {
buffer.push_back(' '); // if an element of an array is a primitive or an empty array, print it after a space
}
string_data.append(buffer);
mixed_to_string(data_piece, string_data, nesting_level + 1); // for entries of an array, increase nesting level
}
}

bool f$yaml_emit_file(const string &filename, const mixed &data) {
if (filename.empty()) {
php_warning("Filename cannot be empty");
return false;
}
string emitted_data = f$yaml_emit(data);
Optional<int64_t> size = f$file_put_contents(filename, emitted_data);
if (size.is_false()) {
php_warning("Error while writing to file \"%s\"", filename.c_str());
return false;
}
return true;
}

string f$yaml_emit(const mixed &data) {
string string_data("---\n"); // beginning of a YAML document
mixed_to_string(data, string_data);
string_data.append("...\n"); // ending of a YAML document
return string_data;
}

mixed f$yaml_parse_file(const string &filename, int pos) {
if (filename.empty()) {
php_warning("Filename cannot be empty");
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PHP, false in returned on failure

}
Optional<string> data = f$file_get_contents(filename);
if (data.is_false()) {
php_warning("Error while reading file \"%s\"", filename.c_str());
return {};
}
return f$yaml_parse(data.ref(), pos);
}

mixed f$yaml_parse(const string &data, int pos) {
if (pos != 0) {
php_warning("Argument \"pos\" = %d. Values other than 0 are not supported yet. Setting to default (pos = 0)", pos);
}
YAML::Node node = YAML::Load(data.c_str());
mixed parsed_data;
yaml_node_to_mixed(node, parsed_data, data);
return parsed_data;
}
11 changes: 11 additions & 0 deletions runtime/yaml.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#pragma once

#include "runtime/kphp_core.h"

bool f$yaml_emit_file(const string &filename, const mixed &data);

string f$yaml_emit(const mixed &data);

mixed f$yaml_parse_file(const string &filename, int pos = 0);

mixed f$yaml_parse(const string &data, int pos = 0);
1 change: 1 addition & 0 deletions tests/cpp/runtime/runtime-tests.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ prepend(RUNTIME_TESTS_SOURCES ${BASE_DIR}/tests/cpp/runtime/
memory_resource/unsynchronized_pool_resource-test.cpp
string-list-test.cpp
string-test.cpp
yaml-test.cpp
zstd-test.cpp)

allow_deprecated_declarations_for_apple(${BASE_DIR}/tests/cpp/runtime/inter-process-mutex-test.cpp)
Expand Down
Loading