Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add yaml parsing & emitting support #707

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open

Conversation

NikOsint
Copy link

@NikOsint NikOsint commented Dec 18, 2022

KPHP YAML

Solution for Issue #639

Implementation of 4 functions from PHP YAML using yaml-cpp:

yaml_emit_file(string $filename, mixed $data): bool
yaml_emit(mixed $data): string
yaml_parse_file(string $filename, int $pos = 0): mixed
yaml_parse(string $data, int $pos = 0): mixed

C++ tests are written and implemented.

There is one major issue: strings like "10", "98.13' are emitted without quotes, so then they are parsed as integers and doubles respectively.

More informative README can be found here.

@Danil42Russia Danil42Russia linked an issue Dec 19, 2022 that may be closed by this pull request
Copy link

@Tsygankov-Slava Tsygankov-Slava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found several places where something can be improved. It would be great if you took these comments into attention.

runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
@tolk-vm tolk-vm requested a review from DrDet December 20, 2022 19:11
@NikOsint
Copy link
Author

@Tsygankov-Slava Thank you for your review! All the comments were taken into account and the corresponding parts of the code were corrected. Some other small improvements were done.

For now, the problem of emitting number strings such as "10", "5.34" without quotes remains due to poor documentation of yaml-cpp library. I will let you know when a solution is found.

@tolk-vm tolk-vm added the missing functionality Missing PHP functions, classes, constants and etc label Feb 2, 2023
@NikOsint
Copy link
Author

NikOsint commented Feb 27, 2023

@Tsygankov-Slava @unserialize The problem of emitting numbers and numbers-as-strings has been solved.

Converting from mixed to a YAML document has been rewritten without use of yaml-cpp as it does not provide any way to emit strings with quotes and numbers without.

Also now booleans are parsed&emitted correctly (as booleans, not strings or integers).

More C++ tests has been written and a complex PHP test has been implemented, too.

For the PHP test to work, a 'yaml.so' extension has been added to the tests/python/lib/kphp_run_once.py file. Without it, the test would throw an error when running with PHP because PHP does not have its YAML module by default.

In the PHP test, serialize() method is used to output mixed. var_dump() is not used as its output is different in PHP and KPHP.

@NikOsint
Copy link
Author

Merged VKCOM:master into my master branch so that there are no conflicts.

@NikOsint NikOsint requested review from Tsygankov-Slava and removed request for quasilyte, DrDet and tolk-vm March 6, 2023 10:02
Copy link

@Tsygankov-Slava Tsygankov-Slava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked your PR and found a few points that you might want to fix.

runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.h Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
runtime/yaml.cpp Outdated Show resolved Hide resolved
@tolk-vm
Copy link
Contributor

tolk-vm commented Mar 9, 2023

I'll try to review it next week

@@ -1652,3 +1652,8 @@ class DateTimeImmutable implements DateTimeInterface {
}

function getenv(string $varname = '', bool $local_only = false): mixed;

function yaml_emit_file ($filename ::: string, $data ::: mixed) ::: bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string $filename, mixed $data) and others, no :::, just type hints. Same for return — just ':' after ')'

@@ -49,6 +49,11 @@ prepend(KPHP_RUNTIME_PDO_PGSQL_SOURCES pdo/pgsql/
pgsql_pdo_emulated_statement.cpp)
endif()

if (YAML)
prepend(KPHP_RUNTIME_YAML_SOURCES /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing identation in if

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically copied this if from above ifs, there is also no identation
should I fix it there too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it honestly looks worse with indentation inside the if-clause
I think that's why there is no indentation in above ifs

@@ -6,7 +6,7 @@

#include "runtime/kphp_core.h"

using Stream =mixed;
using Stream = mixed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very strange change in this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed a missing space there

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted this change

runtime/yaml.cpp Outdated

#include "runtime/optional.h"
#include "runtime/streams.h"
#include "yaml.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"runtime/yaml.h"

runtime/yaml.cpp Outdated
if (node.IsScalar()) {
const string string_data(node.as<std::string>().c_str());
// check whether the primitive is put in quotes in the source YAML
const bool string_data_has_quotes = (source[node.Mark().pos] == '"' && source[node.Mark().pos + string_data.size() + 1] == '"');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about single quotes? I suppose, they also mean a string in yaml

runtime/yaml.cpp Outdated
/*
* print tabs in quantity of nesting_level (used to print nested YAML entries)
*/
static string yaml_print_tabs(const uint8_t nesting_level) noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use uint8_t for a number? There is no reason not to use just int unless memory alignment is a case you care for

Copy link
Author

@NikOsint NikOsint Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I thought it is unrealistic to exceed 255 levels of nesting in a YAML document, so I decided to limit memory for variable a little bit
and uint8_t is safer in terms of sign

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to int

runtime/yaml.cpp Outdated
string escaped_data;
for (size_t i = 0; i < data.size(); ++i) {
if (data[i] == 10) { // line feed - code 10
escaped_data.push_back(92); // backslash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'\' instead of 92 (and below)?

runtime/yaml.cpp Outdated
static string yaml_escape(const string &data) noexcept {
string escaped_data;
for (size_t i = 0; i < data.size(); ++i) {
if (data[i] == 10) { // line feed - code 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. extract data[i] to a separate (char) variable: you use in dozen of times
  2. why do you compare char with a numeric code, why not `data[i] == '\n' and get rid of comments everywher?

runtime/yaml.cpp Outdated
} else if (data.is_bool()) {
buffer = (data.as_bool()) ? string("true") : string("false");
}
string_data.append(buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for do you need buffer, why not to append into string_data directly?

runtime/yaml.cpp Outdated
mixed f$yaml_parse_file(const string &filename, int pos) {
if (filename.empty()) {
php_warning("Filename cannot be empty");
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PHP, false in returned on failure

@tolk-vm
Copy link
Contributor

tolk-vm commented Oct 19, 2023

There is one major issue: strings like "10", "98.13' are emitted without quotes, so then they are parsed as integers and doubles respectively.

Is this issue (from PR description) still actual? I can't see such a behaviour looking at source code

@NikOsint NikOsint closed this Oct 22, 2023
@NikOsint NikOsint reopened this Oct 22, 2023
@NikOsint
Copy link
Author

There is one major issue: strings like "10", "98.13' are emitted without quotes, so then they are parsed as integers and doubles respectively.

Is this issue (from PR description) still actual? I can't see such a behaviour looking at source code

No, it is outdated, now everything in double quotes is emitted as string
Thanks for this remark, updated description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing functionality Missing PHP functions, classes, constants and etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for parsing yaml files.
5 participants