Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly read the version number #152

Open
xrgzs opened this issue Nov 5, 2024 · 7 comments
Open

Incorrectly read the version number #152

xrgzs opened this issue Nov 5, 2024 · 7 comments

Comments

@xrgzs
Copy link

xrgzs commented Nov 5, 2024

I try to read a YAML configuration file that defines a version number, and I'm getting unexpected results.

Simplified as follows:

PS D:\> "version: 3.10" | ConvertFrom-Yaml

Name                           Value
----                           -----
version                        3.1

PS D:\> ("version: 3.10" | ConvertFrom-Yaml).version
3.1

The type is System.Double. This makes my script recognize the wrong version number.

PS D:\> ('version: 3.10' | convertFrom-Yaml).version.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Double                                   System.ValueType

Then I try to convert a JSON string of the same data. It returns different results from ConvertFrom-Json.

PS D:\> '{"version": 3.10}' | ConvertFrom-Yaml

Name                           Value
----                           -----
version                        3.1

PS D:\> '{"version": 3.10}' | ConvertFrom-Json

version
-------
   3.10

This is my environment:

PS D:\> Get-Module -Name powershell-yaml

ModuleType Version    PreRelease Name                                ExportedCommands
---------- -------    ---------- ----                                ----------------
Script     0.4.7                 powershell-yaml                     {ConvertFrom-Yaml, ConvertTo-Yaml, cfy, cty}

PS D:\> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.4.6
PSEdition                      Core
GitCommitId                    7.4.6
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

I think it would be better to provide an swich to treat number as string.

Or can someone give me a solution? I don't want to use regex.😭

@xrgzs
Copy link
Author

xrgzs commented Nov 5, 2024

jq also has this issue when converting YAML to JSON. But reading is fine. 🤔

PS D:\> "version: 3.10" | yq .version
3.10
PS D:\> "version: 3.10" | yq -o yaml
version: 3.10
PS D:\> "version: 3.10" | yq -o json
{
  "version": 3.1
}

@xrgzs
Copy link
Author

xrgzs commented Nov 5, 2024

jq correctly converts YAML to XML.

PS D:\> "version: 3.10" | yq -o xml
<version>3.10</version>

PS D:\> [xml]("version: 3.10" | yq -o xml)

version
-------
3.10

In complex cases, the output cannot be converted to XML by PowerShell.

PS D:\> [xml](@"
>> version: 3.10
>> name: "Python 3.10"
>> "@| yq -o xml)
InvalidArgument: Cannot convert value "System.Object[]" to type "System.Xml.XmlDocument". Error: "This document already has a 'DocumentElement' node."

After manual completion, it works.

function ConvertFrom-Yaml {
    param (
        [parameter(Mandatory, ValueFromPipeline)]
        [string]
        $InputObject
    )
    $xml = $InputObject | yq -o xml
    $xml = [xml] "<data>$xml</data>"
    return $xml.data
}

$xml = @"
version: 3.10
name: "Python 3.10"
installer:
    amd64:
        url: "https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe"
"@ | ConvertFrom-Yaml

PS D:\> $xml

version name        installer
------- ----        ---------
3.10    Python 3.10 installer

PS D:\>
PS D:\> $xml.installer.amd64

url
---
https://www.python.org/ftp/python/3.10.8/python-3.10.8-amd64.exe

However, this implement may introduce some problems with XML, like ConvertTo-Json doesn't work.

@xrgzs
Copy link
Author

xrgzs commented Nov 5, 2024

As yq supports format selection, it can be further modified:

PS D:\> yq | Select-String format

# yq tries to auto-detect the file format based off the extension, and defaults to YAML if it's unknown (or piping through ST
DIN)
# Use the '-p/--input-format' flag to specify a format type.
  -p, --input-format string           [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|lua|l] parse format for
 input. (default "auto")
  -o, --output-format string          [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|shell|s|lua|l] output f
ormat type. (default "auto")
  -V, --version                       Print version information and quit
Use "yq [command] --help" for more information about a command.

Now choose a format as middleware. Known that the YAML-JSON conversion doesn't work well, so don't use it.

flowchart TD
    A[Start] --> B[Receive YAML string from pipeline]
    B --> C[Convert YAML to XML using yq -- toString]
    C --> D[Convert XML to JSON using yq]
    D --> E[Convert JSON to PSCustomObject using ConvertFrom-Json]
    E --> F[Return the converted object]
Loading
function ConvertFrom-Yaml {
    param (
        [parameter(Mandatory, ValueFromPipeline)]
        [string]
        $InputObject
    )
    return $InputObject | yq -o xml | yq -p xml -o json | ConvertFrom-Json
}

In my payloads, both YAML-XML-JSON-PSCustomObject and YAML-LUA-JSON-PSCustomObject are ok.

@gabriel-samfira
Copy link
Member

gabriel-samfira commented Nov 5, 2024

Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the !!str tag or just quote the value and see if that makes a difference?

"version: !!str 3.10" | ConvertFrom-Yaml

Or just quote the scalar:

'version: "3.10"' | ConvertFrom-Yaml

This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.

When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.

An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.

$aNumber = 100
$aString = "100"

ConvertTo-Yaml $aNumber
ConvertTo-Yaml $aString

@xrgzs
Copy link
Author

xrgzs commented Nov 5, 2024

Is this a yaml that you craft or is it something you consume? If it's something you define, is it possible to add the !!str tag or just quote the value and see if that makes a difference?

"version: !!str 3.10" | ConvertFrom-Yaml

Or just quote the scalar:

'version: "3.10"' | ConvertFrom-Yaml

This happens because the bare scalar (unquoted) is converted to a float. This probably happens in most yaml parsers that automatically coerce types.

When serializing to yaml, parsers have a choice to use bare scalars (values without quotes) or quoted scalars. A bare scalar can be ambiguous in some cases, if there are no tags associated to hint at what the original type was.

An easy way to disambiguate strings is to just quote them. That's why in most parsers (python, go, powershell-yaml), when you convert a string that might be easily converted to any other type, it's quoted.

$aNumber = 100
$aString = "100"

ConvertTo-Yaml $aNumber
ConvertTo-Yaml $aString

It must be something I consume. The above test content is just simplified. The original content is obtained through Invoke-RestMethod and is much more complex.

Also, thank you for the analysis.

@gabriel-samfira
Copy link
Member

gabriel-samfira commented Nov 5, 2024

Okay. There are 2 potential solutions to this. A simple one that will break round-triping, or a more complicated one in the form of something like this:

The simple one implies adding a switch to disable type coercion and return all scalars as strings. Serializing the result back to yaml will have all scalars, quoted. Breaking round tripping.

@gabriel-samfira
Copy link
Member

I wish yaml authors would use tags or at least quote ambiguous scalars. Dynamically typed languages always end up guessing the type. It's a damned if you do damned if you don't situation. If you don't coerce types, everyone needs to duplicate the coercion. If you do coerce types, ambiguous scalars can be misinterpreted.

The WiP branch above tries to add some modeling, but people don't seem to be too interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants