Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to marshal a slice of large structs #235

Open
abiliojr opened this issue Aug 29, 2018 · 20 comments
Open

Not able to marshal a slice of large structs #235

abiliojr opened this issue Aug 29, 2018 · 20 comments

Comments

@abiliojr
Copy link

I'm trying to marshal an "heterogeneous" slice as an experiment (one of interfaces), and I'm hitting a wall with it. If I fill it with structs of up to 3 elements, it will encode them fine. If I add a single struct with 4 elements, it fails at runtime, saying:

msgp: type "main.FailureObject" not supported

FailureObject being the problematic struct in my example to reproduce the problem, though I proved I can marshal an individual object of that type.

Note: I intentionally need the objects to be encoded as arrays, and that's why I'm adding the "tuple" on top.

I'm completely new in Go, so most probably is my mistake. Anyhow, can you please take a look at the code:

package main

import "fmt"

//go:generate msgp
type ListOfObjects []interface{}

//msgp:tuple OkObject
type OkObject struct {
	A uint8
	B uint8
	C uint8
}

//msgp:tuple FailureObject
type FailureObject struct {
	A uint8
	B uint8
	C uint8
	D uint8
}

func AppendOkObj(list ListOfObjects, b uint8, c uint8) ListOfObjects {
	return append(list, OkObject{0, b, c})
}

func AppendFailureObj(list ListOfObjects, b uint8, c uint8) ListOfObjects {
	return append(list, FailureObject{1, b, c, 1})
}

func main() {
	// This will work. I can marshal a "long" object
	thisWorks := FailureObject{1, 2, 3, 4}
	res, _ := thisWorks.MarshalMsg(nil)
	fmt.Printf("% X\n", res)

	// I can do this and it will work
	list := make(ListOfObjects, 0, 10)
	list = AppendOkObj(list, 0, 0)
	list = AppendOkObj(list, 1, 1)
	list = AppendOkObj(list, 2, 2)
	res, _ = list.MarshalMsg(nil)
	fmt.Println("as you can see, it works")
	fmt.Printf("% X\n", res)

	// now this will give an error and instead return a half baked message
	list = make(ListOfObjects, 0, 10)
	list = AppendFailureObj(list, 0, 0)
	list = AppendFailureObj(list, 0, 0)
	res, err := list.MarshalMsg(nil)
	fmt.Println("as you can see, this doesn't")
	fmt.Printf("% X\n", res)
	fmt.Println(err)

	// combination of short and long won't work either
	list = make(ListOfObjects, 0, 10)
	list = AppendOkObj(list, 1, 1)
	list = AppendFailureObj(list, 0, 0)
	res, err = list.MarshalMsg(nil)
	fmt.Println("neither this")
	fmt.Printf("% X\n", res)
	fmt.Println(err)
}

If I'm assuming something here that is wrong, what would be the correct way, cleanest and most efficient way of encoding an array of arrays that can have different sizes and that are mapped to existing types.

@erikdubbelboer
Copy link
Contributor

The problem is that it generates a method receiver on the pointer type:

func (z *FailureObject) EncodeMsg(en *msgp.Writer) (err error) {

While for OkObject it's on the value type:

func (z OkObject) EncodeMsg(en *msgp.Writer) (err error) {

The reason why is this piece of code:

msgp/gen/spec.go

Lines 233 to 244 in 53e4ad1

case *Struct:
// TODO(HACK): actually do real math here.
if len(e.Fields) <= 3 {
for i := range e.Fields {
if be, ok := e.Fields[i].FieldElem.(*BaseElem); !ok || (be.Value == IDENT || be.Value == Bytes) {
goto nope
}
}
return p.TypeName()
}
nope:
return "*" + p.TypeName()

Which was introduced in commit cd70faa which says allow some encode/marshal method recvrs to be values.

The 3 here makes the difference why your OkObject works and FailureObject fails.

I don't understand where the 3 comes from or what for loop is actually checking but changing it to just:

case *Struct:
  return p.TypeName()

still passes all tests and fixes this bug.

In theory all receivers for EncodeMsg should be able to be values seeing as it doesn't change the object.

@philhofer do you remember writing this code 3 years ago and why you did it this way?

@philhofer
Copy link
Member

So, originally I wrote the code to always use pointer receivers, since there's sometimes a substantial performance penalty to copying the whole struct into the method. Then someone complained that they wanted to use value receivers. The middle-ground that the code sits in right now is that it uses
value receivers for "small" structs (three words or under, so smaller than a slice header), and uses pointer receivers otherwise.

Looking back on things, I should have just stuck with pointer receivers, because the behavior is not intuitive or well-documented.

@abiliojr
Copy link
Author

I increased the value from 3 to 10 and now it works. Anything I can do to help?

@philhofer
Copy link
Member

philhofer commented Aug 29, 2018

You likely don't want to be copying (or boxing) 80-byte structures.

The simplest fix for you is:

func AppendFailureObj(list ListOfObjects, b uint8, c uint8) ListOfObjects {
-	return append(list, FailureObject{1, b, c, 1})
+	return append(list, &FailureObject{1, b, c, 1})
}

You can cause your example to fail during type-checking (rather than at run-time) by making your slice of type []msgp.Marshaler rather than []interface{}.

@abiliojr
Copy link
Author

I understand the first suggestion, but the second one isn't obvious to me (I read the wiki a couple of weekends ago actually looking for some detail on how to marshal a slice, and I don't remember reading anything like that, then I learned about slices of interfaces as a concept).

Anyhow, I tried []msgp.Marshaler, and now I get a different error:
demo.go: ListOfObjects: unresolved identifier: msgp.Marshaler

I'm using it as:
type ListOfObjects []msgp.Marshaler

What am I doing wrong?

As a suggestion, I think this info can be in the tips and tricks section of the wiki (unless you consider it too obvious). I would write it myself in order to help, but with less than 10 hours of experience, I don't consider it wise ;)

@philhofer
Copy link
Member

philhofer commented Aug 29, 2018

You'll need import "github.com/tinylib/msgp/msgp" in order to use the msgp.Marshaler identifier.

@abiliojr
Copy link
Author

I was really impressed about the editor I'm using (vscode) because it automatically detected the need for the import and added it. But still, I get that error message when I try to manually run:
~/go/bin/msgp --file demo.go

@philhofer
Copy link
Member

The full import is "github.com/tinylib/msgp/msgp" (note the second 'msgp'; the top-level one is the code generation tool, not a library). Your editor is probably selecting the wrong import.

@philhofer
Copy link
Member

philhofer commented Aug 29, 2018

Also, keep in mind that "unresolved identifier" is just a warning produced by the tool when it encounters a type not defined in the input file, not an error. But you'll probably want to avoid generating methods for that type.

@abiliojr
Copy link
Author

Yes, the editor actually included the right one (two msgp). Here a simplified example:

package main

import (
	"github.com/tinylib/msgp/msgp"
)

//go:generate msgp
type ListOfObjects []msgp.Marshaler

//msgp:tuple Obj
type Obj struct {
	A uint8
	B uint8
	C uint8
}

func AppendObj(list ListOfObjects, b uint8, c uint8) ListOfObjects {
	return append(list, Obj{0, b, c})
}

func main() {
	list := make(ListOfObjects, 0, 10)
	list = AppendObj(list, 0, 0)
	res, _ := list.MarshalMsg(nil)
}

The warning prints as usual, but if I run go build, I get compiler errors:

./demo_gen.go:22:21: (*z)[zb0001].DecodeMsg undefined (type msgp.Marshaler has no field or method DecodeMsg)
./demo_gen.go:37:18: z[zb0003].EncodeMsg undefined (type msgp.Marshaler has no field or method EncodeMsg)
./demo_gen.go:71:26: (*z)[zb0001].UnmarshalMsg undefined (type msgp.Marshaler has no field or method UnmarshalMsg)
./demo_gen.go:84:17: z[zb0003].Msgsize undefined (type msgp.Marshaler has no field or method Msgsize)

@philhofer
Copy link
Member

Stick a //msgp:ignore ListOfObjects and you should be fine.

(It's not possible to generate code for the ListOfObjects type.)

@erikdubbelboer
Copy link
Contributor

@philhofer is this something you are still willing to change?

You likely don't want to be copying (or boxing) 80-byte structures.

That actually takes only a couple of nanoseconds. I wrote a simple test: https://gist.github.com/erikdubbelboer/98c8a910b1151c6df0e1ecba56fffcec
The result:

BenchmarkValue-8     	50000000	        28.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkPointer-8   	100000000	        22.9 ns/op	       0 B/op	       0 allocs/op

As you can see the difference is neglectable.

@abiliojr
Copy link
Author

After adding the ignore as per instructions, the warning disappears as expected, but when I try to compile I still get an error:
./demo.go:27:16: list.MarshalMsg undefined (type ListOfObjects has no field or method MarshalMsg)

Which I kind of understand as valid

@erikdubbelboer
Copy link
Contributor

erikdubbelboer commented Aug 30, 2018

@philhofer what you want isn't possible. Since it's a slice you have to generate code for it. But the generated code assumes each element has the methods DecodeMsg, EncodeMsg, UnmarshalMsg and Msgsize which msg.Marshaler obviously doesn't have.

The closest you can get is:

type ListType interface {
  msgp.Decodable
  msgp.Encodable
  msgp.Marshaler
  msgp.Unmarshaler
  msgp.Sizer
}

type ListOfObjects []ListType

But this results in a:

./test.go:24:15: cannot use Obj literal (type Obj) as type ListType in append:
	Obj does not implement ListType (DecodeMsg method has pointer receiver)

@erikdubbelboer
Copy link
Contributor

I suggest changing msgp to always use a value receiver just like json.RawMessage for example.

@erikdubbelboer
Copy link
Contributor

@philhofer I updated my benchmark a bit: https://gist.github.com/erikdubbelboer/98c8a910b1151c6df0e1ecba56fffcec
Unless people are encoding huge structs passing by value should always be better as it allows escape analysis to put structs on the stack much more often.

@philhofer
Copy link
Member

Why would generating code that passes structs by value cause any difference in escape analysis? Escape analysis should pretty much always conclude that the struct does not escape, since the generated code doesn't maintain a pointer anywhere into the structure...

Your benchmark seems to show performance parity because of the compiler being clever enough to inline the calls. The generated code is rarely (never, I think) a candidate for inlining.

I've always viewed passing structures as value receivers as a code smell, perhaps because I have written too much C. (But, also, perhaps because I see beginner Go programmers coming from Java do it too frequently because they don't really grok pointers.) So, I may be unreasonably biased.

@erikdubbelboer
Copy link
Contributor

I looked into it more and I think you are right about it not affecting escape analysis. Escape analysis is only affected by pointer cycles and interfaces which both aren't used in the generated code. So why don't you change the code to always use pass by reference? I still think the current solution of switching methods after 4 or more fields is not the best solution.

It doesn't seem like the compiler is inlining the calls. -gcflags '-m -m' prints cannot inline Big.byValue: unhandled op FOR and cannot inline (*Big).byPointer: unhandled op FOR. Adding //go:noinline comments doesn't change the result either.

I don't think passing structs as value is code smell. I think in Go it is somewhat similar as to using const in C where you want to show that the argument shouldn't be modified. I'm not sure which makes it easier to spot bugs, passing by value and having the copy modifier by accident or passing by pointer and having the original modified by accident.

@skaldesh
Copy link

skaldesh commented Sep 5, 2024

Beware of the following:
When trying to assert whether an any type implements the msgp.Marshaler interface, it can get really tricky when the method receivers of the generated methods are sometimes value types:

type Test struct {
     S *string // <- causes pointer receiver on generated msgp.Marshaler method
}

// Generated msgp code...

func main() {
    var a any
    a = Test{}
    if m, ok := a.(msgp.Marshaler); ok {
         // Not reached because Test struct has a generated method with a pointer receiver.
    }
}

@klauspost
Copy link
Collaborator

We can implement a fairly generic wrapper that can be used for all types that are generated.

There is a minimal amount of (AFAICT necessary) reflection, for creating new instances.

package arr

import (
	"errors"
	"math"
	"reflect"

	"github.com/tinylib/msgp/msgp"
)

// RoundTripper provides an interface for type roundtrip serialization.
type RoundTripper interface {
	msgp.Unmarshaler
	msgp.Marshaler
	msgp.Sizer
	msgp.Encodable
	msgp.Decodable

	comparable
}

// Array provides a wrapper for an underlying array of serializable objects.
type Array[T RoundTripper] struct {
	value []T
}

// Msgsize returns the size of the array in bytes.
func (j *Array[T]) Msgsize() int {
	if j.value == nil {
		return msgp.NilSize
	}
	sz := msgp.ArrayHeaderSize
	for _, v := range j.value {
		sz += v.Msgsize()
	}
	return sz
}

// Value returns the underlying value.
// Regular append mechanics should be observed.
func (j *Array[T]) Value() []T {
	return j.value
}

// Append a value to the underlying array.
// The returned Array is always the same as the one called.
func (j *Array[T]) Append(v ...T) *Array[T] {
	if j.value == nil {
		j.value = make([]T, 0, len(v))
	}
	j.value = append(j.value, v...)
	return j
}

// Set the underlying value.
func (j *Array[T]) Set(val []T) {
	j.value = val
}

// MarshalMsg implements msgp.Marshaler
func (j *Array[T]) MarshalMsg(b []byte) (o []byte, err error) {
	if j.value == nil {
		return msgp.AppendNil(b), nil
	}
	if uint64(len(j.value)) > math.MaxUint32 {
		return b, errors.New("array: length of array exceeds math.MaxUint32")
	}
	b = msgp.AppendArrayHeader(b, uint32(len(j.value)))
	for _, v := range j.value {
		b, err = v.MarshalMsg(b)
		if err != nil {
			return b, err
		}
	}
	return b, err
}


// EncodeMsg implements msgp.Encoder
func (j *Array[T]) EncodeMsg(w *msgp.Writer) error {
	if j.value == nil {
		return w.WriteNil()
	}
	if uint64(len(j.value)) > math.MaxUint32 {
		return errors.New("array: length of array exceeds math.MaxUint32")
	}
	if err := w.WriteArrayHeader(uint32(len(j.value))); err != nil {
		return err
	}
	for _, v := range j.value {
		err := v.EncodeMsg(w)
		if err != nil {
			return err
		}
	}
	return nil
}

// UnmarshalMsg will unmarshal the value into a typed array.
// Nil values are supported.
func (j *Array[T]) UnmarshalMsg(bytes []byte) ([]byte, error) {
	if bytes, err := msgp.ReadNilBytes(bytes); err == nil {
		j.value = nil
		return bytes, nil
	}
	l, bytes, err := msgp.ReadArrayHeaderBytes(bytes)
	if err != nil {
		return bytes, err
	}
	if j.value == nil {
		j.value = make([]T, 0, l)
	} else {
		j.value = j.value[:0]
	}
	for i := uint32(0); i < l; i++ {
		v := newRT[T]()
		bytes, err = v.UnmarshalMsg(bytes)
		if err != nil {
			return bytes, err
		}
		j.value = append(j.value, v)
	}
	return bytes, nil
}

// DecodeMsg will decode the value into a typed array.
// Nil values are supported.
func (j *Array[T]) DecodeMsg(r *msgp.Reader) error {
	if err := r.ReadNil(); err == nil {
		j.value = nil
		return nil
	}
	l, err := r.ReadArrayHeader()
	if err != nil {
		return err
	}
	if j.value == nil {
		j.value = make([]T, 0, l)
	} else {
		j.value = j.value[:0]
	}
	for i := uint32(0); i < l; i++ {
		v := newRT[T]()
		err = v.DecodeMsg(r)
		if err != nil {
			return err
		}
		j.value = append(j.value, v)
	}
	return nil
}

// newRT will create a new RoundTripper, which is a pointer that points to a zero value element.
func newRT[T RoundTripper]() T {
	var t T
	// Use reflection to get the type of T
	ptrType := reflect.TypeOf(t)

	// T will always have a pointer type, so we create a new element.
	elemType := ptrType.Elem()

	// Create a new instance of T using reflect.New
	// This will create a pointer to the element.
	newValue := reflect.New(elemType)

	return newValue.Interface().(T)
}

Usage:

package main

import (
	"fmt"

	".path.to./arr"
)

//go:generate msgp

type Obj struct {
	A uint8
	B uint8
	C uint8
}

func main() {
	var a arr.Array[*Obj]
	a.Append(&Obj{A: 1, B: 2, C: 3})
	a.Append(&Obj{A: 4, B: 5, C: 6})

	msg, err := a.MarshalMsg(nil)
	if err != nil {
		panic(err)
	}
	fmt.Println("msg:", msg)

	var b arr.Array[*Obj]
	_, err = b.UnmarshalMsg(msg)
	if err != nil {
		panic(err)
	}
	for _, v := range b.Value() {
		fmt.Printf("%+v", *v)
	}
}

// OUTPUT: msg: [146 131 161 65 1 161 66 2 161 67 3 131 161 65 4 161 66 5 161 67 6]
// {A:1 B:2 C:3}{A:4 B:5 C:6}

The only annoying part is that array elements must be pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants