[BUG] Stack Smashing on stack allocated tensors #14

Dando18 · 2019-07-11T17:02:37Z

Describe the bug
Passing stack allocated tensor objects to some MagmaDNN functions causes stack smashing errors. The bug is somewhat unpredictable. It does not happen every run.

To Reproduce

model::NeuralNetwork<float> model( ... );

/* stack allocated */
Tensor<float> x( ... );
Tensor<float> y( ... );

/* this causes a stack smashing error, a non-negligent percentage of runs */
model.fit(&x, &y, ... );

Expected behavior
No stack smashing.

Environment:

OS: Ubuntu 18.10
CUDA Version: 10.1
CBLAS: OpenBLAS
Magma Version: 2.5.2
CuDNN Version: 7.6
MagmaDNN Version 1.0

Additional context
Due to magmadnn::Tensor<T>'s copy constructor and MemoryManager copying.

The text was updated successfully, but these errors were encountered:

Dando18 · 2019-07-18T13:54:56Z

A new branch has been created to overhaul MagmaDNN's memory system.

New Branch: memory-fixes

The goal of this branch is to replace most pointer use in MagmaDNN with references or smart pointers. This aligns with a modern c++ approach.

this is in effort for #14 and milestone v1.2

towards milestone v1.2 and #14

TODO -- a lot of the old macros need to be changed out to consider Tensor or Tensor*. Part of work towards #14.

better support for the binary and unary ops in addition to GPU support part of #14.

compile the math portion of the library as it comes to speed with the rest of the framework in effort to fix issue #14.

some math:: functions are refactored to take const correct Tensor& references. This is part of the changes to address issue #14.

this is in effort towards #14.

… const correct. This is in part of the memory fixes for #14.

Dando18 · 2019-07-24T22:49:16Z

Here as an outline of how MagmaDNN's memory management will be refactored into a more modern c++ style.

Remove pointers where possible

In MagmaDNN v1.0, just about everything works in pointers. All of the ::magmadnn::math functions take Tensor<T>* pointers rather than plain tensors. This was necessary due to the lack of copy/move/assign semantics for the ::magmadnn::Tensor<T> class. Also operators and other tensor containers allocate, instantiate, and manage tensor pointers.

This is C-style and an unsafe programming style. Modern C++ favors references over pointers. Where resource management is necessary (::magmadnn::MemoryManager) it is favorable to use smart pointers and rather than raw pointers.

First we shall change Tensors and the MemoryManager class to use smart pointers. Tensors will use reference counting to avoid copying of data pointers.

Then proper copy/move/assign semantics are required for the Tensor class. This should be simple if std::shared_ptr is used for the MemoryManager. Then default copying should suffice.

Finally, the ::magmadnn::math library needs to be rewritten to utilize tensor references rather than pointers. This is a more appropriate C++ style interface. It also removes the possibility of nullptr values.

An example func:

void add(const Tensor& a, const Tensor& b, Tensor& out) { ... }

Operations will still needs pointers to children and parents. This is inherent to the graph data structure. However, rather than ::magmadnn::op::add( ... ) returning a raw allocated pointer, it should now add a std::unique_ptr to a central Graph class and return the non-owner raw pointer. The raw pointer can now be used in an algorithmic sense rather than for resource management.

Const Correctness

A significant portion of MagmaDNN routines are not const correct. We will be enforcing const correctness on member functions and ::magmadnn::math function parameters. This is for type safety, to prevent bugs, and provide a cleaner more complete interface.

Remove Type Templating

Currently Operations and math functions are restricted by type. We will remove the type template argument from most of these. For example, ::magmadnn::Tensor<T> will become just ::magmadnn::Tensor. Data type will be signified by some enum DataType, which specifies the supported values.

Valgrind and Cuda-memcheck Testing

We will run and check various MagmaDNN routines against valgrind and cuda-memcheck to be certain of its memory behaviour and performance.

Now only the device type is templated. (#14)

This is part of the efforts described in #14.

Dando18 · 2019-08-02T00:20:22Z

Just to comment on the above memory strategy proposed for MagmaDNN: it will break a lot (pretty much all) existing code. This is unfortunate, but the memory update is necessary for growing the framework and there are also not that many users currently.

Not passing (or compiling).

Dando18 added the bug Something isn't working label Jul 11, 2019

Dando18 added this to the MagmaDNN-v1.2 milestone Jul 11, 2019

Dando18 assigned Dando18 and sedrickkeh Jul 11, 2019

Dando18 added a commit that referenced this issue Jul 22, 2019

removed templated data type from tensor and memorymanager

57ac558

this is in effort for #14 and milestone v1.2

Dando18 added a commit that referenced this issue Jul 22, 2019

kernel launchers for generalized math operations

7275073

towards milestone v1.2 and #14

Dando18 added a commit that referenced this issue Jul 23, 2019

updated some tensor macros

6eb95fa

TODO -- a lot of the old macros need to be changed out to consider Tensor or Tensor*. Part of work towards #14.

Dando18 added a commit that referenced this issue Jul 23, 2019

updated some of the generalized kernels

1ef0651

better support for the binary and unary ops in addition to GPU support part of #14.

Dando18 added a commit that referenced this issue Jul 23, 2019

now testing the math library

5361877

compile the math portion of the library as it comes to speed with the rest of the framework in effort to fix issue #14.

Dando18 added a commit that referenced this issue Jul 23, 2019

refactoring math:: namespace

5ffc201

some math:: functions are refactored to take const correct Tensor& references. This is part of the changes to address issue #14.

Dando18 added a commit that referenced this issue Jul 24, 2019

added device type enums to use for computation specifications.

91d60d9

this is in effort towards #14.

Dando18 added a commit that referenced this issue Jul 24, 2019

updated all ::magmadnn::math routines to use tensor references and be…

5bdcbcd

… const correct. This is in part of the memory fixes for #14.

Dando18 added a commit that referenced this issue Jul 25, 2019

updated bias add to remove templated data type

7e1d8dd

Now only the device type is templated. (#14)

Dando18 added a commit that referenced this issue Jul 25, 2019

began work on removing datatype templates from Operations

55b479a

This is part of the efforts described in #14.

Dando18 added a commit that referenced this issue Aug 9, 2019

temporary updates to compute graph structure (#14)

959736d

Not passing (or compiling).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Stack Smashing on stack allocated tensors #14

[BUG] Stack Smashing on stack allocated tensors #14

Dando18 commented Jul 11, 2019

Dando18 commented Jul 18, 2019

Dando18 commented Jul 24, 2019

Dando18 commented Aug 2, 2019

[BUG] Stack Smashing on stack allocated tensors #14

[BUG] Stack Smashing on stack allocated tensors #14

Comments

Dando18 commented Jul 11, 2019

Dando18 commented Jul 18, 2019

Dando18 commented Jul 24, 2019

Remove pointers where possible

Const Correctness

Remove Type Templating

Valgrind and Cuda-memcheck Testing

Dando18 commented Aug 2, 2019