Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Stack Smashing on stack allocated tensors #14

Open
Dando18 opened this issue Jul 11, 2019 · 3 comments
Open

[BUG] Stack Smashing on stack allocated tensors #14

Dando18 opened this issue Jul 11, 2019 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Dando18
Copy link
Collaborator

Dando18 commented Jul 11, 2019

Describe the bug
Passing stack allocated tensor objects to some MagmaDNN functions causes stack smashing errors. The bug is somewhat unpredictable. It does not happen every run.

To Reproduce

model::NeuralNetwork<float> model( ... );

/* stack allocated */
Tensor<float> x( ... );
Tensor<float> y( ... );

/* this causes a stack smashing error, a non-negligent percentage of runs */
model.fit(&x, &y, ... );

Expected behavior
No stack smashing.

Environment:

  • OS: Ubuntu 18.10
  • CUDA Version: 10.1
  • CBLAS: OpenBLAS
  • Magma Version: 2.5.2
  • CuDNN Version: 7.6
  • MagmaDNN Version 1.0

Additional context
Due to magmadnn::Tensor<T>'s copy constructor and MemoryManager copying.

@Dando18 Dando18 added the bug Something isn't working label Jul 11, 2019
@Dando18 Dando18 added this to the MagmaDNN-v1.2 milestone Jul 11, 2019
@Dando18
Copy link
Collaborator Author

Dando18 commented Jul 18, 2019

A new branch has been created to overhaul MagmaDNN's memory system.

New Branch: memory-fixes

The goal of this branch is to replace most pointer use in MagmaDNN with references or smart pointers. This aligns with a modern c++ approach.

Dando18 added a commit that referenced this issue Jul 22, 2019
this is in effort for #14 and milestone v1.2
Dando18 added a commit that referenced this issue Jul 22, 2019
Dando18 added a commit that referenced this issue Jul 23, 2019
TODO -- a lot of the old macros need to be changed out to consider Tensor or Tensor*.

Part of work towards #14.
Dando18 added a commit that referenced this issue Jul 23, 2019
better support for the binary and unary ops in addition to GPU support

part of #14.
Dando18 added a commit that referenced this issue Jul 23, 2019
compile the math portion of the library as it comes to speed with the rest of the framework

in effort to fix issue #14.
Dando18 added a commit that referenced this issue Jul 23, 2019
some math:: functions are refactored to take const correct Tensor& references.

This is part of the changes to address issue #14.
Dando18 added a commit that referenced this issue Jul 24, 2019
Dando18 added a commit that referenced this issue Jul 24, 2019
… const correct.

This is in part of the memory fixes for #14.
@Dando18
Copy link
Collaborator Author

Dando18 commented Jul 24, 2019

Here as an outline of how MagmaDNN's memory management will be refactored into a more modern c++ style.

Remove pointers where possible

In MagmaDNN v1.0, just about everything works in pointers. All of the ::magmadnn::math functions take Tensor<T>* pointers rather than plain tensors. This was necessary due to the lack of copy/move/assign semantics for the ::magmadnn::Tensor<T> class. Also operators and other tensor containers allocate, instantiate, and manage tensor pointers.

This is C-style and an unsafe programming style. Modern C++ favors references over pointers. Where resource management is necessary (::magmadnn::MemoryManager) it is favorable to use smart pointers and rather than raw pointers.

First we shall change Tensors and the MemoryManager class to use smart pointers. Tensors will use reference counting to avoid copying of data pointers.

Then proper copy/move/assign semantics are required for the Tensor class. This should be simple if std::shared_ptr is used for the MemoryManager. Then default copying should suffice.

Finally, the ::magmadnn::math library needs to be rewritten to utilize tensor references rather than pointers. This is a more appropriate C++ style interface. It also removes the possibility of nullptr values.

An example func:

void add(const Tensor& a, const Tensor& b, Tensor& out) { ... }

Operations will still needs pointers to children and parents. This is inherent to the graph data structure. However, rather than ::magmadnn::op::add( ... ) returning a raw allocated pointer, it should now add a std::unique_ptr to a central Graph class and return the non-owner raw pointer. The raw pointer can now be used in an algorithmic sense rather than for resource management.

Const Correctness

A significant portion of MagmaDNN routines are not const correct. We will be enforcing const correctness on member functions and ::magmadnn::math function parameters. This is for type safety, to prevent bugs, and provide a cleaner more complete interface.

Remove Type Templating

Currently Operations and math functions are restricted by type. We will remove the type template argument from most of these. For example, ::magmadnn::Tensor<T> will become just ::magmadnn::Tensor. Data type will be signified by some enum DataType, which specifies the supported values.

Valgrind and Cuda-memcheck Testing

We will run and check various MagmaDNN routines against valgrind and cuda-memcheck to be certain of its memory behaviour and performance.

Dando18 added a commit that referenced this issue Jul 25, 2019
Now only the device type is templated. (#14)
Dando18 added a commit that referenced this issue Jul 25, 2019
This is part of the efforts described in #14.
@Dando18
Copy link
Collaborator Author

Dando18 commented Aug 2, 2019

Just to comment on the above memory strategy proposed for MagmaDNN: it will break a lot (pretty much all) existing code. This is unfortunate, but the memory update is necessary for growing the framework and there are also not that many users currently.

Dando18 added a commit that referenced this issue Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants