Machine Learning: Best Practices for C++ Development

Introduction

The intent of this document is to present rules and guidelines detailing best practices for developers writing C++ code for Machine Learning at Elastic. It is not intended to be an exhaustive, definitive set of coding standards covering style, format etc. nor is it intended to replace careful study of "Best Practice" handbooks such as the "Effective C++" series of books by Scott Meyers.

The rules contained in this document have deliberately been made as generic as possible. They should be agnostic to development platform, compiler type and version etc. with the caveat that the compiler supports the C++11 standard as a minimum. If a rule applies to a C++ standard version higher than this it will be specified as e.g. From C++14.

The guiding principle should be that consistency with existing code is paramount, therefore in the interests of brevity, detailed justification for the rules has been omitted.

The use of the words SHOULD, MUST etc., comply with RFC 2119.

Naming Conventions

Code SHOULD be consistent with its surrounding context.
Names MUST be meaningful and concise
Variable names MUST reflect use not type
Local variables MUST be named as per variableName
Classes, structs and unions MUST be named as per CClassName, SStructName, UUnionName respectively
1. Implementation files MUST have a .cc extension
2. Header files MUST have a .h extension
Member variables MUST be named as per m_ClassMember, s_StructMember, u_UnionMember For class, structure and union member variables respectively
Static members MUST be named as per ms_ClassStatic, ss_StructStatic
Methods SHOULD be named as per methodName
Template classes MUST be named as per CClassName, <typename TYPE_NAME>
Enumerations MUST be named as per EEnumName, E_MemberName
Type aliases MUST be named as per TTypeName
Type aliases referring to a template SHOULD identify the template instantiation concisely (we have a consistent naming convention in such cases which SHOULD be followed) e.g.
```
using TDoubleVec = std::vector<double>
using TSizeDoubleMap = std::map<std::size_t, double>
```
Constants MUST be named as per CONSTANT_NAME
Macros MUST be named as per MACRO_NAME(...). However...
Macros SHOULD NOT be used unless unavoidable
Files MUST be named as per CClassName.cc, CClassName.h if they contain a single or principle class named CClassName
Files containing primarily global typedefs SHOULD be named as per <Identifier>Types.h where <Identifier> pertains to the file contents.
Files containing primarily constants SHOULD be named Constants.h
Files containing the function main() MUST be called Main[Xxx].cc. Where Xxx is only used if necessary to distinguish multiple such files in the same directory
Non-boolean accessor functions MUST be named as clientId NOT getClientId
Boolean accessor functions MUST be named as isComplete NOT complete
Parameter names in function declaration and definition MUST be identical
Parameters in constructor initialiser lists SHOULD be as per classMember

Project and Library Structure

Existing namespace usage MUST be followed: Production code in a subdirectory foo resides in namespace foo or foo_t (or a nested namespace)
Namespaces MUST be used for logical groupings of code
Namespaces MUST NOT span libraries
Namespaces SHOULD NOT be imported with using directives (like using namespace ml;), except in unit test implementation files. In particular, using namespace std; and using namespace boost; MUST NOT be used anywhere
Files SHOULD be kept short
Multiple classes defined in a single file SHOULD be avoided, except where they pertain to closely related functionality
Shared constants and typedefs MUST be in a 'Types' namespace of the form
```
<library namespace>_t
```
Libraries MUST NOT have circular dependencies
Platform specific code SHOULD be in a separate file of the form
```
<class name>_<platform name>.cc
```

File Structure

All source files MUST be formatted with the clang-format tool prior to check in.
1. This procedure can be simplified by use of the dev-tools/clang-format.sh script.
2. The same specific version of clang-format used by the clang-format.sh script MUST be used. It is recommended that this be obtained from the pre-built binary packages of LLVM available from http://releases.llvm.org/download.html
The standard header file layout MUST be observed

Header files MUST contain the following items in the order defined below
1. Copyright statement
2. Include guard of the form
```
#ifdef INCLUDED_[<namespace>_]<class name>_h
```
  Note that test files are in the global namespace and hence the part in square bracket should be omitted in this case.
3. Include files SHOULD be avoided in header files.
4. Include files SHOULD be in the recommended order (see below) if present
5. Forward declarations. These MUST be used wherever possible to reduce include file requirements
6. Class declarations. These MUST be in the recommended order (see below)
7. End of include guard. This SHOULD be followed by a comment indicating the name of the guard to which this pertains.
8. Judicious use of blank lines SHOULD be used to separate each of the above items.
The standard implementation file layout MUST be observed. Implementation files MUST contain the following items in the order defined below
1. Elastic commercial code file header
2. Class include file
3. Other include files, in the recommended order (see below)
4. Unnamed namespace local declarations (use of this is preferred to private declarations in the header file)
5. Beginning of namespace for this library/application
6. Constructor Implementation
7. Destructor Implementation
8. Copy/Move constructors (if present)
9. Class operators (if present)
10. Other method implementations
11. End of namespace for this library/application
12. Judicious use of blank lines SHOULD be used to separate each of the above items.
Standard ordering of #include statements SHOULD be followed
1. Own include file - for .cc files including their own .h
2. Other ML include files
3. 3^rd party library include files (including Boost)
4. Standard C++ include files
5. Standard C include files. However C++ header wrappers SHOULD be included in preference to the equivalent C header, e.g include cstdlib in preference to stdlib.h
Include files SHOULD be grouped by library/subdirectory, with a blank line between each grouping. clang-format is then able to sort in alphabetical order within each grouped section. It is best practice to list the ML include files in the build order of the libraries they relate to, as this helps to catch accidental circular dependencies.

Class Structure

Class headers SHOULD be broken into sections

Elements SHOULD be placed in sections according to a number of criteria:
1. Scope: public, protected, private or hidden (private + unimplemented functions)
2. Type: constants, typedefs, methods, variables, or nested classes
3. Static or non static
Header sections MUST be ordered according to scope - public, protected, private
Each section in a class MUST be prefixed with its scope keyword (public/protected/private) even if this repeats the scope already in effect
Structs and Unions MUST NOT be used for encapsulation
1. Access specifiers MUST NOT appear
2. Constructors (if present) MUST be trivial
3. Other methods SHOULD NOT be used. This includes any explicit destructor.
Declarations SHOULD be given minimal scope
1. This is a general rule, that should guide the scoping of classes, data, enums and functions
2. It has a number of corollaries:
  1. Class variable data MUST NOT be public - this rule does not apply to static constants
  2. A class which is used by only one other class SHOULD be nested inside it
  3. A class which is instantiated in only one function MAY be nested inside the function definition. A lambda SHOULD be used in preference in this case.
  4. Typedefs, enums and static constants SHOULD be given minimal scope
Copy constructor and assignment operator SHOULD be hidden unless appropriate - new classes SHOULD use delete for this purpose
Functions SHOULD NOT use default arguments
Functions and variable data not belonging to a class SHOULD be defined in a detail or unnamed namespace within implementation files
Destructor of a polymorphic class MUST be virtual
Destructor of a non-polymorphic class MUST NOT be virtual
Interdependencies between static objects MUST be avoided
Headers MUST NOT define static variables of non-built-in types
Implementation details SHOULD reside in the .cc file
Template code SHOULD either be in implementation file, or inside class declaration
Function parameters MUST be ordered as [in] [in,out] [out]
Classes SHOULD be forward declared rather than included
Logically const methods SHOULD be const
Headers SHOULD NOT define static variables of non-built-in types

Language Fundamentals

nullptr MUST be used in preference to 0 or NULL
Unreachable or ineffectual code MUST NOT be included (this also applies to 'commented out' code)
Functions returning a value MUST return a default value at the end
Exceptions SHOULD NOT be thrown - use return codes to explicitly handle error conditions
Exceptions thrown from 3rd party code MUST be caught in the smallest enclosing scope - such exceptions MUST be converted to an appropriate error code
Multi-value error codes SHOULD be returned as an enumeration
Switch statements SHOULD switch over enums and MUST cover all the cases
1. When switching over a non-enum a default case SHOULD be used
2. When switching over an enum a default case SHOULD NOT be used
Error conditions SHOULD always be returned early
The declaration of all variables SHOULD be given the minimal possible scope
Return codes SHOULD preserve the abstraction and encapsulation of the class - functions should not return an error specific to the internal implementation
assert() MUST NOT be used
C-style casts MUST NOT be used
Macros SHOULD NOT be used
Objects and references SHOULD be used in preference to pointers
Smart pointers SHOULD be used in preference to raw pointers
Dynamically acquired resources SHOULD be released in the same scope
A class wrapper SHOULD be used to avoid dangling resources

Resource ownership MUST be indicated by parameter type

Argument, Passed By	Memory Ownership
Value	N/A
Pointer	Called function
Const pointer (not pointer to const object)	Calling function
Reference	Calling function
Const reference	Calling function

Explicit integer definitions, specifying size, SHOULD be used
Member functions MUST be scoped with this-> when called
Floating point variables SHOULD be double
Lambdas SHOULD be used in preference to any form of bind
Default lambda capture modes SHOULD be avoided
The override keyword SHOULD be used consistently within a source file
Type aliases MUST be used in preference to typedefs - use using to create a type alias not typedef
Rvalue references SHOULD only be used in the following cases
1. For implementing move semantics
2. For forwarding references in template code
Emplace operations SHOULD be used to add items to containers wherever applicable i.e. prefer emplace_back over push_back for vectors and emplace over insert for maps
Containers SHOULD have their capacity reserved in advance if applicable
Range based for loops SHOULD be preferred over their explicit counterparts
Uniform (braced) initialisers SHOULD be preferred in new code units
The auto keyword SHOULD be used liberally for variable assignments when the resulting code is less verbose
auto SHOULD NOT be used where the assigned type is not clear in the context e.g.
```
auto obj = doSomething(someVariable); // bad
```
auto types should always be assigned with operator=

Language Extensions

Language features MUST NOT be used until supported by all compilers used for a given ML build version
1. The current lowest common denominator compiler version is Visual Studio 2013
2. C++14 features may be used as of version 7.0
Where both Boost and the standard library contain equivalent implementations of the same feature the same provider SHOULD be used consistently across the codebase
make_shared & make_unique (From C++14) SHOULD be used to create the corresponding smart pointer types
Use of 3rd party libraries (including Boost) MUST be approved - this applies to new features used for the first time.
STL algorithms SHOULD be used wherever appropriate

Documentation

Source code MUST be readable. This is of primary importance
Doxygen MUST be used to comment all source files
Header files MUST be commented for Doxygen in standard format. The following MUST be included for all top-level classes:
1. Brief summary
2. Detailed description - what the class does.
3. Implementation decisions - what has been done and why.
4. In addition, the following MAY be used if required:
5. Future enhancements - what we should do to this class in the future.
6. Resource ownership - should be used where this class manages any resources (e.g. objects on the heap)
7. Example - usage example etc
Exclamation mark style MUST be used for Doxygen
Implementation files SHOULD be commented in C++ style (not C, and not Doxygen)
All class members SHOULD be documented
All public and protected methods MUST be documented with \param, \return
Parameters MUST be documented by name
Non-trivial resource ownership SHOULD be documented

Testing

The CPPUNIT framework is used for testing the Elastic Machine Learning C++ code.

Every class SHOULD have a corresponding unit test suite
Test classes SHOULD belong to the parent or global namespace
Unit tests SHOULD exist for every public method in the corresponding class

General Comments

The number of lines in a method SHOULD be kept to a minimum
C++ functions SHOULD be used in preference to C
reinterpret_cast MUST only be used when interfacing with 3^rd party code
dynamic_cast SHOULD be used judiciously
const_cast SHOULD be used judiciously
Classes SHOULD NOT inherit from more than one base class
All code SHOULD compile with no warnings
Overloaded function parameters SHOULD NOT be implicitly convertible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STYLEGUIDE.md

STYLEGUIDE.md

Machine Learning: Best Practices for C++ Development

Table of Contents

Introduction

Naming Conventions

Project and Library Structure

File Structure

Class Structure

Language Fundamentals

Language Extensions

Documentation

Testing

General Comments

Files

STYLEGUIDE.md

Latest commit

History

STYLEGUIDE.md

File metadata and controls

Machine Learning: Best Practices for C++ Development

Table of Contents

Introduction

Naming Conventions

Project and Library Structure

File Structure

Class Structure

Language Fundamentals

Language Extensions

Documentation

Testing

General Comments