diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
new file mode 100644
index 000000000..72a2fb57e
--- /dev/null
+++ b/.github/pull_request_template.md
@@ -0,0 +1,42 @@
+# Description
+<!---
+Please include a summary of the changes and the related issue, if relevant. Please also include relevant motivation and context. List any new dependencies that are required for this change.
+-->
+
+<!--- Fixes known issue # (if relevant) -->
+
+
+## Type of change
+
+Please select all relevant options
+
+- [ ] New feature (non-breaking change which adds functionality)
+- [ ] Bug fix (non-breaking change which fixes an issue)
+- [ ] Formatting and/or style fixes
+- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
+- [ ] This change requires a documentation update
+
+# How Has This Been Tested?
+<!---
+Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
+-->
+
+- [ ] Test A : 
+- [ ] Test B : 
+
+**Test Configuration**:
+* OS version:
+* Hardware:
+* Compiler:
+
+# Checklist:
+
+- [ ] My code follows the style guidelines of this project
+- [ ] I have performed a self-review of my code
+- [ ] I have commented my code, particularly in hard-to-understand areas
+- [ ] I have made corresponding changes to the documentation
+- [ ] My changes generate no new warnings
+- [ ] The code builds from scratch with my new changes
+- [ ] I have added tests that prove my fix is effective or that my feature works
+- [ ] New and existing unit tests pass locally with my changes
+- [ ] Any dependent changes have been merged and published in downstream modules
diff --git a/README.md b/README.md
index 102f80a18..a5a598e3c 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ GPU hardware will be leveraged according to the distribution of Trilinos that **
 You are welcome to only compile one solver or the other, and the one(s) that you did compile will be available through the CLI.
 
 ## Building dependencies
-**Fierro** depends on both ELEMENTS and Trilinos. If you are building **Fierro** for hardware optimizations, you should also also build ELEMENTS from source. ELEMENTS is included in the **Fierro** source distribution and building ELEMENTS is enabled by default when building **Fierro**.
+**Fierro** depends on both ELEMENTS and Trilinos. If you are building **Fierro** for hardware optimizations, you should also build ELEMENTS from source. ELEMENTS is included in the **Fierro** source distribution and building ELEMENTS is enabled by default when building **Fierro**.
 
 As for Trilinos, we recommend installing the Anaconda package for the desired build into a new Anaconda environment to satisfy **Fierro**'s dependency rather than building it from source. If you do wish to build it from source, however, sample build scripts for Trilinos can be found in `Fierro/Trilinos-Build-Scripts`. Build scripts for all Anaconda packages can be found in `Fierro/Anaconda-Packages/`.
 
@@ -81,9 +81,54 @@ The ELEMENTS library and MATAR library can be updated to the newest release usin
 git submodule update --remote --merge
 ```
 
+# Contributing to Fierro
+As an open source software team, we greatly appreciate any and all community involvement. There are many ways to contribute to Fierro, from tidying up code sections to implementing novel solvers. 
+To streamline the integration of community contributions, we follow the following guidelines.
 
+## Writing commit messages
 
+Write your commit messages using these standard prefixes:
+- `BUG:` Fix for runtime crash or incorrect result
+- `COMP:` Compiler error or warning fix
+- `DOC:` Documentation change
+- `ENH:` New functionality
+- `PERF:` Performance improvement
+- `STYLE:` No logic impact (indentation, comments)
+- `WIP:` Work In Progress not ready for merge
 
+The commit message should assist in the review process and allow future developers to understand the change.  To that end, please follow these few rules:
 
+1. Try to keep the subject line below 72 characters, ideally 50.
+2. Be concise, but honor the change.
+3. If the change is uncharacteristically large (or diverse), consider breaking the change into multiple distinct commits.
+4. If the commit references a specific issue, link it in the message
 
+[This](https://cbea.ms/git-commit/) is a great post on how to write a good commit message. 
 
+## The PR Process
+
+If you are new to Fierro development and don't have push access to the repository, here are the steps:
+1. [Fork and clone](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) the repository.
+2. Create a branch for your work.
+3. [Push](https://docs.github.com/en/get-started/using-git/pushing-commits-to-a-remote-repository) the branch to your GitHub fork.
+4. Build and test your changes.
+5. Update any necessary documentation.
+6. Create a [Pull Request](https://github.com/lanl/Fierro/pulls).
+
+This corresponds to the **Fork & Pull Model** described in the [GitHub collaborative development](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/getting-started/about-collaborative-development-models) documentation.
+
+## Integrating a PR
+
+Integrating your contributions to Fierro is relatively straightforward; here is the checklist:
+- All test pass
+- The changes build with no new compiler warnings/errors
+- All feedback has been addressed
+- Consensus is reached. This usually means that at least two reviewers approved the changes (or added a `LGTM` comment) and at least one business day passed without anyone objecting. `LGTM` is an acronym for Looks Good to Me.
+- If you do NOT have push access, a Fierro core developer will integrate your PR. 
+
+## Benevolent dictators for life
+
+The [benevolent dictators](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life) can integrate changes to keep the platform healthy and help interpret or address conflict related to the contribution guidelines and the platform as a whole.
+
+These currently include:
+- [Nathaniel Morgan](https://github.com/nathanielmorgan)
diff --git a/docs/formatting/README.md b/docs/formatting/README.md
new file mode 100644
index 000000000..8c7e3a307
--- /dev/null
+++ b/docs/formatting/README.md
@@ -0,0 +1,7 @@
+# Formatting
+
+Fierro uses [Uncrustify](https://github.com/uncrustify/uncrustify/tree/master) to standardize the syle and clean up code. Currently, code formatting is not
+strictly enforced, but that is the current direction. The Uncrustify [GitHub](https://github.com/uncrustify/uncrustify/tree/master) page has extensive
+documentation for building and running Uncrustify. The supplied uncrustify.cfg file should be used when formatting with Uncrustify.
+
+A future goal is to automate this process during a push.
diff --git a/docs/formatting/uncrustify.cfg b/docs/formatting/uncrustify.cfg
new file mode 100644
index 000000000..043d49f3f
--- /dev/null
+++ b/docs/formatting/uncrustify.cfg
@@ -0,0 +1,447 @@
+# Control what to do with the UTF-8 BOM (recommend 'remove')
+utf8_bom                        = remove   # ignore/add/remove/force
+
+# The number of columns to indent per level.
+# Usually 2, 3, 4, or 8. Default=8
+indent_columns                  = 4        # number
+
+# The continuation indent. If non-zero, this overrides the indent of '(' and '=' continuation indents.
+# For FreeBSD, this is set to 4. Negative value is absolute and not increased for each ( level
+indent_continue                 = 0        # number
+
+# How to use tabs when indenting code
+# 0=spaces only
+# 1=indent with tabs to brace level, align with spaces (default)
+# 2=indent and align with tabs, using spaces when not on a tabstop
+indent_with_tabs                = 0        # number
+
+# Indent the code after an access specifier by one level.
+# If set, this option forces 'indent_access_spec=0'
+indent_access_spec_body         = true     # false/true
+
+# True:  indent continued function call parameters one indent level
+# False: align parameters under the open paren
+indent_func_call_param          = true     # false/true
+
+# Same as indent_func_call_param, but for function defs
+indent_func_def_param           = false    # false/true
+
+# Same as indent_func_call_param, but for function protos
+indent_func_proto_param         = false    # false/true
+
+# Same as indent_func_call_param, but for class declarations
+indent_func_class_param         = false    # false/true
+
+# Same as indent_func_call_param, but for class variable constructors
+indent_func_ctor_var_param      = false    # false/true
+
+# Same as indent_func_call_param, but for templates
+indent_template_param           = false    # false/true
+
+# Double the indent for indent_func_xxx_param options
+indent_func_param_double        = false    # false/true
+
+# True:  indent_func_call_param will be used (default)
+# False: indent_func_call_param will NOT be used
+use_indent_func_call_param      = false    # false/true
+
+# Add or remove newline at the end of the file
+nl_end_of_file                  = add      # ignore/add/remove/force
+
+# The maximum consecutive newlines (3 = 2 blank lines)
+nl_max                          = 2        # number
+
+# Whether to remove blank lines after '{'
+eat_blanks_after_open_brace     = true     # false/true
+
+# Whether to remove blank lines before '}'
+eat_blanks_before_close_brace   = true     # false/true
+
+# Add or remove braces on single-line 'for' statement
+mod_full_brace_for              = force      # ignore/add/remove/force
+
+# Add or remove braces on single-line 'if' statement. Will not remove the braces if they contain an 'else'.
+mod_full_brace_if               = force      # ignore/add/remove/force
+
+# Add or remove braces on single-line 'while' statement
+mod_full_brace_while            = force      # ignore/add/remove/force
+
+# Whether to remove superfluous semicolons
+mod_remove_extra_semicolon      = false    # false/true
+
+# Whether to sort consecutive single-line '#include' statements (C/C++) and
+# '#import' statements (Objective-C). Be aware that this has the potential to
+# break your code if your includes/imports have ordering dependencies.
+# mod_sort_include                = true    # true/false
+
+# Indeed, the mode above actually break the code
+# Need to work on the code
+
+
+# Add or remove newline between 'enum' and '{'
+nl_enum_brace                   = force      # ignore/add/remove/force
+
+# Newline between namespace and {.
+nl_namespace_brace              = force   # ignore/add/remove/force
+
+# Add or remove newline between 'class' and '{'.
+nl_class_brace                  = force   # ignore/add/remove/force
+
+# Add or remove newline between 'struct and '{'
+nl_struct_brace                 = force      # ignore/add/remove/force
+
+# Add or remove newline between 'union' and '{'
+nl_union_brace                  = force      # ignore/add/remove/force
+
+# Add or remove newline between 'if' and '{'
+nl_if_brace                     = force      # ignore/add/remove/force
+
+# Add or remove newline between '}' and 'else'
+nl_brace_else                   = force      # ignore/add/remove/force
+
+# Add or remove newline between 'else' and '{'
+nl_else_brace                   = force      # ignore/add/remove/force
+
+# Add or remove newline between 'else' and 'if'
+nl_else_if                      = remove   # ignore/add/remove/force
+
+# Add or remove newline between '}' and 'finally'
+nl_brace_finally                = force      # ignore/add/remove/force
+
+# Add or remove newline between 'finally' and '{'
+nl_finally_brace                = force      # ignore/add/remove/force
+
+# Add or remove newline between 'try' and '{'
+nl_try_brace                    = force      # ignore/add/remove/force
+
+# Add or remove newline between 'for' and '{'
+nl_for_brace                    = force      # ignore/add/remove/force
+
+# Add or remove newline between 'catch' and '{'
+nl_catch_brace                  = force      # ignore/add/remove/force
+
+# Add or remove newline between '}' and 'catch'
+nl_brace_catch                  = force      # ignore/add/remove/force
+
+# Add or remove newline between 'do' and '{'
+nl_do_brace                     = force      # ignore/add/remove/force
+
+# Add or remove newline between '}' and 'while' of 'do' statement
+nl_brace_while                  = force      # ignore/add/remove/force
+
+# Add or remove newline between 'switch' and '{'
+nl_switch_brace                 = force      # ignore/add/remove/force
+
+# Add or remove newline between function signature and '{'
+nl_fdef_brace                   = force      # ignore/add/remove/force
+
+# Don't split one-line braced statements inside a class xx { } body
+nl_class_leave_one_liners       = true     # false/true
+
+# Don't split one-line C++11 lambdas - '[]() { return 0; }'
+nl_cpp_lambda_leave_one_liners  = true     # false/true
+
+# The number of newlines after '}' of a multi-line function body.
+nl_after_func_body              = 2        # unsigned number
+
+# Add or remove newline between return type and function name in a function
+# definition.
+nl_func_type_name               = remove   # ignore/add/remove/force
+
+# Add or remove newline between return type and function name in a prototype.
+nl_func_proto_type_name         = remove   # ignore/add/remove/force
+
+# Add or remove newline between return type and function name inside a class
+# definition. If set to ignore, nl_func_type_name or nl_func_proto_type_name
+# is used instead.
+nl_func_type_name_class         = remove   # ignore/add/remove/force
+
+# Add or remove space around compare operator '<', '>', '==', etc
+sp_compare                      = force      # ignore/add/remove/force
+
+# Add or remove space around non-assignment symbolic operators ('+', '/', '%',
+# '<<', and so forth).
+sp_arith                        = force   # ignore/add/remove/force
+
+# Add or remove space around boolean operators '&&' and '||'.
+sp_bool                         = force   # ignore/add/remove/force
+
+# Add or remove space after ',', i.e. 'a,b' vs. 'a, b'.
+sp_after_comma                  = force   # ignore/add/remove/force
+
+# Add or remove space inside a non-empty '[' and ']'.
+sp_inside_square                = remove   # ignore/add/remove/force
+
+# Add or remove space before '(' of control statements ('if', 'for', 'switch',
+# 'while', etc.).
+sp_before_sparen                = force   # ignore/add/remove/force
+
+# Add or remove space inside '(' and ')' of control statements.
+sp_inside_sparen                = remove   # ignore/add/remove/force
+
+# Add or remove space after '(' of control statements.
+#
+# Overrides sp_inside_sparen.
+sp_inside_sparen_open           = remove   # ignore/add/remove/force
+
+# Add or remove space before ')' of control statements.
+#
+# Overrides sp_inside_sparen.
+sp_inside_sparen_close          = remove   # ignore/add/remove/force
+
+# Add or remove space after ')' of control statements.
+sp_after_sparen                 = remove   # ignore/add/remove/force
+
+# Add or remove space between ')' and '{' of of control statements.
+sp_sparen_brace                 = force   # ignore/add/remove/force
+
+# Add or remove space before a trailing comment.
+sp_before_tr_cmt                = add   # ignore/add/remove/force/not_defined
+
+# If true, cpp lambda body will be indentedDefault=False.
+indent_cpp_lambda_body          = true    # false/true
+
+# Add or remove space around assignment operator '=', '+=', etc
+sp_assign                       = add      # ignore/add/remove/force
+
+# Add or remove space between '>' and '>' in '>>' (template stuff C++/C# only). Default=Add
+sp_angle_shift                  = remove   # ignore/add/remove/force
+
+# Permit removal of the space between '>>' in 'foo<bar<int> >' (C++11 only). Default=False
+# sp_angle_shift cannot remove the space without this option.
+sp_permit_cpp11_shift           = true     # false/true
+
+# Add or remove space inside '{' and '}'.
+sp_inside_braces                = force   # ignore/add/remove/force
+
+# Add or remove space inside enum '{' and '}'.
+sp_inside_braces_enum           = force   # ignore/add/remove/force
+
+# Add or remove space inside struct/union '{' and '}'.
+sp_inside_braces_struct         = force   # ignore/add/remove/force
+
+# Add or remove space inside '(' and ')'.
+sp_inside_paren                 = remove   # ignore/add/remove/force
+
+# Add or remove space inside empty function '()'.
+sp_inside_fparens               = remove   # ignore/add/remove/force
+
+# Add or remove space inside function '(' and ')'.
+sp_inside_fparen                = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '(' on function definition.
+sp_func_def_paren               = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '()' on function definition
+# without parameters.
+sp_func_def_paren_empty         = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '(' on function calls.
+sp_func_call_paren              = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '()' on function calls without
+# parameters. If set to 'ignore' (the default), sp_func_call_paren is used.
+sp_func_call_paren_empty        = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '(' on function declaration.
+sp_func_proto_paren             = remove   # ignore/add/remove/force
+
+# Add or remove space between function name and '()' on function declaration
+# without parameters.
+sp_func_proto_paren_empty       = remove   # ignore/add/remove/force
+
+# Control the space after the opening of a C++ comment '// A' vs '//A'.
+sp_cmt_cpp_start                = add   # ignore/add/remove/force
+
+# Add or remove space between ')' and '{'.
+sp_paren_brace                  = force   # ignore/add/remove/force
+
+# Add or remove space between ')' and '{' of function.
+sp_fparen_brace                 = force   # ignore/add/remove/force
+
+# Add or remove space between ')' and a qualifier such as 'const'.
+# Note: this option is currently not avaialbe in uncrustify 0.64
+# sp_paren_qualifier              = force   # ignore/add/remove/force
+
+# Add or remove space between ')' and 'noexcept'.
+# Note: this option is currently not avaialbe in uncrustify 0.64
+#sp_paren_noexcept               = force   # ignore/add/remove/force
+
+# Add or remove space before pointer star '*'.
+sp_before_ptr_star              = remove   # ignore/add/remove/force
+
+# Add or remove space before pointer star '*' that isn't followed by a
+# variable name. If set to 'ignore', sp_before_ptr_star is used instead.
+sp_before_unnamed_ptr_star      = remove   # ignore/add/remove/force
+
+# Add or remove space after pointer star '*', if followed by a word.
+sp_after_ptr_star               = force   # ignore/add/remove/force
+
+# Add or remove space after a pointer star '*', if followed by a function
+# prototype or function definition.
+sp_after_ptr_star_func          = force   # ignore/add/remove/force
+
+# Add or remove space after a pointer star '*', if followed by an open
+# parenthesis, as in 'void* (*)().
+sp_ptr_star_paren               = force   # ignore/add/remove/force
+
+# Add or remove space before a reference sign '&'.
+sp_before_byref                 = remove   # ignore/add/remove/force
+
+# Add or remove space before a reference sign '&' that isn't followed by a
+# variable name. If set to 'ignore', sp_before_byref is used instead.
+sp_before_unnamed_byref         = remove   # ignore/add/remove/force
+
+# Add or remove space after reference sign '&', if followed by a word.
+sp_after_byref                  = force   # ignore/add/remove/force
+
+# Add or remove space after a reference sign '&', if followed by a function
+# prototype or function definition.
+sp_after_byref_func             = force   # ignore/add/remove/force
+
+# Add or remove space before a reference sign '&', if followed by a function
+# prototype or function definition.
+sp_before_byref_func            = remove   # ignore/add/remove/force
+
+# Add or remove space between 'template' and '<'.
+# If set to ignore, sp_before_angle is used.
+sp_template_angle               = remove   # ignore/add/remove/force
+
+# Add or remove space before '<'.
+sp_before_angle                 = remove   # ignore/add/remove/force
+
+# Add or remove space inside '<' and '>'.
+sp_inside_angle                 = remove   # ignore/add/remove/force
+
+# Add or remove space inside '<>'.
+# Note: this option is currently not avaialbe in uncrustify 0.64
+#sp_inside_angle_empty           = remove   # ignore/add/remove/force
+
+# Add or remove space between '>' and ':'.
+# Note: this option is currently not avaialbe in uncrustify 0.64
+#sp_angle_colon                  = remove   # ignore/add/remove/force
+
+# Add or remove space after '<>'.
+sp_after_angle                  = remove   # ignore/add/remove/force
+
+# Add or remove space between '>' and '(' as found in 'new List<byte>(foo);'.
+sp_angle_paren                  = remove   # ignore/add/remove/force
+
+# Add or remove space between '>' and '()' as found in 'new List<byte>();'.
+sp_angle_paren_empty            = remove   # ignore/add/remove/force
+
+# Add or remove space between '>' and a word as in 'List<byte> m;' or
+# 'template <typename T> static ...'.
+sp_angle_word                   = force   # ignore/add/remove/force
+
+# Add or remove space before '[' (except '[]').
+sp_before_square                = remove   # ignore/add/remove/force
+
+# Add or remove space before '[]'.
+sp_before_squares               = remove   # ignore/add/remove/force
+
+# Add or remove space inside a non-empty '[' and ']'.
+sp_inside_square                = remove   # ignore/add/remove/force
+
+# The span for aligning variable definitions.
+#
+# 0 = Don't align (default).
+align_var_def_span              = 1        # unsigned number
+
+# How to align the '&' in variable definitions.
+#  0=Part of the type
+#  1=Part of the variable
+#  2=Dangling
+align_var_def_amp_style         = 0        # number
+
+# The span for aligning comments that end lines.
+#
+# 0 = Don't align (default).
+align_right_cmt_span            = 0        # unsigned number
+
+# The span for aligning class (0=don't align)
+align_var_class_span            = 1        # number
+
+# The threshold for aligning class member definitions.
+# Use a negative number for absolute thresholds.
+#
+# 0 = No limit (default).
+align_var_class_thresh          = 6        # number
+
+# Whether to align variable definitions in prototypes and functions.
+align_func_params               = true    # true/false
+
+# The span for aligning on '=' in assignments.
+#
+# 0 = Don't align (default).
+align_assign_span               = 1        # unsigned number
+
+# The threshold for aligning on '=' in assignments.
+# Use a negative number for absolute thresholds.
+#
+# 0 = No limit (default).
+align_assign_thresh             = 6        # number
+
+# Whether to align lines that start with '<<' with previous '<<'.
+#
+# Default: true
+align_left_shift                = true     # true/false
+
+# Try to limit code width to N number of columns
+code_width                      = 200      # number
+
+#
+# Positioning options
+#
+
+# The position of arithmetic operators in wrapped expressions.
+pos_arith                       = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of assignment in wrapped expressions. Do not affect '='
+# followed by '{'.
+pos_assign                      = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of Boolean operators in wrapped expressions.
+pos_bool                        = lead     # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of comparison operators in wrapped expressions.
+pos_compare                     = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of conditional operators, as in the '?' and ':' of
+# 'expr ? stmt : stmt', in wrapped expressions.
+pos_conditional                 = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of the comma in wrapped expressions.
+pos_comma                       = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of the comma in enum entries.
+pos_enum_comma                  = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of the comma in the base class list if there is more than one
+# line. Affects nl_class_init_args.
+pos_class_comma                 = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of the comma in the constructor initialization list.
+# Related to nl_constr_colon, nl_constr_init_args and pos_constr_colon.
+pos_constr_comma                = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of trailing/leading class colon, between class and base class
+# list. Affects nl_class_colon.
+pos_class_colon                 = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# The position of colons between constructor and member initialization.
+# Related to nl_constr_colon, nl_constr_init_args and pos_constr_comma.
+pos_constr_colon                = ignore   # ignore/break/force/lead/trail/join/lead_break/lead_force/trail_break/trail_force
+
+# If false, disable all multi-line comment changes, including cmt_width. keyword substitution and leading chars.
+# Default=True.
+# Cound interfere with license header
+cmt_indent_multi                = false    # false/true
+
+# The filename that contains text to insert at the head of a file if the file doesn't start with a C/C++ comment.
+# Will substitute $(filename) with the current file's name.
+# cmt_insert_file_header          = "../HeaderForSources.txt"
+
+# If a namespace body exceeds the specified number of newlines and doesn't have a comment after
+# the close brace, a comment will be added.
+mod_add_long_namespace_closebrace_comment = 1        # number
diff --git a/scripts/build-fierro.sh b/scripts/build-fierro.sh
index eecc3aa85..6e76c4941 100755
--- a/scripts/build-fierro.sh
+++ b/scripts/build-fierro.sh
@@ -2,7 +2,7 @@
 show_help() {
     echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
     echo "Valid options:"
-    echo "  --solver=<all|explicit|explicit-evpfft|explicit-evp|implicit>. Default is 'explicit'"
+    echo "  --solver=<all|explicit|explicit-evpfft|explicit-ls-evpfft|explicit-evp|implicit>. Default is 'explicit'"
     echo "  --kokkos_build_type=<serial|openmp|pthreads|cuda|hip>. Default is 'serial'"
     echo "  --build_action=<full-app|set-env|install-trilinos|install-hdf5|install-heffte|fierro>. Default is 'full-app'"
     echo "  --machine=<darwin|chicoma|linux|mac>. Default is 'linux'"
@@ -26,6 +26,7 @@ show_help() {
     echo "          all                         builds both the explicit (non EVPFFT) and implicit solvers. Generally for debugging purposes"
     echo "          explicit                    builds the explicit solver"
     echo "          explicit-evpfft             builds the explicit solver with the EVPFFT material model"
+    echo "          explicit-ls-evpfft          builds the explicit solver with the LS-EVPFFT material model"
     echo "          explicit-evp                builds the explicit solver with the EVP material model"    
     echo "          implicit                    builds the explicit solver"
     echo " "
@@ -63,7 +64,7 @@ build_cores="1"
 
 # Define arrays of valid options
 valid_build_action=("full-app" "set-env" "install-trilinos" "install-hdf5" "install-heffte" "fierro")
-valid_solver=("all" "explicit" "explicit-evpfft" "explicit-evp" "implicit")
+valid_solver=("all" "explicit" "explicit-evpfft" "explicit-ls-evpfft" "explicit-evp" "implicit")
 valid_kokkos_build_types=("serial" "openmp" "pthreads" "cuda" "hip")
 valid_heffte_build_types=("fftw" "cufft" "rocfft")
 valid_machines=("darwin" "chicoma" "linux" "mac")
@@ -167,7 +168,7 @@ echo "Building based on these argument options:"
 echo "Build action - ${build_action}"
 echo "Solver - ${solver}"
 echo "Kokkos backend - ${kokkos_build_type}"
-if [ "${solver}" = "explicit-evpfft" ]; then
+if [ "${solver}" = "explicit-evpfft" ] || [ "${solver}" = "explicit-ls-evpfft" ]; then
     echo "HEFFTE - ${heffte_build_type}"
 fi
 echo "make -j ${build_cores}"
@@ -180,7 +181,7 @@ source setup-env.sh ${machine} ${kokkos_build_type} ${build_cores}
 # Next, do action based on args
 if [ "$build_action" = "full-app" ]; then
     source trilinos-install.sh ${kokkos_build_type}
-    if [ "$solver" = "explicit-evpfft" ]; then
+    if [ "$solver" = "explicit-evpfft" ] || [ "${solver}" = "explicit-ls-evpfft" ]; then
         source hdf5-install.sh
         source heffte-install.sh ${heffte_build_type} ${machine}
     fi
diff --git a/scripts/cmake_build.sh b/scripts/cmake_build.sh
index 6a2a33818..04d9c2a26 100644
--- a/scripts/cmake_build.sh
+++ b/scripts/cmake_build.sh
@@ -56,12 +56,23 @@ elif [ "$solver" = "implicit" ]; then
         -D BUILD_PARALLEL_EXPLICIT_SOLVER=OFF
         -D BUILD_IMPLICIT_SOLVER=ON
     )
-elif [ "$solver" = "explicit-evpfft" ]; then
+elif [ "$solver" = "explicit-evpfft" ] || [ "$solver" = "explicit-ls-evpfft" ]; then
+    if [ "$solver" = "explicit-evpfft" ]; then
+        cmake_options+=(
+            -D BUILD_EVPFFT_FIERRO=ON
+        )
+    elif [ "$solver" = "explicit-ls-evpfft" ]; then
+        cmake_options+=(
+            -D BUILD_LS_EVPFFT_FIERRO=ON
+        )
+    fi
+
+    # below options work for both explicit-evpfft and explicit-ls-evpfft
     cmake_options+=(
         -D BUILD_PARALLEL_EXPLICIT_SOLVER=ON
         -D BUILD_IMPLICIT_SOLVER=OFF
-        -D BUILD_EVPFFT_FIERRO=ON
         -D CMAKE_PREFIX_PATH="$HEFFTE_INSTALL_DIR;$HDF5_INSTALL_DIR;$ELEMENTS_INSTALL_DIR"
+        -D ABSOLUTE_NO_OUTPUT=ON
     )
     if [ "$heffte_build_type" = "cufft" ]; then
         cmake_options+=(
diff --git a/scripts/machines/chicoma-env.sh b/scripts/machines/chicoma-env.sh
new file mode 100644
index 000000000..f4f6a9a40
--- /dev/null
+++ b/scripts/machines/chicoma-env.sh
@@ -0,0 +1,26 @@
+#!/bin/bash -e
+### Load environment modules here
+### Assign names as relevant
+
+# DO NOT DO module purge ON CHICOMA
+# INSTEAD USE module swap
+
+# On chicoma here are what I have found that works for cpu build of either Fierro or EVPFFT
+# You would need to choose which are relevant for your build
+    # module swap PrgEnv-cray PrgEnv-gnu
+    # module load cray-fftw
+    # module load cray-mpich
+    # module load cray-hdf5-parallel
+    # module load cmake
+  # Note: use latest cmake according to code requirment. As of Feb. 13th, 2024, cmake/3.25.1
+
+# For GPU build of either Fierro or EVPFFT
+    # module swap PrgEnv-cray PrgEnv-gnu
+    # module load cray-fftw
+    # module load cray-mpich
+    # module load cray-hdf5-parallel
+    # module load cmake
+    # module load cuda
+  # Note: make sure the gcc and cuda are compatible. As of Feb. 13th, 2024, these work
+  # gcc/10.3.0, cuda/11.6
+
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/boundary.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/boundary.cpp
old mode 100755
new mode 100644
index 5b6d2c373..464198cc8
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/boundary.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/boundary.cpp
@@ -1,82 +1,81 @@
 // -------------------------------------------------------
 // This function applys the boundary condition
 // to points on a list of patches created at setup
-//--------------------------------------------------------
+// --------------------------------------------------------
 
 #include "mesh.h"
 #include "state.h"
 
-
-void boundary_velocity(const mesh_t &mesh,
-                       const CArrayKokkos <boundary_t> &boundary,
-                       DViewCArrayKokkos <double> &node_vel,
-                       const double time_value){
-
-    
+/**
+ * \brief Apply velocity boundary conditions
+ *
+ * This function loops over all boundary nodes and applies boundary conditions
+ * depending on the definition of boundary set.
+ *
+ * \param mesh The mesh that boundary conditions are applied to
+ * \param boundary The type of boundary condition being applied
+ * \param node_vel A view into the nodal velocity
+ * \param time_value Current simulation time, uesd to apply transient boundary conditions
+ */
+void boundary_velocity(const mesh_t&                   mesh,
+                       const CArrayKokkos<boundary_t>& boundary,
+                       DViewCArrayKokkos<double>&      node_vel,
+                       const double                    time_value)
+{
     // Loop over boundary sets
-    for (size_t bdy_set=0; bdy_set<mesh.num_bdy_sets; bdy_set++){
-        
+    for (size_t bdy_set = 0; bdy_set < mesh.num_bdy_sets; bdy_set++)
+    {
         // Loop over boundary nodes in a boundary set
         FOR_ALL(bdy_node_lid, 0, mesh.num_bdy_nodes_in_set.host(bdy_set), {
-                
             // reflected (boundary array is on the device)
-            if (boundary(bdy_set).hydro_bc == bdy::reflected){
-            
+            if (boundary(bdy_set).hydro_bc == bdy::reflected)
+            {
                 // directions with hydro_bc:
                 // x_plane  = 0,
                 // y_plane  = 1,
                 // z_plane  = 2,
                 size_t direction = boundary(bdy_set).surface;
-                
+
                 size_t bdy_node_gid = mesh.bdy_nodes_in_set(bdy_set, bdy_node_lid);
-                    
+
                 // Set velocity to zero in that directdion
                 node_vel(1, bdy_node_gid, direction) = 0.0;
-                        
             }
-            else if (boundary(bdy_set).hydro_bc == bdy::fixed){
-                
-                
+            else if (boundary(bdy_set).hydro_bc == bdy::fixed)
+            {
                 size_t bdy_node_gid = mesh.bdy_nodes_in_set(bdy_set, bdy_node_lid);
-                
-                for(size_t dim=0; dim<mesh.num_dims; dim++){
+
+                for (size_t dim = 0; dim < mesh.num_dims; dim++)
+                {
                     // Set velocity to zero
                     node_vel(1, bdy_node_gid, dim) = 0.0;
                 }
-                
-            }// end if
-            else if (boundary(bdy_set).hydro_bc == bdy::velocity){
-                
-                
+            } // end if
+            else if (boundary(bdy_set).hydro_bc == bdy::velocity)
+            {
                 size_t bdy_node_gid = mesh.bdy_nodes_in_set(bdy_set, bdy_node_lid);
-                
-                
+
                 // directions with hydro_bc:
                 // x_plane  = 0,
                 // y_plane  = 1,
                 // z_plane  = 2,
                 size_t direction = boundary(bdy_set).surface;
-                
+
                 // Set velocity to that directdion to specified value
                 // if t_end > time > t_start
                 // v(t) = v0 exp(-v1*(time - time_start) )
-                if (time_value >= boundary(bdy_set).hydro_bc_vel_t_start &&
-                    time_value <= boundary(bdy_set).hydro_bc_vel_t_end){
-                    
+                if (time_value >= boundary(bdy_set).hydro_bc_vel_t_start
+                    && time_value <= boundary(bdy_set).hydro_bc_vel_t_end)
+                {
                     double time_delta = time_value - boundary(bdy_set).hydro_bc_vel_t_start;
-                    
+
                     node_vel(1, bdy_node_gid, direction) =
                         boundary(bdy_set).hydro_bc_vel_0 *
-                        exp(-boundary(bdy_set).hydro_bc_vel_1 * time_delta );
-                    
+                        exp(-boundary(bdy_set).hydro_bc_vel_1 * time_delta);
                 } // end if on time
-                
-            }// end if
-            
-                
+            } // end if
         }); // end for bdy_node_lid
-	    
     } // end for bdy_set
-    
+
     return;
 } // end boundary_velocity function
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/driver.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/driver.cpp
index 4e6a1faf6..a290c3a52 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/driver.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/driver.cpp
@@ -1,25 +1,19 @@
 // -----------------------------------------------------------------------------
 // This is the main function
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include <stdio.h>
 #include <iostream>
 #include <cmath>
 #include <sys/stat.h>
 #include <ctime>
 
-
 #include "mesh.h"
 #include "state.h"
 #include "matar.h"
 
-
-
-
-//==============================================================================
+// ==============================================================================
 //   Variables, setting default inputs
-//==============================================================================
-
-
+// ==============================================================================
 
 // --- num vars ----
 size_t num_dims = 3;
@@ -30,48 +24,42 @@ size_t num_state_vars;
 size_t num_fills;
 size_t num_bcs;
 
-
 // --- Graphics output variables ---
-size_t graphics_id = 0;
+size_t graphics_id       = 0;
 size_t graphics_cyc_ival = 50;
 
-CArray <double> graphics_times(2000);
-double graphics_dt_ival = 1.0e8;
-double graphics_time = graphics_dt_ival;  // the times for writing graphics dump
-
+CArray<double> graphics_times(2000);
+double         graphics_dt_ival = 1.0e8;
+double         graphics_time    = graphics_dt_ival; // the times for writing graphics dump
 
 // --- Time and cycling variables ---
 double time_value = 0.0;
 double time_final = 1.e16;
-double dt = 1.e-8;
-double dt_max = 1.0e-2;
-double dt_min = 1.0e-8;
-double dt_cfl = 0.4;
+double dt       = 1.e-8;
+double dt_max   = 1.0e-2;
+double dt_min   = 1.0e-8;
+double dt_cfl   = 0.4;
 double dt_start = 1.0e-8;
 
 size_t rk_num_stages = 2;
-size_t rk_num_bins = 2;
+size_t rk_num_bins   = 2;
 
-size_t cycle = 0;
+size_t cycle      = 0;
 size_t cycle_stop = 1000000000;
 
-
 // --- Precision variables ---
-double fuzz = 1.0e-16;  // machine precision
-double tiny = 1.0e-12;  // very very small (between real_t and single)
-double small= 1.0e-8;   // single precision
-
+double fuzz  = 1.0e-16; // machine precision
+double tiny  = 1.0e-12; // very very small (between real_t and single)
+double small = 1.0e-8;   // single precision
 
-
-
-//==============================================================================
+// ==============================================================================
 //    main
-//==============================================================================
-int main(int argc, char *argv[]){
-    
-    
+// ==============================================================================
+int main(int argc, char* argv[])
+{
     // check to see of a mesh was supplied when running the code
-    if (argc == 1) {
+    if (argc == 1)
+    {
         std::cout << "\n\n**********************************\n\n";
         std::cout << " ERROR:\n";
         std::cout << " Please supply a mesh \n";
@@ -80,35 +68,30 @@ int main(int argc, char *argv[]){
         std::exit(EXIT_FAILURE);
     } // end if
 
-
     printf("Starting Lagrangian SGH code\n");
 
-
     // The kokkos scope
     Kokkos::initialize();
     {
-     
         // ---------------------------------------------------------------------
         //    state data type declarations
         // ---------------------------------------------------------------------
-        node_t  node;
-        elem_t  elem;
-        corner_t  corner;
-        CArrayKokkos <material_t> material;
-        CArrayKokkos <double> state_vars; // array to hold init model variables
-        
-        
+        node_t                   node;
+        elem_t                   elem;
+        corner_t                 corner;
+        CArrayKokkos<material_t> material;
+        CArrayKokkos<double>     state_vars; // array to hold init model variables
+
         // ---------------------------------------------------------------------
         //    mesh data type declarations
         // ---------------------------------------------------------------------
-        mesh_t mesh;
-        CArrayKokkos <mat_fill_t> mat_fill;
-        CArrayKokkos <boundary_t> boundary;
-        
+        mesh_t                   mesh;
+        CArrayKokkos<mat_fill_t> mat_fill;
+        CArrayKokkos<boundary_t> boundary;
 
         // ---------------------------------------------------------------------
         //    read the input file
-        // ---------------------------------------------------------------------  
+        // ---------------------------------------------------------------------
         input(material,
               mat_fill,
               boundary,
@@ -127,117 +110,106 @@ int main(int argc, char *argv[]){
               graphics_cyc_ival,
               cycle_stop,
               rk_num_stages
-              );
-        
-
-
+            );
 
         // ---------------------------------------------------------------------
         //    read in supplied mesh
-        // --------------------------------------------------------------------- 
+        // ---------------------------------------------------------------------
         read_mesh_ensight(argv[1], mesh, node, elem, corner, num_dims, rk_num_bins);
         mesh.build_corner_connectivity();
         mesh.build_elem_elem_connectivity();
         mesh.build_patch_connectivity();
         mesh.build_node_node_connectivity();
-        
-        
+
         // ---------------------------------------------------------------------
         //    allocate memory
         // ---------------------------------------------------------------------
 
         // shorthand names
-        const size_t num_nodes = mesh.num_nodes;
-        const size_t num_elems = mesh.num_elems;
+        const size_t num_nodes   = mesh.num_nodes;
+        const size_t num_elems   = mesh.num_elems;
         const size_t num_corners = mesh.num_corners;
 
-        
         // allocate elem_statev
-        elem.statev = CArray <double> (num_elems, num_state_vars);
+        elem.statev = CArray<double>(num_elems, num_state_vars);
 
         // --- make dual views of data on CPU and GPU ---
         //  Notes:
-        //     Instead of using a struct of dual types like the mesh type, 
-        //     individual dual views will be made for all the state 
-        //     variables.  The motivation is to reduce memory movement 
-        //     when passing state into a function.  Passing a struct by 
-        //     reference will copy the meta data and pointers for the 
-        //     variables held inside the struct.  Since all the mesh 
-        //     variables are typically used by most functions, a single 
-        //     mesh struct or passing the arrays will be roughly equivalent 
+        //     Instead of using a struct of dual types like the mesh type,
+        //     individual dual views will be made for all the state
+        //     variables.  The motivation is to reduce memory movement
+        //     when passing state into a function.  Passing a struct by
+        //     reference will copy the meta data and pointers for the
+        //     variables held inside the struct.  Since all the mesh
+        //     variables are typically used by most functions, a single
+        //     mesh struct or passing the arrays will be roughly equivalent
         //     for memory movement.
 
-        
         // create Dual Views of the individual node struct variables
-        DViewCArrayKokkos <double> node_coords(&node.coords(0,0,0),
-                                               rk_num_bins,
-                                               num_nodes,
-                                               num_dims);
+        DViewCArrayKokkos<double> node_coords(&node.coords(0, 0, 0),
+                                              rk_num_bins,
+                                              num_nodes,
+                                              num_dims);
 
-        DViewCArrayKokkos <double> node_vel(&node.vel(0,0,0),
-                                            rk_num_bins,
-                                            num_nodes,
-                                            num_dims);
+        DViewCArrayKokkos<double> node_vel(&node.vel(0, 0, 0),
+                                           rk_num_bins,
+                                           num_nodes,
+                                           num_dims);
+
+        DViewCArrayKokkos<double> node_mass(&node.mass(0),
+                                            num_nodes);
 
-        DViewCArrayKokkos <double> node_mass(&node.mass(0),
-                                             num_nodes);
-        
-        
         // create Dual Views of the individual elem struct variables
-        DViewCArrayKokkos <double> elem_den(&elem.den(0),
+        DViewCArrayKokkos<double> elem_den(&elem.den(0),
+                                           num_elems);
+
+        DViewCArrayKokkos<double> elem_pres(&elem.pres(0),
                                             num_elems);
 
-        DViewCArrayKokkos <double> elem_pres(&elem.pres(0),
-                                             num_elems);
+        DViewCArrayKokkos<double> elem_stress(&elem.stress(0, 0, 0, 0),
+                                              rk_num_bins,
+                                              num_elems,
+                                              3,
+                                              3);  // always 3D even in 2D-RZ
 
-        DViewCArrayKokkos <double> elem_stress(&elem.stress(0,0,0,0),
-                                               rk_num_bins,
-                                               num_elems,
-                                               3,
-                                               3); // always 3D even in 2D-RZ
+        DViewCArrayKokkos<double> elem_sspd(&elem.sspd(0),
+                                            num_elems);
 
-        DViewCArrayKokkos <double> elem_sspd(&elem.sspd(0),
-                                             num_elems);
+        DViewCArrayKokkos<double> elem_sie(&elem.sie(0, 0),
+                                           rk_num_bins,
+                                           num_elems);
 
-        DViewCArrayKokkos <double> elem_sie(&elem.sie(0,0),
-                                            rk_num_bins,
-                                            num_elems);
+        DViewCArrayKokkos<double> elem_vol(&elem.vol(0),
+                                           num_elems);
 
-        DViewCArrayKokkos <double> elem_vol(&elem.vol(0),
-                                            num_elems);
-        
-        DViewCArrayKokkos <double> elem_div(&elem.div(0),
+        DViewCArrayKokkos<double> elem_div(&elem.div(0),
+                                           num_elems);
+
+        DViewCArrayKokkos<double> elem_mass(&elem.mass(0),
                                             num_elems);
-        
 
-        DViewCArrayKokkos <double> elem_mass(&elem.mass(0),
-                                             num_elems);
+        DViewCArrayKokkos<size_t> elem_mat_id(&elem.mat_id(0),
+                                              num_elems);
 
-        DViewCArrayKokkos <size_t> elem_mat_id(&elem.mat_id(0),
-                                               num_elems);
+        DViewCArrayKokkos<double> elem_statev(&elem.statev(0, 0),
+                                              num_elems,
+                                              num_state_vars);
 
-        DViewCArrayKokkos <double> elem_statev(&elem.statev(0,0),
-                                               num_elems,
-                                               num_state_vars );
-        
         // create Dual Views of the corner struct variables
-        DViewCArrayKokkos <double> corner_force(&corner.force(0,0),
-                                                num_corners, 
-                                                num_dims);
-
-        DViewCArrayKokkos <double> corner_mass(&corner.mass(0),
-                                               num_corners);
-        
-        
+        DViewCArrayKokkos<double> corner_force(&corner.force(0, 0),
+                                               num_corners,
+                                               num_dims);
+
+        DViewCArrayKokkos<double> corner_mass(&corner.mass(0),
+                                              num_corners);
+
         // ---------------------------------------------------------------------
         //   calculate geometry
         // ---------------------------------------------------------------------
         node_coords.update_device();
         Kokkos::fence();
-        
-        
-        get_vol(elem_vol, node_coords, mesh);
 
+        get_vol(elem_vol, node_coords, mesh);
 
         // ---------------------------------------------------------------------
         //   setup the IC's and BC's
@@ -265,14 +237,13 @@ int main(int argc, char *argv[]){
               num_bcs,
               num_materials,
               num_state_vars);
-        
+
         // intialize time, time_step, and cycles
         time_value = 0.0;
         dt = dt_start;
-        graphics_id = 0;
+        graphics_id       = 0;
         graphics_times(0) = 0.0;
-        graphics_time = graphics_dt_ival;  // the times for writing graphics dump
-        
+        graphics_time     = graphics_dt_ival; // the times for writing graphics dump
 
         // ---------------------------------------------------------------------
         //   Calculate the SGH solution
@@ -312,32 +283,25 @@ int main(int argc, char *argv[]){
                   graphics_times,
                   graphics_id);
 
-
         // calculate total energy at time=t_end
-        
-        
     } // end of kokkos scope
     Kokkos::finalize();
-    
-
 
     printf("Finished\n");
 
-
     return 0;
-}// end of main function
-
+} // end of main function
 
 // for easy copy and paste to functions
-//DViewCArrayKokkos <double> node_coords,
-//DViewCArrayKokkos <double> node_vel,
-//DViewCArrayKokkos <double> node_mass,
-//DViewCArrayKokkos <double> elem_den,
-//DViewCArrayKokkos <double> elem_pres,
-//DViewCArrayKokkos <double> elem_stress,
-//DViewCArrayKokkos <double> elem_sspd, 
-//DViewCArrayKokkos <double> elem_sie,
-//DViewCArrayKokkos <double> elem_vol,
-//DViewCArrayKokkos <double> elem_mass,
-//DViewCArrayKokkos <size_t> elem_mat_id,
-//DViewCArrayKokkos <double> elem_statev
+// DViewCArrayKokkos <double> node_coords,
+// DViewCArrayKokkos <double> node_vel,
+// DViewCArrayKokkos <double> node_mass,
+// DViewCArrayKokkos <double> elem_den,
+// DViewCArrayKokkos <double> elem_pres,
+// DViewCArrayKokkos <double> elem_stress,
+// DViewCArrayKokkos <double> elem_sspd,
+// DViewCArrayKokkos <double> elem_sie,
+// DViewCArrayKokkos <double> elem_vol,
+// DViewCArrayKokkos <double> elem_mass,
+// DViewCArrayKokkos <size_t> elem_mat_id,
+// DViewCArrayKokkos <double> elem_statev
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/energy_sgh.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/energy_sgh.cpp
old mode 100755
new mode 100644
index f742e8301..1f4c0b2f7
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/energy_sgh.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/energy_sgh.cpp
@@ -1,55 +1,65 @@
-
 #include "mesh.h"
 #include "state.h"
 
-
-void update_energy_sgh(double rk_alpha,
-                       double dt,
-                       const mesh_t &mesh,
-                       const DViewCArrayKokkos <double> &node_vel,
-                       const DViewCArrayKokkos <double> &node_coords,
-                       DViewCArrayKokkos <double> &elem_sie,
-                       const DViewCArrayKokkos <double> &elem_mass,
-                       const DViewCArrayKokkos <double> &corner_force){
-
+/**
+ * \brief Update specific internal energy for the SGH solver
+ *
+ * This function updates the specific internal energy of each element by 
+ * integrating the power (Force dotted into velocity) over the element.
+ *
+ * \param rk_alpha Time integration partition
+ * \param dt time increment
+ * \param mesh simulation mesh
+ * \param node_vel view into the nodal velocity memory
+ * \param node_coords view into the nodal position memory
+ * \param elem_sie view into the specific internal energy of each element
+ * \param elem_mass view into the element mass
+ * \param corner_force view into the corner forces
+ */
+void update_energy_sgh(double                           rk_alpha,
+                       double                           dt,
+                       const mesh_t&                    mesh,
+                       const DViewCArrayKokkos<double>& node_vel,
+                       const DViewCArrayKokkos<double>& node_coords,
+                       DViewCArrayKokkos<double>&       elem_sie,
+                       const DViewCArrayKokkos<double>& elem_mass,
+                       const DViewCArrayKokkos<double>& corner_force)
+{
     // loop over all the elements in the mesh
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         double elem_power = 0.0;
 
         // --- tally the contribution from each corner to the element ---
 
         // Loop over the nodes in the element
-        for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
-            
+        for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+        {
             size_t corner_lid = node_lid;
-            
+
             // Get node global id for the local node id
             size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
-            
+
             // Get the corner global id for the local corner id
             size_t corner_gid = mesh.corners_in_elem(elem_gid, corner_lid);
-            
+
             double node_radius = 1;
-            if(mesh.num_dims==2){
-                node_radius = node_coords(1,node_gid,1);
+            if (mesh.num_dims == 2)
+            {
+                node_radius = node_coords(1, node_gid, 1);
             }
 
             // calculate the Power=F dot V for this corner
-            for (size_t dim=0; dim<mesh.num_dims; dim++){
-                
-                double half_vel = (node_vel(1, node_gid, dim) + node_vel(0, node_gid, dim))*0.5;
-                elem_power += corner_force(corner_gid, dim)*node_radius*half_vel;
-                
+            for (size_t dim = 0; dim < mesh.num_dims; dim++)
+            {
+                double half_vel = (node_vel(1, node_gid, dim) + node_vel(0, node_gid, dim)) * 0.5;
+                elem_power     += corner_force(corner_gid, dim) * node_radius * half_vel;
             } // end for dim
-            
         } // end for node_lid
 
         // update the specific energy
         elem_sie(1, elem_gid) = elem_sie(0, elem_gid) -
-                                rk_alpha*dt/elem_mass(elem_gid) * elem_power;
-
+                                rk_alpha * dt / elem_mass(elem_gid) * elem_power;
     }); // end parallel loop over the elements
-    
+
     return;
 } // end subroutine
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/eos.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/eos.cpp
index 752f0f872..64316b649 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/eos.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/eos.cpp
@@ -1,47 +1,42 @@
 // -----------------------------------------------------------------------------
 // This code contains the equation of state information (constitutive relations)
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "state.h"
 
-
-
-
 // -----------------------------------------------------------------------------
 // This is the gamma-law gas eos
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void ideal_gas(const DViewCArrayKokkos <double> &elem_pres,
-               const DViewCArrayKokkos <double> &elem_stress,
-               const size_t elem_gid,
-               const size_t mat_id,
-               const DViewCArrayKokkos <double> &elem_state_vars,
-               const DViewCArrayKokkos <double> &elem_sspd,
-               const double den,
-               const double sie){
-    
-
+void ideal_gas(const DViewCArrayKokkos<double>& elem_pres,
+               const DViewCArrayKokkos<double>& elem_stress,
+               const size_t                     elem_gid,
+               const size_t                     mat_id,
+               const DViewCArrayKokkos<double>& elem_state_vars,
+               const DViewCArrayKokkos<double>& elem_sspd,
+               const double                     den,
+               const double                     sie)
+{
     // statev(0) = gamma
     // statev(1) = minimum sound speed
     // statev(2) = specific heat c_v
     // statev(3) = ref temperature
     // statev(4) = ref density
     // statev(5) = ref specific internal energy
-    
-    double gamma = elem_state_vars(elem_gid,0);
-    double csmin = elem_state_vars(elem_gid,1);
-    
-    
+
+    double gamma = elem_state_vars(elem_gid, 0);
+    double csmin = elem_state_vars(elem_gid, 1);
+
     // pressure
-    elem_pres(elem_gid) = (gamma - 1.0)*sie*den;
-    
+    elem_pres(elem_gid) = (gamma - 1.0) * sie * den;
+
     // sound speed
-    elem_sspd(elem_gid) = sqrt(gamma*(gamma - 1.0)*sie);
-    
+    elem_sspd(elem_gid) = sqrt(gamma * (gamma - 1.0) * sie);
+
     // ensure soundspeed is great than min specified
-    if (elem_sspd(elem_gid) < csmin){
+    if (elem_sspd(elem_gid) < csmin)
+    {
         elem_sspd(elem_gid) = csmin;
     } // end if
 
     return;
 } // end of ideal_gas
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/force_sgh.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/force_sgh.cpp
index 2827182f5..20bff8d3a 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/force_sgh.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/force_sgh.cpp
@@ -2,94 +2,82 @@
 #include "mesh.h"
 #include "state.h"
 
-
-
-
-
 // -----------------------------------------------------------------------------
 // This function calculates the corner forces and the evolves stress (hypo)
-//------------------------------------------------------------------------------
-void get_force_sgh(const CArrayKokkos <material_t> &material,
-                   const mesh_t &mesh,
-                   const DViewCArrayKokkos <double> &node_coords,
-                   const DViewCArrayKokkos <double> &node_vel,
-                   const DViewCArrayKokkos <double> &elem_den,
-                   const DViewCArrayKokkos <double> &elem_sie,
-                   const DViewCArrayKokkos <double> &elem_pres,
-                   const DViewCArrayKokkos <double> &elem_stress,
-                   const DViewCArrayKokkos <double> &elem_sspd,
-                   const DViewCArrayKokkos <double> &elem_vol,
-                   const DViewCArrayKokkos <double> &elem_div,
-                   const DViewCArrayKokkos <size_t> &elem_mat_id,
-                   DViewCArrayKokkos <double> &corner_force,
-                   const double fuzz,
-                   const double small,
-                   const DViewCArrayKokkos <double> &elem_statev,
-                   const double dt,
-                   const double rk_alpha
-                   ){
-    
-
-    
+// ------------------------------------------------------------------------------
+void get_force_sgh(const CArrayKokkos<material_t>&  material,
+                   const mesh_t&                    mesh,
+                   const DViewCArrayKokkos<double>& node_coords,
+                   const DViewCArrayKokkos<double>& node_vel,
+                   const DViewCArrayKokkos<double>& elem_den,
+                   const DViewCArrayKokkos<double>& elem_sie,
+                   const DViewCArrayKokkos<double>& elem_pres,
+                   const DViewCArrayKokkos<double>& elem_stress,
+                   const DViewCArrayKokkos<double>& elem_sspd,
+                   const DViewCArrayKokkos<double>& elem_vol,
+                   const DViewCArrayKokkos<double>& elem_div,
+                   const DViewCArrayKokkos<size_t>& elem_mat_id,
+                   DViewCArrayKokkos<double>&       corner_force,
+                   const double                     fuzz,
+                   const double                     small,
+                   const DViewCArrayKokkos<double>& elem_statev,
+                   const double                     dt,
+                   const double                     rk_alpha
+                   )
+{
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-        
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_dims = 3;
         const size_t num_nodes_in_elem = 8;
-        
+
         // total Cauchy stress
         double tau_array[9];
-        
+
         // corner area normals
         double area_normal_array[24];
-        
+
         // estimate of shock direction
         double shock_dir_array[3];
-        
+
         // the sums in the Riemann solver
         double sum_array[4];
-        
+
         // corner shock impeadance x |corner area normal dot shock_dir|
         double muc_array[8];
-        
+
         // Riemann velocity
         double vel_star_array[3];
-        
+
         // velocity gradient
         double vel_grad_array[9];
-        
+
         // --- Create views of arrays to aid the force calculation ---
-    
-        ViewCArrayKokkos <double> tau(tau_array, num_dims, num_dims);
-        ViewCArrayKokkos <double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
-        ViewCArrayKokkos <double> shock_dir(shock_dir_array, num_dims);
-        ViewCArrayKokkos <double> sum(sum_array, 4);
-        ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
-        ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
-        ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
-
-        
+
+        ViewCArrayKokkos<double> tau(tau_array, num_dims, num_dims);
+        ViewCArrayKokkos<double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
+        ViewCArrayKokkos<double> shock_dir(shock_dir_array, num_dims);
+        ViewCArrayKokkos<double> sum(sum_array, 4);
+        ViewCArrayKokkos<double> muc(muc_array, num_nodes_in_elem);
+        ViewCArrayKokkos<double> vel_star(vel_star_array, num_dims);
+        ViewCArrayKokkos<double> vel_grad(vel_grad_array, num_dims, num_dims);
+
         // --- abviatations of variables ---
-        
+
         // element volume
         double vol = elem_vol(elem_gid);
-        
+
         // create a view of the stress_matrix
-        ViewCArrayKokkos <double> stress(&elem_stress(1, elem_gid, 0,0), 3, 3);
-        
-        
+        ViewCArrayKokkos<double> stress(&elem_stress(1, elem_gid, 0, 0), 3, 3);
+
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
-        
-        
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
+
         // get the B matrix which are the OUTWARD corner area normals
         get_bmatrix(area_normal,
                     elem_gid,
                     node_coords,
                     elem_node_gids);
-    
-        
+
         // --- Calculate the velocity gradient ---
         get_velgrad(vel_grad,
                     elem_node_gids,
@@ -97,76 +85,74 @@ void get_force_sgh(const CArrayKokkos <material_t> &material,
                     area_normal,
                     vol,
                     elem_gid);
-        
-        
+
         // the -1 is for the inward surface area normal,
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-            for (size_t dim = 0; dim < num_dims; dim++){
-                area_normal(node_lid, dim) = (-1.0)*area_normal(node_lid,dim);
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
+            for (size_t dim = 0; dim < num_dims; dim++)
+            {
+                area_normal(node_lid, dim) = (-1.0) * area_normal(node_lid, dim);
             } // end for
         } // end for
-        
-    
-        
+
         double div = elem_div(elem_gid);
-        
-    
+
         // vel = [u,v,w]
         //            [du/dx,  du/dy,  du/dz]
         // vel_grad = [dv/dx,  dv/dy,  dv/dz]
         //            [dw/dx,  dw/dy,  dw/dz]
         double curl[3];
-        curl[0] = vel_grad(2,1) - vel_grad(1,2);  // dw/dy - dv/dz
-        curl[1] = vel_grad(0,2) - vel_grad(2,0);  // du/dz - dw/dx
-        curl[2] = vel_grad(1,0) - vel_grad(0,1);  // dv/dx - du/dy
-        
-        double mag_curl = sqrt(curl[0]*curl[0] + curl[1]*curl[1] + curl[2]*curl[2]);
-        
-        
+        curl[0] = vel_grad(2, 1) - vel_grad(1, 2);  // dw/dy - dv/dz
+        curl[1] = vel_grad(0, 2) - vel_grad(2, 0);  // du/dz - dw/dx
+        curl[2] = vel_grad(1, 0) - vel_grad(0, 1);  // dv/dx - du/dy
+
+        double mag_curl = sqrt(curl[0] * curl[0] + curl[1] * curl[1] + curl[2] * curl[2]);
+
         // --- Calculate the Cauchy stress ---
-        for (size_t i = 0; i < 3; i++){
-            for (size_t j = 0; j < 3; j++){
-                tau(i, j) = stress(i,j);
+        for (size_t i = 0; i < 3; i++)
+        {
+            for (size_t j = 0; j < 3; j++)
+            {
+                tau(i, j) = stress(i, j);
                 // artificial viscosity can be added here to tau
             } // end for
-        } //end for
+        } // end for
 
         // add the pressure
-        for (int i = 0; i < num_dims; i++){
+        for (int i = 0; i < num_dims; i++)
+        {
             tau(i, i) -= elem_pres(elem_gid);
         } // end for
-        
-        
-
 
         // ---- Multidirectional Approximate Riemann solver (MARS) ----
         // find the average velocity of the elem, it is an
         // estimate of the Riemann velocity
-            
+
         // initialize to Riemann velocity to zero
-        for (size_t dim = 0; dim < num_dims; dim++){
+        for (size_t dim = 0; dim < num_dims; dim++)
+        {
             vel_star(dim) = 0.0;
         }
 
         // loop over nodes and calculate an average velocity, which is
         // an estimate of Riemann velocity
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get node gloabl index and create view of nodal velocity
             int node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
-            
-            ViewCArrayKokkos <double> vel(&node_vel(1, node_gid, 0), num_dims);
-            
-            vel_star(0) += 0.125*vel(0);
-            vel_star(1) += 0.125*vel(1);
-            vel_star(2) += 0.125*vel(2);
-                
+
+            ViewCArrayKokkos<double> vel(&node_vel(1, node_gid, 0), num_dims);
+
+            vel_star(0) += 0.125 * vel(0);
+            vel_star(1) += 0.125 * vel(1);
+            vel_star(2) += 0.125 * vel(2);
         } // end for loop over nodes
 
         // find shock direction and shock impedance associated with each node
-        
+
         // initialize sum term in MARS to zero
-        for (int i = 0; i < 4; i++){
+        for (int i = 0; i < 4; i++)
+        {
             sum(i) = 0.0;
         }
 
@@ -174,103 +160,102 @@ void get_force_sgh(const CArrayKokkos <material_t> &material,
         double mag_vel;   // magnitude of velocity
 
         // loop over the nodes of the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++) {
-
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get global node id
             size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
 
             // Create view of nodal velocity
-            ViewCArrayKokkos <double> vel(&node_vel(1, node_gid, 0), num_dims);
+            ViewCArrayKokkos<double> vel(&node_vel(1, node_gid, 0), num_dims);
 
             // Get an estimate of the shock direction.
-            mag_vel = sqrt( (vel(0) - vel_star(0) )*(vel(0) - vel_star(0) )
-                          + (vel(1) - vel_star(1) )*(vel(1) - vel_star(1) )
-                          + (vel(2) - vel_star(2) )*(vel(2) - vel_star(2) ) );
+            mag_vel = sqrt( (vel(0) - vel_star(0) ) * (vel(0) - vel_star(0) )
+                + (vel(1) - vel_star(1) ) * (vel(1) - vel_star(1) )
+                + (vel(2) - vel_star(2) ) * (vel(2) - vel_star(2) ) );
 
-     
-            if (mag_vel > small) {
-        
+            if (mag_vel > small)
+            {
                 // estimate of the shock direction, a unit normal
-                for (int dim = 0; dim < num_dims; dim++){
+                for (int dim = 0; dim < num_dims; dim++)
+                {
                     shock_dir(dim) = (vel(dim) - vel_star(dim)) / mag_vel;
                 }
             }
-
-            else {
-                
+            else
+            {
                 // if there is no velocity change, then use the surface area
                 // normal as the shock direction
-                mag = sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
-                          + area_normal(node_lid, 1)*area_normal(node_lid, 1)
-                          + area_normal(node_lid, 2)*area_normal(node_lid, 2) );
-                
+                mag = sqrt(area_normal(node_lid, 0) * area_normal(node_lid, 0)
+                    + area_normal(node_lid, 1) * area_normal(node_lid, 1)
+                    + area_normal(node_lid, 2) * area_normal(node_lid, 2) );
+
                 // estimate of the shock direction
-                for (int dim = 0; dim < num_dims; dim++){
-                    shock_dir(dim) = area_normal(node_lid, dim)/mag;
+                for (int dim = 0; dim < num_dims; dim++)
+                {
+                    shock_dir(dim) = area_normal(node_lid, dim) / mag;
                 }
-                
             } // end if mag_vel
-            
 
             // cell divergence indicates compression or expansions
             size_t mat_id = elem_mat_id(elem_gid);
-            if (div < 0){ // element in compression
+            if (div < 0)  // element in compression
+            {
                 muc(node_lid) = elem_den(elem_gid) *
-                               (material(mat_id).q1*elem_sspd(elem_gid) + material(mat_id).q2*mag_vel);
+                                (material(mat_id).q1 * elem_sspd(elem_gid) + material(mat_id).q2 * mag_vel);
             }
-            else { // element in expansion
+            else   // element in expansion
+            {
                 muc(node_lid) = elem_den(elem_gid) *
-                               (material(mat_id).q1ex*elem_sspd(elem_gid) + material(mat_id).q2ex*mag_vel);
+                                (material(mat_id).q1ex * elem_sspd(elem_gid) + material(mat_id).q2ex * mag_vel);
             } // end if on divergence sign
-           
 
             size_t use_shock_dir = 0;
             double mu_term;
-            
+
             // Coding to use shock direction
-            if (use_shock_dir == 1){
+            if (use_shock_dir == 1)
+            {
                 // this is denominator of the Riamann solver and the multiplier
                 // on velocity in the numerator.  It filters on the shock
                 // direction
-                mu_term = muc(node_lid)*
-                           fabs( shock_dir(0)*area_normal(node_lid,0)
-                               + shock_dir(1)*area_normal(node_lid,1)
-                               + shock_dir(2)*area_normal(node_lid,2) );
+                mu_term = muc(node_lid) *
+                          fabs(shock_dir(0) * area_normal(node_lid, 0)
+                    + shock_dir(1) * area_normal(node_lid, 1)
+                    + shock_dir(2) * area_normal(node_lid, 2) );
             }
-            else {
-               // Using a full tensoral Riemann jump relation
-               mu_term = muc(node_lid)
-                         * sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
-                               + area_normal(node_lid, 1)*area_normal(node_lid, 1)
-                               + area_normal(node_lid, 2)*area_normal(node_lid, 2) );
+            else
+            {
+                // Using a full tensoral Riemann jump relation
+                mu_term = muc(node_lid)
+                          * sqrt(area_normal(node_lid, 0) * area_normal(node_lid, 0)
+                    + area_normal(node_lid, 1) * area_normal(node_lid, 1)
+                    + area_normal(node_lid, 2) * area_normal(node_lid, 2) );
             }
-            
-            sum(0) += mu_term*vel(0);
-            sum(1) += mu_term*vel(1);
-            sum(2) += mu_term*vel(2);
+
+            sum(0) += mu_term * vel(0);
+            sum(1) += mu_term * vel(1);
+            sum(2) += mu_term * vel(2);
             sum(3) += mu_term;
 
             muc(node_lid) = mu_term; // the impeadance time surface area is stored here
-
         } // end for node_lid loop over nodes of the elem
 
-
-
-
         // The Riemann velocity, called vel_star
-        if (sum(3) > fuzz) {
-            for (size_t i = 0; i < num_dims; i++) {
-                vel_star(i) = sum(i)/sum(3);
+        if (sum(3) > fuzz)
+        {
+            for (size_t i = 0; i < num_dims; i++)
+            {
+                vel_star(i) = sum(i) / sum(3);
             }
         }
-        else {
-            for (int i = 0; i < num_dims; i++){
+        else
+        {
+            for (int i = 0; i < num_dims; i++)
+            {
                 vel_star(i) = 0.0;
             }
         } // end if
 
-            
-            
         // ---- Calculate the shock detector for the Riemann-solver ----
         //
         // The dissipation from the Riemann problem is limited by phi
@@ -283,7 +268,7 @@ void get_force_sgh(const CArrayKokkos <material_t> &material,
         //      phi = 0 highest-order solution
         //      phi = 1 first order solution
         //
-            
+
         double phi    = 0.0;  // the shock detector
         double r_face = 1.0;  // the ratio on the face
         double r_min  = 1.0;  // the min ratio for the cell
@@ -292,85 +277,77 @@ void get_force_sgh(const CArrayKokkos <material_t> &material,
         double n_coef = 1.0;  // the power on the limiting coefficient
                               //   (1=nominal, and n_coeff > 1 oscillatory)
 
-            
         // loop over the nieghboring cells
-        for (size_t elem_lid = 0; elem_lid < mesh.num_elems_in_elem(elem_gid); elem_lid++){
-            
+        for (size_t elem_lid = 0; elem_lid < mesh.num_elems_in_elem(elem_gid); elem_lid++)
+        {
             // Get global index for neighboring cell
             size_t neighbor_gid = mesh.elems_in_elem(elem_gid, elem_lid);
-            
+
             // calculate the velocity divergence in neighbor
             double div_neighbor = elem_div(neighbor_gid);
 
-            r_face = r_coef*(div_neighbor + small)/(div + small);
+            r_face = r_coef * (div_neighbor + small) / (div + small);
 
             // store the smallest face ratio
             r_min = fmin(r_face, r_min);
-            
         } // end for elem_lid
-        
 
         // calculate standard shock detector
         phi = 1.0 - fmax(0.0, r_min);
         phi = pow(phi, n_coef);
 
         //  Mach number shock detector
-        double omega = 20.0;//20.0;    // weighting factor on Mach number
-        double third = 1.0/3.0;
+        double omega    = 20.0; // 20.0;    // weighting factor on Mach number
+        double third    = 1.0 / 3.0;
         double c_length = pow(vol, third); // characteristic length
-        double alpha = fmin(1.0, omega * (c_length * fabs(div))/(elem_sspd(elem_gid) + fuzz) );
-        
+        double alpha    = fmin(1.0, omega * (c_length * fabs(div)) / (elem_sspd(elem_gid) + fuzz) );
+
         // use Mach based detector with standard shock detector
 
         // turn off dissipation in expansion
-        //alpha = fmax(-fabs(div0)/div0 * alpha, 0.0);  // this should be if(div0<0) alpha=alpha else alpha=0
-        
-        phi = alpha*phi;
-        
-        // curl limiter on Q
-        double phi_curl = fmin(1.0, 1.0*fabs(div)/(mag_curl + fuzz));  // disable Q when vorticity is high
-        //phi = phi_curl*phi;
+        // alpha = fmax(-fabs(div0)/div0 * alpha, 0.0);  // this should be if(div0<0) alpha=alpha else alpha=0
 
+        phi = alpha * phi;
+
+        // curl limiter on Q
+        double phi_curl = fmin(1.0, 1.0 * fabs(div) / (mag_curl + fuzz));  // disable Q when vorticity is high
+        // phi = phi_curl*phi;
 
         // ---- Calculate the Riemann force on each node ----
 
         // loop over the each node in the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++) {
-            
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             size_t corner_lid = node_lid;
 
             // Get corner gid
             size_t corner_gid = mesh.corners_in_elem(elem_gid, corner_lid);
-            
+
             // Get node gid
             size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
-   
-            // loop over dimension
-            for (int dim = 0; dim < num_dims; dim++){
 
+            // loop over dimension
+            for (int dim = 0; dim < num_dims; dim++)
+            {
                 corner_force(corner_gid, dim) =
-                          area_normal(node_lid, 0)*tau(0, dim)
-                        + area_normal(node_lid, 1)*tau(1, dim)
-                        + area_normal(node_lid, 2)*tau(2, dim)
-                        + phi*muc(node_lid)*(vel_star(dim) - node_vel(1, node_gid, dim));
-
+                    area_normal(node_lid, 0) * tau(0, dim)
+                    + area_normal(node_lid, 1) * tau(1, dim)
+                    + area_normal(node_lid, 2) * tau(2, dim)
+                    + phi * muc(node_lid) * (vel_star(dim) - node_vel(1, node_gid, dim));
             } // end loop over dimension
-
         } // end for loop over nodes in elem
-        
-        
-        
+
         // --- Update Stress ---
         // calculate the new stress at the next rk level, if it is a hypo model
-        
+
         size_t mat_id = elem_mat_id(elem_gid);
-        
-        // hypo elastic plastic model
-        if(material(mat_id).strength_type == model::hypo){
 
+        // hypo elastic plastic model
+        if (material(mat_id).strength_type == model::hypo)
+        {
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t>   elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
-            
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
+
             // --- call strength model ---
             material(mat_id).strength_model(elem_pres,
                                             elem_stress,
@@ -387,110 +364,96 @@ void get_force_sgh(const CArrayKokkos <material_t> &material,
                                             elem_vol(elem_gid),
                                             dt,
                                             rk_alpha);
-            
         } // end logical on hypo strength model
-        
-
     }); // end parallel for loop over elements
 
-    
     return;
-    
 } // end of routine
 
-
-
-
 // -----------------------------------------------------------------------------
 // This function calculates the corner forces and the evolves stress (hypo)
-//------------------------------------------------------------------------------
-void get_force_sgh2D(const CArrayKokkos <material_t> &material,
-                     const mesh_t &mesh,
-                     const DViewCArrayKokkos <double> &node_coords,
-                     const DViewCArrayKokkos <double> &node_vel,
-                     const DViewCArrayKokkos <double> &elem_den,
-                     const DViewCArrayKokkos <double> &elem_sie,
-                     const DViewCArrayKokkos <double> &elem_pres,
-                     const DViewCArrayKokkos <double> &elem_stress,
-                     const DViewCArrayKokkos <double> &elem_sspd,
-                     const DViewCArrayKokkos <double> &elem_vol,
-                     const DViewCArrayKokkos <double> &elem_div,
-                     const DViewCArrayKokkos <size_t> &elem_mat_id,
-                     DViewCArrayKokkos <double> &corner_force,
-                     const double fuzz,
-                     const double small,
-                     const DViewCArrayKokkos <double> &elem_statev,
-                     const double dt,
-                     const double rk_alpha
-                     ){
-    
-
-    
+// ------------------------------------------------------------------------------
+void get_force_sgh2D(const CArrayKokkos<material_t>&  material,
+                     const mesh_t&                    mesh,
+                     const DViewCArrayKokkos<double>& node_coords,
+                     const DViewCArrayKokkos<double>& node_vel,
+                     const DViewCArrayKokkos<double>& elem_den,
+                     const DViewCArrayKokkos<double>& elem_sie,
+                     const DViewCArrayKokkos<double>& elem_pres,
+                     const DViewCArrayKokkos<double>& elem_stress,
+                     const DViewCArrayKokkos<double>& elem_sspd,
+                     const DViewCArrayKokkos<double>& elem_vol,
+                     const DViewCArrayKokkos<double>& elem_div,
+                     const DViewCArrayKokkos<size_t>& elem_mat_id,
+                     DViewCArrayKokkos<double>&       corner_force,
+                     const double                     fuzz,
+                     const double                     small,
+                     const DViewCArrayKokkos<double>& elem_statev,
+                     const double                     dt,
+                     const double                     rk_alpha
+                     )
+{
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-        
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_dims = 2;
         const size_t num_nodes_in_elem = 4;
-        
+
         // total Cauchy stress
         double tau_array[9];
-        
+
         // corner area normals
         double area_normal_array[8]; // 4 corners and 2 directions
-        
+
         // estimate of shock direction
         double shock_dir_array[2];
-        
+
         // the sums in the Riemann solver
         double sum_array[4];
-        
+
         // corner shock impeadance x |corner area normal dot shock_dir|
         double muc_array[4];
-        
+
         // Riemann velocity
         double vel_star_array[2];
-        
+
         // velocity gradient
         double vel_grad_array[9];
-        
+
         // --- Create views of arrays to aid the force calculation ---
-    
-        ViewCArrayKokkos <double> tau(tau_array, 3, 3);
-        ViewCArrayKokkos <double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
-        ViewCArrayKokkos <double> shock_dir(shock_dir_array, num_dims);
-        ViewCArrayKokkos <double> sum(sum_array, 4);
-        ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
-        ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
-        ViewCArrayKokkos <double> vel_grad(vel_grad_array, 3, 3);
-
-    
+
+        ViewCArrayKokkos<double> tau(tau_array, 3, 3);
+        ViewCArrayKokkos<double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
+        ViewCArrayKokkos<double> shock_dir(shock_dir_array, num_dims);
+        ViewCArrayKokkos<double> sum(sum_array, 4);
+        ViewCArrayKokkos<double> muc(muc_array, num_nodes_in_elem);
+        ViewCArrayKokkos<double> vel_star(vel_star_array, num_dims);
+        ViewCArrayKokkos<double> vel_grad(vel_grad_array, 3, 3);
+
         // create a view of the stress_matrix
-        ViewCArrayKokkos <double> stress(&elem_stress(1, elem_gid, 0,0), 3, 3);
-        
-        
+        ViewCArrayKokkos<double> stress(&elem_stress(1, elem_gid, 0, 0), 3, 3);
+
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
+
         // get the B matrix which are the OUTWARD corner area normals
         get_bmatrix2D(area_normal,
                       elem_gid,
                       node_coords,
                       elem_node_gids);
         // NOTE: I added a minus in bmatrix2D, it should be outward pointing now?
-        
+
         // facial area of the element
         double elem_area = get_area_quad(elem_gid, node_coords, elem_node_gids);
-        
+
         // facial area of the corners
         double corner_areas_array[4];
-        ViewCArrayKokkos <double> corner_areas(&corner_areas_array[0],4);
-        
+        ViewCArrayKokkos<double> corner_areas(&corner_areas_array[0], 4);
+
         get_area_weights2D(corner_areas,
                            elem_gid,
                            node_coords,
                            elem_node_gids);
-        
-        
+
         // --- Calculate the velocity gradient ---
         get_velgrad2D(vel_grad,
                       elem_node_gids,
@@ -499,71 +462,70 @@ void get_force_sgh2D(const CArrayKokkos <material_t> &material,
                       elem_vol(elem_gid),
                       elem_area,
                       elem_gid);
-        
-        
+
         // the -1 is for the inward surface area normal,
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-            for (size_t dim = 0; dim < num_dims; dim++){
-                area_normal(node_lid, dim) = (-1.0)*area_normal(node_lid,dim);
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
+            for (size_t dim = 0; dim < num_dims; dim++)
+            {
+                area_normal(node_lid, dim) = (-1.0) * area_normal(node_lid, dim);
             } // end for
         } // end for
-        
-    
-        
+
         double div = elem_div(elem_gid);
-        
-    
+
         // vel = [u,v]
         //            [du/dx,  du/dy]
         // vel_grad = [dv/dx,  dv/dy]
         double curl;
-        curl = vel_grad(1,0) - vel_grad(0,1);  // dv/dx - du/dy
-        
+        curl = vel_grad(1, 0) - vel_grad(0, 1);  // dv/dx - du/dy
+
         double mag_curl = curl;
-        
-        
+
         // --- Calculate the Cauchy stress ---
-        for (size_t i = 0; i < 3; i++){
-            for (size_t j = 0; j < 3; j++){
-                tau(i, j) = stress(i,j);
+        for (size_t i = 0; i < 3; i++)
+        {
+            for (size_t j = 0; j < 3; j++)
+            {
+                tau(i, j) = stress(i, j);
                 // artificial viscosity can be added here to tau
             } // end for
-        } //end for
+        } // end for
 
         // add the pressure
-        for (int i = 0; i < 3; i++){
+        for (int i = 0; i < 3; i++)
+        {
             tau(i, i) -= elem_pres(elem_gid);
         } // end for
-        
-
 
         // ---- Multidirectional Approximate Riemann solver (MARS) ----
         // find the average velocity of the elem, it is an
         // estimate of the Riemann velocity
-            
+
         // initialize to Riemann velocity to zero
-        for (size_t dim = 0; dim < num_dims; dim++){
+        for (size_t dim = 0; dim < num_dims; dim++)
+        {
             vel_star(dim) = 0.0;
         }
 
         // loop over nodes and calculate an average velocity, which is
         // an estimate of Riemann velocity
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get node gloabl index and create view of nodal velocity
             int node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
-            
-            ViewCArrayKokkos <double> vel(&node_vel(1, node_gid, 0), num_dims);
-            
-            vel_star(0) += 0.25*vel(0);
-            vel_star(1) += 0.25*vel(1);
-                
+
+            ViewCArrayKokkos<double> vel(&node_vel(1, node_gid, 0), num_dims);
+
+            vel_star(0) += 0.25 * vel(0);
+            vel_star(1) += 0.25 * vel(1);
         } // end for loop over nodes
 
         // find shock direction and shock impedance associated with each node
-        
+
         // initialize sum term in MARS to zero
-        for (int i = 0; i < 4; i++){
+        for (int i = 0; i < 4; i++)
+        {
             sum(i) = 0.0;
         }
 
@@ -571,98 +533,97 @@ void get_force_sgh2D(const CArrayKokkos <material_t> &material,
         double mag_vel;   // magnitude of velocity
 
         // loop over the nodes of the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++) {
-
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get global node id
             size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
 
             // Create view of nodal velocity
-            ViewCArrayKokkos <double> vel(&node_vel(1, node_gid, 0), num_dims);
+            ViewCArrayKokkos<double> vel(&node_vel(1, node_gid, 0), num_dims);
 
             // Get an estimate of the shock direction.
-            mag_vel = sqrt( (vel(0) - vel_star(0) )*(vel(0) - vel_star(0) )
-                          + (vel(1) - vel_star(1) )*(vel(1) - vel_star(1) ) );
+            mag_vel = sqrt( (vel(0) - vel_star(0) ) * (vel(0) - vel_star(0) )
+                + (vel(1) - vel_star(1) ) * (vel(1) - vel_star(1) ) );
 
-     
-            if (mag_vel > small) {
-        
+            if (mag_vel > small)
+            {
                 // estimate of the shock direction, a unit normal
-                for (int dim = 0; dim < num_dims; dim++){
+                for (int dim = 0; dim < num_dims; dim++)
+                {
                     shock_dir(dim) = (vel(dim) - vel_star(dim)) / mag_vel;
                 }
             }
-
-            else {
-                
+            else
+            {
                 // if there is no velocity change, then use the surface area
                 // normal as the shock direction
-                mag = sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
-                          + area_normal(node_lid, 1)*area_normal(node_lid, 1) );
-                
+                mag = sqrt(area_normal(node_lid, 0) * area_normal(node_lid, 0)
+                    + area_normal(node_lid, 1) * area_normal(node_lid, 1) );
+
                 // estimate of the shock direction
-                for (int dim = 0; dim < num_dims; dim++){
-                    shock_dir(dim) = area_normal(node_lid, dim)/mag;
+                for (int dim = 0; dim < num_dims; dim++)
+                {
+                    shock_dir(dim) = area_normal(node_lid, dim) / mag;
                 }
-                
             } // end if mag_vel
-            
 
             // cell divergence indicates compression or expansions
             size_t mat_id = elem_mat_id(elem_gid);
-            if (div < 0){ // element in compression
+            if (div < 0)  // element in compression
+            {
                 muc(node_lid) = elem_den(elem_gid) *
-                               (material(mat_id).q1*elem_sspd(elem_gid) + material(mat_id).q2*mag_vel);
+                                (material(mat_id).q1 * elem_sspd(elem_gid) + material(mat_id).q2 * mag_vel);
             }
-            else { // element in expansion
+            else   // element in expansion
+            {
                 muc(node_lid) = elem_den(elem_gid) *
-                               (material(mat_id).q1ex*elem_sspd(elem_gid) + material(mat_id).q2ex*mag_vel);
+                                (material(mat_id).q1ex * elem_sspd(elem_gid) + material(mat_id).q2ex * mag_vel);
             } // end if on divergence sign
-           
 
             size_t use_shock_dir = 0;
             double mu_term;
-            
+
             // Coding to use shock direction
-            if (use_shock_dir == 1){
+            if (use_shock_dir == 1)
+            {
                 // this is denominator of the Riamann solver and the multiplier
                 // on velocity in the numerator.  It filters on the shock
                 // direction
-                mu_term = muc(node_lid)*
-                           fabs( shock_dir(0)*area_normal(0)
-                               + shock_dir(1)*area_normal(1) );
+                mu_term = muc(node_lid) *
+                          fabs(shock_dir(0) * area_normal(0)
+                    + shock_dir(1) * area_normal(1) );
             }
-            else {
-               // Using a full tensoral Riemann jump relation
-               mu_term = muc(node_lid)
-                         * sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
-                               + area_normal(node_lid, 1)*area_normal(node_lid, 1) );
+            else
+            {
+                // Using a full tensoral Riemann jump relation
+                mu_term = muc(node_lid)
+                          * sqrt(area_normal(node_lid, 0) * area_normal(node_lid, 0)
+                    + area_normal(node_lid, 1) * area_normal(node_lid, 1) );
             }
-            
-            sum(0) += mu_term*vel(0);
-            sum(1) += mu_term*vel(1);
+
+            sum(0) += mu_term * vel(0);
+            sum(1) += mu_term * vel(1);
             sum(3) += mu_term;
 
             muc(node_lid) = mu_term; // the impeadance time surface area is stored here
-
         } // end for node_lid loop over nodes of the elem
 
-
-
-
         // The Riemann velocity, called vel_star
-        if (sum(3) > fuzz) {
-            for (size_t i = 0; i < num_dims; i++) {
-                vel_star(i) = sum(i)/sum(3);
+        if (sum(3) > fuzz)
+        {
+            for (size_t i = 0; i < num_dims; i++)
+            {
+                vel_star(i) = sum(i) / sum(3);
             }
         }
-        else {
-            for (int i = 0; i < num_dims; i++){
+        else
+        {
+            for (int i = 0; i < num_dims; i++)
+            {
                 vel_star(i) = 0.0;
             }
         } // end if
-            
-            
-            
+
         // ---- Calculate the shock detector for the Riemann-solver ----
         //
         // The dissipation from the Riemann problem is limited by phi
@@ -675,7 +636,7 @@ void get_force_sgh2D(const CArrayKokkos <material_t> &material,
         //      phi = 0 highest-order solution
         //      phi = 1 first order solution
         //
-            
+
         double phi    = 0.0;  // the shock detector
         double r_face = 1.0;  // the ratio on the face
         double r_min  = 1.0;  // the min ratio for the cell
@@ -684,102 +645,92 @@ void get_force_sgh2D(const CArrayKokkos <material_t> &material,
         double n_coef = 1.0;  // the power on the limiting coefficient
                               //   (1=nominal, and n_coeff > 1 oscillatory)
 
-            
         // loop over the nieghboring cells
-        for (size_t elem_lid = 0; elem_lid < mesh.num_elems_in_elem(elem_gid); elem_lid++){
-            
+        for (size_t elem_lid = 0; elem_lid < mesh.num_elems_in_elem(elem_gid); elem_lid++)
+        {
             // Get global index for neighboring cell
             size_t neighbor_gid = mesh.elems_in_elem(elem_gid, elem_lid);
-            
+
             // calculate the velocity divergence in neighbor
             double div_neighbor = elem_div(neighbor_gid);
 
-            r_face = r_coef*(div_neighbor + small)/(div + small);
+            r_face = r_coef * (div_neighbor + small) / (div + small);
 
             // store the smallest face ratio
             r_min = fmin(r_face, r_min);
-            
         } // end for elem_lid
-        
 
         // calculate standard shock detector
         phi = 1.0 - fmax(0.0, r_min);
         phi = pow(phi, n_coef);
 
         //  Mach number shock detector
-        double omega = 20.0;//20.0;    // weighting factor on Mach number
+        double omega    = 20.0; // 20.0;    // weighting factor on Mach number
         double c_length = sqrt(elem_area); // characteristic length
-        double alpha = fmin(1.0, omega * (c_length * fabs(div))/(elem_sspd(elem_gid) + fuzz) );
-        
+        double alpha    = fmin(1.0, omega * (c_length * fabs(div)) / (elem_sspd(elem_gid) + fuzz) );
+
         // use Mach based detector with standard shock detector
 
         // turn off dissipation in expansion
-        //alpha = fmax(-fabs(div0)/div0 * alpha, 0.0);  // this should be if(div0<0) alpha=alpha else alpha=0
-        
-        phi = alpha*phi;
-        
-        //phi = 1.0;  // WARNING WARNING WARNING
-        
-        // curl limiter on Q
-        double phi_curl = fmin(1.0, 4.0*fabs(div)/(mag_curl + fuzz));  // disable Q when vorticity is high
-        //phi = phi_curl*phi;
-        
+        // alpha = fmax(-fabs(div0)/div0 * alpha, 0.0);  // this should be if(div0<0) alpha=alpha else alpha=0
+
+        phi = alpha * phi;
 
+        // phi = 1.0;  // WARNING WARNING WARNING
 
+        // curl limiter on Q
+        double phi_curl = fmin(1.0, 4.0 * fabs(div) / (mag_curl + fuzz));  // disable Q when vorticity is high
+        // phi = phi_curl*phi;
 
         // ---- Calculate the Riemann force on each node ----
 
         // loop over the each node in the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++) {
-            
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             size_t corner_lid = node_lid;
 
             // Get corner gid
             size_t corner_gid = mesh.corners_in_elem(elem_gid, corner_lid);
-            
+
             // Get node gid
             size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
-   
-            // loop over dimension
-            for (int dim = 0; dim < num_dims; dim++){
 
+            // loop over dimension
+            for (int dim = 0; dim < num_dims; dim++)
+            {
                 corner_force(corner_gid, dim) =
-                          area_normal(node_lid, 0)*tau(0, dim)
-                        + area_normal(node_lid, 1)*tau(1, dim)
-                        + phi*muc(node_lid)*(vel_star(dim) - node_vel(1, node_gid, dim));
-
+                    area_normal(node_lid, 0) * tau(0, dim)
+                    + area_normal(node_lid, 1) * tau(1, dim)
+                    + phi * muc(node_lid) * (vel_star(dim) - node_vel(1, node_gid, dim));
             } // end loop over dimension
-            
-            
+
             // ---- add hoop stress terms ----
-            
-            double node_radius = node_coords(1,node_gid,1);
-            
+
+            double node_radius = node_coords(1, node_gid, 1);
+
             // Wilkins used elem_area*0.25 for the corner area, we will use the corner
             // areas calculated using Barlow's symmetry and energy preserving area partitioning
-            if(node_radius>1e-14){
+            if (node_radius > 1e-14)
+            {
                 // sigma_RZ / R_p
-                corner_force(corner_gid, 0) += tau(1,0)*corner_areas(corner_lid)/node_radius;
-                
+                corner_force(corner_gid, 0) += tau(1, 0) * corner_areas(corner_lid) / node_radius;
+
                 // (sigma_RR - sigma_theta) / R_p
-                corner_force(corner_gid, 1) += (tau(1,1) - tau(2,2))*corner_areas(corner_lid)/node_radius;
+                corner_force(corner_gid, 1) += (tau(1, 1) - tau(2, 2)) * corner_areas(corner_lid) / node_radius;
             } // end if radius >0
-            
         } // end for loop over nodes in elem
-        
-        
-        
+
         // --- Update Stress ---
         // calculate the new stress at the next rk level, if it is a hypo model
-        
+
         size_t mat_id = elem_mat_id(elem_gid);
-        
-        // hypo elastic plastic model
-        if(material(mat_id).strength_type == model::hypo){
 
+        // hypo elastic plastic model
+        if (material(mat_id).strength_type == model::hypo)
+        {
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t>   elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
-            
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
+
             // --- call strength model ---
             material(mat_id).strength_model(elem_pres,
                                             elem_stress,
@@ -796,13 +747,8 @@ void get_force_sgh2D(const CArrayKokkos <material_t> &material,
                                             elem_vol(elem_gid),
                                             dt,
                                             rk_alpha);
-            
         } // end logical on hypo strength model
-        
-
     }); // end parallel for loop over elements
 
-    
     return;
-    
 } // end of routine for 2D force and stress update
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/geometry.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/geometry.cpp
index 95eeb2e7a..64213b796 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/geometry.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/geometry.cpp
@@ -1,31 +1,27 @@
 // -----------------------------------------------------------------------------
 // This code handles the geometric information for the mesh for the SHG solver
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "matar.h"
 #include "mesh.h"
 #include "state.h"
 
-
-void update_position_sgh(double rk_alpha,
-                         double dt,
-                         const size_t num_dims,
-                         const size_t num_nodes,
-                         DViewCArrayKokkos <double> &node_coords,
-                         const DViewCArrayKokkos <double> &node_vel){
-    
+void update_position_sgh(double                           rk_alpha,
+                         double                           dt,
+                         const size_t                     num_dims,
+                         const size_t                     num_nodes,
+                         DViewCArrayKokkos<double>&       node_coords,
+                         const DViewCArrayKokkos<double>& node_vel)
+{
     // loop over all the nodes in the mesh
     FOR_ALL(node_gid, 0, num_nodes, {
-
-        for (int dim = 0; dim < num_dims; dim++){
-            double half_vel = (node_vel(1, node_gid, dim) + node_vel(0, node_gid, dim))*0.5;
-            node_coords(1, node_gid, dim) = node_coords(0, node_gid, dim) + rk_alpha*dt*half_vel;
+        for (int dim = 0; dim < num_dims; dim++)
+        {
+            double half_vel = (node_vel(1, node_gid, dim) + node_vel(0, node_gid, dim)) * 0.5;
+            node_coords(1, node_gid, dim) = node_coords(0, node_gid, dim) + rk_alpha * dt * half_vel;
         }
-        
     }); // end parallel for over nodes
-    
 } // end subroutine
 
-
 // -----------------------------------------------------------------------------
 //  This function claculates
 //    B_p =  J^{-T} \cdot (\nabla_{xi} \phi_p w
@@ -36,298 +32,293 @@ void update_position_sgh(double rk_alpha,
 //    \nabla_{xi} is the gradient opperator in the reference coordinates
 //
 //   B_p is the OUTWARD corner area normal at node p
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void get_bmatrix(const ViewCArrayKokkos <double> &B_matrix,
-                 const size_t elem_gid,
-                 const DViewCArrayKokkos <double> &node_coords,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids){
-
+void get_bmatrix(const ViewCArrayKokkos<double>&  B_matrix,
+                 const size_t                     elem_gid,
+                 const DViewCArrayKokkos<double>& node_coords,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
     const size_t num_nodes = 8;
 
     double x_array[8];
     double y_array[8];
     double z_array[8];
-    
+
     // x, y, z coordinates of elem vertices
-    auto x  = ViewCArrayKokkos <double> (x_array, num_nodes);
-    auto y  = ViewCArrayKokkos <double> (y_array, num_nodes);
-    auto z  = ViewCArrayKokkos <double> (z_array, num_nodes);
+    auto x = ViewCArrayKokkos<double>(x_array, num_nodes);
+    auto y = ViewCArrayKokkos<double>(y_array, num_nodes);
+    auto z = ViewCArrayKokkos<double>(z_array, num_nodes);
 
     // get the coordinates of the nodes(rk,elem,node) in this element
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
         z(node_lid) = node_coords(1, elem_node_gids(node_lid), 2);
     } // end for
 
-    double twelth = 1./12.;
-        
-    B_matrix(0,0) = ( +y(1)*( -z(2) -z(3) +z(4) +z(5) )
-                      +y(2)*( +z(1) -z(3) )
-                      +y(3)*( +z(1) +z(2) -z(4) -z(7) )
-                      +y(4)*( -z(1) +z(3) -z(5) +z(7) )
-                      +y(5)*( -z(1) +z(4) )
-                      +y(7)*( +z(3) -z(4) ) )*twelth;
-
-    B_matrix(1,0) = ( +y(0)*( +z(2) +z(3) -z(4) -z(5) )
-                      +y(2)*( -z(0) -z(3) +z(5) +z(6) )
-                      +y(3)*( -z(0) +z(2) )
-                      +y(4)*( +z(0) -z(5) )
-                      +y(5)*( +z(0) -z(2) +z(4) -z(6) )
-                      +y(6)*( -z(2) +z(5) ) )*twelth;
-
-    B_matrix(2,0) = ( +y(0)*( -z(1) +z(3) )
-                      +y(1)*( +z(0) +z(3) -z(5) -z(6) )
-                      +y(3)*( -z(0) -z(1) +z(6) +z(7) )
-                      +y(5)*( +z(1) -z(6) )
-                      +y(6)*( +z(1) -z(3) +z(5) -z(7) )
-                      +y(7)*( -z(3) +z(6) ) )*twelth;
-
-    B_matrix(3,0) = ( +y(0)*( -z(1) -z(2) +z(4) +z(7) )
-                      +y(1)*( +z(0) -z(2) )
-                      +y(2)*( +z(0) +z(1) -z(6) -z(7) )
-                      +y(4)*( -z(0) +z(7) )
-                      +y(6)*( +z(2) -z(7) )
-                      +y(7)*( -z(0) +z(2) -z(4) +z(6) ) )*twelth;
-
-    B_matrix(4,0) = ( +y(0)*( +z(1) -z(3) +z(5) -z(7) )
-                      +y(1)*( -z(0) +z(5) )
-                      +y(3)*( +z(0) -z(7) )
-                      +y(5)*( -z(0) -z(1) +z(6) +z(7) )
-                      +y(6)*( -z(5) +z(7) )
-                      +y(7)*( +z(0) +z(3) -z(5) -z(6) ) )*twelth;
-
-    B_matrix(5,0) = ( +y(0)*( +z(1) -z(4) )
-                      +y(1)*( -z(0) +z(2) -z(4) +z(6) )
-                      +y(2)*( -z(1) +z(6) )
-                      +y(4)*( +z(0) +z(1) -z(6) -z(7) )
-                      +y(6)*( -z(1) -z(2) +z(4) +z(7) )
-                      +y(7)*( +z(4) -z(6) ) )*twelth;
-
-    B_matrix(6,0) = ( +y(1)*( +z(2) -z(5) )
-                      +y(2)*( -z(1) +z(3) -z(5) +z(7) )
-                      +y(3)*( -z(2) +z(7) )
-                      +y(4)*( +z(5) -z(7) )
-                      +y(5)*( +z(1) +z(2) -z(4) -z(7) )
-                      +y(7)*( -z(2) -z(3) +z(4) +z(5) ) )*twelth;
-
-    B_matrix(7,0) = ( +y(0)*( -z(3) +z(4) )
-                      +y(2)*( +z(3) -z(6) )
-                      +y(3)*( +z(0) -z(2) +z(4) -z(6) )
-                      +y(4)*( -z(0) -z(3) +z(5) +z(6) )
-                      +y(5)*( -z(4) +z(6) )
-                      +y(6)*( +z(2) +z(3) -z(4) -z(5) ) )*twelth;
-
-    B_matrix(0,1) = ( +z(1)*( -x(2) -x(3) +x(4) +x(5) )
-                      +z(2)*( +x(1) -x(3) )
-                      +z(3)*( +x(1) +x(2) -x(4) -x(7) )
-                      +z(4)*( -x(1) +x(3) -x(5) +x(7) )
-                      +z(5)*( -x(1) +x(4) )
-                      +z(7)*( +x(3) -x(4) ) )*twelth;
-
-    B_matrix(1,1) = ( +z(0)*( +x(2) +x(3) -x(4) -x(5) )
-                      +z(2)*( -x(0) -x(3) +x(5) +x(6) )
-                      +z(3)*( -x(0) +x(2) )
-                      +z(4)*( +x(0) -x(5) )
-                      +z(5)*( +x(0) -x(2) +x(4) -x(6) )
-                      +z(6)*( -x(2) +x(5) ) )*twelth;
-
-    B_matrix(2,1) = ( +z(0)*( -x(1) +x(3) )
-                      +z(1)*( +x(0) +x(3) -x(5) -x(6) )
-                      +z(3)*( -x(0) -x(1) +x(6) +x(7) )
-                      +z(5)*( +x(1) -x(6) )
-                      +z(6)*( +x(1) -x(3) +x(5) -x(7) )
-                      +z(7)*( -x(3) +x(6) ) )*twelth;
-
-    B_matrix(3,1) = ( +z(0)*( -x(1) -x(2) +x(4) +x(7) )
-                      +z(1)*( +x(0) -x(2) )
-                      +z(2)*( +x(0) +x(1) -x(6) -x(7) )
-                      +z(4)*( -x(0) +x(7) )
-                      +z(6)*( +x(2) -x(7) )
-                      +z(7)*( -x(0) +x(2) -x(4) +x(6) ) )*twelth;
-
-    B_matrix(4,1) = ( +z(0)*( +x(1) -x(3) +x(5) -x(7) )
-                      +z(1)*( -x(0) +x(5) )
-                      +z(3)*( +x(0) -x(7) )
-                      +z(5)*( -x(0) -x(1) +x(6) +x(7) )
-                      +z(6)*( -x(5) +x(7) )
-                      +z(7)*( +x(0) +x(3) -x(5) -x(6) ) )*twelth;
-
-    B_matrix(5,1) = ( +z(0)*( +x(1) -x(4) )
-                      +z(1)*( -x(0) +x(2) -x(4) +x(6) )
-                      +z(2)*( -x(1) +x(6) )
-                      +z(4)*( +x(0) +x(1) -x(6) -x(7) )
-                      +z(6)*( -x(1) -x(2) +x(4) +x(7) )
-                      +z(7)*( +x(4) -x(6) ) )*twelth;
-
-    B_matrix(6,1) = ( +z(1)*( +x(2) -x(5) )
-                      +z(2)*( -x(1) +x(3) -x(5) +x(7) )
-                      +z(3)*( -x(2) +x(7) )
-                      +z(4)*( +x(5) -x(7) )
-                      +z(5)*( +x(1) +x(2) -x(4) -x(7) )
-                      +z(7)*( -x(2) -x(3) +x(4) +x(5) ) )*twelth;
-
-    B_matrix(7,1) = ( +z(0)*( -x(3) +x(4) )
-                      +z(2)*( +x(3) -x(6) )
-                      +z(3)*( +x(0) -x(2) +x(4) -x(6) )
-                      +z(4)*( -x(0) -x(3) +x(5) +x(6) )
-                      +z(5)*( -x(4) +x(6) )
-                      +z(6)*( +x(2) +x(3) -x(4) -x(5) ) )*twelth;
-
-    B_matrix(0,2) = ( +x(1)*( -y(2) -y(3) +y(4) +y(5) )
-                      +x(2)*( +y(1) -y(3) )
-                      +x(3)*( +y(1) +y(2) -y(4) -y(7) )
-                      +x(4)*( -y(1) +y(3) -y(5) +y(7) )
-                      +x(5)*( -y(1) +y(4) )
-                      +x(7)*( +y(3) -y(4) ) )*twelth;
-
-    B_matrix(1,2) = ( +x(0)*( +y(2) +y(3) -y(4) -y(5) )
-                      +x(2)*( -y(0) -y(3) +y(5) +y(6) )
-                      +x(3)*( -y(0) +y(2) )
-                      +x(4)*( +y(0) -y(5) )
-                      +x(5)*( +y(0) -y(2) +y(4) -y(6) )
-                      +x(6)*( -y(2) +y(5) ) )*twelth;
-
-    B_matrix(2,2) = ( +x(0)*( -y(1) +y(3) )
-                      +x(1)*( +y(0) +y(3) -y(5) -y(6) )
-                      +x(3)*( -y(0) -y(1) +y(6) +y(7) )
-                      +x(5)*( +y(1) -y(6) )
-                      +x(6)*( +y(1) -y(3) +y(5) -y(7) )
-                      +x(7)*( -y(3) +y(6) ) )*twelth;
-
-    B_matrix(3,2) = ( +x(0)*( -y(1) -y(2) +y(4) +y(7) )
-                      +x(1)*( +y(0) -y(2) )
-                      +x(2)*( +y(0) +y(1) -y(6) -y(7) )
-                      +x(4)*( -y(0) +y(7) )
-                      +x(6)*( +y(2) -y(7) )
-                      +x(7)*( -y(0) +y(2) -y(4) +y(6) ) )*twelth;
-
-    B_matrix(4,2) = ( +x(0)*( +y(1) -y(3) +y(5) -y(7) )
-                      +x(1)*( -y(0) +y(5) )
-                      +x(3)*( +y(0) -y(7) )
-                      +x(5)*( -y(0) -y(1) +y(6) +y(7) )
-                      +x(6)*( -y(5) +y(7) )
-                      +x(7)*( +y(0) +y(3) -y(5) -y(6) ) )*twelth;
-
-    B_matrix(5,2) = ( +x(0)*( +y(1) -y(4) )
-                      +x(1)*( -y(0) +y(2) -y(4) +y(6) )
-                      +x(2)*( -y(1) +y(6) )
-                      +x(4)*( +y(0) +y(1) -y(6) -y(7) )
-                      +x(6)*( -y(1) -y(2) +y(4) +y(7) )
-                      +x(7)*( +y(4) -y(6) ) )*twelth;
-
-    B_matrix(6,2) = ( +x(1)*( +y(2) -y(5) )
-                      +x(2)*( -y(1) +y(3) -y(5) +y(7) )
-                      +x(3)*( -y(2) +y(7) )
-                      +x(4)*( +y(5) -y(7) )
-                      +x(5)*( +y(1) +y(2) -y(4) -y(7) )
-                      +x(7)*( -y(2) -y(3) +y(4) +y(5) ) )*twelth;
-
-    B_matrix(7,2) = ( +x(0)*( -y(3) +y(4) )
-                      +x(2)*( +y(3) -y(6) )
-                      +x(3)*( +y(0) -y(2) +y(4) -y(6) )
-                      +x(4)*( -y(0) -y(3) +y(5) +y(6) )
-                      +x(5)*( -y(4) +y(6) )
-                      +x(6)*( +y(2) +y(3) -y(4) -y(5) ) )*twelth;
-
+    double twelth = 1. / 12.;
+
+    B_matrix(0, 0) = (+y(1) * (-z(2) - z(3) + z(4) + z(5) )
+                      + y(2) * (+z(1) - z(3) )
+                      + y(3) * (+z(1) + z(2) - z(4) - z(7) )
+                      + y(4) * (-z(1) + z(3) - z(5) + z(7) )
+                      + y(5) * (-z(1) + z(4) )
+                      + y(7) * (+z(3) - z(4) ) ) * twelth;
+
+    B_matrix(1, 0) = (+y(0) * (+z(2) + z(3) - z(4) - z(5) )
+                      + y(2) * (-z(0) - z(3) + z(5) + z(6) )
+                      + y(3) * (-z(0) + z(2) )
+                      + y(4) * (+z(0) - z(5) )
+                      + y(5) * (+z(0) - z(2) + z(4) - z(6) )
+                      + y(6) * (-z(2) + z(5) ) ) * twelth;
+
+    B_matrix(2, 0) = (+y(0) * (-z(1) + z(3) )
+                      + y(1) * (+z(0) + z(3) - z(5) - z(6) )
+                      + y(3) * (-z(0) - z(1) + z(6) + z(7) )
+                      + y(5) * (+z(1) - z(6) )
+                      + y(6) * (+z(1) - z(3) + z(5) - z(7) )
+                      + y(7) * (-z(3) + z(6) ) ) * twelth;
+
+    B_matrix(3, 0) = (+y(0) * (-z(1) - z(2) + z(4) + z(7) )
+                      + y(1) * (+z(0) - z(2) )
+                      + y(2) * (+z(0) + z(1) - z(6) - z(7) )
+                      + y(4) * (-z(0) + z(7) )
+                      + y(6) * (+z(2) - z(7) )
+                      + y(7) * (-z(0) + z(2) - z(4) + z(6) ) ) * twelth;
+
+    B_matrix(4, 0) = (+y(0) * (+z(1) - z(3) + z(5) - z(7) )
+                      + y(1) * (-z(0) + z(5) )
+                      + y(3) * (+z(0) - z(7) )
+                      + y(5) * (-z(0) - z(1) + z(6) + z(7) )
+                      + y(6) * (-z(5) + z(7) )
+                      + y(7) * (+z(0) + z(3) - z(5) - z(6) ) ) * twelth;
+
+    B_matrix(5, 0) = (+y(0) * (+z(1) - z(4) )
+                      + y(1) * (-z(0) + z(2) - z(4) + z(6) )
+                      + y(2) * (-z(1) + z(6) )
+                      + y(4) * (+z(0) + z(1) - z(6) - z(7) )
+                      + y(6) * (-z(1) - z(2) + z(4) + z(7) )
+                      + y(7) * (+z(4) - z(6) ) ) * twelth;
+
+    B_matrix(6, 0) = (+y(1) * (+z(2) - z(5) )
+                      + y(2) * (-z(1) + z(3) - z(5) + z(7) )
+                      + y(3) * (-z(2) + z(7) )
+                      + y(4) * (+z(5) - z(7) )
+                      + y(5) * (+z(1) + z(2) - z(4) - z(7) )
+                      + y(7) * (-z(2) - z(3) + z(4) + z(5) ) ) * twelth;
+
+    B_matrix(7, 0) = (+y(0) * (-z(3) + z(4) )
+                      + y(2) * (+z(3) - z(6) )
+                      + y(3) * (+z(0) - z(2) + z(4) - z(6) )
+                      + y(4) * (-z(0) - z(3) + z(5) + z(6) )
+                      + y(5) * (-z(4) + z(6) )
+                      + y(6) * (+z(2) + z(3) - z(4) - z(5) ) ) * twelth;
+
+    B_matrix(0, 1) = (+z(1) * (-x(2) - x(3) + x(4) + x(5) )
+                      + z(2) * (+x(1) - x(3) )
+                      + z(3) * (+x(1) + x(2) - x(4) - x(7) )
+                      + z(4) * (-x(1) + x(3) - x(5) + x(7) )
+                      + z(5) * (-x(1) + x(4) )
+                      + z(7) * (+x(3) - x(4) ) ) * twelth;
+
+    B_matrix(1, 1) = (+z(0) * (+x(2) + x(3) - x(4) - x(5) )
+                      + z(2) * (-x(0) - x(3) + x(5) + x(6) )
+                      + z(3) * (-x(0) + x(2) )
+                      + z(4) * (+x(0) - x(5) )
+                      + z(5) * (+x(0) - x(2) + x(4) - x(6) )
+                      + z(6) * (-x(2) + x(5) ) ) * twelth;
+
+    B_matrix(2, 1) = (+z(0) * (-x(1) + x(3) )
+                      + z(1) * (+x(0) + x(3) - x(5) - x(6) )
+                      + z(3) * (-x(0) - x(1) + x(6) + x(7) )
+                      + z(5) * (+x(1) - x(6) )
+                      + z(6) * (+x(1) - x(3) + x(5) - x(7) )
+                      + z(7) * (-x(3) + x(6) ) ) * twelth;
+
+    B_matrix(3, 1) = (+z(0) * (-x(1) - x(2) + x(4) + x(7) )
+                      + z(1) * (+x(0) - x(2) )
+                      + z(2) * (+x(0) + x(1) - x(6) - x(7) )
+                      + z(4) * (-x(0) + x(7) )
+                      + z(6) * (+x(2) - x(7) )
+                      + z(7) * (-x(0) + x(2) - x(4) + x(6) ) ) * twelth;
+
+    B_matrix(4, 1) = (+z(0) * (+x(1) - x(3) + x(5) - x(7) )
+                      + z(1) * (-x(0) + x(5) )
+                      + z(3) * (+x(0) - x(7) )
+                      + z(5) * (-x(0) - x(1) + x(6) + x(7) )
+                      + z(6) * (-x(5) + x(7) )
+                      + z(7) * (+x(0) + x(3) - x(5) - x(6) ) ) * twelth;
+
+    B_matrix(5, 1) = (+z(0) * (+x(1) - x(4) )
+                      + z(1) * (-x(0) + x(2) - x(4) + x(6) )
+                      + z(2) * (-x(1) + x(6) )
+                      + z(4) * (+x(0) + x(1) - x(6) - x(7) )
+                      + z(6) * (-x(1) - x(2) + x(4) + x(7) )
+                      + z(7) * (+x(4) - x(6) ) ) * twelth;
+
+    B_matrix(6, 1) = (+z(1) * (+x(2) - x(5) )
+                      + z(2) * (-x(1) + x(3) - x(5) + x(7) )
+                      + z(3) * (-x(2) + x(7) )
+                      + z(4) * (+x(5) - x(7) )
+                      + z(5) * (+x(1) + x(2) - x(4) - x(7) )
+                      + z(7) * (-x(2) - x(3) + x(4) + x(5) ) ) * twelth;
+
+    B_matrix(7, 1) = (+z(0) * (-x(3) + x(4) )
+                      + z(2) * (+x(3) - x(6) )
+                      + z(3) * (+x(0) - x(2) + x(4) - x(6) )
+                      + z(4) * (-x(0) - x(3) + x(5) + x(6) )
+                      + z(5) * (-x(4) + x(6) )
+                      + z(6) * (+x(2) + x(3) - x(4) - x(5) ) ) * twelth;
+
+    B_matrix(0, 2) = (+x(1) * (-y(2) - y(3) + y(4) + y(5) )
+                      + x(2) * (+y(1) - y(3) )
+                      + x(3) * (+y(1) + y(2) - y(4) - y(7) )
+                      + x(4) * (-y(1) + y(3) - y(5) + y(7) )
+                      + x(5) * (-y(1) + y(4) )
+                      + x(7) * (+y(3) - y(4) ) ) * twelth;
+
+    B_matrix(1, 2) = (+x(0) * (+y(2) + y(3) - y(4) - y(5) )
+                      + x(2) * (-y(0) - y(3) + y(5) + y(6) )
+                      + x(3) * (-y(0) + y(2) )
+                      + x(4) * (+y(0) - y(5) )
+                      + x(5) * (+y(0) - y(2) + y(4) - y(6) )
+                      + x(6) * (-y(2) + y(5) ) ) * twelth;
+
+    B_matrix(2, 2) = (+x(0) * (-y(1) + y(3) )
+                      + x(1) * (+y(0) + y(3) - y(5) - y(6) )
+                      + x(3) * (-y(0) - y(1) + y(6) + y(7) )
+                      + x(5) * (+y(1) - y(6) )
+                      + x(6) * (+y(1) - y(3) + y(5) - y(7) )
+                      + x(7) * (-y(3) + y(6) ) ) * twelth;
+
+    B_matrix(3, 2) = (+x(0) * (-y(1) - y(2) + y(4) + y(7) )
+                      + x(1) * (+y(0) - y(2) )
+                      + x(2) * (+y(0) + y(1) - y(6) - y(7) )
+                      + x(4) * (-y(0) + y(7) )
+                      + x(6) * (+y(2) - y(7) )
+                      + x(7) * (-y(0) + y(2) - y(4) + y(6) ) ) * twelth;
+
+    B_matrix(4, 2) = (+x(0) * (+y(1) - y(3) + y(5) - y(7) )
+                      + x(1) * (-y(0) + y(5) )
+                      + x(3) * (+y(0) - y(7) )
+                      + x(5) * (-y(0) - y(1) + y(6) + y(7) )
+                      + x(6) * (-y(5) + y(7) )
+                      + x(7) * (+y(0) + y(3) - y(5) - y(6) ) ) * twelth;
+
+    B_matrix(5, 2) = (+x(0) * (+y(1) - y(4) )
+                      + x(1) * (-y(0) + y(2) - y(4) + y(6) )
+                      + x(2) * (-y(1) + y(6) )
+                      + x(4) * (+y(0) + y(1) - y(6) - y(7) )
+                      + x(6) * (-y(1) - y(2) + y(4) + y(7) )
+                      + x(7) * (+y(4) - y(6) ) ) * twelth;
+
+    B_matrix(6, 2) = (+x(1) * (+y(2) - y(5) )
+                      + x(2) * (-y(1) + y(3) - y(5) + y(7) )
+                      + x(3) * (-y(2) + y(7) )
+                      + x(4) * (+y(5) - y(7) )
+                      + x(5) * (+y(1) + y(2) - y(4) - y(7) )
+                      + x(7) * (-y(2) - y(3) + y(4) + y(5) ) ) * twelth;
+
+    B_matrix(7, 2) = (+x(0) * (-y(3) + y(4) )
+                      + x(2) * (+y(3) - y(6) )
+                      + x(3) * (+y(0) - y(2) + y(4) - y(6) )
+                      + x(4) * (-y(0) - y(3) + y(5) + y(6) )
+                      + x(5) * (-y(4) + y(6) )
+                      + x(6) * (+y(2) + y(3) - y(4) - y(5) ) ) * twelth;
 } // end subroutine
 
-
-
-void get_vol(const DViewCArrayKokkos <double> &elem_vol,
-             const DViewCArrayKokkos <double> &node_coords,
-             const mesh_t &mesh){
-    
+void get_vol(const DViewCArrayKokkos<double>& elem_vol,
+             const DViewCArrayKokkos<double>& node_coords,
+             const mesh_t&                    mesh)
+{
     const size_t num_dims = mesh.num_dims;
-    
-    if (num_dims == 2){
+
+    if (num_dims == 2)
+    {
         FOR_ALL(elem_gid, 0, mesh.num_elems, {
-            
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
             get_vol_quad(elem_vol, elem_gid, node_coords, elem_node_gids);
-            
         });
         Kokkos::fence();
     }
-    else {
+    else
+    {
         FOR_ALL(elem_gid, 0, mesh.num_elems, {
-            
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
             get_vol_hex(elem_vol, elem_gid, node_coords, elem_node_gids);
-            
         });
         Kokkos::fence();
     } // end if
-    
+
     return;
-    
 } // end subroutine
 
-
 // Exact volume for a hex element
 KOKKOS_FUNCTION
-void get_vol_hex(const DViewCArrayKokkos <double> &elem_vol,
-                 const size_t elem_gid,
-                 const DViewCArrayKokkos <double> &node_coords,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids){
-
+void get_vol_hex(const DViewCArrayKokkos<double>& elem_vol,
+                 const size_t                     elem_gid,
+                 const DViewCArrayKokkos<double>& node_coords,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
     const size_t num_nodes = 8;
 
     double x_array[8];
     double y_array[8];
     double z_array[8];
-    
+
     // x, y, z coordinates of elem vertices
-    auto x  = ViewCArrayKokkos <double> (x_array, num_nodes);
-    auto y  = ViewCArrayKokkos <double> (y_array, num_nodes);
-    auto z  = ViewCArrayKokkos <double> (z_array, num_nodes);
-    
+    auto x = ViewCArrayKokkos<double>(x_array, num_nodes);
+    auto y = ViewCArrayKokkos<double>(y_array, num_nodes);
+    auto z = ViewCArrayKokkos<double>(z_array, num_nodes);
+
     // get the coordinates of the nodes(rk,elem,node) in this element
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
         z(node_lid) = node_coords(1, elem_node_gids(node_lid), 2);
     } // end for
 
-    double twelth = 1./12.;
-        
+    double twelth = 1. / 12.;
+
     // element volume
     elem_vol(elem_gid) =
-        (x(1)*(y(3)*(-z(0) + z(2)) + y(4)*( z(0) - z(5)) + y(0)*(z(2) + z(3) - z(4) - z(5)) + y(6)*(-z(2) + z(5)) + y(5)*(z(0) - z(2) + z(4) - z(6)) + y(2)*(-z(0) - z(3) + z(5) + z(6))) +
-         x(7)*(y(0)*(-z(3) + z(4)) + y(6)*( z(2) + z(3)  - z(4) - z(5)) + y(2)*(z(3) - z(6)) + y(3)*(z(0) - z(2) + z(4) - z(6)) + y(5)*(-z(4) + z(6)) + y(4)*(-z(0) - z(3) + z(5) + z(6))) +
-         x(3)*(y(1)*( z(0) - z(2)) + y(7)*(-z(0) + z(2)  - z(4) + z(6)) + y(6)*(z(2) - z(7)) + y(2)*(z(0) + z(1) - z(6) - z(7)) + y(4)*(-z(0) + z(7)) + y(0)*(-z(1) - z(2) + z(4) + z(7))) +
-         x(5)*(y(0)*( z(1) - z(4)) + y(7)*( z(4) - z(6)) + y(2)*(-z(1) + z(6)) + y(1)*(-z(0) + z(2) - z(4) + z(6)) + y(4)*(z(0) + z(1) - z(6) - z(7)) + y(6)*(-z(1) - z(2) + z(4) + z(7))) +
-         x(6)*(y(1)*( z(2) - z(5)) + y(7)*(-z(2) - z(3)  + z(4) + z(5)) + y(5)*(z(1) + z(2) - z(4) - z(7)) + y(4)*(z(5) - z(7)) + y(3)*(-z(2) + z(7)) + y(2)*(-z(1) + z(3) - z(5) + z(7))) +
-         x(0)*(y(2)*( z(1) - z(3)) + y(7)*( z(3) - z(4)) + y(5)*(-z(1) + z(4)) + y(1)*(-z(2) - z(3) + z(4) + z(5)) + y(3)*(z(1) + z(2) - z(4) - z(7)) + y(4)*(-z(1) + z(3) - z(5) + z(7))) +
-         x(2)*(y(0)*(-z(1) + z(3)) + y(5)*( z(1) - z(6)) + y(1)*(z(0) + z(3) - z(5) - z(6)) + y(7)*(-z(3) + z(6)) + y(6)*(z(1) - z(3) + z(5) - z(7)) + y(3)*(-z(0) - z(1) + z(6) + z(7))) +
-         x(4)*(y(1)*(-z(0) + z(5)) + y(7)*( z(0) + z(3)  - z(5) - z(6)) + y(3)*(z(0) - z(7)) + y(0)*(z(1) - z(3) + z(5) - z(7)) + y(6)*(-z(5) + z(7)) + y(5)*(-z(0) - z(1) + z(6) + z(7))))*twelth;
+        (x(1) * (y(3) * (-z(0) + z(2)) + y(4) * (z(0) - z(5)) + y(0) * (z(2) + z(3) - z(4) - z(5)) + y(6) * (-z(2) + z(5)) + y(5) * (z(0) - z(2) + z(4) - z(6)) + y(2) * (-z(0) - z(3) + z(5) + z(6))) +
+         x(7) * (y(0) * (-z(3) + z(4)) + y(6) * (z(2) + z(3) - z(4) - z(5)) + y(2) * (z(3) - z(6)) + y(3) * (z(0) - z(2) + z(4) - z(6)) + y(5) * (-z(4) + z(6)) + y(4) * (-z(0) - z(3) + z(5) + z(6))) +
+         x(3) * (y(1) * (z(0) - z(2)) + y(7) * (-z(0) + z(2) - z(4) + z(6)) + y(6) * (z(2) - z(7)) + y(2) * (z(0) + z(1) - z(6) - z(7)) + y(4) * (-z(0) + z(7)) + y(0) * (-z(1) - z(2) + z(4) + z(7))) +
+         x(5) * (y(0) * (z(1) - z(4)) + y(7) * (z(4) - z(6)) + y(2) * (-z(1) + z(6)) + y(1) * (-z(0) + z(2) - z(4) + z(6)) + y(4) * (z(0) + z(1) - z(6) - z(7)) + y(6) * (-z(1) - z(2) + z(4) + z(7))) +
+         x(6) * (y(1) * (z(2) - z(5)) + y(7) * (-z(2) - z(3) + z(4) + z(5)) + y(5) * (z(1) + z(2) - z(4) - z(7)) + y(4) * (z(5) - z(7)) + y(3) * (-z(2) + z(7)) + y(2) * (-z(1) + z(3) - z(5) + z(7))) +
+         x(0) * (y(2) * (z(1) - z(3)) + y(7) * (z(3) - z(4)) + y(5) * (-z(1) + z(4)) + y(1) * (-z(2) - z(3) + z(4) + z(5)) + y(3) * (z(1) + z(2) - z(4) - z(7)) + y(4) * (-z(1) + z(3) - z(5) + z(7))) +
+         x(2) * (y(0) * (-z(1) + z(3)) + y(5) * (z(1) - z(6)) + y(1) * (z(0) + z(3) - z(5) - z(6)) + y(7) * (-z(3) + z(6)) + y(6) * (z(1) - z(3) + z(5) - z(7)) + y(3) * (-z(0) - z(1) + z(6) + z(7))) +
+         x(4) *
+         (y(1) * (-z(0) + z(5)) + y(7) * (z(0) + z(3) - z(5) - z(6)) + y(3) * (z(0) - z(7)) + y(0) * (z(1) - z(3) + z(5) - z(7)) + y(6) * (-z(5) + z(7)) + y(5) * (-z(0) - z(1) + z(6) + z(7)))) *
+        twelth;
 
     return;
-    
 } // end subroutine
 
-
-
 KOKKOS_FUNCTION
-void get_bmatrix2D(const ViewCArrayKokkos <double> &B_matrix,
-                   const size_t elem_gid,
-                   const DViewCArrayKokkos <double> &node_coords,
-                   const ViewCArrayKokkos <size_t>  &elem_node_gids){
-
+void get_bmatrix2D(const ViewCArrayKokkos<double>&  B_matrix,
+                   const size_t                     elem_gid,
+                   const DViewCArrayKokkos<double>& node_coords,
+                   const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
     const size_t num_nodes = 4;
 
     double x_array[4];
     double y_array[4];
-    
+
     // x, y coordinates of elem vertices
-    auto x  = ViewCArrayKokkos <double> (x_array, num_nodes);
-    auto y  = ViewCArrayKokkos <double> (y_array, num_nodes);
+    auto x = ViewCArrayKokkos<double>(x_array, num_nodes);
+    auto y = ViewCArrayKokkos<double>(y_array, num_nodes);
 
     // get the coordinates of the nodes(rk,elem,node) in this element
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
     } // end for
@@ -335,26 +326,23 @@ void get_bmatrix2D(const ViewCArrayKokkos <double> &B_matrix,
     /* ensight node order   0 1 2 3
        Flanaghan node order 3 4 1 2
     */
-    
-    
-    B_matrix(0,0) = -0.5*(y(3)-y(1));
-                  
-    B_matrix(1,0) = -0.5*(y(0)-y(2));
-                  
-    B_matrix(2,0) = -0.5*(y(1)-y(3));
-                  
-    B_matrix(3,0) = -0.5*(y(2)-y(0));
-                 
-
-    B_matrix(0,1) = -0.5*(x(1)-x(3));
-                 
-    B_matrix(1,1) = -0.5*(x(2)-x(0));
-                  
-    B_matrix(2,1) = -0.5*(x(3)-x(1));
-                  
-    B_matrix(3,1) = -0.5*(x(0)-x(2));
-
-    
+
+    B_matrix(0, 0) = -0.5 * (y(3) - y(1));
+
+    B_matrix(1, 0) = -0.5 * (y(0) - y(2));
+
+    B_matrix(2, 0) = -0.5 * (y(1) - y(3));
+
+    B_matrix(3, 0) = -0.5 * (y(2) - y(0));
+
+    B_matrix(0, 1) = -0.5 * (x(1) - x(3));
+
+    B_matrix(1, 1) = -0.5 * (x(2) - x(0));
+
+    B_matrix(2, 1) = -0.5 * (x(3) - x(1));
+
+    B_matrix(3, 1) = -0.5 * (x(0) - x(2));
+
     //
     /*
      The Flanagan and Belytschko paper has:
@@ -363,65 +351,62 @@ void get_bmatrix2D(const ViewCArrayKokkos <double> &B_matrix,
        node 2: 0.5*(y3 - y1)  ,  0.5*(x1 - x3)
        node 3: 0.5*(y4 - y2)  ,  0.5*(x2 - x4)
        node 4: 0.5*(y1 - y3)  ,  0.5*(x3 - x1)
-     
+
      Ensight order would be
-     
+
        node 2: 0.5*(y3 - y1)  ,  0.5*(x1 - x3)
        node 3: 0.5*(y0 - y2)  ,  0.5*(x2 - x0)
        node 0: 0.5*(y1 - y3)  ,  0.5*(x3 - x1)
        node 1: 0.5*(y2 - y0)  ,  0.5*(x0 - x2)
-     
+
      */
 
     return;
-    
 } // end subroutine
 
-
 // true volume of a quad in RZ coords
 KOKKOS_FUNCTION
-void get_vol_quad(const DViewCArrayKokkos <double> &elem_vol,
-                  const size_t elem_gid,
-                  const DViewCArrayKokkos <double> &node_coords,
-                  const ViewCArrayKokkos <size_t>  &elem_node_gids){
-
-    
+void get_vol_quad(const DViewCArrayKokkos<double>& elem_vol,
+                  const size_t                     elem_gid,
+                  const DViewCArrayKokkos<double>& node_coords,
+                  const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
     // --- testing here ---
     /*
     double test_vol = 0.0;
     // getting the corner facial area
     double corner_areas_array[4];
     ViewCArrayKokkos <double> corner_areas(&corner_areas_array[0],4);
-    
+
     get_area_weights2D(corner_areas,
                        elem_gid,
                        node_coords,
                        elem_node_gids);
 
-    
+
     for(size_t node_lid=0; node_lid<4; node_lid++){
         double y = node_coords(1, elem_node_gids(node_lid), 1);  // node radius
         test_vol += corner_areas(node_lid)*y;
     } // end for
-     
+
      test_vol matches the Barlow volume formula
     */
     // -------------------
-    
-    
+
     elem_vol(elem_gid) = 0.0;
-    
+
     const size_t num_nodes = 4;
 
     double x_array[4];
     double y_array[4];
-    
+
     // x, y coordinates of elem vertices
-    auto x  = ViewCArrayKokkos <double> (x_array, num_nodes);
-    auto y  = ViewCArrayKokkos <double> (y_array, num_nodes);
-     
+    auto x = ViewCArrayKokkos<double>(x_array, num_nodes);
+    auto y = ViewCArrayKokkos<double>(y_array, num_nodes);
+
     // get the coordinates of the nodes(rk,elem,node) in this element
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
     } // end for
@@ -430,33 +415,32 @@ void get_vol_quad(const DViewCArrayKokkos <double> &elem_vol,
        Flanaghan node order 3 4 1 2
     */
     elem_vol(elem_gid) =
-           ( (y(2)+y(3)+y(0))*((y(2)-y(3))*(x(0)-x(3))-(y(0)-y(3))*(x(2)-x(3)) )
-          +  (y(0)+y(1)+y(2))*((y(0)-y(1))*(x(2)-x(1))-(y(2)-y(1))*(x(0)-x(1))) )/6.0;
+        ( (y(2) + y(3) + y(0)) * ((y(2) - y(3)) * (x(0) - x(3)) - (y(0) - y(3)) * (x(2) - x(3)) )
+          + (y(0) + y(1) + y(2)) * ((y(0) - y(1)) * (x(2) - x(1)) - (y(2) - y(1)) * (x(0) - x(1))) ) / 6.0;
 
     return;
-    
 } // end subroutine
 
-
 // element facial area
 KOKKOS_FUNCTION
-double get_area_quad(const size_t elem_gid,
-                     const DViewCArrayKokkos <double> &node_coords,
-                     const ViewCArrayKokkos <size_t>  &elem_node_gids){
+double get_area_quad(const size_t                     elem_gid,
+                     const DViewCArrayKokkos<double>& node_coords,
+                     const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
+    double elem_area = 0.0;
 
-    double elem_area=0.0;
-    
     const size_t num_nodes = 4;
 
     double x_array[4];
     double y_array[4];
-    
+
     // x, y coordinates of elem vertices
-    auto x  = ViewCArrayKokkos <double> (x_array, num_nodes);
-    auto y  = ViewCArrayKokkos <double> (y_array, num_nodes);
-     
+    auto x = ViewCArrayKokkos<double>(x_array, num_nodes);
+    auto y = ViewCArrayKokkos<double>(y_array, num_nodes);
+
     // get the coordinates of the nodes(rk,elem,node) in this element
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
     } // end for
@@ -464,15 +448,13 @@ double get_area_quad(const size_t elem_gid,
     /* ensight node order   0 1 2 3
        Flanaghan node order 3 4 1 2
     */
-    
+
     // element facial area
-    elem_area = 0.5*((x(0)-x(2))*(y(1)-y(3))+(x(3)-x(1))*(y(0)-y(2)));
+    elem_area = 0.5 * ((x(0) - x(2)) * (y(1) - y(3)) + (x(3) - x(1)) * (y(0) - y(2)));
 
     return elem_area;
-    
 } // end subroutine
 
-
 KOKKOS_FUNCTION
 double heron(const double x1,
              const double y1,
@@ -481,68 +463,63 @@ double heron(const double x1,
              const double x3,
              const double y3)
 {
-  double S,a,b,c,area;
+    double S, a, b, c, area;
 
-  S=0.0;
-  a=sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
-  S+=a;
-  b=sqrt((x3-x2)*(x3-x2)+(y3-y2)*(y3-y2));
-  S+=b;
-  c=sqrt((x3-x1)*(x3-x1)+(y3-y1)*(y3-y1));
-  S+=c;
+    S  = 0.0;
+    a  = sqrt((x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1));
+    S += a;
+    b  = sqrt((x3 - x2) * (x3 - x2) + (y3 - y2) * (y3 - y2));
+    S += b;
+    c  = sqrt((x3 - x1) * (x3 - x1) + (y3 - y1) * (y3 - y1));
+    S += c;
 
-  S*=0.5;
-  area=sqrt(S*(S-a)*(S-b)*(S-c));
+    S   *= 0.5;
+    area = sqrt(S * (S - a) * (S - b) * (S - c));
 
-  return area;
+    return area;
 }
-  
-  
 
 KOKKOS_FUNCTION
-void get_area_weights2D(const ViewCArrayKokkos <double> &corner_areas,
-                        const size_t elem_gid,
-                        const DViewCArrayKokkos <double> &node_coords,
-                        const ViewCArrayKokkos <size_t>  &elem_node_gids){
-
+void get_area_weights2D(const ViewCArrayKokkos<double>&  corner_areas,
+                        const size_t                     elem_gid,
+                        const DViewCArrayKokkos<double>& node_coords,
+                        const ViewCArrayKokkos<size_t>&  elem_node_gids)
+{
     const size_t num_nodes = 4;
 
     double x_array[4];
     double y_array[4];
 
-    double rc,zc;
-    double A12,A23,A34,A41;
-    
+    double rc, zc;
+    double A12, A23, A34, A41;
+
     // x, y coordinates of elem vertices
-    ViewCArrayKokkos <double> x(x_array, num_nodes);
-    ViewCArrayKokkos <double> y(y_array, num_nodes);
+    ViewCArrayKokkos<double> x(x_array, num_nodes);
+    ViewCArrayKokkos<double> y(y_array, num_nodes);
 
     // get the coordinates of the nodes(rk,elem,node) in this element
-    rc=zc=0.0;
-    for (int node_lid = 0; node_lid < num_nodes; node_lid++){
+    rc = zc = 0.0;
+    for (int node_lid = 0; node_lid < num_nodes; node_lid++)
+    {
         x(node_lid) = node_coords(1, elem_node_gids(node_lid), 0);
         y(node_lid) = node_coords(1, elem_node_gids(node_lid), 1);
-    rc+=0.25*y(node_lid);
-    zc+=0.25*x(node_lid);
+        rc += 0.25 * y(node_lid);
+        zc += 0.25 * x(node_lid);
     } // end for
 
     /* ensight node order   0 1 2 3
        Barlow node order    1 2 3 4
     */
-    
-    A12 = heron(x(0),y(0),zc,rc,x(1),y(1));
-    A23 = heron(x(1),y(1),zc,rc,x(2),y(2));
-    A34 = heron(x(2),y(2),zc,rc,x(3),y(3));
-    A41 = heron(x(3),y(3),zc,rc,x(0),y(0));
-
-    corner_areas(0)=(5.*A41+5.*A12+A23+A34)/12.;
-    corner_areas(1)=(A41+5.*A12+5.*A23+A34)/12.;
-    corner_areas(2)=(A41+A12+5.*A23+5.*A34)/12.;
-    corner_areas(3)=(5.*A41+A12+A23+5.*A34)/12.;
 
+    A12 = heron(x(0), y(0), zc, rc, x(1), y(1));
+    A23 = heron(x(1), y(1), zc, rc, x(2), y(2));
+    A34 = heron(x(2), y(2), zc, rc, x(3), y(3));
+    A41 = heron(x(3), y(3), zc, rc, x(0), y(0));
 
+    corner_areas(0) = (5. * A41 + 5. * A12 + A23 + A34) / 12.;
+    corner_areas(1) = (A41 + 5. * A12 + 5. * A23 + A34) / 12.;
+    corner_areas(2) = (A41 + A12 + 5. * A23 + 5. * A34) / 12.;
+    corner_areas(3) = (5. * A41 + A12 + A23 + 5. * A34) / 12.;
 
     return;
-    
 } // end subroutine
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/input.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/input.cpp
index 574614f7e..7e8fbc0b7 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/input.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/input.cpp
@@ -1,12 +1,11 @@
-                                                           
+
 // -----------------------------------------------------------------------------
 // This code is the input file
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "matar.h"
 #include "state.h"
 #include "mesh.h"
 
-
 // Problem choice:
 //     sedov 3D and RZ
 //     sod3D x  (set up for 3D 200x1x1 mesh)
@@ -20,100 +19,92 @@
 //     Taylor Anvil
 namespace test
 {
-    
-    // applying initial conditions
-    enum setup
-    {
-        none = 0,
-        Sedov3D = 1,
-        SedovRZ = 2,
-        
-        Noh3D = 3,
-        NohRZ = 4,
-        
-        SodZ = 5,
-        Sod3DX = 6,
-        Sod3DY = 7,
-        Sod3DZ = 8,
-        
-        TriplePoint = 9,
-        TaylorAnvil = 10,
-        
-        box = 11,
-    };
-    
+// applying initial conditions
+enum setup
+{
+    none = 0,
+    Sedov3D = 1,
+    SedovRZ = 2,
+
+    Noh3D = 3,
+    NohRZ = 4,
+
+    SodZ = 5,
+    Sod3DX = 6,
+    Sod3DY = 7,
+    Sod3DZ = 8,
+
+    TriplePoint = 9,
+    TaylorAnvil = 10,
+
+    box = 11,
+};
 } // end of initial conditions namespace
 
 test::setup test_problem;
 
 // -----------------------------------------------------------------------------
 // The user must input their parameters inside this function
-//------------------------------------------------------------------------------
-void input(CArrayKokkos <material_t> &material,
-           CArrayKokkos <mat_fill_t> &mat_fill,
-           CArrayKokkos <boundary_t> &boundary,
-           CArrayKokkos <double> &state_vars,
-           size_t &num_materials,
-           size_t &num_fills,
-           size_t &num_bcs,
-           size_t &num_dims,
-           size_t &max_num_state_vars,
-           double &dt_start,
-           double &time_final,
-           double &dt_max,
-           double &dt_min,
-           double &dt_cfl,
-           double &graphics_dt_ival,
-           size_t &graphics_cyc_ival,
-           size_t &cycle_stop,
-           size_t &rk_num_stages){
-    
+// ------------------------------------------------------------------------------
+void input(CArrayKokkos<material_t>& material,
+           CArrayKokkos<mat_fill_t>& mat_fill,
+           CArrayKokkos<boundary_t>& boundary,
+           CArrayKokkos<double>&     state_vars,
+           size_t&                   num_materials,
+           size_t&                   num_fills,
+           size_t&                   num_bcs,
+           size_t&                   num_dims,
+           size_t&                   max_num_state_vars,
+           double&                   dt_start,
+           double&                   time_final,
+           double&                   dt_max,
+           double&                   dt_min,
+           double&                   dt_cfl,
+           double&                   graphics_dt_ival,
+           size_t&                   graphics_cyc_ival,
+           size_t&                   cycle_stop,
+           size_t&                   rk_num_stages)
+{
     // Dimensions
     num_dims = 3;
-    
+
     // ---- time varaibles and cycle info ----
     time_final = 1.0;  // 1.0 for Sedov
-    dt_min = 1.e-8;
-    dt_max = 1.e-2;
-    dt_start = 1.e-5;
+    dt_min     = 1.e-8;
+    dt_max     = 1.e-2;
+    dt_start   = 1.e-5;
     cycle_stop = 100000;
 
-
     // ---- graphics information ----
     graphics_cyc_ival = 1000000;
     graphics_dt_ival  = 0.25;
 
-    
     // --- number of material regions ---
     num_materials = 1;
-    material = CArrayKokkos <material_t> (num_materials); // create material
-    
-    
+    material      = CArrayKokkos<material_t>(num_materials); // create material
+
     // --- declare model state variable array size ---
     max_num_state_vars = 6;  // it is a memory block
-    state_vars = CArrayKokkos <double> (num_materials, max_num_state_vars); // init values
-    
-    
+    state_vars = CArrayKokkos<double>(num_materials, max_num_state_vars); // init values
+
     // --- number of fill regions ---
     num_fills = 2;  // =2 for Sedov
-    mat_fill = CArrayKokkos <mat_fill_t> (num_fills); // create fills
-    
-    
+    mat_fill  = CArrayKokkos<mat_fill_t>(num_fills);  // create fills
+
     // --- number of boundary conditions ---
-    num_bcs=6;  // =6 for Sedov
-    boundary = CArrayKokkos <boundary_t> (num_bcs);  // create boundaries
-    
+    num_bcs  = 6; // =6 for Sedov
+    boundary = CArrayKokkos<boundary_t>(num_bcs);  // create boundaries
+
     // --- test problems ---
     test_problem = test::Sedov3D;
-    
-    
+
     // ---- fill instructions and intial conditions ---- //
 
-    
     // Sedov blast wave test case
-    if (test_problem == test::Sedov3D){
+    if (test_problem == test::Sedov3D)
+    {
         time_final = 1.0;  // 1.0 for Sedov
-        
+
         RUN({
             // gamma law model
             // statev(0) = gamma
@@ -122,89 +113,85 @@ void input(CArrayKokkos <material_t> &material,
             // statev(3) = ref temperature
             // statev(4) = ref density
             // statev(5) = ref specific internal energy
-            
+
             material(0).eos_model = ideal_gas; // EOS model is required
-            
-            material(0).strength_type = model::none;
+
+            material(0).strength_type  = model::none;
             material(0).strength_setup = model_init::input; // not need, the input is the default
             material(0).strength_model = NULL;  // not needed, but illistrates the syntax
-            
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat
+
             // global initial conditions
             mat_fill(0).volume = region::global; // fill everywhere
             mat_fill(0).mat_id = 0;              // material id
-            mat_fill(0).den = 1.0;               // intial density
-            mat_fill(0).sie = 1.e-10;            // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;            // intial density
+            mat_fill(0).sie    = 1.e-10;         // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;   // initial x-dir velocity
             mat_fill(0).v = 0.0;   // initial y-dir velocity
             mat_fill(0).w = 0.0;   // initial z-dir velocity
-            
+
             // energy source initial conditions
-            mat_fill(1).volume = region::sphere; // fill a sphere
-            mat_fill(1).mat_id = 0;              // material id
+            mat_fill(1).volume  = region::sphere; // fill a sphere
+            mat_fill(1).mat_id  = 0;             // material id
             mat_fill(1).radius1 = 0.0;           // inner radius of fill region
-            mat_fill(1).radius2 = 1.2/8.0;       // outer radius of fill region
-            mat_fill(1).den = 1.0;               // initial density
-            mat_fill(1).sie = (963.652344*
-                               pow((1.2/30.0),3))/pow((mat_fill(1).radius2),3);
-            
+            mat_fill(1).radius2 = 1.2 / 16.0;       // outer radius of fill region
+            mat_fill(1).den     = 1.0;           // initial density
+            mat_fill(1).sie     = (963.652344 *
+                                   pow((1.2 / 30.0), 3)) / pow((mat_fill(1).radius2), 3);
+
             mat_fill(1).velocity = init_conds::cartesian;
             mat_fill(1).u = 0.0;   // initial x-dir velocity
             mat_fill(1).v = 0.0;   // initial y-dir velocity
             mat_fill(1).w = 0.0;   // initial z-dir velocity
 
-
-
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
+
             // Tag Z plane
-            boundary(2).surface = bdy::z_plane;
-            boundary(2).value = 0.0;
+            boundary(2).surface  = bdy::z_plane;
+            boundary(2).value    = 0.0;
             boundary(2).hydro_bc = bdy::reflected;
-            
-            
+
             // Tag X plane
-            boundary(3).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(3).value = 1.2;
+            boundary(3).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(3).value    = 1.2;
             boundary(3).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(4).surface = bdy::y_plane;
-            boundary(4).value = 1.2;
+            boundary(4).surface  = bdy::y_plane;
+            boundary(4).value    = 1.2;
             boundary(4).hydro_bc = bdy::reflected;
-            
+
             // Tag Z plane
-            boundary(5).surface = bdy::z_plane;
-            boundary(5).value = 1.2;
+            boundary(5).surface  = bdy::z_plane;
+            boundary(5).value    = 1.2;
             boundary(5).hydro_bc = bdy::reflected;
-            
         });  // end RUN
-
     } // end if Sedov
-    
+
     // 2D RZ Sedov blast wave test case
-    if (test_problem == test::SedovRZ){
+    if (test_problem == test::SedovRZ)
+    {
         time_final = 1.0;  // 1.0 for Sedov
         RUN({
             // gamma law model
@@ -214,189 +201,173 @@ void input(CArrayKokkos <material_t> &material,
             // statev(3) = ref temperature
             // statev(4) = ref density
             // statev(5) = ref specific internal energy
-            
+
             material(0).eos_model = ideal_gas; // EOS model is required
-            
+
             material(0).strength_type  = model::none;
             material(0).strength_model = NULL;  // not needed, but illistrates the syntax
-            
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat
+
             // global initial conditions
             mat_fill(0).volume = region::global; // fill everywhere
             mat_fill(0).mat_id = 0;              // material id
-            mat_fill(0).den = 1.0;               // intial density
-            mat_fill(0).sie = 1.e-10;            // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;            // intial density
+            mat_fill(0).sie    = 1.e-10;         // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;   // initial x-dir velocity
             mat_fill(0).v = 0.0;   // initial y-dir velocity
             mat_fill(0).w = 0.0;   // initial z-dir velocity
-            
+
             // energy source initial conditions
-            mat_fill(1).volume = region::sphere; // fill a sphere
-            mat_fill(1).mat_id = 0;              // material id
+            mat_fill(1).volume  = region::sphere; // fill a sphere
+            mat_fill(1).mat_id  = 0;             // material id
             mat_fill(1).radius1 = 0.0;           // inner radius of fill region
-            mat_fill(1).radius2 = 1.2/8.0;       // outer radius of fill region
-            mat_fill(1).den = 1.0;               // initial density
-            double vol = PI*( pow((mat_fill(1).radius2),3)
-                            - pow((mat_fill(1).radius1),3) );
-            //vol = 4./3.* PI * ( pow((mat_fill(1).radius2),3) - pow((mat_fill(1).radius1),3) )/2.0;
-            mat_fill(1).sie = (0.5*0.49339/vol);
-            
+            mat_fill(1).radius2 = 1.2 / 8.0;       // outer radius of fill region
+            mat_fill(1).den     = 1.0;           // initial density
+            double vol = PI * (pow((mat_fill(1).radius2), 3)
+                               - pow((mat_fill(1).radius1), 3) );
+            // vol = 4./3.* PI * ( pow((mat_fill(1).radius2),3) - pow((mat_fill(1).radius1),3) )/2.0;
+            mat_fill(1).sie = (0.5 * 0.49339 / vol);
+
             mat_fill(1).velocity = init_conds::cartesian;
             mat_fill(1).u = 0.0;   // initial x-dir velocity
             mat_fill(1).v = 0.0;   // initial y-dir velocity
             mat_fill(1).w = 0.0;   // initial z-dir velocity
 
-
-
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
-            
+
             // Tag X plane
-            //boundary(2).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            //boundary(2).value = 1.2;
-            //boundary(2).hydro_bc = bdy::reflected;
+            // boundary(2).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
+            // boundary(2).value = 1.2;
+            // boundary(2).hydro_bc = bdy::reflected;
             //
             //// Tag Y plane
-            //boundary(3).surface = bdy::y_plane;
-            //boundary(3).value = 1.2;
-            //boundary(3).hydro_bc = bdy::reflected;
-            
-            
+            // boundary(3).surface = bdy::y_plane;
+            // boundary(3).value = 1.2;
+            // boundary(3).hydro_bc = bdy::reflected;
+
             // Tag inner cylinder
-            boundary(2).surface = bdy::cylinder;
-            boundary(2).value = 0.01;
+            boundary(2).surface  = bdy::cylinder;
+            boundary(2).value    = 0.01;
             boundary(2).hydro_bc = bdy::fixed;
-            
         });  // end RUN
-
     } // end if Sedov
-    
-    
-    // Noh 3D
-    if (test_problem == test::Noh3D){
 
+    // Noh 3D
+    if (test_problem == test::Noh3D)
+    {
         time_final = 0.6;
-        
+
         RUN({
-            
             material(0).eos_model = ideal_gas; // EOS model
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 6;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat c_v
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat c_v
+
             // Global instructions
             mat_fill(0).volume = region::global;   // fill everywhere
             mat_fill(0).mat_id = 0;                // material id
-            mat_fill(0).den = 1.0;                   // intial density
-            mat_fill(0).sie = 1e-9;             // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;                // intial density
+            mat_fill(0).sie    = 1e-9;          // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::spherical;
-            mat_fill(0).speed = -1.0;
-            
+            mat_fill(0).speed    = -1.0;
+
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
+
             // Tag Z plane
-            boundary(2).surface = bdy::z_plane;
-            boundary(2).value = 0.0;
+            boundary(2).surface  = bdy::z_plane;
+            boundary(2).value    = 0.0;
             boundary(2).hydro_bc = bdy::reflected;
-            
         });  // end RUN
-            
     } // end if Noh
-    
-    
-    // Noh 2D
-    if (test_problem == test::NohRZ){
 
+    // Noh 2D
+    if (test_problem == test::NohRZ)
+    {
         time_final = 0.6;
-        
+
         RUN({
-            
             material(0).eos_model = ideal_gas; // EOS model
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat c_v
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat c_v
+
             // Global instructions
             mat_fill(0).volume = region::global;   // fill everywhere
             mat_fill(0).mat_id = 0;                // material id
-            mat_fill(0).den = 1.0;                   // intial density
-            mat_fill(0).sie = 1e-9;             // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;                // intial density
+            mat_fill(0).sie    = 1e-9;          // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::radial;
-            mat_fill(0).speed = -1.0;
-            
+            mat_fill(0).speed    = -1.0;
+
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(2).surface = bdy::cylinder;
-            boundary(2).value = 0.01;
+            boundary(2).surface  = bdy::cylinder;
+            boundary(2).value    = 0.01;
             boundary(2).hydro_bc = bdy::fixed;
-            
-            
         });  // end RUN
-            
     } // end if Noh
-    
-    
+
     // Sod in Z direction in RZ coordinates (eq. to x-dir)
-    if (test_problem == test::SodZ){
-        
+    if (test_problem == test::SodZ)
+    {
         time_final = 0.2;  // 1.0 for Sedov
-        
+
         RUN({
             // gamma law model
             // statev(0) = gamma
@@ -405,86 +376,79 @@ void input(CArrayKokkos <material_t> &material,
             // statev(3) = ref temperature
             // statev(4) = ref density
             // statev(5) = ref specific internal energy
-            
+
             material(0).eos_model = ideal_gas; // EOS model is required
-            
-            material(0).strength_type = model::none;
+
+            material(0).strength_type  = model::none;
             material(0).strength_model = NULL;  // not needed, but illistrates the syntax
-            
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 1.4; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat
-            
+            state_vars(0, 0) = 1.4; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat
+
             // global initial conditions
             mat_fill(0).volume = region::global; // fill everywhere
             mat_fill(0).mat_id = 0;              // material id
-            mat_fill(0).den = 1.0;               // intial density
-            mat_fill(0).sie = 2.5;            // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;            // intial density
+            mat_fill(0).sie    = 2.5;         // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;   // initial x-dir velocity
             mat_fill(0).v = 0.0;   // initial y-dir velocity
             mat_fill(0).w = 0.0;   // initial z-dir velocity
-            
+
             // energy source initial conditions
             mat_fill(1).volume = region::box;    // fill a box
             mat_fill(1).mat_id = 0;              // material id
-            mat_fill(1).x1 = 0.5;           //
-            mat_fill(1).x2 = 1.0;           //
-            mat_fill(1).y1 = 0.0;           //
-            mat_fill(1).y2 = 1.0;           //
-            mat_fill(1).z1 = 0.0;           //
-            mat_fill(1).z2 = 1.0;           //
-            mat_fill(1).den = 0.125;        // initial density
-            mat_fill(1).sie = 2.5;          // initial specific internal energy
-            
-            
+            mat_fill(1).x1     = 0.5;       //
+            mat_fill(1).x2     = 1.0;       //
+            mat_fill(1).y1     = 0.0;       //
+            mat_fill(1).y2     = 1.0;       //
+            mat_fill(1).z1     = 0.0;       //
+            mat_fill(1).z2     = 1.0;       //
+            mat_fill(1).den    = 0.125;     // initial density
+            mat_fill(1).sie    = 2.5;       // initial specific internal energy
+
             mat_fill(1).velocity = init_conds::cartesian;
             mat_fill(1).u = 0.0;   // initial x-dir velocity
             mat_fill(1).v = 0.0;   // initial y-dir velocity
             mat_fill(1).w = 0.0;   // initial z-dir velocity
 
-
-
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
-            
+
             // Tag X plane
-            boundary(2).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(2).value = 1.0;
+            boundary(2).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(2).value    = 1.0;
             boundary(2).hydro_bc = bdy::reflected;
 
             // Tag Y plane
-            boundary(3).surface = bdy::y_plane;
-            boundary(3).value = 0.1;
+            boundary(3).surface  = bdy::y_plane;
+            boundary(3).value    = 0.1;
             boundary(3).hydro_bc = bdy::reflected;
-            
         });  // end RUN
-
     } // end if SodZ
-    
-    
+
     // Triple point
-    if (test_problem == test::TriplePoint){
-        
-        time_final = 4.0; 
-        
+    if (test_problem == test::TriplePoint)
+    {
+        time_final = 4.0;
+
         RUN({
             // gamma law model
             // statev(0) = gamma
@@ -493,166 +457,153 @@ void input(CArrayKokkos <material_t> &material,
             // statev(3) = ref temperature
             // statev(4) = ref density
             // statev(5) = ref specific internal energy
-            
+
             material(0).eos_model = ideal_gas; // EOS model is required
-            
-            material(0).strength_type = model::none;
+
+            material(0).strength_type  = model::none;
             material(0).strength_model = NULL;  // not needed, but illistrates the syntax
-            
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat
+
             // global initial conditions
             mat_fill(0).volume = region::global; // fill everywhere
             mat_fill(0).mat_id = 0;              // material id
-            mat_fill(0).den = 1.0;               // intial density
-            mat_fill(0).sie = 2.5;            // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;            // intial density
+            mat_fill(0).sie    = 2.5;         // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;   // initial x-dir velocity
             mat_fill(0).v = 0.0;   // initial y-dir velocity
             mat_fill(0).w = 0.0;   // initial z-dir velocity
-            
-            
+
             // initial conditions, region 1
             mat_fill(1).volume = region::box;    // fill a box
             mat_fill(1).mat_id = 0;              // material id
-            mat_fill(1).x1 = 1.0;           //
-            mat_fill(1).x2 = 7.0;           //
-            mat_fill(1).y1 = 0.0;           //
-            mat_fill(1).y2 = 1.5;           //
-            mat_fill(1).z1 = 0.0;           //
-            mat_fill(1).z2 = 1.0;           //
-            mat_fill(1).den = 1.0;          // initial density
-            mat_fill(1).sie = 0.25;         // initial specific internal energy
-            
+            mat_fill(1).x1     = 1.0;       //
+            mat_fill(1).x2     = 7.0;       //
+            mat_fill(1).y1     = 0.0;       //
+            mat_fill(1).y2     = 1.5;       //
+            mat_fill(1).z1     = 0.0;       //
+            mat_fill(1).z2     = 1.0;       //
+            mat_fill(1).den    = 1.0;       // initial density
+            mat_fill(1).sie    = 0.25;      // initial specific internal energy
+
             mat_fill(1).velocity = init_conds::cartesian;
             mat_fill(1).u = 0.0;   // initial x-dir velocity
             mat_fill(1).v = 0.0;   // initial y-dir velocity
             mat_fill(1).w = 0.0;   // initial z-dir velocity
-            
+
             // initial conditions, region 2
             mat_fill(2).volume = region::box;    // fill a box
             mat_fill(2).mat_id = 0;              // material id
-            mat_fill(2).x1 = 1.0;           //
-            mat_fill(2).x2 = 7.0;           //
-            mat_fill(2).y1 = 1.5;           //
-            mat_fill(2).y2 = 3.0;           //
-            mat_fill(2).z1 = 0.0;           //
-            mat_fill(2).z2 = 1.0;           //
-            mat_fill(2).den = 0.1;        // initial density
-            mat_fill(2).sie = 2.5;          // initial specific internal energy
-            
+            mat_fill(2).x1     = 1.0;       //
+            mat_fill(2).x2     = 7.0;       //
+            mat_fill(2).y1     = 1.5;       //
+            mat_fill(2).y2     = 3.0;       //
+            mat_fill(2).z1     = 0.0;       //
+            mat_fill(2).z2     = 1.0;       //
+            mat_fill(2).den    = 0.1;     // initial density
+            mat_fill(2).sie    = 2.5;       // initial specific internal energy
+
             mat_fill(2).velocity = init_conds::cartesian;
             mat_fill(2).u = 0.0;   // initial x-dir velocity
             mat_fill(2).v = 0.0;   // initial y-dir velocity
             mat_fill(2).w = 0.0;   // initial z-dir velocity
 
-
-
             // ---- boundary conditions ---- //
-            
+
             // Tag X = 0 plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
+
             // Tag Y = 0 plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
+
             // Tag Z = 0 plane
-            boundary(2).surface = bdy::z_plane;
-            boundary(2).value = 0.0;
+            boundary(2).surface  = bdy::z_plane;
+            boundary(2).value    = 0.0;
             boundary(2).hydro_bc = bdy::reflected;
 
-
             // Tag X = 7 plane
-            boundary(3).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(3).value = 6.0;  // some meshes are 7 and others are 6
+            boundary(3).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(3).value    = 6.0; // some meshes are 7 and others are 6
             boundary(3).hydro_bc = bdy::reflected;
-            
+
             // Tag Y = 3 plane
-            boundary(4).surface = bdy::y_plane;
-            boundary(4).value = 3.0;
+            boundary(4).surface  = bdy::y_plane;
+            boundary(4).value    = 3.0;
             boundary(4).hydro_bc = bdy::reflected;
-            
+
             // Tag Z = 1 plane
-            boundary(5).surface = bdy::z_plane;
-            boundary(5).value = 1.0;
+            boundary(5).surface  = bdy::z_plane;
+            boundary(5).value    = 1.0;
             boundary(5).hydro_bc = bdy::reflected;
-            
         });  // end RUN
-
     } // end if SodZ
-    
-    
-    // Taylor Anvil
-    if (test_problem == test::TaylorAnvil){
 
+    // Taylor Anvil
+    if (test_problem == test::TaylorAnvil)
+    {
         time_final = 25.0;
-        
+
         RUN({
-            
             material(0).eos_model = ideal_gas; // EOS model
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat c_v
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat c_v
+
             // Global instructions
             mat_fill(0).volume = region::global;   // fill everywhere
             mat_fill(0).mat_id = 0;                // material id
-            mat_fill(0).den = 1.0;                   // intial density
-            mat_fill(0).sie = 1e-9;             // intial specific internal energy
-            
+            mat_fill(0).den    = 1.0;                // intial density
+            mat_fill(0).sie    = 1e-9;          // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;
             mat_fill(0).v = 0.0;
             mat_fill(0).w = -1.0;
-            
+
             // ---- boundary conditions ---- //
-            
+
             // Tag X plane
-            boundary(0).surface = bdy::x_plane; // planes, cylinder, spheres, or a files
-            boundary(0).value = 0.0;
+            boundary(0).surface  = bdy::x_plane; // planes, cylinder, spheres, or a files
+            boundary(0).value    = 0.0;
             boundary(0).hydro_bc = bdy::reflected;
-            
-            
+
             // Tag Y plane
-            boundary(1).surface = bdy::y_plane;
-            boundary(1).value = 0.0;
+            boundary(1).surface  = bdy::y_plane;
+            boundary(1).value    = 0.0;
             boundary(1).hydro_bc = bdy::reflected;
-            
+
             // Tag Z plane
-            boundary(2).surface = bdy::z_plane;
-            boundary(2).value = 0.0;
+            boundary(2).surface  = bdy::z_plane;
+            boundary(2).value    = 0.0;
             boundary(2).hydro_bc = bdy::reflected;
-            
-            
         });  // end RUN
-        
     } // end if Taylor Anvil
-    
-    
+
     // a voxel box mesh
-    if (test_problem == test::box){
+    if (test_problem == test::box)
+    {
         time_final = 10.0;  // 1.0 for Sedov
-        
-        
+
         RUN({
             // gamma law model
             // statev(0) = gamma
@@ -661,43 +612,40 @@ void input(CArrayKokkos <material_t> &material,
             // statev(3) = ref temperature
             // statev(4) = ref density
             // statev(5) = ref specific internal energy
-            
+
             material(0).eos_model = ideal_gas; // EOS model is required
-            
-            material(0).strength_type = model::none;
+
+            material(0).strength_type  = model::none;
             material(0).strength_setup = model_init::input; // not need, the input is the default
             material(0).strength_model = NULL;  // not needed, but illistrates the syntax
-            
-            material(0).q1        = 1.0;       // accoustic coefficient
-            material(0).q2        = 1.3333;    // linear slope of UsUp for Riemann solver
-            material(0).q1ex      = 1.0;       // accoustic coefficient in expansion
-            material(0).q2ex      = 0.0;       // linear slope of UsUp in expansion
-            
+
+            material(0).q1   = 1.0;            // accoustic coefficient
+            material(0).q2   = 1.3333;         // linear slope of UsUp for Riemann solver
+            material(0).q1ex = 1.0;            // accoustic coefficient in expansion
+            material(0).q2ex = 0.0;            // linear slope of UsUp in expansion
+
             material(0).num_state_vars = 3;  // actual num_state_vars
-            state_vars(0,0) = 5.0/3.0; // gamma value
-            state_vars(0,1) = 1.0E-14; // minimum sound speed
-            state_vars(0,2) = 1.0;     // specific heat
-            
-            
-            
-            
+            state_vars(0, 0) = 5.0 / 3.0; // gamma value
+            state_vars(0, 1) = 1.0E-14; // minimum sound speed
+            state_vars(0, 2) = 1.0;     // specific heat
+
             // global initial conditions
             mat_fill(0).volume = region::global; // fill everywhere
             mat_fill(0).mat_id = 0;              // material id
-            mat_fill(0).den = 0.5;               // intial density
-            mat_fill(0).sie = 1.e-10;            // intial specific internal energy
-            
+            mat_fill(0).den    = 0.5;            // intial density
+            mat_fill(0).sie    = 1.e-10;         // intial specific internal energy
+
             mat_fill(0).velocity = init_conds::cartesian;
             mat_fill(0).u = 0.0;   // initial x-dir velocity
             mat_fill(0).v = 0.0;   // initial y-dir velocity
             mat_fill(0).w = 0.0;   // initial z-dir velocity
-            
+
             // read a voxel mesh and paint material onto the mesh
             mat_fill(1).volume = region::readVoxelFile; // fill in voxel structure
             mat_fill(1).mat_id = 0;              // material id
-            mat_fill(1).den = 2.7;               // intial density
-            mat_fill(1).sie = 1.e-10;            // intial specific internal energy
-            
+            mat_fill(1).den    = 2.7;            // intial density
+            mat_fill(1).sie    = 1.e-10;         // intial specific internal energy
+
             mat_fill(1).velocity = init_conds::cartesian;
             mat_fill(1).u = 0.0;   // initial x-dir velocity
             mat_fill(1).v = 0.0;   // initial y-dir velocity
@@ -705,30 +653,21 @@ void input(CArrayKokkos <material_t> &material,
 
             // ---- boundary conditions ---- //
 
-            
             // Tag Y plane
-            boundary(0).surface = bdy::y_plane;
-            boundary(0).value = -0.381;
+            boundary(0).surface  = bdy::y_plane;
+            boundary(0).value    = -0.381;
             boundary(0).hydro_bc = bdy::reflected;
-            
-            
-            
+
             // Tag top y plane
-            boundary(1).surface = bdy::y_plane; // planes, cylinder, spheres, or a files
-            boundary(1).value = 37.719;
-            boundary(1).hydro_bc = bdy::velocity;
-            boundary(1).hydro_bc_vel_0 = -1.0;  // velocity is downward
-            boundary(1).hydro_bc_vel_1 = 0.5;
+            boundary(1).surface              = bdy::y_plane; // planes, cylinder, spheres, or a files
+            boundary(1).value                = 37.719;
+            boundary(1).hydro_bc             = bdy::velocity;
+            boundary(1).hydro_bc_vel_0       = -1.0; // velocity is downward
+            boundary(1).hydro_bc_vel_1       = 0.5;
             boundary(1).hydro_bc_vel_t_start = 0;
-            boundary(1).hydro_bc_vel_t_end = 10000.0; // mu seconds
-            
+            boundary(1).hydro_bc_vel_t_end   = 10000.0; // mu seconds
         });  // end RUN
-
     } // end if box
-        
 
     return;
-
 } // end of input
-
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/mesh.h b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/mesh.h
index babcceb27..941a2a8c2 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/mesh.h
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/mesh.h
@@ -1,7 +1,9 @@
+// -------------------------------------------------------
+// Mesh data
+// --------------------------------------------------------
 #ifndef MESH_H
 #define MESH_H
 
-
 #include "matar.h"
 #include "state.h"
 
@@ -50,29 +52,29 @@ patch 6: [4,5,6,7]  zeta-plus  dir
 
 // sort in ascending order using bubble sort
 KOKKOS_INLINE_FUNCTION
-void bubble_sort(size_t arr[], const size_t num){
-    
-    for (size_t i=0; i<(num-1); i++){
-        for (size_t j=0; j<(num-i-1); j++){
-            
-            if (arr[j]>arr[j+1]){
+void bubble_sort(size_t arr[], const size_t num)
+{
+    for (size_t i = 0; i < (num - 1); i++)
+    {
+        for (size_t j = 0; j < (num - i - 1); j++)
+        {
+            if (arr[j] > arr[j + 1])
+            {
                 size_t temp = arr[j];
-                arr[j] = arr[j+1];
-                arr[j+1] = temp;
+                arr[j]     = arr[j + 1];
+                arr[j + 1] = temp;
             } // end if
-            
         } // end for j
     } // end for i
 } // end function
 
-
 // mesh sizes and connectivity data structures
-struct mesh_t {
-    
+struct mesh_t
+{
     size_t num_dims;
-    
+
     size_t num_nodes;
-    
+
     size_t num_elems;
     size_t num_nodes_in_elem;
     size_t num_patches_in_elem;
@@ -80,174 +82,160 @@ struct mesh_t {
     size_t num_patches_in_surf;  // high-order mesh class
 
     size_t num_corners;
-    
+
     size_t num_patches;
     size_t num_surfs;           // high_order mesh class
-    
+
     size_t num_bdy_patches;
     size_t num_bdy_nodes;
     size_t num_bdy_sets;
     size_t num_nodes_in_patch;
 
-    
     // ---- nodes ----
-    
+
     // corner ids in node
-    RaggedRightArrayKokkos <size_t> corners_in_node;
-    CArrayKokkos <size_t> num_corners_in_node;
-    
+    RaggedRightArrayKokkos<size_t> corners_in_node;
+    CArrayKokkos<size_t> num_corners_in_node;
+
     // elem ids in node
-    RaggedRightArrayKokkos <size_t> elems_in_node;
-    
+    RaggedRightArrayKokkos<size_t> elems_in_node;
+
     // node ids in node
-    RaggedRightArrayKokkos <size_t> nodes_in_node;
-    CArrayKokkos <size_t> num_nodes_in_node;
-    
-    
+    RaggedRightArrayKokkos<size_t> nodes_in_node;
+    CArrayKokkos<size_t> num_nodes_in_node;
+
     // ---- elems ----
-    
+
     // node ids in elem
-    DCArrayKokkos <size_t> nodes_in_elem;
-    
+    DCArrayKokkos<size_t> nodes_in_elem;
+
     // corner ids in elem
-    CArrayKokkos <size_t> corners_in_elem;
-    
+    CArrayKokkos<size_t> corners_in_elem;
+
     // elem ids in elem
-    RaggedRightArrayKokkos <size_t> elems_in_elem;
-    CArrayKokkos <size_t> num_elems_in_elem;
-    
+    RaggedRightArrayKokkos<size_t> elems_in_elem;
+    CArrayKokkos<size_t> num_elems_in_elem;
+
     // patch ids in elem
-    CArrayKokkos <size_t> patches_in_elem;
-    
+    CArrayKokkos<size_t> patches_in_elem;
+
     // surface ids in elem
-    CArrayKokkos <size_t> surfs_in_elem;  // high-order mesh class
-    
+    CArrayKokkos<size_t> surfs_in_elem;  // high-order mesh class
+
     // surface ids in elem
-    CArrayKokkos <size_t> zones_in_elem;     // high-order mesh class
-    
-    
-    
+    CArrayKokkos<size_t> zones_in_elem;     // high-order mesh class
+
     // ---- patches / surfaces ----
-    
+
     // node ids in a patch
-    CArrayKokkos <size_t> nodes_in_patch;
-    
+    CArrayKokkos<size_t> nodes_in_patch;
+
     // element ids in a patch
-    CArrayKokkos <size_t> elems_in_patch;
-    
+    CArrayKokkos<size_t> elems_in_patch;
+
     // the two element ids sharing a surface
-    CArrayKokkos <size_t> elems_in_surf;
-    
+    CArrayKokkos<size_t> elems_in_surf;
+
     // patch ids in a surface
-    CArrayKokkos <size_t> patches_in_surf;  // high-order mesh class
-    
-    CArrayKokkos <size_t> surf_in_patch;       // high-order mesh class
-    
-    
+    CArrayKokkos<size_t> patches_in_surf;  // high-order mesh class
+
+    CArrayKokkos<size_t> surf_in_patch;       // high-order mesh class
+
     // ---- bdy ----
-    
+
     // bdy_patches
-    CArrayKokkos <size_t> bdy_patches;
-    
+    CArrayKokkos<size_t> bdy_patches;
+
     // bdy nodes
-    CArrayKokkos <size_t> bdy_nodes;
-    
+    CArrayKokkos<size_t> bdy_nodes;
+
     // patch ids in bdy set
-    DynamicRaggedRightArrayKokkos <size_t> bdy_patches_in_set;
-    
+    DynamicRaggedRightArrayKokkos<size_t> bdy_patches_in_set;
+
     // node ids in bdy_patch set
-    RaggedRightArrayKokkos <size_t> bdy_nodes_in_set;
-    DCArrayKokkos <size_t> num_bdy_nodes_in_set;
-    
+    RaggedRightArrayKokkos<size_t> bdy_nodes_in_set;
+    DCArrayKokkos<size_t> num_bdy_nodes_in_set;
+
     // initialization methods
     void initialize_nodes(const size_t num_nodes_inp)
     {
         num_nodes = num_nodes_inp;
-        
+
         return;
-        
     }; // end method
-    
-    
+
     // initialization methods
     void initialize_elems(const size_t num_elems_inp, const size_t num_dims_inp)
     {
         num_dims = num_dims_inp;
         num_nodes_in_elem = 1;
-        for (int dim=0; dim<num_dims; dim++){
+        for (int dim = 0; dim < num_dims; dim++)
+        {
             num_nodes_in_elem *= 2;
         }
-        num_elems = num_elems_inp;
-        nodes_in_elem = DCArrayKokkos <size_t> (num_elems, num_nodes_in_elem);
-        corners_in_elem = CArrayKokkos <size_t> (num_elems, num_nodes_in_elem);
-        
+        num_elems       = num_elems_inp;
+        nodes_in_elem   = DCArrayKokkos<size_t>(num_elems, num_nodes_in_elem);
+        corners_in_elem = CArrayKokkos<size_t>(num_elems, num_nodes_in_elem);
+
         return;
-        
     }; // end method
-    
-    
+
     // initialization methods
     void initialize_corners(const size_t num_corners_inp)
     {
         num_corners = num_corners_inp;
-        
+
         return;
-        
     }; // end method
-    
-    
+
     // build the corner mesh connectivity arrays
-    void build_corner_connectivity(){
-        
-        num_corners_in_node = CArrayKokkos <size_t> (num_nodes); // stride sizes
-        
+    void build_corner_connectivity()
+    {
+        num_corners_in_node = CArrayKokkos<size_t>(num_nodes); // stride sizes
+
         // initializing the number of corners (node-cell pair) to be zero
         FOR_ALL_CLASS(node_gid, 0, num_nodes, {
             num_corners_in_node(node_gid) = 0;
         });
-        
-        
-        for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++){
-            FOR_ALL_CLASS(node_lid, 0, num_nodes_in_elem,{
-                
+
+        for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+        {
+            FOR_ALL_CLASS(node_lid, 0, num_nodes_in_elem, {
                 // get the global_id of the node
                 size_t node_gid = nodes_in_elem(elem_gid, node_lid);
 
                 // increment the number of corners attached to this point
                 num_corners_in_node(node_gid) = num_corners_in_node(node_gid) + 1;
-                
             });  // end FOR_ALL over nodes in element
         } // end for elem_gid
-        
-        
+
         // the stride sizes are the num_corners_in_node at the node
-        corners_in_node = RaggedRightArrayKokkos <size_t> (num_corners_in_node);
+        corners_in_node = RaggedRightArrayKokkos<size_t>(num_corners_in_node);
 
-        CArrayKokkos <size_t> count_saved_corners_in_node(num_nodes);
+        CArrayKokkos<size_t> count_saved_corners_in_node(num_nodes);
 
         // reset num_corners to zero
         FOR_ALL_CLASS(node_gid, 0, num_nodes, {
             count_saved_corners_in_node(node_gid) = 0;
         });
-        
-        
+
         // he elems_in_elem data type
-        elems_in_node = RaggedRightArrayKokkos <size_t> (num_corners_in_node);
-        
+        elems_in_node = RaggedRightArrayKokkos<size_t>(num_corners_in_node);
+
         // populate the elems connected to a node list and corners in a node
-        for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++){
+        for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+        {
             FOR_ALL_CLASS(node_lid, 0, num_nodes_in_elem, {
-                
                 // get the global_id of the node
                 size_t node_gid = nodes_in_elem(elem_gid, node_lid);
-                
+
                 // the column index is the num corners saved
                 size_t j = count_saved_corners_in_node(node_gid);
 
                 // Save corner index to this node_gid
-                size_t corner_gid = node_lid + elem_gid*num_nodes_in_elem;
+                size_t corner_gid = node_lid + elem_gid * num_nodes_in_elem;
                 corners_in_node(node_gid, j) = corner_gid;
-                
+
                 elems_in_node(node_gid, j) = elem_gid; // save the elem_gid
 
                 // Save corner index to element
@@ -256,136 +244,129 @@ struct mesh_t {
 
                 // increment the number of corners saved to this node_gid
                 count_saved_corners_in_node(node_gid) = count_saved_corners_in_node(node_gid) + 1;
-
             });  // end FOR_ALL over nodes in element
         } // end for elem_gid
-     
+
         return;
-        
     } // end of build_corner_connectivity
-    
-    
+
     // build elem connectivity arrays
-    void build_elem_elem_connectivity(){
-        
+    void build_elem_elem_connectivity()
+    {
         // find the max number of elems around a node
         size_t max_num_elems_in_node;
         size_t max_num_lcl;
         REDUCE_MAX_CLASS(node_gid, 0, num_nodes, max_num_lcl, {
-            
             // num_corners_in_node = num_elems_in_node
             size_t max_num = num_corners_in_node(node_gid);
-            
-            if (max_num > max_num_lcl) max_num_lcl = max_num;
-                    
+
+            if (max_num > max_num_lcl)
+            {
+                max_num_lcl = max_num;
+            }
         }, max_num_elems_in_node); // end parallel reduction on max
         Kokkos::fence();
-        
+
         // a temporary ragged array to save the elems around an elem
-        DynamicRaggedRightArrayKokkos <size_t> temp_elems_in_elem(num_nodes, num_nodes_in_elem*max_num_elems_in_node);
-        
-        num_elems_in_elem = CArrayKokkos <size_t> (num_elems);
+        DynamicRaggedRightArrayKokkos<size_t> temp_elems_in_elem(num_nodes, num_nodes_in_elem * max_num_elems_in_node);
+
+        num_elems_in_elem = CArrayKokkos<size_t>(num_elems);
         FOR_ALL_CLASS(elem_gid, 0, num_elems, {
             num_elems_in_elem(elem_gid) = 0;
         });
         Kokkos::fence();
-        
+
         // find and save neighboring elem_gids of an elem
         FOR_ALL_CLASS(elem_gid, 0, num_elems, {
-            for (int node_lid=0; node_lid<num_nodes_in_elem; node_lid++){
-                
+            for (int node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+            {
                 // get the gid for the node
                 size_t node_id = nodes_in_elem(elem_gid, node_lid);
-                
+
                 // loop over all elems connected to node_gid
-                for (int elem_lid = 0; elem_lid < num_corners_in_node(node_id); elem_lid++){
-                    
+                for (int elem_lid = 0; elem_lid < num_corners_in_node(node_id); elem_lid++)
+                {
                     // get the global id for the neighboring elem
                     size_t neighbor_elem_gid = elems_in_node(node_id, elem_lid);
-                    
+
                     // a flag to save (=1) or not (=0)
                     size_t save = 1;
-                    
+
                     // a true neighbor_elem_id is not equal to elem_gid
-                    if (neighbor_elem_gid == elem_gid ){
+                    if (neighbor_elem_gid == elem_gid)
+                    {
                         save = 0;  // don't save
                     } // end if
-                    
+
                     // check to see if the neighbor_elem_gid has been saved already
                     size_t num_saved = temp_elems_in_elem.stride(elem_gid);
-                    for (size_t i=0; i<num_saved; i++){
-                        
-                        if (neighbor_elem_gid == temp_elems_in_elem(elem_gid,i)){
-                            save=0;   // don't save, it has been saved already
+                    for (size_t i = 0; i < num_saved; i++)
+                    {
+                        if (neighbor_elem_gid == temp_elems_in_elem(elem_gid, i))
+                        {
+                            save = 0;   // don't save, it has been saved already
                         } // end if
-                        
                     } // end for i
-                    
-                    if (save==1){
-                        
+
+                    if (save == 1)
+                    {
                         // increment the number of neighboring elements saved
                         temp_elems_in_elem.stride(elem_gid)++;
-                        
+
                         // save the neighboring elem_gid
                         temp_elems_in_elem(elem_gid, num_saved) = neighbor_elem_gid;
-                        
-                        
                     } // end if save
-                    
                 } // end for elem_lid in a node
-
             }  // end for node_lid in an elem
-            
+
             // save the actial stride size
             num_elems_in_elem(elem_gid) = temp_elems_in_elem.stride(elem_gid);
-            
         }); // end FOR_ALL elems
         Kokkos::fence();
-        
-        
+
         // compress out the extra space in the temp_elems_in_elem
-        elems_in_elem = RaggedRightArrayKokkos <size_t> (num_elems_in_elem);
-        
+        elems_in_elem = RaggedRightArrayKokkos<size_t>(num_elems_in_elem);
+
         FOR_ALL_CLASS(elem_gid, 0, num_elems, {
-            for (size_t i=0; i<num_elems_in_elem(elem_gid); i++){
+            for (size_t i = 0; i < num_elems_in_elem(elem_gid); i++)
+            {
                 elems_in_elem(elem_gid, i) = temp_elems_in_elem(elem_gid, i);
             } // end for i
         });  // end FOR_ALL elems
         Kokkos::fence();
-      
+
         return;
-        
     } // end of build_elem_elem_connectivity
-        
-    
+
     // build the patches
-    void build_patch_connectivity(){
-        
+    void build_patch_connectivity()
+    {
         size_t high_order = 0;
-        
+
         // building patches
-        DViewCArrayKokkos <size_t> node_ordering_in_elem;  // node lids in a patch
-        
-        num_nodes_in_patch = 2*(num_dims-1);  // 2 (2D) or 4 (3D)
-        num_patches_in_elem = 2*num_dims; // 4 (2D) or 6 (3D)
-        
-        
+        DViewCArrayKokkos<size_t> node_ordering_in_elem;  // node lids in a patch
+
+        num_nodes_in_patch  = 2 * (num_dims - 1); // 2 (2D) or 4 (3D)
+        num_patches_in_elem = 2 * num_dims; // 4 (2D) or 6 (3D)
+
         size_t node_lids_in_patch_in_elem[24];
-        
-        if(num_dims == 3) {
-            size_t temp_node_lids[24] = {0,4,7,3,
-                                         1,2,6,5,
-                                         0,1,5,4,
-                                         2,3,7,6,
-                                         0,3,2,1,
-                                         4,5,6,7};
-            
-            for (size_t i=0; i<24; i++){
+
+        if (num_dims == 3)
+        {
+            size_t temp_node_lids[24] = { 0, 4, 7, 3,
+                                          1, 2, 6, 5,
+                                          0, 1, 5, 4,
+                                          2, 3, 7, 6,
+                                          0, 3, 2, 1,
+                                          4, 5, 6, 7 };
+
+            for (size_t i = 0; i < 24; i++)
+            {
                 node_lids_in_patch_in_elem[i] = temp_node_lids[i];
             } // end for i
-
         }
-        else {
+        else
+        {
             //   J
             //   |
             // 3---2
@@ -393,582 +374,515 @@ struct mesh_t {
             // 0---1
             //
             size_t temp_node_lids[8] =
-               {0,3,
-                1,2,
-                0,1,
-                3,2};
-            
-            for (size_t i=0; i<8; i++){
+            { 0, 3,
+              1, 2,
+              0, 1,
+              3, 2 };
+
+            for (size_t i = 0; i < 8; i++)
+            {
                 node_lids_in_patch_in_elem[i] = temp_node_lids[i];
             } // end for i
-
         } // end if on dims
-        
-        node_ordering_in_elem = DViewCArrayKokkos <size_t> (&node_lids_in_patch_in_elem[0],num_patches_in_elem,num_nodes_in_patch);
-        
-        
+
+        node_ordering_in_elem = DViewCArrayKokkos<size_t>(&node_lids_in_patch_in_elem[0], num_patches_in_elem, num_nodes_in_patch);
+
         // for saviong the hash keys of the patches and then the nighboring elem_gid
-        CArrayKokkos <int> hash_keys_in_elem (num_elems, num_patches_in_elem, num_nodes_in_patch); // always 4 ids in 3D
-        
-        
-        
+        CArrayKokkos<int> hash_keys_in_elem(num_elems, num_patches_in_elem, num_nodes_in_patch); // always 4 ids in 3D
+
         // for saving the adjacient patch_lid, which is the slide_lid
-        //CArrayKokkos <size_t> neighboring_side_lids (num_elems, num_patches_in_elem);
-        
+        // CArrayKokkos <size_t> neighboring_side_lids (num_elems, num_patches_in_elem);
 
-        
         // allocate memory for the patches in the elem
-        patches_in_elem = CArrayKokkos <size_t> (num_elems, num_patches_in_elem);
-        
-        
+        patches_in_elem = CArrayKokkos<size_t>(num_elems, num_patches_in_elem);
+
         // a temporary storaage for the patch_gids that are on the mesh boundary
-        CArrayKokkos <size_t> temp_bdy_patches(num_elems*num_patches_in_elem);
-        
+        CArrayKokkos<size_t> temp_bdy_patches(num_elems * num_patches_in_elem);
+
         // step 1) calculate the hash values for each patch in the element
         FOR_ALL_CLASS(elem_gid, 0, num_elems, {
-            
-            for (size_t patch_lid = 0; patch_lid<num_patches_in_elem; patch_lid++) {
-                
+            for (size_t patch_lid = 0; patch_lid < num_patches_in_elem; patch_lid++)
+            {
                 size_t sorted_patch_nodes[4];  // note: cannot be allocated with num_nodes_in_patch
-                
+
                 // first save the patch nodes
-                for (size_t patch_node_lid = 0; patch_node_lid<num_nodes_in_patch; patch_node_lid++){
-                    
+                for (size_t patch_node_lid = 0; patch_node_lid < num_nodes_in_patch; patch_node_lid++)
+                {
                     // get the local node index of the element for this patch and node in patch
                     size_t node_lid = node_ordering_in_elem(patch_lid, patch_node_lid);
-                    
+
                     // get and save the global index of the node
                     sorted_patch_nodes[patch_node_lid] = nodes_in_elem(elem_gid, node_lid);
-                    
                 }  // end for node_lid
-                
+
                 // sort nodes from smallest to largest
                 bubble_sort(sorted_patch_nodes, num_nodes_in_patch);
-                
+
                 // save hash_keys in the this elem
-                for (size_t key_lid=0; key_lid<num_nodes_in_patch; key_lid++){
+                for (size_t key_lid = 0; key_lid < num_nodes_in_patch; key_lid++)
+                {
                     hash_keys_in_elem(elem_gid, patch_lid, key_lid) = sorted_patch_nodes[key_lid];  // 4 node values are keys
                 } // for
-                
             } // end for patch_lid
-            
         }); // end FOR_ALL elem_gid
-        
-        DCArrayKokkos <size_t> num_values(2);
-        
-    // 8x8x8 mesh
-    // num_patches = 8*8*9*3 = 1728
-	// bdy_patches = 8*8*6 = 384
-    //
-        
+
+        DCArrayKokkos<size_t> num_values(2);
+
+        // 8x8x8 mesh
+        // num_patches = 8*8*9*3 = 1728
+        // bdy_patches = 8*8*6 = 384
+        //
+
         // step 2: walk around the elements and save the elem pairs that have the same hash_key
         RUN_CLASS({
             // serial execution on GPU
-            
-            size_t patch_gid = 0;
+
+            size_t patch_gid     = 0;
             size_t bdy_patch_gid = 0;
-            
-            for (size_t elem_gid = 0; elem_gid<num_elems; elem_gid++) {
-                
+
+            for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+            {
                 // loop over the patches in this elem
-                for (size_t patch_lid = 0; patch_lid<num_patches_in_elem; patch_lid++) {
-                    
+                for (size_t patch_lid = 0; patch_lid < num_patches_in_elem; patch_lid++)
+                {
                     size_t exit = 0;
-                    
+
                     // negative values mean the patch has not been saved
-                    if (hash_keys_in_elem(elem_gid, patch_lid, 0)>=0){
-                        
+                    if (hash_keys_in_elem(elem_gid, patch_lid, 0) >= 0)
+                    {
                         // find the nighboring patch with the same hash_key
-                        
-                        for (size_t neighbor_elem_lid=0; neighbor_elem_lid<num_elems_in_elem(elem_gid); neighbor_elem_lid++){
-                             
-                            
+
+                        for (size_t neighbor_elem_lid = 0; neighbor_elem_lid < num_elems_in_elem(elem_gid); neighbor_elem_lid++)
+                        {
                             // get the neighboring element global index
                             size_t neighbor_elem_gid = elems_in_elem(elem_gid, neighbor_elem_lid);
-                            
-                            for (size_t neighbor_patch_lid=0; neighbor_patch_lid<num_patches_in_elem; neighbor_patch_lid++){
-                                
+
+                            for (size_t neighbor_patch_lid = 0; neighbor_patch_lid < num_patches_in_elem; neighbor_patch_lid++)
+                            {
                                 size_t save_it = 0;
-                                for (size_t key_lid=0; key_lid<num_nodes_in_patch; key_lid++){
-                                    
-                                    if (hash_keys_in_elem(neighbor_elem_gid, neighbor_patch_lid, key_lid) == hash_keys_in_elem(elem_gid, patch_lid, key_lid)){
-                                        save_it ++; // if save_it == num_nodes after this loop, then it is a match
+                                for (size_t key_lid = 0; key_lid < num_nodes_in_patch; key_lid++)
+                                {
+                                    if (hash_keys_in_elem(neighbor_elem_gid, neighbor_patch_lid, key_lid) == hash_keys_in_elem(elem_gid, patch_lid, key_lid))
+                                    {
+                                        save_it++; // if save_it == num_nodes after this loop, then it is a match
                                     }
-                                    
                                 } // end key loop
-                                
-                                
-                                
+
                                 // this hash is from the nodes on the patch
-                                if (save_it == num_nodes_in_patch){
-                    
-                                    
+                                if (save_it == num_nodes_in_patch)
+                                {
                                     // make it negative, because we saved it
                                     hash_keys_in_elem(elem_gid, patch_lid, 0) = -1;
                                     hash_keys_in_elem(neighbor_elem_gid, neighbor_patch_lid, 0) = -1;
-                                    
+
                                     // save the patch_lids for the adjacient sides
-                                    //neighboring_side_lids(elem_gid, patch_lid) = neighbor_patch_lid;
-                                    //neighboring_side_lids(neighbor_elem_gid, neighbor_patch_lid) = patch_lid;
-                                    
+                                    // neighboring_side_lids(elem_gid, patch_lid) = neighbor_patch_lid;
+                                    // neighboring_side_lids(neighbor_elem_gid, neighbor_patch_lid) = patch_lid;
+
                                     // save the patch_gid
                                     patches_in_elem(elem_gid, patch_lid) = patch_gid;
                                     patches_in_elem(neighbor_elem_gid, neighbor_patch_lid) = patch_gid;
-                    
+
                                     patch_gid++;
-                                    
+
                                     exit = 1;
                                     break;
                                 } // end if
-                    
                             } // end for loop over a neighbors patch set
-                            
-                            if (exit==1) break;
-                            
+
+                            if (exit == 1)
+                            {
+                                break;
+                            }
                         } // end for loop over elem neighbors
-                    
-                    
                     } // end if hash<0
-    
                 } // end for patch_lid
-                
-                
+
                 // loop over the patches in this element again
                 // remaining postive hash key values are the boundary patches
-                for (size_t patch_lid=0; patch_lid<num_patches_in_elem; patch_lid++) {
-                
-                    if( hash_keys_in_elem(elem_gid, patch_lid, 0)>=0 ){
-                
+                for (size_t patch_lid = 0; patch_lid < num_patches_in_elem; patch_lid++)
+                {
+                    if (hash_keys_in_elem(elem_gid, patch_lid, 0) >= 0)
+                    {
                         hash_keys_in_elem(elem_gid, patch_lid, 0) = -1;  // make it negative, because we saved it
-                        
-                        //neighboring_side_lids(elem_gid, patch_lid) = patch_lid;
-                        
+
+                        // neighboring_side_lids(elem_gid, patch_lid) = patch_lid;
+
                         patches_in_elem(elem_gid, patch_lid) = patch_gid;
-                        temp_bdy_patches(bdy_patch_gid) = patch_gid;
-                        
+                        temp_bdy_patches(bdy_patch_gid)      = patch_gid;
+
                         patch_gid++;
-                        bdy_patch_gid++;  
-                
+                        bdy_patch_gid++;
                     } // end if
-		    
-                
                 }  // end for over patch_lid
-                
             }  // end for over elem_gid
-            
-	    
+
             // the num_values is because the values passed in are const, so a const pointer is needed
             num_values(0) = patch_gid;     // num_patches = patch_gid;
             num_values(1) = bdy_patch_gid; // num_bdy_patches = bdy_patch_gid;
-	    
         }); // end RUN
         Kokkos::fence();
-        
+
         num_values.update_host();
         Kokkos::fence();
-        
-        num_patches = num_values.host(0);
+
+        num_patches     = num_values.host(0);
         num_bdy_patches = num_values.host(1);
-	
-        
-        //size_t mesh_1D = 60;
-        //size_t exact_num_patches = (mesh_1D*mesh_1D)*(mesh_1D+1)*3;
-        //size_t exact_num_bdy_patches = (mesh_1D*mesh_1D)*6;
-        //printf("num_patches = %lu, exact = %lu \n", num_patches, exact_num_patches);
-        //printf("num_bdy_patches = %lu exact = %lu \n", num_bdy_patches, exact_num_bdy_patches);
+
+        // size_t mesh_1D = 60;
+        // size_t exact_num_patches = (mesh_1D*mesh_1D)*(mesh_1D+1)*3;
+        // size_t exact_num_bdy_patches = (mesh_1D*mesh_1D)*6;
+        // printf("num_patches = %lu, exact = %lu \n", num_patches, exact_num_patches);
+        // printf("num_bdy_patches = %lu exact = %lu \n", num_bdy_patches, exact_num_bdy_patches);
         printf("Num patches = %lu \n", num_patches);
         printf("Num boundary patches = %lu \n", num_bdy_patches);
-        
-        elems_in_patch = CArrayKokkos <size_t> (num_patches, 2);
-        nodes_in_patch = CArrayKokkos <size_t> (num_patches, num_nodes_in_patch);
-    
+
+        elems_in_patch = CArrayKokkos<size_t>(num_patches, 2);
+        nodes_in_patch = CArrayKokkos<size_t>(num_patches, num_nodes_in_patch);
+
         // a temporary variable to help populate patch structures
-        CArrayKokkos <size_t> num_elems_in_patch_saved (num_patches);
-        
+        CArrayKokkos<size_t> num_elems_in_patch_saved(num_patches);
+
         // initialize the number of elems in a patch saved to zero
         FOR_ALL_CLASS(patch_gid, 0, num_patches, {
             num_elems_in_patch_saved(patch_gid) = 0;
         });
-        
-        for(size_t elem_gid=0; elem_gid<num_elems; elem_gid++) {
-            
+
+        for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+        {
             FOR_ALL_CLASS(patch_lid, 0, num_patches_in_elem, {
-                
                 size_t patch_gid = patches_in_elem(elem_gid, patch_lid);
-                
+
                 size_t num_saved = num_elems_in_patch_saved(patch_gid);
-                
+
                 elems_in_patch(patch_gid, num_saved) = elem_gid;
-                
+
                 // record that an elem_gid was saved
-                num_elems_in_patch_saved(patch_gid) ++;
-                
+                num_elems_in_patch_saved(patch_gid)++;
+
                 // save the nodes on this patch
-                for (size_t patch_node_lid = 0; patch_node_lid<num_nodes_in_patch; patch_node_lid++){
-                    
+                for (size_t patch_node_lid = 0; patch_node_lid < num_nodes_in_patch; patch_node_lid++)
+                {
                     // get the local node index of the element for this patch and node in patch
                     size_t node_lid = node_ordering_in_elem(patch_lid, patch_node_lid);
-                    
+
                     // get and save the global index of the node
                     nodes_in_patch(patch_gid, patch_node_lid) = nodes_in_elem(elem_gid, node_lid);
                 }  // end for node_lid
-                
             }); // end FOR_ALL patch_lid
-            
         } // end for
-        
-        
-        
+
         // Surfaces and patches in surface
-        if(high_order==1) {
-            
-            num_surfs_in_elem = 2*num_dims; // 4 (2D) or 6 (3D)
+        if (high_order == 1)
+        {
+            num_surfs_in_elem   = 2 * num_dims; // 4 (2D) or 6 (3D)
             num_patches_in_surf = 1;  // =Pn_order, must update for high-order mesh class
-        
+
             // allocate memory for the surfaces in the elem
-            surfs_in_elem = CArrayKokkos <size_t> (num_elems, num_surfs_in_elem);
+            surfs_in_elem = CArrayKokkos<size_t>(num_elems, num_surfs_in_elem);
 
-            
             // allocate memory for surface data structures
-            num_surfs = num_patches/num_patches_in_surf;
-            
-            patches_in_surf = CArrayKokkos <size_t> (num_surfs, num_patches_in_surf);
-            elems_in_surf = CArrayKokkos <size_t> (num_surfs, 2);
-            surf_in_patch = CArrayKokkos <size_t> (num_patches);
-            
+            num_surfs = num_patches / num_patches_in_surf;
+
+            patches_in_surf = CArrayKokkos<size_t>(num_surfs, num_patches_in_surf);
+            elems_in_surf   = CArrayKokkos<size_t>(num_surfs, 2);
+            surf_in_patch   = CArrayKokkos<size_t>(num_patches);
+
             FOR_ALL_CLASS(surf_gid, 0, num_surfs, {
-                
                 // loop over the patches in this surface
-                for(size_t patch_lid = 0; patch_lid < num_patches_in_surf; patch_lid++){
-                    
+                for (size_t patch_lid = 0; patch_lid < num_patches_in_surf; patch_lid++)
+                {
                     // get patch_gid
-                    size_t patch_gid = patch_lid + surf_gid*num_patches_in_surf;
-                    
+                    size_t patch_gid = patch_lid + surf_gid * num_patches_in_surf;
+
                     // save the patch_gids
                     patches_in_surf(surf_gid, patch_lid) = patch_gid;
-                    
+
                     // save the surface this patch belongs to
                     surf_in_patch(patch_gid) = surf_gid;
-                    
                 } // end for
-                
-                
+
                 // get first patch in the surface, and populate elem surface structures
-                size_t this_patch_gid = surf_gid*num_patches_in_surf;
-  
-                elems_in_surf(surf_gid,0) = elems_in_patch(this_patch_gid,0);  // elem_gid0
-                elems_in_surf(surf_gid,1) = elems_in_patch(this_patch_gid,1);  // elem_gid1
-                
+                size_t this_patch_gid = surf_gid * num_patches_in_surf;
+
+                elems_in_surf(surf_gid, 0) = elems_in_patch(this_patch_gid, 0);  // elem_gid0
+                elems_in_surf(surf_gid, 1) = elems_in_patch(this_patch_gid, 1);  // elem_gid1
             }); // end FOR_ALL over surfaces
-            
-            
+
             // save surfaces in elem
             FOR_ALL_CLASS(elem_gid, 0, num_elems, {
-                
-                for(size_t surf_lid = 0; surf_lid < num_surfs_in_elem; surf_lid++){
-                    
+                for (size_t surf_lid = 0; surf_lid < num_surfs_in_elem; surf_lid++)
+                {
                     // get the local patch_lid
-                    size_t patch_lid = surf_lid*num_patches_in_surf;
-                    
+                    size_t patch_lid = surf_lid * num_patches_in_surf;
+
                     // get the patch_gids in this element
                     size_t patch_gid = patches_in_elem(elem_gid, patch_lid);
-                    
-                    // save the surface gid
-                    surfs_in_elem(elem_gid,surf_lid) = surf_in_patch(patch_gid);
 
+                    // save the surface gid
+                    surfs_in_elem(elem_gid, surf_lid) = surf_in_patch(patch_gid);
                 } // end surf_lid
-                
             });
-            
-        
         } // end of high-order mesh objects
-                          
+
         // ----------------
-        
-        
+
         // allocate memory for boundary patches
-        bdy_patches = CArrayKokkos <size_t> (num_bdy_patches);
-        
+        bdy_patches = CArrayKokkos<size_t>(num_bdy_patches);
+
         FOR_ALL_CLASS(bdy_patch_gid, 0, num_bdy_patches, {
             bdy_patches(bdy_patch_gid) = temp_bdy_patches(bdy_patch_gid);
         }); // end FOR_ALL bdy_patch_gid
-        
-        
-        
-        
+
         // find and store the boundary nodes
-        CArrayKokkos <size_t> temp_bdy_nodes(num_nodes);
-        CArrayKokkos <long long int> hash_bdy_nodes(num_nodes);
-        
-        FOR_ALL_CLASS (node_gid, 0, num_nodes, {
+        CArrayKokkos<size_t>        temp_bdy_nodes(num_nodes);
+        CArrayKokkos<long long int> hash_bdy_nodes(num_nodes);
+
+        FOR_ALL_CLASS(node_gid, 0, num_nodes, {
             hash_bdy_nodes(node_gid) = -1;
         }); // end for node_gid
-        
+
         // Parallel loop over boundary patches
-        DCArrayKokkos <size_t> num_bdy_nodes_saved(1);
-        
+        DCArrayKokkos<size_t> num_bdy_nodes_saved(1);
+
         RUN_CLASS({
             num_bdy_nodes_saved(0) = 0;
-            for (size_t bdy_patch_gid=0; bdy_patch_gid<num_bdy_patches; bdy_patch_gid++) {
-                
+            for (size_t bdy_patch_gid = 0; bdy_patch_gid < num_bdy_patches; bdy_patch_gid++)
+            {
                 // get the global index of the patch that is on the boundary
                 size_t patch_gid = bdy_patches(bdy_patch_gid);
-                
+
                 // tag the boundary nodes
-                for (size_t node_lid=0; node_lid<num_nodes_in_patch; node_lid++){
-                    
+                for (size_t node_lid = 0; node_lid < num_nodes_in_patch; node_lid++)
+                {
                     size_t node_gid = nodes_in_patch(patch_gid, node_lid);
-                    
-                    if (hash_bdy_nodes(node_gid) < 0){
+
+                    if (hash_bdy_nodes(node_gid) < 0)
+                    {
                         hash_bdy_nodes(node_gid) = node_gid;
                         temp_bdy_nodes(num_bdy_nodes_saved(0)) = node_gid;
-                        
-                        //printf("bdy_node = %lu \n", node_gid);
+
+                        // printf("bdy_node = %lu \n", node_gid);
                         num_bdy_nodes_saved(0)++;
                     } // end if
-                    
                 } // end for node_lid
-                
             } // end for loop over bdy_patch_gid
-            
         });  // end RUN
         Kokkos::fence();
-        
+
         // copy value to host (CPU)
         num_bdy_nodes_saved.update_host();
         Kokkos::fence();
-        
+
         // save the number of bdy_nodes to mesh_t
         num_bdy_nodes = num_bdy_nodes_saved.host(0);
-        
-        bdy_nodes = CArrayKokkos <size_t> (num_bdy_nodes);
-        
-        FOR_ALL_CLASS (node_gid, 0, num_bdy_nodes, {
+
+        bdy_nodes = CArrayKokkos<size_t>(num_bdy_nodes);
+
+        FOR_ALL_CLASS(node_gid, 0, num_bdy_nodes, {
             bdy_nodes(node_gid) = temp_bdy_nodes(node_gid);
         }); // end for boundary node_gid
-        
+
         printf("Num boundary nodes = %lu \n", num_bdy_nodes);
-        
+
         return;
-        
     } // end patch connectivity method
-    
-    
-    
+
     // build the patches
-    void build_node_node_connectivity(){
-        
-        
+    void build_node_node_connectivity()
+    {
         // find the max number of elems around a node
         size_t max_num_elems_in_node;
         size_t max_num_lcl;
         REDUCE_MAX_CLASS(node_gid, 0, num_nodes, max_num_lcl, {
-            
             // num_corners_in_node = num_elems_in_node
             size_t max_num = num_corners_in_node(node_gid);
-            
-            if (max_num > max_num_lcl) max_num_lcl = max_num;
-                    
+
+            if (max_num > max_num_lcl)
+            {
+                max_num_lcl = max_num;
+            }
         }, max_num_elems_in_node); // end parallel reduction on max
         Kokkos::fence();
-        
+
         // each elem corner will contribute 3 edges to the node. Those edges will likely be the same
         // ones from an adjacent element so it is a safe estimate to multiply by 3
-        DynamicRaggedRightArrayKokkos <size_t> temp_nodes_in_nodes(num_nodes, max_num_elems_in_node*3);
-        
-        num_nodes_in_node = CArrayKokkos <size_t> (num_nodes);
-        
+        DynamicRaggedRightArrayKokkos<size_t> temp_nodes_in_nodes(num_nodes, max_num_elems_in_node * 3);
+
+        num_nodes_in_node = CArrayKokkos<size_t>(num_nodes);
+
         // walk over the patches and save the node node connectivity
         RUN_CLASS({
-            if (num_dims==3){
-                
-                for (size_t patch_gid=0; patch_gid<num_patches; patch_gid++){
-                    for (size_t node_lid=0; node_lid<num_nodes_in_patch; node_lid++){
-                        
+            if (num_dims == 3)
+            {
+                for (size_t patch_gid = 0; patch_gid < num_patches; patch_gid++)
+                {
+                    for (size_t node_lid = 0; node_lid < num_nodes_in_patch; node_lid++)
+                    {
                         // the first node on the edge
                         size_t node_gid_0 = nodes_in_patch(patch_gid, node_lid);
-                        
+
                         // second node on this edge
                         size_t node_gid_1;
-                        
-                        if (node_lid == num_nodes_in_patch-1){
+
+                        if (node_lid == num_nodes_in_patch - 1)
+                        {
                             node_gid_1 = nodes_in_patch(patch_gid, 0);
                         }
-                        else {
-                            node_gid_1 = nodes_in_patch(patch_gid, node_lid+1);
+                        else
+                        {
+                            node_gid_1 = nodes_in_patch(patch_gid, node_lid + 1);
                         } // end if
-                        
+
                         size_t num_saved_0 = temp_nodes_in_nodes.stride(node_gid_0);
                         size_t num_saved_1 = temp_nodes_in_nodes.stride(node_gid_1);
-                        
+
                         size_t save_0 = 1;
                         size_t save_1 = 1;
-                        
+
                         // check to see if the node_gid_1 was already saved
-                        for (size_t contents_lid=0; contents_lid<num_saved_0; contents_lid++){
-                            if (temp_nodes_in_nodes(node_gid_0, contents_lid) == node_gid_1){
+                        for (size_t contents_lid = 0; contents_lid < num_saved_0; contents_lid++)
+                        {
+                            if (temp_nodes_in_nodes(node_gid_0, contents_lid) == node_gid_1)
+                            {
                                 save_0 = 0; // don't save, it was already saved
                             }
                         }
-                        
+
                         // check to see if the node_gid_0 was already saved
-                        for (size_t contents_lid=0; contents_lid<num_saved_1; contents_lid++){
-                            if (temp_nodes_in_nodes(node_gid_1, contents_lid) == node_gid_0){
+                        for (size_t contents_lid = 0; contents_lid < num_saved_1; contents_lid++)
+                        {
+                            if (temp_nodes_in_nodes(node_gid_1, contents_lid) == node_gid_0)
+                            {
                                 save_1 = 0;  // don't save, it was already saved
                             }
                         }
-                        
-                        if (save_0 == 1){
+
+                        if (save_0 == 1)
+                        {
                             // increment the number of nodes in a node saved
                             temp_nodes_in_nodes.stride(node_gid_0)++;
-                            
+
                             // save the second node to the first node
                             temp_nodes_in_nodes(node_gid_0, num_saved_0) = node_gid_1;
-                            
-                            
                         }
-                        
-                        if (save_1 == 1){
+
+                        if (save_1 == 1)
+                        {
                             // increment the number of nodes in a node saved
                             temp_nodes_in_nodes.stride(node_gid_1)++;
-                            
+
                             // save the first node to the second node
                             temp_nodes_in_nodes(node_gid_1, num_saved_1) = node_gid_0;
-                            
-                            
                         }
-                        
+
                         // save the strides
                         num_nodes_in_node(node_gid_0) = temp_nodes_in_nodes.stride(node_gid_0);
                         num_nodes_in_node(node_gid_1) = temp_nodes_in_nodes.stride(node_gid_1);
-                        
                     } // end for node in patch
                 } // end for patches
-                
             } // end if 3D
-            else {
-                for (size_t patch_gid=0; patch_gid<num_patches; patch_gid++){
-                    
+            else
+            {
+                for (size_t patch_gid = 0; patch_gid < num_patches; patch_gid++)
+                {
                     // the first node on the edge
                     size_t node_gid_0 = nodes_in_patch(patch_gid, 0);
-                    
+
                     // second node on this edge
                     size_t node_gid_1 = nodes_in_patch(patch_gid, 1);
-                    
+
                     size_t num_saved_0 = temp_nodes_in_nodes.stride(node_gid_0);
                     size_t num_saved_1 = temp_nodes_in_nodes.stride(node_gid_1);
-                    
+
                     // increment the number of nodes in a node saved
                     temp_nodes_in_nodes.stride(node_gid_0)++;
                     temp_nodes_in_nodes.stride(node_gid_1)++;
-                    
+
                     // save the second node to the first node
                     temp_nodes_in_nodes(node_gid_0, num_saved_0) = node_gid_1;
-                    
+
                     // save the first node to the second node
                     temp_nodes_in_nodes(node_gid_1, num_saved_1) = node_gid_0;
-                    
-                    
+
                     // save the strides
                     num_nodes_in_node(node_gid_0) = temp_nodes_in_nodes.stride(node_gid_0);
                     num_nodes_in_node(node_gid_1) = temp_nodes_in_nodes.stride(node_gid_1);
-                        
                 } // end for patches
-                
             } // end if 2D
-            
         });  // end RUN
         Kokkos::fence();
-        
-        nodes_in_node = RaggedRightArrayKokkos <size_t> (num_nodes_in_node);
-        
+
+        nodes_in_node = RaggedRightArrayKokkos<size_t>(num_nodes_in_node);
+
         // save the connectivity
         FOR_ALL_CLASS(node_gid, 0, num_nodes, {
-            
             size_t num_saved = 0;
-            for (size_t node_lid=0; node_lid<num_nodes_in_node(node_gid); node_lid++){
-                
+            for (size_t node_lid = 0; node_lid < num_nodes_in_node(node_gid); node_lid++)
+            {
                 nodes_in_node(node_gid, num_saved) = temp_nodes_in_nodes(node_gid, num_saved);
-                
+
                 // increment the number of nodes in node saved
-                num_saved ++;
-                
+                num_saved++;
             } // end for node_lid
-            
-            
         }); // end parallel for over nodes
-        
     } // end of node node connectivity
-    
-    
-
-    
-    
-    void init_bdy_sets (size_t num_bcs){
-        
-        if(num_bcs == 0){
+
+    void init_bdy_sets(size_t num_bcs)
+    {
+        if (num_bcs == 0)
+        {
             printf("ERROR: number of boundary sets = 0, set it = 1");
             num_bcs = 1;
         }
-        num_bdy_sets = num_bcs;
-        bdy_patches_in_set = DynamicRaggedRightArrayKokkos <size_t> (num_bcs, num_bdy_patches);
-        
+        num_bdy_sets       = num_bcs;
+        bdy_patches_in_set = DynamicRaggedRightArrayKokkos<size_t>(num_bcs, num_bdy_patches);
+
         return;
-        
     } // end of init_bdy_sets method
-    
-
-    
 }; // end mesh_t
 
-
 namespace region
 {
-
-    // for tagging boundary faces
-    enum vol_tag
-    {
-        global = 0,     // tag every elements in the mesh
-        box = 1,        // tag all elements inside a box
-        cylinder = 2,   // tag all elements inside a cylinder
-        sphere = 3,     // tag all elements inside a sphere
-        readVoxelFile = 4       // tag all elements in a voxel mesh input
-    };
-
+// for tagging boundary faces
+enum vol_tag
+{
+    global = 0,         // tag every elements in the mesh
+    box = 1,            // tag all elements inside a box
+    cylinder = 2,       // tag all elements inside a cylinder
+    sphere = 3,         // tag all elements inside a sphere
+    readVoxelFile = 4           // tag all elements in a voxel mesh input
+};
 } // end of namespace
 
-
 namespace init_conds
 {
-    
-    // applying initial conditions
-    enum init_velocity_conds
-    {
-        // uniform
-        cartesian = 0,   // cart velocity
-        radial = 1,      // radial in the (x,y) plane where x=r*cos(theta) and y=r*sin(theta)
-        spherical = 2,   // spherical
-    
-        // linear variation
-        radial_linear = 3,     // linear variation from 0,0,0
-        spherical_linear = 4,   // linear variation from 0,0,0
-    
-        // vortical initial conditions
-        tg_vortex = 5
-    };
-    
-} // end of initial conditions namespace
+// applying initial conditions
+enum init_velocity_conds
+{
+    // uniform
+    cartesian = 0,       // cart velocity
+    radial = 1,          // radial in the (x,y) plane where x=r*cos(theta) and y=r*sin(theta)
+    spherical = 2,       // spherical
 
+    // linear variation
+    radial_linear = 3,         // linear variation from 0,0,0
+    spherical_linear = 4,       // linear variation from 0,0,0
+
+    // vortical initial conditions
+    tg_vortex = 5
+};
+} // end of initial conditions namespace
 
 // fill instructions
-struct mat_fill_t {
-    
+struct mat_fill_t
+{
     // type
     region::vol_tag volume; // 1 is global, 2 are planes, 3 is a sphere
-    
+
     // material id
     size_t mat_id;
-    
+
     // planes
     double x1;
     double x2;
@@ -976,68 +890,61 @@ struct mat_fill_t {
     double y2;
     double z1;
     double z2;
-    
+
     // radius
     double radius1;
     double radius2;
 
-    
     // initial conditions
     init_conds::init_velocity_conds velocity;
-    
+
     // velocity coefficients by component
-    double u,v,w;
-    
+    double u, v, w;
+
     // velocity magnitude for radial velocity initialization
     double speed;
-    
+
     double sie;  // specific internal energy
     double den;  // density
 };
 
-
 namespace bdy
 {
-    
-    // for tagging boundary faces
-    enum bdy_tag
-    {
-        x_plane  = 0,   // tag an x-plane
-        y_plane  = 1,   // tag an y-plane
-        z_plane  = 2,   // tag an z-plane
-        cylinder = 3,   // tag an cylindrical surface
-        sphere   = 4,   // tag a spherical surface
-        readFile = 5    // read from a file
-    };
-    
-    
-    
-    // for enforcing boundary conditions
-    enum bdy_hydro_conds
-    {
-        fixed = 0,          // zero velocity
-        reflected = 1,      // reflected or wall condition
-        velocity = 2,       // constant velocity
-        pressure = 3,       // constant pressure
-        acceleration = 4,   // constant acceleration
-        contact = 5         // contact surface
-    };
+// for tagging boundary faces
+enum bdy_tag
+{
+    x_plane = 0,        // tag an x-plane
+    y_plane = 1,        // tag an y-plane
+    z_plane = 2,        // tag an z-plane
+    cylinder = 3,       // tag an cylindrical surface
+    sphere = 4,         // tag a spherical surface
+    readFile = 5        // read from a file
+};
 
+// for enforcing boundary conditions
+enum bdy_hydro_conds
+{
+    fixed = 0,              // zero velocity
+    reflected = 1,          // reflected or wall condition
+    velocity = 2,           // constant velocity
+    pressure = 3,           // constant pressure
+    acceleration = 4,       // constant acceleration
+    contact = 5             // contact surface
+};
 } // end of bdy namespace
 
-
 // tag mesh points on bdy's and set the BC type
-struct boundary_t {
-
+struct boundary_t
+{
     // tag surface type
     bdy::bdy_tag surface;    // 0=xplane, 1=yplane, 2=zplane, 3=cylinder, 4=sphere, 5=read file
-    
+
     // tag surface value or radius
     real_t value;
-    
+
     // BC type
     bdy::bdy_hydro_conds hydro_bc;
-    
+
     // velocity
     // if t_end > time > t_start
     // v(t) = v0 exp(-v1*(time - time_start) )
@@ -1047,469 +954,428 @@ struct boundary_t {
     double hydro_bc_vel_t_end;
 };
 
-
-void read_mesh_ensight(char* MESH,
-                       mesh_t &mesh,
-                       node_t &node,
-                       elem_t &elem,
-                       corner_t &corner,
+void read_mesh_ensight(char*        MESH,
+                       mesh_t&      mesh,
+                       node_t&      node,
+                       elem_t&      elem,
+                       corner_t&    corner,
                        const size_t num_dims,
                        const size_t rk_num_bins);
 
-
-void input(CArrayKokkos <material_t> &material,
-           CArrayKokkos <mat_fill_t> &mat_fill,
-           CArrayKokkos <boundary_t> &boundary,
-           CArrayKokkos <double> &state_vars,
-           size_t &num_materials,
-           size_t &num_fills,
-           size_t &num_boundaries,
-           size_t &num_dims,
-           size_t &num_state_vars,
-           double &dt_start,
-           double &time_final,
-           double &dt_max,
-           double &dt_min,
-           double &dt_cfl,
-           double &graphics_dt_ival,
-           size_t &graphics_cyc_ival,
-           size_t &cycle_stop,
-           size_t &rk_num_stages);
-
-
-void get_vol(const DViewCArrayKokkos <double> &elem_vol,
-             const DViewCArrayKokkos <double> &node_coords,
-             const mesh_t &mesh);
-
+void input(CArrayKokkos<material_t>& material,
+           CArrayKokkos<mat_fill_t>& mat_fill,
+           CArrayKokkos<boundary_t>& boundary,
+           CArrayKokkos<double>&     state_vars,
+           size_t&                   num_materials,
+           size_t&                   num_fills,
+           size_t&                   num_boundaries,
+           size_t&                   num_dims,
+           size_t&                   num_state_vars,
+           double&                   dt_start,
+           double&                   time_final,
+           double&                   dt_max,
+           double&                   dt_min,
+           double&                   dt_cfl,
+           double&                   graphics_dt_ival,
+           size_t&                   graphics_cyc_ival,
+           size_t&                   cycle_stop,
+           size_t&                   rk_num_stages);
+
+void get_vol(const DViewCArrayKokkos<double>& elem_vol,
+             const DViewCArrayKokkos<double>& node_coords,
+             const mesh_t&                    mesh);
 
 KOKKOS_FUNCTION
-void get_vol_hex(const DViewCArrayKokkos <double> &elem_vol,
-                 const size_t elem_gid,
-                 const DViewCArrayKokkos <double> &node_coords,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
+void get_vol_hex(const DViewCArrayKokkos<double>& elem_vol,
+                 const size_t                     elem_gid,
+                 const DViewCArrayKokkos<double>& node_coords,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids);
 
 KOKKOS_FUNCTION
-void get_vol_quad(const DViewCArrayKokkos <double> &elem_vol,
-                  const size_t elem_gid,
-                  const DViewCArrayKokkos <double> &node_coords,
-                  const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
+void get_vol_quad(const DViewCArrayKokkos<double>& elem_vol,
+                  const size_t                     elem_gid,
+                  const DViewCArrayKokkos<double>& node_coords,
+                  const ViewCArrayKokkos<size_t>&  elem_node_gids);
 
 KOKKOS_FUNCTION
-double get_area_quad(const size_t elem_gid,
-                     const DViewCArrayKokkos <double> &node_coords,
-                     const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
+double get_area_quad(const size_t                     elem_gid,
+                     const DViewCArrayKokkos<double>& node_coords,
+                     const ViewCArrayKokkos<size_t>&  elem_node_gids);
 
 KOKKOS_FUNCTION
-void get_bmatrix(const ViewCArrayKokkos <double> &B_matrix,
-                 const size_t elem_gid,
-                 const DViewCArrayKokkos <double> &node_coords,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
+void get_bmatrix(const ViewCArrayKokkos<double>&  B_matrix,
+                 const size_t                     elem_gid,
+                 const DViewCArrayKokkos<double>& node_coords,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids);
 
 KOKKOS_FUNCTION
-void get_bmatrix2D(const ViewCArrayKokkos <double> &B_matrix,
-                   const size_t elem_gid,
-                   const DViewCArrayKokkos <double> &node_coords,
-                   const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
-
-void setup(const CArrayKokkos <material_t> &material,
-           const CArrayKokkos <mat_fill_t> &mat_fill,
-           const CArrayKokkos <boundary_t> &boundary,
-           mesh_t &mesh,
-           const DViewCArrayKokkos <double> &node_coords,
-           DViewCArrayKokkos <double> &node_vel,
-           DViewCArrayKokkos <double> &node_mass,
-           const DViewCArrayKokkos <double> &elem_den,
-           const DViewCArrayKokkos <double> &elem_pres,
-           const DViewCArrayKokkos <double> &elem_stress,
-           const DViewCArrayKokkos <double> &elem_sspd,
-           const DViewCArrayKokkos <double> &elem_sie,
-           const DViewCArrayKokkos <double> &elem_vol,
-           const DViewCArrayKokkos <double> &elem_mass,
-           const DViewCArrayKokkos <size_t> &elem_mat_id,
-           const DViewCArrayKokkos <double> &elem_statev,
-           const CArrayKokkos <double> &state_vars,
-           const DViewCArrayKokkos <double> &corner_mass,
-           const size_t num_fills,
-           const size_t rk_num_bins,
-           const size_t num_bdy_sets,
-           const size_t num_materials,
-           const size_t num_state_vars);
-
-
-void write_outputs (const mesh_t &mesh,
-                    DViewCArrayKokkos <double> &node_coords,
-                    DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &node_mass,
-                    DViewCArrayKokkos <double> &elem_den,
-                    DViewCArrayKokkos <double> &elem_pres,
-                    DViewCArrayKokkos <double> &elem_stress,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    DViewCArrayKokkos <double> &elem_sie,
-                    DViewCArrayKokkos <double> &elem_vol,
-                    DViewCArrayKokkos <double> &elem_mass,
-                    DViewCArrayKokkos <size_t> &elem_mat_id,
-                    CArray <double> &graphics_times,
-                    size_t &graphics_id,
-                    const double time_value);
-
-
-void ensight(const mesh_t &mesh,
-             const DViewCArrayKokkos <double> &node_coords,
-             const DViewCArrayKokkos <double> &node_vel,
-             const DViewCArrayKokkos <double> &node_mass,
-             const DViewCArrayKokkos <double> &elem_den,
-             const DViewCArrayKokkos <double> &elem_pres,
-             const DViewCArrayKokkos <double> &elem_stress,
-             const DViewCArrayKokkos <double> &elem_sspd,
-             const DViewCArrayKokkos <double> &elem_sie,
-             const DViewCArrayKokkos <double> &elem_vol,
-             const DViewCArrayKokkos <double> &elem_mass,
-             const DViewCArrayKokkos <size_t> &elem_mat_id,
-             CArray <double> &graphics_times,
-             size_t &graphics_id,
-             const double time_value);
-
-
-void state_file(const mesh_t &mesh,
-                const DViewCArrayKokkos <double> &node_coords,
-                const DViewCArrayKokkos <double> &node_vel,
-                const DViewCArrayKokkos <double> &node_mass,
-                const DViewCArrayKokkos <double> &elem_den,
-                const DViewCArrayKokkos <double> &elem_pres,
-                const DViewCArrayKokkos <double> &elem_stress,
-                const DViewCArrayKokkos <double> &elem_sspd,
-                const DViewCArrayKokkos <double> &elem_sie,
-                const DViewCArrayKokkos <double> &elem_vol,
-                const DViewCArrayKokkos <double> &elem_mass,
-                const DViewCArrayKokkos <size_t> &elem_mat_id,
-                const double time_value );
-
-
-void tag_bdys(const CArrayKokkos <boundary_t> &boundary,
-              mesh_t &mesh,
-              const DViewCArrayKokkos <double> &node_coords);
-
+void get_bmatrix2D(const ViewCArrayKokkos<double>&  B_matrix,
+                   const size_t                     elem_gid,
+                   const DViewCArrayKokkos<double>& node_coords,
+                   const ViewCArrayKokkos<size_t>&  elem_node_gids);
+
+void setup(const CArrayKokkos<material_t>&  material,
+           const CArrayKokkos<mat_fill_t>&  mat_fill,
+           const CArrayKokkos<boundary_t>&  boundary,
+           mesh_t&                          mesh,
+           const DViewCArrayKokkos<double>& node_coords,
+           DViewCArrayKokkos<double>&       node_vel,
+           DViewCArrayKokkos<double>&       node_mass,
+           const DViewCArrayKokkos<double>& elem_den,
+           const DViewCArrayKokkos<double>& elem_pres,
+           const DViewCArrayKokkos<double>& elem_stress,
+           const DViewCArrayKokkos<double>& elem_sspd,
+           const DViewCArrayKokkos<double>& elem_sie,
+           const DViewCArrayKokkos<double>& elem_vol,
+           const DViewCArrayKokkos<double>& elem_mass,
+           const DViewCArrayKokkos<size_t>& elem_mat_id,
+           const DViewCArrayKokkos<double>& elem_statev,
+           const CArrayKokkos<double>&      state_vars,
+           const DViewCArrayKokkos<double>& corner_mass,
+           const size_t                     num_fills,
+           const size_t                     rk_num_bins,
+           const size_t                     num_bdy_sets,
+           const size_t                     num_materials,
+           const size_t                     num_state_vars);
+
+void write_outputs(const mesh_t&              mesh,
+                   DViewCArrayKokkos<double>& node_coords,
+                   DViewCArrayKokkos<double>& node_vel,
+                   DViewCArrayKokkos<double>& node_mass,
+                   DViewCArrayKokkos<double>& elem_den,
+                   DViewCArrayKokkos<double>& elem_pres,
+                   DViewCArrayKokkos<double>& elem_stress,
+                   DViewCArrayKokkos<double>& elem_sspd,
+                   DViewCArrayKokkos<double>& elem_sie,
+                   DViewCArrayKokkos<double>& elem_vol,
+                   DViewCArrayKokkos<double>& elem_mass,
+                   DViewCArrayKokkos<size_t>& elem_mat_id,
+                   CArray<double>&            graphics_times,
+                   size_t&                    graphics_id,
+                   const double               time_value);
+
+void ensight(const mesh_t&                    mesh,
+             const DViewCArrayKokkos<double>& node_coords,
+             const DViewCArrayKokkos<double>& node_vel,
+             const DViewCArrayKokkos<double>& node_mass,
+             const DViewCArrayKokkos<double>& elem_den,
+             const DViewCArrayKokkos<double>& elem_pres,
+             const DViewCArrayKokkos<double>& elem_stress,
+             const DViewCArrayKokkos<double>& elem_sspd,
+             const DViewCArrayKokkos<double>& elem_sie,
+             const DViewCArrayKokkos<double>& elem_vol,
+             const DViewCArrayKokkos<double>& elem_mass,
+             const DViewCArrayKokkos<size_t>& elem_mat_id,
+             CArray<double>&                  graphics_times,
+             size_t&                          graphics_id,
+             const double                     time_value);
+
+void state_file(const mesh_t&                    mesh,
+                const DViewCArrayKokkos<double>& node_coords,
+                const DViewCArrayKokkos<double>& node_vel,
+                const DViewCArrayKokkos<double>& node_mass,
+                const DViewCArrayKokkos<double>& elem_den,
+                const DViewCArrayKokkos<double>& elem_pres,
+                const DViewCArrayKokkos<double>& elem_stress,
+                const DViewCArrayKokkos<double>& elem_sspd,
+                const DViewCArrayKokkos<double>& elem_sie,
+                const DViewCArrayKokkos<double>& elem_vol,
+                const DViewCArrayKokkos<double>& elem_mass,
+                const DViewCArrayKokkos<size_t>& elem_mat_id,
+                const double                     time_value);
+
+void tag_bdys(const CArrayKokkos<boundary_t>&  boundary,
+              mesh_t&                          mesh,
+              const DViewCArrayKokkos<double>& node_coords);
 
 KOKKOS_FUNCTION
-size_t check_bdy(const size_t patch_gid,
-                 const int this_bc_tag,
-                 const double val,
-                 const mesh_t &mesh,
-                 const DViewCArrayKokkos <double> &node_coords);
-
+size_t check_bdy(const size_t                     patch_gid,
+                 const int                        this_bc_tag,
+                 const double                     val,
+                 const mesh_t&                    mesh,
+                 const DViewCArrayKokkos<double>& node_coords);
 
-void build_boundry_node_sets(const CArrayKokkos <boundary_t> &boundary,
-                             mesh_t &mesh);
-
-
-void boundary_velocity(const mesh_t &mesh,
-                       const CArrayKokkos <boundary_t> &boundary,
-                       DViewCArrayKokkos <double> &node_vel,
-                       const double time_value);
+void build_boundry_node_sets(const CArrayKokkos<boundary_t>& boundary,
+                             mesh_t&                         mesh);
 
+void boundary_velocity(const mesh_t&                   mesh,
+                       const CArrayKokkos<boundary_t>& boundary,
+                       DViewCArrayKokkos<double>&      node_vel,
+                       const double                    time_value);
 
 KOKKOS_FUNCTION
-void ideal_gas(const DViewCArrayKokkos <double> &elem_pres,
-               const DViewCArrayKokkos <double> &elem_stress,
-               const size_t elem_gid,
-               const size_t mat_id,
-               const DViewCArrayKokkos <double> &elem_state_vars,
-               const DViewCArrayKokkos <double> &elem_sspd,
-               const double den,
-               const double sie);
-
+void ideal_gas(const DViewCArrayKokkos<double>& elem_pres,
+               const DViewCArrayKokkos<double>& elem_stress,
+               const size_t                     elem_gid,
+               const size_t                     mat_id,
+               const DViewCArrayKokkos<double>& elem_state_vars,
+               const DViewCArrayKokkos<double>& elem_sspd,
+               const double                     den,
+               const double                     sie);
 
 KOKKOS_FUNCTION
-void user_eos_model(const DViewCArrayKokkos <double> &elem_pres,
-                    const DViewCArrayKokkos <double> &elem_stress,
-                    const size_t elem_gid,
-                    const size_t mat_id,
-                    const DViewCArrayKokkos <double> &elem_state_vars,
-                    const DViewCArrayKokkos <double> &elem_sspd,
-                    const double den,
-                    const double sie);
-
+void user_eos_model(const DViewCArrayKokkos<double>& elem_pres,
+                    const DViewCArrayKokkos<double>& elem_stress,
+                    const size_t                     elem_gid,
+                    const size_t                     mat_id,
+                    const DViewCArrayKokkos<double>& elem_state_vars,
+                    const DViewCArrayKokkos<double>& elem_sspd,
+                    const double                     den,
+                    const double                     sie);
 
 KOKKOS_FUNCTION
-void user_strength_model(const DViewCArrayKokkos <double> &elem_pres,
-                         const DViewCArrayKokkos <double> &elem_stress,
-                         const size_t elem_gid,
-                         const size_t mat_id,
-                         const DViewCArrayKokkos <double> &elem_state_vars,
-                         const DViewCArrayKokkos <double> &elem_sspd,
-                         const double den,
-                         const double sie,
-                         const ViewCArrayKokkos <double> &vel_grad,
-                         const ViewCArrayKokkos <size_t> &elem_node_gids,
-                         const DViewCArrayKokkos <double> &node_coords,
-                         const DViewCArrayKokkos <double> &node_vel,
-                         const double vol,
-                         const double dt,
-                         const double rk_alpha);
-
+void user_strength_model(const DViewCArrayKokkos<double>& elem_pres,
+                         const DViewCArrayKokkos<double>& elem_stress,
+                         const size_t                     elem_gid,
+                         const size_t                     mat_id,
+                         const DViewCArrayKokkos<double>& elem_state_vars,
+                         const DViewCArrayKokkos<double>& elem_sspd,
+                         const double                     den,
+                         const double                     sie,
+                         const ViewCArrayKokkos<double>&  vel_grad,
+                         const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                         const DViewCArrayKokkos<double>& node_coords,
+                         const DViewCArrayKokkos<double>& node_vel,
+                         const double                     vol,
+                         const double                     dt,
+                         const double                     rk_alpha);
 
 KOKKOS_FUNCTION
-void user_strength_model_vpsc(const DViewCArrayKokkos <double> &elem_pres,
-                              const DViewCArrayKokkos <double> &elem_stress,
-                              const size_t elem_gid,
-                              const size_t mat_id,
-                              const DViewCArrayKokkos <double> &elem_state_vars,
-                              const DViewCArrayKokkos <double> &elem_sspd,
-                              const double den,
-                              const double sie,
-                              const ViewCArrayKokkos <double> &vel_grad,
-                              const ViewCArrayKokkos <size_t> &elem_node_gids,
-                              const DViewCArrayKokkos <double> &node_coords,
-                              const DViewCArrayKokkos <double> &node_vel,
-                              const double vol,
-                              const double dt,
-                              const double rk_alpha);
-
-
-void user_model_init(const DCArrayKokkos <double> &file_state_vars,
-                     const size_t num_state_vars,
-                     const size_t mat_id,
-                     const size_t num_elems);
-
+void user_strength_model_vpsc(const DViewCArrayKokkos<double>& elem_pres,
+                              const DViewCArrayKokkos<double>& elem_stress,
+                              const size_t                     elem_gid,
+                              const size_t                     mat_id,
+                              const DViewCArrayKokkos<double>& elem_state_vars,
+                              const DViewCArrayKokkos<double>& elem_sspd,
+                              const double                     den,
+                              const double                     sie,
+                              const ViewCArrayKokkos<double>&  vel_grad,
+                              const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                              const DViewCArrayKokkos<double>& node_coords,
+                              const DViewCArrayKokkos<double>& node_vel,
+                              const double                     vol,
+                              const double                     dt,
+                              const double                     rk_alpha);
+
+void user_model_init(const DCArrayKokkos<double>& file_state_vars,
+                     const size_t                 num_state_vars,
+                     const size_t                 mat_id,
+                     const size_t                 num_elems);
 
 KOKKOS_FUNCTION
 double max_Eigen3D(const ViewCArrayKokkos<double> tensor);
 
-
 KOKKOS_FUNCTION
 double max_Eigen2D(const ViewCArrayKokkos<double> tensor);
 
-
-
-void sgh_solve(CArrayKokkos <material_t> &material,
-               CArrayKokkos <boundary_t> &boundary,
-               mesh_t &mesh,
-               DViewCArrayKokkos <double> &node_coords,
-               DViewCArrayKokkos <double> &node_vel,
-               DViewCArrayKokkos <double> &node_mass,
-               DViewCArrayKokkos <double> &elem_den,
-               DViewCArrayKokkos <double> &elem_pres,
-               DViewCArrayKokkos <double> &elem_stress,
-               DViewCArrayKokkos <double> &elem_sspd,
-               DViewCArrayKokkos <double> &elem_sie,
-               DViewCArrayKokkos <double> &elem_vol,
-               DViewCArrayKokkos <double> &elem_div,
-               DViewCArrayKokkos <double> &elem_mass,
-               DViewCArrayKokkos <size_t> &elem_mat_id,
-               DViewCArrayKokkos <double> &elem_statev,
-               DViewCArrayKokkos <double> &corner_force,
-               DViewCArrayKokkos <double> &corner_mass,
-               double &time_value,
-               const double time_final,
-               const double dt_max,
-               const double dt_min,
-               const double dt_cfl,
-               double &graphics_time,
-               size_t graphics_cyc_ival,
-               double graphics_dt_ival,
-               const size_t cycle_stop,
-               const size_t rk_num_stages,
-               double dt,
-               const double fuzz,
-               const double tiny,
-               const double small,
-               CArray <double> &graphics_times,
-               size_t &graphics_id);
-
-
-void rk_init(DViewCArrayKokkos <double> &node_coords,
-             DViewCArrayKokkos <double> &node_vel,
-             DViewCArrayKokkos <double> &elem_sie,
-             DViewCArrayKokkos <double> &elem_stress,
-             const size_t num_dims,
-             const size_t num_elems,
-             const size_t num_nodes);
-
-
-void get_timestep(mesh_t &mesh,
-                  DViewCArrayKokkos <double> &node_coords,
-                  DViewCArrayKokkos <double> &node_vel,
-                  DViewCArrayKokkos <double> &elem_sspd,
-                  DViewCArrayKokkos <double> &elem_vol,
-                  double time_value,
-                  const double graphics_time,
-                  const double time_final,
-                  const double dt_max,
-                  const double dt_min,
-                  const double dt_cfl,
-                  double &dt,
-                  const double fuzz);
-
-
-void get_timestep2D(mesh_t &mesh,
-                    DViewCArrayKokkos <double> &node_coords,
-                    DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    DViewCArrayKokkos <double> &elem_vol,
-                    double time_value,
-                    const double graphics_time,
-                    const double time_final,
-                    const double dt_max,
-                    const double dt_min,
-                    const double dt_cfl,
-                    double &dt,
-                    const double fuzz);
-
-
-void get_divergence(DViewCArrayKokkos <double> &elem_div,
-                    const mesh_t mesh,
-                    const DViewCArrayKokkos <double> &node_coords,
-                    const DViewCArrayKokkos <double> &node_vel,
-                    const DViewCArrayKokkos <double> &elem_vol);
-
-
-void get_divergence2D(DViewCArrayKokkos <double> &elem_div,
-                      const mesh_t mesh,
-                      const DViewCArrayKokkos <double> &node_coords,
-                      const DViewCArrayKokkos <double> &node_vel,
-                      const DViewCArrayKokkos <double> &elem_vol);
-
+void sgh_solve(CArrayKokkos<material_t>&  material,
+               CArrayKokkos<boundary_t>&  boundary,
+               mesh_t&                    mesh,
+               DViewCArrayKokkos<double>& node_coords,
+               DViewCArrayKokkos<double>& node_vel,
+               DViewCArrayKokkos<double>& node_mass,
+               DViewCArrayKokkos<double>& elem_den,
+               DViewCArrayKokkos<double>& elem_pres,
+               DViewCArrayKokkos<double>& elem_stress,
+               DViewCArrayKokkos<double>& elem_sspd,
+               DViewCArrayKokkos<double>& elem_sie,
+               DViewCArrayKokkos<double>& elem_vol,
+               DViewCArrayKokkos<double>& elem_div,
+               DViewCArrayKokkos<double>& elem_mass,
+               DViewCArrayKokkos<size_t>& elem_mat_id,
+               DViewCArrayKokkos<double>& elem_statev,
+               DViewCArrayKokkos<double>& corner_force,
+               DViewCArrayKokkos<double>& corner_mass,
+               double&                    time_value,
+               const double               time_final,
+               const double               dt_max,
+               const double               dt_min,
+               const double               dt_cfl,
+               double&                    graphics_time,
+               size_t                     graphics_cyc_ival,
+               double                     graphics_dt_ival,
+               const size_t               cycle_stop,
+               const size_t               rk_num_stages,
+               double                     dt,
+               const double               fuzz,
+               const double               tiny,
+               const double               small,
+               CArray<double>&            graphics_times,
+               size_t&                    graphics_id);
+
+void rk_init(DViewCArrayKokkos<double>& node_coords,
+             DViewCArrayKokkos<double>& node_vel,
+             DViewCArrayKokkos<double>& elem_sie,
+             DViewCArrayKokkos<double>& elem_stress,
+             const size_t               num_dims,
+             const size_t               num_elems,
+             const size_t               num_nodes);
+
+void get_timestep(mesh_t&                    mesh,
+                  DViewCArrayKokkos<double>& node_coords,
+                  DViewCArrayKokkos<double>& node_vel,
+                  DViewCArrayKokkos<double>& elem_sspd,
+                  DViewCArrayKokkos<double>& elem_vol,
+                  double                     time_value,
+                  const double               graphics_time,
+                  const double               time_final,
+                  const double               dt_max,
+                  const double               dt_min,
+                  const double               dt_cfl,
+                  double&                    dt,
+                  const double               fuzz);
+
+void get_timestep2D(mesh_t&                    mesh,
+                    DViewCArrayKokkos<double>& node_coords,
+                    DViewCArrayKokkos<double>& node_vel,
+                    DViewCArrayKokkos<double>& elem_sspd,
+                    DViewCArrayKokkos<double>& elem_vol,
+                    double                     time_value,
+                    const double               graphics_time,
+                    const double               time_final,
+                    const double               dt_max,
+                    const double               dt_min,
+                    const double               dt_cfl,
+                    double&                    dt,
+                    const double               fuzz);
+
+void get_divergence(DViewCArrayKokkos<double>&       elem_div,
+                    const mesh_t                     mesh,
+                    const DViewCArrayKokkos<double>& node_coords,
+                    const DViewCArrayKokkos<double>& node_vel,
+                    const DViewCArrayKokkos<double>& elem_vol);
+
+void get_divergence2D(DViewCArrayKokkos<double>&       elem_div,
+                      const mesh_t                     mesh,
+                      const DViewCArrayKokkos<double>& node_coords,
+                      const DViewCArrayKokkos<double>& node_vel,
+                      const DViewCArrayKokkos<double>& elem_vol);
 
 KOKKOS_FUNCTION
-void get_velgrad(ViewCArrayKokkos <double> &vel_grad,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                 const DViewCArrayKokkos <double> &node_vel,
-                 const ViewCArrayKokkos <double> &b_matrix,
-                 const double elem_vol,
-                 const size_t elem_gid);
-
+void get_velgrad(ViewCArrayKokkos<double>&        vel_grad,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                 const DViewCArrayKokkos<double>& node_vel,
+                 const ViewCArrayKokkos<double>&  b_matrix,
+                 const double                     elem_vol,
+                 const size_t                     elem_gid);
 
 KOKKOS_FUNCTION
-void get_velgrad2D(ViewCArrayKokkos <double> &vel_grad,
-                   const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                   const DViewCArrayKokkos <double> &node_vel,
-                   const ViewCArrayKokkos <double> &b_matrix,
-                   const double elem_vol,
-                   const double elem_area,
-                   const size_t elem_gid);
+void get_velgrad2D(ViewCArrayKokkos<double>&        vel_grad,
+                   const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                   const DViewCArrayKokkos<double>& node_vel,
+                   const ViewCArrayKokkos<double>&  b_matrix,
+                   const double                     elem_vol,
+                   const double                     elem_area,
+                   const size_t                     elem_gid);
 
 KOKKOS_FUNCTION
-void decompose_vel_grad(ViewCArrayKokkos <double> &D_tensor,
-                        ViewCArrayKokkos <double> &W_tensor,
-                        const ViewCArrayKokkos <double> &vel_grad,
-                        const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                        const size_t elem_gid,
-                        const DViewCArrayKokkos <double> &node_coords,
-                        const DViewCArrayKokkos <double> &node_vel,
-                        const double vol);
-
-
-void get_force_sgh(const CArrayKokkos <material_t> &material,
-                   const mesh_t &mesh,
-                   const DViewCArrayKokkos <double> &node_coords,
-                   const DViewCArrayKokkos <double> &node_vel,
-                   const DViewCArrayKokkos <double> &elem_den,
-                   const DViewCArrayKokkos <double> &elem_sie,
-                   const DViewCArrayKokkos <double> &elem_pres,
-                   const DViewCArrayKokkos <double> &elem_stress,
-                   const DViewCArrayKokkos <double> &elem_sspd,
-                   const DViewCArrayKokkos <double> &elem_vol,
-                   const DViewCArrayKokkos <double> &elem_div,
-                   const DViewCArrayKokkos <size_t> &elem_mat_id,
-                   DViewCArrayKokkos <double> &corner_force,
-                   const double fuzz,
-                   const double small,
-                   const DViewCArrayKokkos <double> &elem_statev,
-                   const double dt,
-                   const double rk_alpha);
-
-
-void get_force_sgh2D(const CArrayKokkos <material_t> &material,
-                     const mesh_t &mesh,
-                     const DViewCArrayKokkos <double> &node_coords,
-                     const DViewCArrayKokkos <double> &node_vel,
-                     const DViewCArrayKokkos <double> &elem_den,
-                     const DViewCArrayKokkos <double> &elem_sie,
-                     const DViewCArrayKokkos <double> &elem_pres,
-                     const DViewCArrayKokkos <double> &elem_stress,
-                     const DViewCArrayKokkos <double> &elem_sspd,
-                     const DViewCArrayKokkos <double> &elem_vol,
-                     const DViewCArrayKokkos <double> &elem_div,
-                     const DViewCArrayKokkos <size_t> &elem_mat_id,
-                     DViewCArrayKokkos <double> &corner_force,
-                     const double fuzz,
-                     const double small,
-                     const DViewCArrayKokkos <double> &elem_statev,
-                     const double dt,
-                     const double rk_alpha);
-
-
-void update_velocity_sgh(double rk_alpha,
-                         double dt,
-                         const mesh_t &mesh,
-                         DViewCArrayKokkos <double> &node_vel,
-                         const DViewCArrayKokkos <double> &node_mass,
-                         const DViewCArrayKokkos <double> &corner_force);
-
-
-void update_energy_sgh(double rk_alpha,
-                       double dt,
-                       const mesh_t &mesh,
-                       const DViewCArrayKokkos <double> &node_vel,
-                       const DViewCArrayKokkos <double> &node_coords,
-                       DViewCArrayKokkos <double> &elem_sie,
-                       const DViewCArrayKokkos <double> &elem_mass,
-                       const DViewCArrayKokkos <double> &corner_force);
-
-
-void update_position_sgh(double rk_alpha,
-                         double dt,
-                         const size_t num_dims,
-                         const size_t num_nodes,
-                         DViewCArrayKokkos <double> &node_coords,
-                         const DViewCArrayKokkos <double> &node_vel);
-
-
-void update_state(const CArrayKokkos <material_t> &material,
-                  const mesh_t &mesh,
-                  const DViewCArrayKokkos <double> &node_coords,
-                  const DViewCArrayKokkos <double> &node_vel,
-                  DViewCArrayKokkos <double> &elem_den,
-                  DViewCArrayKokkos <double> &elem_pres,
-                  DViewCArrayKokkos <double> &elem_stress,
-                  DViewCArrayKokkos <double> &elem_sspd,
-                  const DViewCArrayKokkos <double> &elem_sie,
-                  const DViewCArrayKokkos <double> &elem_vol,
-                  const DViewCArrayKokkos <double> &elem_mass,
-                  const DViewCArrayKokkos <size_t> &elem_mat_id,
-                  const DViewCArrayKokkos <double> &elem_statev,
-                  const double dt,
-                  const double rk_alpha);
-
-
-void update_state2D(const CArrayKokkos <material_t> &material,
-                    const mesh_t &mesh,
-                    const DViewCArrayKokkos <double> &node_coords,
-                    const DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &elem_den,
-                    DViewCArrayKokkos <double> &elem_pres,
-                    DViewCArrayKokkos <double> &elem_stress,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    const DViewCArrayKokkos <double> &elem_sie,
-                    const DViewCArrayKokkos <double> &elem_vol,
-                    const DViewCArrayKokkos <double> &elem_mass,
-                    const DViewCArrayKokkos <size_t> &elem_mat_id,
-                    const DViewCArrayKokkos <double> &elem_statev,
-                    const double dt,
-                    const double rk_alpha);
-
+void decompose_vel_grad(ViewCArrayKokkos<double>&        D_tensor,
+                        ViewCArrayKokkos<double>&        W_tensor,
+                        const ViewCArrayKokkos<double>&  vel_grad,
+                        const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                        const size_t                     elem_gid,
+                        const DViewCArrayKokkos<double>& node_coords,
+                        const DViewCArrayKokkos<double>& node_vel,
+                        const double                     vol);
+
+void get_force_sgh(const CArrayKokkos<material_t>&  material,
+                   const mesh_t&                    mesh,
+                   const DViewCArrayKokkos<double>& node_coords,
+                   const DViewCArrayKokkos<double>& node_vel,
+                   const DViewCArrayKokkos<double>& elem_den,
+                   const DViewCArrayKokkos<double>& elem_sie,
+                   const DViewCArrayKokkos<double>& elem_pres,
+                   const DViewCArrayKokkos<double>& elem_stress,
+                   const DViewCArrayKokkos<double>& elem_sspd,
+                   const DViewCArrayKokkos<double>& elem_vol,
+                   const DViewCArrayKokkos<double>& elem_div,
+                   const DViewCArrayKokkos<size_t>& elem_mat_id,
+                   DViewCArrayKokkos<double>&       corner_force,
+                   const double                     fuzz,
+                   const double                     small,
+                   const DViewCArrayKokkos<double>& elem_statev,
+                   const double                     dt,
+                   const double                     rk_alpha);
+
+void get_force_sgh2D(const CArrayKokkos<material_t>&  material,
+                     const mesh_t&                    mesh,
+                     const DViewCArrayKokkos<double>& node_coords,
+                     const DViewCArrayKokkos<double>& node_vel,
+                     const DViewCArrayKokkos<double>& elem_den,
+                     const DViewCArrayKokkos<double>& elem_sie,
+                     const DViewCArrayKokkos<double>& elem_pres,
+                     const DViewCArrayKokkos<double>& elem_stress,
+                     const DViewCArrayKokkos<double>& elem_sspd,
+                     const DViewCArrayKokkos<double>& elem_vol,
+                     const DViewCArrayKokkos<double>& elem_div,
+                     const DViewCArrayKokkos<size_t>& elem_mat_id,
+                     DViewCArrayKokkos<double>&       corner_force,
+                     const double                     fuzz,
+                     const double                     small,
+                     const DViewCArrayKokkos<double>& elem_statev,
+                     const double                     dt,
+                     const double                     rk_alpha);
+
+void update_velocity_sgh(double                           rk_alpha,
+                         double                           dt,
+                         const mesh_t&                    mesh,
+                         DViewCArrayKokkos<double>&       node_vel,
+                         const DViewCArrayKokkos<double>& node_mass,
+                         const DViewCArrayKokkos<double>& corner_force);
+
+void update_energy_sgh(double                           rk_alpha,
+                       double                           dt,
+                       const mesh_t&                    mesh,
+                       const DViewCArrayKokkos<double>& node_vel,
+                       const DViewCArrayKokkos<double>& node_coords,
+                       DViewCArrayKokkos<double>&       elem_sie,
+                       const DViewCArrayKokkos<double>& elem_mass,
+                       const DViewCArrayKokkos<double>& corner_force);
+
+void update_position_sgh(double                           rk_alpha,
+                         double                           dt,
+                         const size_t                     num_dims,
+                         const size_t                     num_nodes,
+                         DViewCArrayKokkos<double>&       node_coords,
+                         const DViewCArrayKokkos<double>& node_vel);
+
+void update_state(const CArrayKokkos<material_t>&  material,
+                  const mesh_t&                    mesh,
+                  const DViewCArrayKokkos<double>& node_coords,
+                  const DViewCArrayKokkos<double>& node_vel,
+                  DViewCArrayKokkos<double>&       elem_den,
+                  DViewCArrayKokkos<double>&       elem_pres,
+                  DViewCArrayKokkos<double>&       elem_stress,
+                  DViewCArrayKokkos<double>&       elem_sspd,
+                  const DViewCArrayKokkos<double>& elem_sie,
+                  const DViewCArrayKokkos<double>& elem_vol,
+                  const DViewCArrayKokkos<double>& elem_mass,
+                  const DViewCArrayKokkos<size_t>& elem_mat_id,
+                  const DViewCArrayKokkos<double>& elem_statev,
+                  const double                     dt,
+                  const double                     rk_alpha);
+
+void update_state2D(const CArrayKokkos<material_t>&  material,
+                    const mesh_t&                    mesh,
+                    const DViewCArrayKokkos<double>& node_coords,
+                    const DViewCArrayKokkos<double>& node_vel,
+                    DViewCArrayKokkos<double>&       elem_den,
+                    DViewCArrayKokkos<double>&       elem_pres,
+                    DViewCArrayKokkos<double>&       elem_stress,
+                    DViewCArrayKokkos<double>&       elem_sspd,
+                    const DViewCArrayKokkos<double>& elem_sie,
+                    const DViewCArrayKokkos<double>& elem_vol,
+                    const DViewCArrayKokkos<double>& elem_mass,
+                    const DViewCArrayKokkos<size_t>& elem_mat_id,
+                    const DViewCArrayKokkos<double>& elem_statev,
+                    const double                     dt,
+                    const double                     rk_alpha);
 
 KOKKOS_FUNCTION
-void get_area_weights2D(const ViewCArrayKokkos <double> &corner_areas,
-                        const size_t elem_gid,
-                        const DViewCArrayKokkos <double> &node_coords,
-                        const ViewCArrayKokkos <size_t>  &elem_node_gids);
-
+void get_area_weights2D(const ViewCArrayKokkos<double>&  corner_areas,
+                        const size_t                     elem_gid,
+                        const DViewCArrayKokkos<double>& node_coords,
+                        const ViewCArrayKokkos<size_t>&  elem_node_gids);
 
 KOKKOS_FUNCTION
 double heron(const double x1,
@@ -1519,5 +1385,4 @@ double heron(const double x1,
              const double x3,
              const double y3);
 
-
-#endif 
+#endif
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/momentum.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/momentum.cpp
index 1ea1cf6f5..78d5c4292 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/momentum.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/momentum.cpp
@@ -1,418 +1,385 @@
 
-
 #include "mesh.h"
 #include "state.h"
 
-
-
 // -----------------------------------------------------------------------------
 // This function evolves the velocity at the nodes of the mesh
-//------------------------------------------------------------------------------
-void update_velocity_sgh(double rk_alpha,
-                         double dt,
-                         const mesh_t &mesh,
-                         DViewCArrayKokkos <double> &node_vel,
-                         const DViewCArrayKokkos <double> &node_mass,
-                         const DViewCArrayKokkos <double> &corner_force
-                         ){
-
-    
+// ------------------------------------------------------------------------------
+void update_velocity_sgh(double                           rk_alpha,
+                         double                           dt,
+                         const mesh_t&                    mesh,
+                         DViewCArrayKokkos<double>&       node_vel,
+                         const DViewCArrayKokkos<double>& node_mass,
+                         const DViewCArrayKokkos<double>& corner_force
+                         )
+{
     const size_t num_dims = mesh.num_dims;
-    
+
     // walk over the nodes to update the velocity
     FOR_ALL(node_gid, 0, mesh.num_nodes, {
-
         double node_force[3];
-        for (size_t dim = 0; dim < num_dims; dim++){
+        for (size_t dim = 0; dim < num_dims; dim++)
+        {
             node_force[dim] = 0.0;
         } // end for dim
-        
+
         // loop over all corners around the node and calculate the nodal force
-        for (size_t corner_lid=0; corner_lid<mesh.num_corners_in_node(node_gid); corner_lid++){
-        
+        for (size_t corner_lid = 0; corner_lid < mesh.num_corners_in_node(node_gid); corner_lid++)
+        {
             // Get corner gid
             size_t corner_gid = mesh.corners_in_node(node_gid, corner_lid);
-            
+
             // loop over dimension
-            for (size_t dim = 0; dim < num_dims; dim++){
+            for (size_t dim = 0; dim < num_dims; dim++)
+            {
                 node_force[dim] += corner_force(corner_gid, dim);
             } // end for dim
-            
         } // end for corner_lid
-        
+
         // update the velocity
-        for (int dim = 0; dim < num_dims; dim++){
+        for (int dim = 0; dim < num_dims; dim++)
+        {
             node_vel(1, node_gid, dim) = node_vel(0, node_gid, dim) +
-                                         rk_alpha * dt*node_force[dim]/node_mass(node_gid);
+                                         rk_alpha * dt * node_force[dim] / node_mass(node_gid);
         } // end for dim
-        
     }); // end for parallel for over nodes
-    
+
     return;
-    
 } // end subroutine update_velocity
 
-
 // -----------------------------------------------------------------------------
 // This function calculates the velocity gradient
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void get_velgrad(ViewCArrayKokkos <double> &vel_grad,
-                 const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                 const DViewCArrayKokkos <double> &node_vel,
-                 const ViewCArrayKokkos <double> &b_matrix,
-                 const double elem_vol,
-                 const size_t elem_gid
-                 ){
-    
-
+void get_velgrad(ViewCArrayKokkos<double>&        vel_grad,
+                 const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                 const DViewCArrayKokkos<double>& node_vel,
+                 const ViewCArrayKokkos<double>&  b_matrix,
+                 const double                     elem_vol,
+                 const size_t                     elem_gid
+                 )
+{
     const size_t num_nodes_in_elem = 8;
-    
-    double u_array[num_nodes_in_elem];
-    double v_array[num_nodes_in_elem];
-    double w_array[num_nodes_in_elem];
-    ViewCArrayKokkos <double> u(u_array, num_nodes_in_elem); // x-dir vel component
-    ViewCArrayKokkos <double> v(v_array, num_nodes_in_elem); // y-dir vel component
-    ViewCArrayKokkos <double> w(w_array, num_nodes_in_elem); // z-dir vel component
-    
+
+    double                   u_array[num_nodes_in_elem];
+    double                   v_array[num_nodes_in_elem];
+    double                   w_array[num_nodes_in_elem];
+    ViewCArrayKokkos<double> u(u_array, num_nodes_in_elem); // x-dir vel component
+    ViewCArrayKokkos<double> v(v_array, num_nodes_in_elem); // y-dir vel component
+    ViewCArrayKokkos<double> w(w_array, num_nodes_in_elem); // z-dir vel component
+
     // get the vertex velocities for the cell
-    for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-        
+    for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+    {
         // Get node gid
         size_t node_gid = elem_node_gids(node_lid);
 
         u(node_lid) = node_vel(1, node_gid, 0);
         v(node_lid) = node_vel(1, node_gid, 1);
         w(node_lid) = node_vel(1, node_gid, 2);
-        
     } // end for
 
-    
     // --- calculate the velocity gradient terms ---
-    double inverse_vol = 1.0/elem_vol;
-    
+    double inverse_vol = 1.0 / elem_vol;
+
     // x-dir
-    vel_grad(0,0) = (u(0)*b_matrix(0,0) + u(1)*b_matrix(1,0)
-                   + u(2)*b_matrix(2,0) + u(3)*b_matrix(3,0)
-                   + u(4)*b_matrix(4,0) + u(5)*b_matrix(5,0)
-                   + u(6)*b_matrix(6,0) + u(7)*b_matrix(7,0))*inverse_vol;
-    
-    vel_grad(0,1) = (u(0)*b_matrix(0,1) + u(1)*b_matrix(1,1)
-                   + u(2)*b_matrix(2,1) + u(3)*b_matrix(3,1)
-                   + u(4)*b_matrix(4,1) + u(5)*b_matrix(5,1)
-                   + u(6)*b_matrix(6,1) + u(7)*b_matrix(7,1))*inverse_vol;
-    
-    vel_grad(0,2) = (u(0)*b_matrix(0,2) + u(1)*b_matrix(1,2)
-                   + u(2)*b_matrix(2,2) + u(3)*b_matrix(3,2)
-                   + u(4)*b_matrix(4,2) + u(5)*b_matrix(5,2)
-                   + u(6)*b_matrix(6,2) + u(7)*b_matrix(7,2))*inverse_vol;
-    
+    vel_grad(0, 0) = (u(0) * b_matrix(0, 0) + u(1) * b_matrix(1, 0)
+                      + u(2) * b_matrix(2, 0) + u(3) * b_matrix(3, 0)
+                      + u(4) * b_matrix(4, 0) + u(5) * b_matrix(5, 0)
+                      + u(6) * b_matrix(6, 0) + u(7) * b_matrix(7, 0)) * inverse_vol;
+
+    vel_grad(0, 1) = (u(0) * b_matrix(0, 1) + u(1) * b_matrix(1, 1)
+                      + u(2) * b_matrix(2, 1) + u(3) * b_matrix(3, 1)
+                      + u(4) * b_matrix(4, 1) + u(5) * b_matrix(5, 1)
+                      + u(6) * b_matrix(6, 1) + u(7) * b_matrix(7, 1)) * inverse_vol;
+
+    vel_grad(0, 2) = (u(0) * b_matrix(0, 2) + u(1) * b_matrix(1, 2)
+                      + u(2) * b_matrix(2, 2) + u(3) * b_matrix(3, 2)
+                      + u(4) * b_matrix(4, 2) + u(5) * b_matrix(5, 2)
+                      + u(6) * b_matrix(6, 2) + u(7) * b_matrix(7, 2)) * inverse_vol;
+
     // y-dir
-    vel_grad(1,0) = (v(0)*b_matrix(0,0) + v(1)*b_matrix(1,0)
-                   + v(2)*b_matrix(2,0) + v(3)*b_matrix(3,0)
-                   + v(4)*b_matrix(4,0) + v(5)*b_matrix(5,0)
-                   + v(6)*b_matrix(6,0) + v(7)*b_matrix(7,0))*inverse_vol;
-    
-    vel_grad(1,1) = (v(0)*b_matrix(0,1) + v(1)*b_matrix(1,1)
-                   + v(2)*b_matrix(2,1) + v(3)*b_matrix(3,1)
-                   + v(4)*b_matrix(4,1) + v(5)*b_matrix(5,1)
-                   + v(6)*b_matrix(6,1) + v(7)*b_matrix(7,1))*inverse_vol;
-    vel_grad(1,2) = (v(0)*b_matrix(0,2) + v(1)*b_matrix(1,2)
-                   + v(2)*b_matrix(2,2) + v(3)*b_matrix(3,2)
-                   + v(4)*b_matrix(4,2) + v(5)*b_matrix(5,2)
-                   + v(6)*b_matrix(6,2) + v(7)*b_matrix(7,2))*inverse_vol;
-    
+    vel_grad(1, 0) = (v(0) * b_matrix(0, 0) + v(1) * b_matrix(1, 0)
+                      + v(2) * b_matrix(2, 0) + v(3) * b_matrix(3, 0)
+                      + v(4) * b_matrix(4, 0) + v(5) * b_matrix(5, 0)
+                      + v(6) * b_matrix(6, 0) + v(7) * b_matrix(7, 0)) * inverse_vol;
+
+    vel_grad(1, 1) = (v(0) * b_matrix(0, 1) + v(1) * b_matrix(1, 1)
+                      + v(2) * b_matrix(2, 1) + v(3) * b_matrix(3, 1)
+                      + v(4) * b_matrix(4, 1) + v(5) * b_matrix(5, 1)
+                      + v(6) * b_matrix(6, 1) + v(7) * b_matrix(7, 1)) * inverse_vol;
+    vel_grad(1, 2) = (v(0) * b_matrix(0, 2) + v(1) * b_matrix(1, 2)
+                      + v(2) * b_matrix(2, 2) + v(3) * b_matrix(3, 2)
+                      + v(4) * b_matrix(4, 2) + v(5) * b_matrix(5, 2)
+                      + v(6) * b_matrix(6, 2) + v(7) * b_matrix(7, 2)) * inverse_vol;
+
     // z-dir
-    vel_grad(2,0) = (w(0)*b_matrix(0,0) + w(1)*b_matrix(1,0)
-                   + w(2)*b_matrix(2,0) + w(3)*b_matrix(3,0)
-                   + w(4)*b_matrix(4,0) + w(5)*b_matrix(5,0)
-                   + w(6)*b_matrix(6,0) + w(7)*b_matrix(7,0))*inverse_vol;
-    
-    vel_grad(2,1) = (w(0)*b_matrix(0,1) + w(1)*b_matrix(1,1)
-                   + w(2)*b_matrix(2,1) + w(3)*b_matrix(3,1)
-                   + w(4)*b_matrix(4,1) + w(5)*b_matrix(5,1)
-                   + w(6)*b_matrix(6,1) + w(7)*b_matrix(7,1))*inverse_vol;
-    
-    vel_grad(2,2) = (w(0)*b_matrix(0,2) + w(1)*b_matrix(1,2)
-                   + w(2)*b_matrix(2,2) + w(3)*b_matrix(3,2)
-                   + w(4)*b_matrix(4,2) + w(5)*b_matrix(5,2)
-                   + w(6)*b_matrix(6,2) + w(7)*b_matrix(7,2))*inverse_vol;
-
-    
+    vel_grad(2, 0) = (w(0) * b_matrix(0, 0) + w(1) * b_matrix(1, 0)
+                      + w(2) * b_matrix(2, 0) + w(3) * b_matrix(3, 0)
+                      + w(4) * b_matrix(4, 0) + w(5) * b_matrix(5, 0)
+                      + w(6) * b_matrix(6, 0) + w(7) * b_matrix(7, 0)) * inverse_vol;
+
+    vel_grad(2, 1) = (w(0) * b_matrix(0, 1) + w(1) * b_matrix(1, 1)
+                      + w(2) * b_matrix(2, 1) + w(3) * b_matrix(3, 1)
+                      + w(4) * b_matrix(4, 1) + w(5) * b_matrix(5, 1)
+                      + w(6) * b_matrix(6, 1) + w(7) * b_matrix(7, 1)) * inverse_vol;
+
+    vel_grad(2, 2) = (w(0) * b_matrix(0, 2) + w(1) * b_matrix(1, 2)
+                      + w(2) * b_matrix(2, 2) + w(3) * b_matrix(3, 2)
+                      + w(4) * b_matrix(4, 2) + w(5) * b_matrix(5, 2)
+                      + w(6) * b_matrix(6, 2) + w(7) * b_matrix(7, 2)) * inverse_vol;
+
     return;
-    
 } // end function
 
-
 // -----------------------------------------------------------------------------
 // This function calculates the velocity gradient
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void get_velgrad2D(ViewCArrayKokkos <double> &vel_grad,
-                   const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                   const DViewCArrayKokkos <double> &node_vel,
-                   const ViewCArrayKokkos <double> &b_matrix,
-                   const double elem_vol,
-                   const double elem_area,
-                   const size_t elem_gid
-                   ){
-    
-
+void get_velgrad2D(ViewCArrayKokkos<double>&        vel_grad,
+                   const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                   const DViewCArrayKokkos<double>& node_vel,
+                   const ViewCArrayKokkos<double>&  b_matrix,
+                   const double                     elem_vol,
+                   const double                     elem_area,
+                   const size_t                     elem_gid
+                   )
+{
     const size_t num_nodes_in_elem = 4;
-    
-    double u_array[num_nodes_in_elem];
-    double v_array[num_nodes_in_elem];
-    ViewCArrayKokkos <double> u(u_array, num_nodes_in_elem); // x-dir vel component
-    ViewCArrayKokkos <double> v(v_array, num_nodes_in_elem); // y-dir vel component
-    
+
+    double                   u_array[num_nodes_in_elem];
+    double                   v_array[num_nodes_in_elem];
+    ViewCArrayKokkos<double> u(u_array, num_nodes_in_elem); // x-dir vel component
+    ViewCArrayKokkos<double> v(v_array, num_nodes_in_elem); // y-dir vel component
+
     // get the vertex velocities for the cell
-    for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-        
+    for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+    {
         // Get node gid
         size_t node_gid = elem_node_gids(node_lid);
 
         u(node_lid) = node_vel(1, node_gid, 0); // x-comp
         v(node_lid) = node_vel(1, node_gid, 1); // y-comp
-        
     } // end for
-    
+
     // initialize to zero
-    for (size_t i=0; i<3; i++){
-        for (size_t j=0; j<3; j++){
-            vel_grad(i,j) = 0.0;
+    for (size_t i = 0; i < 3; i++)
+    {
+        for (size_t j = 0; j < 3; j++)
+        {
+            vel_grad(i, j) = 0.0;
         }
     }
-    
 
-    double mean_radius = elem_vol/elem_area;
-    double elem_vel_r = 0.25*(v(0) + v(1) + v(2) + v(3));
-    
-    
+    double mean_radius = elem_vol / elem_area;
+    double elem_vel_r  = 0.25 * (v(0) + v(1) + v(2) + v(3));
+
     // --- calculate the velocity gradient terms ---
-    double inverse_area = 1.0/elem_area;
-    
+    double inverse_area = 1.0 / elem_area;
+
     // x-dir
-    vel_grad(0,0) = (u(0)*b_matrix(0,0) + u(1)*b_matrix(1,0)
-                   + u(2)*b_matrix(2,0) + u(3)*b_matrix(3,0))*inverse_area;
-    
-    vel_grad(0,1) = (u(0)*b_matrix(0,1) + u(1)*b_matrix(1,1)
-                   + u(2)*b_matrix(2,1) + u(3)*b_matrix(3,1))*inverse_area;
-    
-    
+    vel_grad(0, 0) = (u(0) * b_matrix(0, 0) + u(1) * b_matrix(1, 0)
+                      + u(2) * b_matrix(2, 0) + u(3) * b_matrix(3, 0)) * inverse_area;
+
+    vel_grad(0, 1) = (u(0) * b_matrix(0, 1) + u(1) * b_matrix(1, 1)
+                      + u(2) * b_matrix(2, 1) + u(3) * b_matrix(3, 1)) * inverse_area;
+
     // y-dir
-    vel_grad(1,0) = (v(0)*b_matrix(0,0) + v(1)*b_matrix(1,0)
-                   + v(2)*b_matrix(2,0) + v(3)*b_matrix(3,0))*inverse_area;
-    
-    vel_grad(1,1) = (v(0)*b_matrix(0,1) + v(1)*b_matrix(1,1)
-                   + v(2)*b_matrix(2,1) + v(3)*b_matrix(3,1))*inverse_area;
-    
-    vel_grad(2,2) = elem_vel_r/mean_radius;  // + avg(vel_R)/R
-    
-    return;
-    
-} // end function
+    vel_grad(1, 0) = (v(0) * b_matrix(0, 0) + v(1) * b_matrix(1, 0)
+                      + v(2) * b_matrix(2, 0) + v(3) * b_matrix(3, 0)) * inverse_area;
 
+    vel_grad(1, 1) = (v(0) * b_matrix(0, 1) + v(1) * b_matrix(1, 1)
+                      + v(2) * b_matrix(2, 1) + v(3) * b_matrix(3, 1)) * inverse_area;
 
+    vel_grad(2, 2) = elem_vel_r / mean_radius;  // + avg(vel_R)/R
 
+    return;
+} // end function
 
 // -----------------------------------------------------------------------------
 // This subroutine to calculate the velocity divergence in all elements
-//------------------------------------------------------------------------------
-void get_divergence(DViewCArrayKokkos <double> &elem_div,
-                    const mesh_t mesh,
-                    const DViewCArrayKokkos <double> &node_coords,
-                    const DViewCArrayKokkos <double> &node_vel,
-                    const DViewCArrayKokkos <double> &elem_vol
-                    ){
-    
+// ------------------------------------------------------------------------------
+void get_divergence(DViewCArrayKokkos<double>&       elem_div,
+                    const mesh_t                     mesh,
+                    const DViewCArrayKokkos<double>& node_coords,
+                    const DViewCArrayKokkos<double>& node_vel,
+                    const DViewCArrayKokkos<double>& elem_vol
+                    )
+{
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-    
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_nodes_in_elem = 8;
         const size_t num_dims = 3;
-        
+
         double u_array[num_nodes_in_elem];
         double v_array[num_nodes_in_elem];
         double w_array[num_nodes_in_elem];
-        ViewCArrayKokkos <double> u(u_array, num_nodes_in_elem); // x-dir vel component
-        ViewCArrayKokkos <double> v(v_array, num_nodes_in_elem); // y-dir vel component
-        ViewCArrayKokkos <double> w(w_array, num_nodes_in_elem); // z-dir vel component
-        
+        ViewCArrayKokkos<double> u(u_array, num_nodes_in_elem); // x-dir vel component
+        ViewCArrayKokkos<double> v(v_array, num_nodes_in_elem); // y-dir vel component
+        ViewCArrayKokkos<double> w(w_array, num_nodes_in_elem); // z-dir vel component
+
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 8);
+
         // The b_matrix are the outward corner area normals
         double b_matrix_array[24];
-        ViewCArrayKokkos <double> b_matrix(b_matrix_array, num_nodes_in_elem, num_dims);
+        ViewCArrayKokkos<double> b_matrix(b_matrix_array, num_nodes_in_elem, num_dims);
         get_bmatrix(b_matrix,
                     elem_gid,
                     node_coords,
                     elem_node_gids);
-        
+
         // get the vertex velocities for the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-        
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get node gid
             size_t node_gid = elem_node_gids(node_lid);
-    
+
             u(node_lid) = node_vel(1, node_gid, 0);
             v(node_lid) = node_vel(1, node_gid, 1);
             w(node_lid) = node_vel(1, node_gid, 2);
-        
         } // end for
-    
-        
+
         // --- calculate the velocity divergence terms ---
-        double inverse_vol = 1.0/elem_vol(elem_gid);
-        
+        double inverse_vol = 1.0 / elem_vol(elem_gid);
+
         elem_div(elem_gid) = 0.0;
-        
+
         // x-dir
-        elem_div(elem_gid) += (u(0)*b_matrix(0,0) + u(1)*b_matrix(1,0)
-                             + u(2)*b_matrix(2,0) + u(3)*b_matrix(3,0)
-                             + u(4)*b_matrix(4,0) + u(5)*b_matrix(5,0)
-                             + u(6)*b_matrix(6,0) + u(7)*b_matrix(7,0))*inverse_vol;
-        
+        elem_div(elem_gid) += (u(0) * b_matrix(0, 0) + u(1) * b_matrix(1, 0)
+                               + u(2) * b_matrix(2, 0) + u(3) * b_matrix(3, 0)
+                               + u(4) * b_matrix(4, 0) + u(5) * b_matrix(5, 0)
+                               + u(6) * b_matrix(6, 0) + u(7) * b_matrix(7, 0)) * inverse_vol;
+
         // y-dir
-        elem_div(elem_gid) += (v(0)*b_matrix(0,1) + v(1)*b_matrix(1,1)
-                             + v(2)*b_matrix(2,1) + v(3)*b_matrix(3,1)
-                             + v(4)*b_matrix(4,1) + v(5)*b_matrix(5,1)
-                             + v(6)*b_matrix(6,1) + v(7)*b_matrix(7,1))*inverse_vol;
-        
+        elem_div(elem_gid) += (v(0) * b_matrix(0, 1) + v(1) * b_matrix(1, 1)
+                               + v(2) * b_matrix(2, 1) + v(3) * b_matrix(3, 1)
+                               + v(4) * b_matrix(4, 1) + v(5) * b_matrix(5, 1)
+                               + v(6) * b_matrix(6, 1) + v(7) * b_matrix(7, 1)) * inverse_vol;
+
         // z-dir
-        elem_div(elem_gid) += (w(0)*b_matrix(0,2) + w(1)*b_matrix(1,2)
-                             + w(2)*b_matrix(2,2) + w(3)*b_matrix(3,2)
-                             + w(4)*b_matrix(4,2) + w(5)*b_matrix(5,2)
-                             + w(6)*b_matrix(6,2) + w(7)*b_matrix(7,2))*inverse_vol;
-        
+        elem_div(elem_gid) += (w(0) * b_matrix(0, 2) + w(1) * b_matrix(1, 2)
+                               + w(2) * b_matrix(2, 2) + w(3) * b_matrix(3, 2)
+                               + w(4) * b_matrix(4, 2) + w(5) * b_matrix(5, 2)
+                               + w(6) * b_matrix(6, 2) + w(7) * b_matrix(7, 2)) * inverse_vol;
     });  // end parallel for over elem_gid
-    
+
     return;
-    
 } // end subroutine
 
-
 // -----------------------------------------------------------------------------
 // This subroutine to calculate the velocity divergence in all elements
-//------------------------------------------------------------------------------
-void get_divergence2D(DViewCArrayKokkos <double> &elem_div,
-                      const mesh_t mesh,
-                      const DViewCArrayKokkos <double> &node_coords,
-                      const DViewCArrayKokkos <double> &node_vel,
-                      const DViewCArrayKokkos <double> &elem_vol
-                      ){
-    
+// ------------------------------------------------------------------------------
+void get_divergence2D(DViewCArrayKokkos<double>&       elem_div,
+                      const mesh_t                     mesh,
+                      const DViewCArrayKokkos<double>& node_coords,
+                      const DViewCArrayKokkos<double>& node_vel,
+                      const DViewCArrayKokkos<double>& elem_vol
+                      )
+{
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-    
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_nodes_in_elem = 4;
         const size_t num_dims = 2;
-        
+
         double u_array[num_nodes_in_elem];
         double v_array[num_nodes_in_elem];
-        ViewCArrayKokkos <double> u(u_array, num_nodes_in_elem); // x-dir vel component
-        ViewCArrayKokkos <double> v(v_array, num_nodes_in_elem); // y-dir vel component
-        
-        
+        ViewCArrayKokkos<double> u(u_array, num_nodes_in_elem); // x-dir vel component
+        ViewCArrayKokkos<double> v(v_array, num_nodes_in_elem); // y-dir vel component
+
         // true volume RZ
-        //double r_array[num_nodes_in_elem];
-        //ViewCArrayKokkos <double> r(r_array, num_nodes_in_elem); // r-dir coordinate
-        
-        
+        // double r_array[num_nodes_in_elem];
+        // ViewCArrayKokkos <double> r(r_array, num_nodes_in_elem); // r-dir coordinate
+
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
+
         // The b_matrix are the outward corner area normals
         double b_matrix_array[24];
-        ViewCArrayKokkos <double> b_matrix(b_matrix_array, num_nodes_in_elem, num_dims);
+        ViewCArrayKokkos<double> b_matrix(b_matrix_array, num_nodes_in_elem, num_dims);
         get_bmatrix2D(b_matrix,
                       elem_gid,
                       node_coords,
                       elem_node_gids);
-        
+
         // calculate the area of the quad
         double elem_area = get_area_quad(elem_gid, node_coords, elem_node_gids);
         // true volume uses the elem_vol
-        
-        
+
         // get the vertex velocities and node coordinate for the elem
-        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-        
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
             // Get node gid
             size_t node_gid = elem_node_gids(node_lid);
-    
+
             u(node_lid) = node_vel(1, node_gid, 0);
             v(node_lid) = node_vel(1, node_gid, 1);
-            
-            //r(node_lid) = node_coords(1, node_gid, 1); // true volume RZ
-            
+
+            // r(node_lid) = node_coords(1, node_gid, 1); // true volume RZ
         } // end for
-    
-        
+
         // --- calculate the velocity divergence terms ---
-        double inverse_area = 1.0/elem_area;
-        
-        double mean_radius = elem_vol(elem_gid)/elem_area;
-        double elem_vel_r = 0.25*(v(0) + v(1) + v(2) + v(3));
-        
-        
+        double inverse_area = 1.0 / elem_area;
+
+        double mean_radius = elem_vol(elem_gid) / elem_area;
+        double elem_vel_r  = 0.25 * (v(0) + v(1) + v(2) + v(3));
+
         elem_div(elem_gid) = 0.0;
-        
+
         // x-dir
-        elem_div(elem_gid) += (u(0)*b_matrix(0,0)
-                             + u(1)*b_matrix(1,0)
-                             + u(2)*b_matrix(2,0)
-                             + u(3)*b_matrix(3,0))*inverse_area;
-        
-        // y-dir (i.e., r direction)
-        elem_div(elem_gid) += (v(0)*b_matrix(0,1)
-                             + v(1)*b_matrix(1,1)
-                             + v(2)*b_matrix(2,1)
-                             + v(3)*b_matrix(3,1))*inverse_area
-                             + elem_vel_r/mean_radius;  // + avg(u_R)/R
+        elem_div(elem_gid) += (u(0) * b_matrix(0, 0)
+                               + u(1) * b_matrix(1, 0)
+                               + u(2) * b_matrix(2, 0)
+                               + u(3) * b_matrix(3, 0)) * inverse_area;
 
+        // y-dir (i.e., r direction)
+        elem_div(elem_gid) += (v(0) * b_matrix(0, 1)
+                               + v(1) * b_matrix(1, 1)
+                               + v(2) * b_matrix(2, 1)
+                               + v(3) * b_matrix(3, 1)) * inverse_area
+                              + elem_vel_r / mean_radius; // + avg(u_R)/R
     });  // end parallel for over elem_gid
-    
+
     return;
-    
 } // end subroutine
 
-
 // The velocity gradient can be decomposed into symmetric and antisymmetric tensors
 // L = vel_grad
 // D = sym(L)
 // W = antisym(L)
 KOKKOS_FUNCTION
-void decompose_vel_grad(ViewCArrayKokkos <double> &D_tensor,
-                        ViewCArrayKokkos <double> &W_tensor,
-                        const ViewCArrayKokkos <double> &vel_grad,
-                        const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                        const size_t elem_gid,
-                        const DViewCArrayKokkos <double> &node_coords,
-                        const DViewCArrayKokkos <double> &node_vel,
-                        const double vol
-                        ){
-    
-    
+void decompose_vel_grad(ViewCArrayKokkos<double>&        D_tensor,
+                        ViewCArrayKokkos<double>&        W_tensor,
+                        const ViewCArrayKokkos<double>&  vel_grad,
+                        const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                        const size_t                     elem_gid,
+                        const DViewCArrayKokkos<double>& node_coords,
+                        const DViewCArrayKokkos<double>& node_vel,
+                        const double                     vol
+                        )
+{
     // --- Calculate the velocity gradient ---
-    
+
     const size_t num_dims = 3;
-    
+
     // initialize to zero
-    for(size_t i=0; i<num_dims; i++){
-        for(size_t j=0; j<num_dims; j++){
-            D_tensor(i,j) = 0.0;
-            W_tensor(i,j) = 0.0;
+    for (size_t i = 0; i < num_dims; i++)
+    {
+        for (size_t j = 0; j < num_dims; j++)
+        {
+            D_tensor(i, j) = 0.0;
+            W_tensor(i, j) = 0.0;
         }
     } // end for
-    
-    for(size_t i=0; i<num_dims; i++){
-        for(size_t j=0; j<num_dims; j++){
-            D_tensor(i,j) = 0.5*(vel_grad(i,j) + vel_grad(j,i));
-            W_tensor(i,j) = 0.5*(vel_grad(i,j) - vel_grad(j,i));
+
+    for (size_t i = 0; i < num_dims; i++)
+    {
+        for (size_t j = 0; j < num_dims; j++)
+        {
+            D_tensor(i, j) = 0.5 * (vel_grad(i, j) + vel_grad(j, i));
+            W_tensor(i, j) = 0.5 * (vel_grad(i, j) - vel_grad(j, i));
         }
     } // end for
-    
+
     return;
-    
 } // end function to calculate D and W
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/outputs.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/outputs.cpp
old mode 100755
new mode 100644
index 8fa7b8738..8ef21e67c
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/outputs.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/outputs.cpp
@@ -4,25 +4,22 @@
 #include <cstring>
 #include <sys/stat.h>
 
-
-
-
-void write_outputs (const mesh_t &mesh,
-                    DViewCArrayKokkos <double> &node_coords,
-                    DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &node_mass,
-                    DViewCArrayKokkos <double> &elem_den,
-                    DViewCArrayKokkos <double> &elem_pres,
-                    DViewCArrayKokkos <double> &elem_stress,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    DViewCArrayKokkos <double> &elem_sie,
-                    DViewCArrayKokkos <double> &elem_vol,
-                    DViewCArrayKokkos <double> &elem_mass,
-                    DViewCArrayKokkos <size_t> &elem_mat_id,
-                    CArray <double> &graphics_times,
-                    size_t &graphics_id,
-                    const double time_value){
-    
+void write_outputs(const mesh_t&              mesh,
+                   DViewCArrayKokkos<double>& node_coords,
+                   DViewCArrayKokkos<double>& node_vel,
+                   DViewCArrayKokkos<double>& node_mass,
+                   DViewCArrayKokkos<double>& elem_den,
+                   DViewCArrayKokkos<double>& elem_pres,
+                   DViewCArrayKokkos<double>& elem_stress,
+                   DViewCArrayKokkos<double>& elem_sspd,
+                   DViewCArrayKokkos<double>& elem_sie,
+                   DViewCArrayKokkos<double>& elem_vol,
+                   DViewCArrayKokkos<double>& elem_mass,
+                   DViewCArrayKokkos<size_t>& elem_mat_id,
+                   CArray<double>&            graphics_times,
+                   size_t&                    graphics_id,
+                   const double               time_value)
+{
     // ---------------------------------------------------------------------
     //   t=tval ensight and state output
     // ---------------------------------------------------------------------
@@ -34,12 +31,12 @@ void write_outputs (const mesh_t &mesh,
     elem_vol.update_host();
     elem_mass.update_host();
     elem_mat_id.update_host();
-    
+
     node_coords.update_host();
     node_vel.update_host();
     node_mass.update_host();
     Kokkos::fence();
-    
+
     // write out state file
     state_file(mesh,
                node_coords,
@@ -54,8 +51,7 @@ void write_outputs (const mesh_t &mesh,
                elem_mass,
                elem_mat_id,
                time_value);
-    
-    
+
     // write out ensight file
     ensight(mesh,
             node_coords,
@@ -72,100 +68,97 @@ void write_outputs (const mesh_t &mesh,
             graphics_times,
             graphics_id,
             time_value);
-    
+
     return;
-    
 } // end of write outputs
 
-
 // -----------------------------------------------------------------------------
 // This function write outs the data to an ensight case file
-//------------------------------------------------------------------------------
-
-void ensight( const mesh_t &mesh,
-              const DViewCArrayKokkos <double> &node_coords,
-              const DViewCArrayKokkos <double> &node_vel,
-              const DViewCArrayKokkos <double> &node_mass,
-              const DViewCArrayKokkos <double> &elem_den,
-              const DViewCArrayKokkos <double> &elem_pres,
-              const DViewCArrayKokkos <double> &elem_stress,
-              const DViewCArrayKokkos <double> &elem_sspd,
-              const DViewCArrayKokkos <double> &elem_sie,
-              const DViewCArrayKokkos <double> &elem_vol,
-              const DViewCArrayKokkos <double> &elem_mass,
-              const DViewCArrayKokkos <size_t> &elem_mat_id,
-              CArray <double> &graphics_times,
-              size_t &graphics_id,
-              const double time_value) {
-
+// ------------------------------------------------------------------------------
+
+void ensight(const mesh_t&                    mesh,
+             const DViewCArrayKokkos<double>& node_coords,
+             const DViewCArrayKokkos<double>& node_vel,
+             const DViewCArrayKokkos<double>& node_mass,
+             const DViewCArrayKokkos<double>& elem_den,
+             const DViewCArrayKokkos<double>& elem_pres,
+             const DViewCArrayKokkos<double>& elem_stress,
+             const DViewCArrayKokkos<double>& elem_sspd,
+             const DViewCArrayKokkos<double>& elem_sie,
+             const DViewCArrayKokkos<double>& elem_vol,
+             const DViewCArrayKokkos<double>& elem_mass,
+             const DViewCArrayKokkos<size_t>& elem_mat_id,
+             CArray<double>&                  graphics_times,
+             size_t&                          graphics_id,
+             const double                     time_value)
+{
     const int num_scalar_vars = 9;
-    const int num_vec_vars = 2;
-
+    const int num_vec_vars    = 2;
 
     std::string name_tmp;
     name_tmp = "Outputs_SGH";
 
-    char * name = new char [name_tmp.length()+1];
-    std::strcpy (name, name_tmp.c_str());
-
+    char* name = new char [name_tmp.length() + 1];
+    std::strcpy(name, name_tmp.c_str());
 
     const char scalar_var_names[num_scalar_vars][15] = {
-        "den" ,"pres","sie","vol", "mass", "sspd", "speed", "mat_id", "elem_switch"
+        "den", "pres", "sie", "vol", "mass", "sspd", "speed", "mat_id", "elem_switch"
     };
 
     const char vec_var_names[num_vec_vars][15] = {
         "pos", "vel"
     };
-    
+
     // short hand
     const size_t num_nodes = mesh.num_nodes;
     const size_t num_elems = mesh.num_elems;
     const size_t num_dims  = mesh.num_dims;
 
     // save the cell state to an array for exporting to graphics files
-    auto elem_fields = CArray <double> (num_elems, num_scalar_vars);
-    int elem_switch = 1;
-
+    auto elem_fields = CArray<double>(num_elems, num_scalar_vars);
+    int  elem_switch = 1;
 
-    DCArrayKokkos <double> speed(num_elems);
+    DCArrayKokkos<double> speed(num_elems);
     FOR_ALL(elem_gid, 0, num_elems, {
-            
         double elem_vel[3]; // note:initialization with a list won't work
         elem_vel[0] = 0.0;
         elem_vel[1] = 0.0;
         elem_vel[2] = 0.0;
         // get the coordinates of the element center
-        for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
+        for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+        {
             elem_vel[0] += node_vel(1, mesh.nodes_in_elem(elem_gid, node_lid), 0);
             elem_vel[1] += node_vel(1, mesh.nodes_in_elem(elem_gid, node_lid), 1);
-            if (mesh.num_dims == 3){
+            if (mesh.num_dims == 3)
+            {
                 elem_vel[2] += node_vel(1, mesh.nodes_in_elem(elem_gid, node_lid), 2);
             }
-            else {
+            else
+            {
                 elem_vel[2] = 0.0;
             }
         } // end loop over nodes in element
-        elem_vel[0] = elem_vel[0]/mesh.num_nodes_in_elem;
-        elem_vel[1] = elem_vel[1]/mesh.num_nodes_in_elem;
-        elem_vel[2] = elem_vel[2]/mesh.num_nodes_in_elem;
+        elem_vel[0] = elem_vel[0] / mesh.num_nodes_in_elem;
+        elem_vel[1] = elem_vel[1] / mesh.num_nodes_in_elem;
+        elem_vel[2] = elem_vel[2] / mesh.num_nodes_in_elem;
 
         double speed_sqrd = 0.0;
-        for (int dim=0; dim<num_dims; dim++){
-           speed_sqrd += elem_vel[dim]*elem_vel[dim];
+        for (int dim = 0; dim < num_dims; dim++)
+        {
+            speed_sqrd += elem_vel[dim] * elem_vel[dim];
         }
         speed(elem_gid) = sqrt(speed_sqrd);
     }); // end parallel for
     speed.update_host();
 
-
     // save the output scale fields to a single 2D array
-    for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++){
-        
+    for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+    {
         // save outputs
-        elem_fields(elem_gid, 0) = elem_den.host(elem_gid);   
-        elem_fields(elem_gid, 1) = elem_pres.host(elem_gid); 
-        elem_fields(elem_gid, 2) = elem_sie.host(1, elem_gid);  
-        elem_fields(elem_gid, 3) = elem_vol.host(elem_gid); 
+        elem_fields(elem_gid, 0) = elem_den.host(elem_gid);
+        elem_fields(elem_gid, 1) = elem_pres.host(elem_gid);
+        elem_fields(elem_gid, 2) = elem_sie.host(1, elem_gid);
+        elem_fields(elem_gid, 3) = elem_vol.host(elem_gid);
         elem_fields(elem_gid, 4) = elem_mass.host(elem_gid);
         elem_fields(elem_gid, 5) = elem_sspd.host(elem_gid);
         elem_fields(elem_gid, 6) = speed.host(elem_gid);
@@ -173,316 +166,325 @@ void ensight( const mesh_t &mesh,
         elem_fields(elem_gid, 8) = (double)elem_switch;
         elem_switch *= -1;
     } // end for elements
-    
-    
+
     // save the vertex vector fields to an array for exporting to graphics files
-    CArray <double> vec_fields(num_nodes, num_vec_vars, 3);
+    CArray<double> vec_fields(num_nodes, num_vec_vars, 3);
 
-    for (size_t node_gid = 0; node_gid < num_nodes; node_gid++){
-        
+    for (size_t node_gid = 0; node_gid < num_nodes; node_gid++)
+    {
         // position, var 0
-        vec_fields(node_gid,0,0) = node_coords.host(1, node_gid, 0); 
-        vec_fields(node_gid,0,1) = node_coords.host(1, node_gid, 1); 
-        if(num_dims == 2){ 
-            vec_fields(node_gid,0,2) = 0.0;
-        } else{
-            vec_fields(node_gid,0,2) = node_coords.host(1, node_gid, 2); 
+        vec_fields(node_gid, 0, 0) = node_coords.host(1, node_gid, 0);
+        vec_fields(node_gid, 0, 1) = node_coords.host(1, node_gid, 1);
+        if (num_dims == 2)
+        {
+            vec_fields(node_gid, 0, 2) = 0.0;
+        }
+        else
+        {
+            vec_fields(node_gid, 0, 2) = node_coords.host(1, node_gid, 2);
         }
-        
+
         // position, var 1
         vec_fields(node_gid, 1, 0) = node_vel.host(1, node_gid, 0);
         vec_fields(node_gid, 1, 1) = node_vel.host(1, node_gid, 1);
-        if(num_dims == 2){ 
-            vec_fields(node_gid,1,2) = 0.0;
-        } else{
+        if (num_dims == 2)
+        {
+            vec_fields(node_gid, 1, 2) = 0.0;
+        }
+        else
+        {
             vec_fields(node_gid, 1, 2) = node_vel.host(1, node_gid, 2);
         }
-
     } // end for loop over vertices
-     
-    
+
     //  ---------------------------------------------------------------------------
     //  Setup of file and directoring for exporting
     //  ---------------------------------------------------------------------------
-    FILE *out[20];   // the output files that are written to
-    char filename[128];
-    
+    FILE* out[20];   // the output files that are written to
+    char  filename[128];
+
     struct stat st;
-    
-    if(stat("ensight",&st) != 0)
+
+    if (stat("ensight", &st) != 0)
+    {
         system("mkdir ensight");
-    
-    if(stat("ensight/data",&st) != 0)
+    }
+
+    if (stat("ensight/data", &st) != 0)
+    {
         system("mkdir ensight/data");
-    
-    
+    }
+
     //  ---------------------------------------------------------------------------
     //  Write the Geometry file
     //  ---------------------------------------------------------------------------
     sprintf(filename, "ensight/data/%s.%05lu.geo", name, graphics_id);
     // filename has the full string
-    
+
     out[0] = fopen(filename, "w");
-    
-    
-    fprintf(out[0],"A graphics dump by Fierro \n");
-    fprintf(out[0],"%s","EnSight Gold geometry\n");
-    fprintf(out[0],"%s","node id assign\n");
-    fprintf(out[0],"%s","element id assign\n");
-    
-    fprintf(out[0],"part\n");
-    fprintf(out[0],"%10d\n", 1);
-    fprintf(out[0],"Mesh\n");
-
-    
+
+    fprintf(out[0], "A graphics dump by Fierro \n");
+    fprintf(out[0], "%s", "EnSight Gold geometry\n");
+    fprintf(out[0], "%s", "node id assign\n");
+    fprintf(out[0], "%s", "element id assign\n");
+
+    fprintf(out[0], "part\n");
+    fprintf(out[0], "%10d\n", 1);
+    fprintf(out[0], "Mesh\n");
+
     // --- vertices ---
-    fprintf(out[0],"coordinates\n");
-    fprintf(out[0],"%10lu\n",num_nodes);
-    
+    fprintf(out[0], "coordinates\n");
+    fprintf(out[0], "%10lu\n", num_nodes);
+
     // write all components of the point coordinates
-    for (int node_gid = 0; node_gid < num_nodes; node_gid++){
-        fprintf(out[0],"%12.5e\n",node_coords.host(1, node_gid, 0));
+    for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+    {
+        fprintf(out[0], "%12.5e\n", node_coords.host(1, node_gid, 0));
     }
-    
-    for (int node_gid = 0; node_gid < num_nodes; node_gid++){
-        fprintf(out[0],"%12.5e\n",node_coords.host(1, node_gid, 1));
+
+    for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+    {
+        fprintf(out[0], "%12.5e\n", node_coords.host(1, node_gid, 1));
     }
-    
-    for (int node_gid = 0; node_gid < num_nodes; node_gid++){
-        if(num_dims==3){
-            fprintf(out[0],"%12.5e\n",node_coords.host(1, node_gid, 2));
+
+    for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+    {
+        if (num_dims == 3)
+        {
+            fprintf(out[0], "%12.5e\n", node_coords.host(1, node_gid, 2));
         }
         else
         {
-            fprintf(out[0],"%12.5e\n",0.0);
+            fprintf(out[0], "%12.5e\n", 0.0);
         }
     }
-    
-    
+
     // --- elements ---
-    if(num_dims==3){
-        fprintf(out[0],"hexa8\n");
+    if (num_dims == 3)
+    {
+        fprintf(out[0], "hexa8\n");
     }
-    else {
-        fprintf(out[0],"quad4\n");
+    else
+    {
+        fprintf(out[0], "quad4\n");
     }
-    fprintf(out[0],"%10lu\n",num_elems);
+    fprintf(out[0], "%10lu\n", num_elems);
 
-    
     // write all global point numbers for this cell
-    for (int elem_gid = 0; elem_gid < num_elems; elem_gid++) {
-        for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
-
-            fprintf(out[0],"%10lu\t",mesh.nodes_in_elem.host(elem_gid, node_lid) + 1); // note: node_gid starts at 1
+    for (int elem_gid = 0; elem_gid < num_elems; elem_gid++)
+    {
+        for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+        {
+            fprintf(out[0], "%10lu\t", mesh.nodes_in_elem.host(elem_gid, node_lid) + 1); // note: node_gid starts at 1
         }
-        fprintf(out[0],"\n");
+        fprintf(out[0], "\n");
     }
-    
+
     fclose(out[0]);
-    
- 
+
     // ---------------------------------------------------------------------------
     // Write the Scalar variable files
     // ---------------------------------------------------------------------------
-    
+
     // ensight_vars = (den, pres,...)
-    for (int var=0; var<num_scalar_vars; var++){
-        
+    for (int var = 0; var < num_scalar_vars; var++)
+    {
         // write a scalar value
-        sprintf(filename,"ensight/data/%s.%05lu.%s", name, graphics_id, scalar_var_names[var]);
-
-        out[0]=fopen(filename,"w");
-        
-        fprintf(out[0],"Per_elem scalar values\n");
-        fprintf(out[0],"part\n");
-        fprintf(out[0],"%10d\n",1);
-        if(num_dims==3){
-            fprintf(out[0],"hexa8\n");
+        sprintf(filename, "ensight/data/%s.%05lu.%s", name, graphics_id, scalar_var_names[var]);
+
+        out[0] = fopen(filename, "w");
+
+        fprintf(out[0], "Per_elem scalar values\n");
+        fprintf(out[0], "part\n");
+        fprintf(out[0], "%10d\n", 1);
+        if (num_dims == 3)
+        {
+            fprintf(out[0], "hexa8\n");
         }
-        else {
-            fprintf(out[0],"quad4\n");
+        else
+        {
+            fprintf(out[0], "quad4\n");
         }
-        
-        for (int elem_id=0; elem_id<num_elems; elem_id++) {
-            fprintf(out[0],"%12.5e\n",elem_fields(elem_id, var));
+
+        for (int elem_id = 0; elem_id < num_elems; elem_id++)
+        {
+            fprintf(out[0], "%12.5e\n", elem_fields(elem_id, var));
         }
-        
+
         fclose(out[0]);
-        
     } // end for var
-    
-    
-    
+
     //  ---------------------------------------------------------------------------
     //  Write the Vector variable files
     //  ---------------------------------------------------------------------------
-    
+
     // ensight vector vars = (position, velocity, force)
-    for (int var=0; var<num_vec_vars; var++){
-        
-        sprintf(filename,"ensight/data/%s.%05lu.%s", name, graphics_id, vec_var_names[var]);
-        
-        out[0]=fopen(filename,"w");
+    for (int var = 0; var < num_vec_vars; var++)
+    {
+        sprintf(filename, "ensight/data/%s.%05lu.%s", name, graphics_id, vec_var_names[var]);
+
+        out[0] = fopen(filename, "w");
         // fprintf(out[0],"Per_node vector values\n");
         // fprintf(out[0],"part\n");
         // fprintf(out[0],"%10d \n",1);
         // fprintf(out[0],"hexa8\n"); // WARNING, maybe bug here?
 
-        fprintf(out[0],"Per_node vector values\n");
-        fprintf(out[0],"part\n");
-        fprintf(out[0],"%10d\n",1);
-        fprintf(out[0],"block\n");
-        
-        for (int node_gid=0; node_gid<num_nodes; node_gid++){
-            fprintf(out[0],"%12.5e\n",vec_fields(node_gid, var, 0));
+        fprintf(out[0], "Per_node vector values\n");
+        fprintf(out[0], "part\n");
+        fprintf(out[0], "%10d\n", 1);
+        fprintf(out[0], "block\n");
+
+        for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+        {
+            fprintf(out[0], "%12.5e\n", vec_fields(node_gid, var, 0));
         }
-        
-        for (int node_gid=0; node_gid<num_nodes; node_gid++){
-            fprintf(out[0],"%12.5e\n",vec_fields(node_gid, var, 1));
+
+        for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+        {
+            fprintf(out[0], "%12.5e\n", vec_fields(node_gid, var, 1));
         }
-        
-        for (int node_gid=0; node_gid<num_nodes; node_gid++){
-            fprintf(out[0],"%12.5e\n",vec_fields(node_gid, var, 2));
+
+        for (int node_gid = 0; node_gid < num_nodes; node_gid++)
+        {
+            fprintf(out[0], "%12.5e\n", vec_fields(node_gid, var, 2));
         }
-        
-        fclose(out[0]);
 
+        fclose(out[0]);
     } // end for var
-    
 
-    
     // ---------------------------------------------------------------------------
     // Write the case file
     // ---------------------------------------------------------------------------
-    
-    sprintf(filename,"ensight/%s.case",name);
-    out[0]=fopen(filename,"w");
-    
-    fprintf(out[0],"FORMAT\n");
-    fprintf(out[0],"type: ensight gold\n");
-    fprintf(out[0],"GEOMETRY\n");
-    
-    sprintf(filename,"model: data/%s.*****.geo\n",name);
-    fprintf(out[0],"%s",filename);
-    fprintf(out[0],"VARIABLE\n");
-    
-    for (int var=0; var<num_scalar_vars; var++){
+
+    sprintf(filename, "ensight/%s.case", name);
+    out[0] = fopen(filename, "w");
+
+    fprintf(out[0], "FORMAT\n");
+    fprintf(out[0], "type: ensight gold\n");
+    fprintf(out[0], "GEOMETRY\n");
+
+    sprintf(filename, "model: data/%s.*****.geo\n", name);
+    fprintf(out[0], "%s", filename);
+    fprintf(out[0], "VARIABLE\n");
+
+    for (int var = 0; var < num_scalar_vars; var++)
+    {
         sprintf(filename,
                 "scalar per element: %s data/%s.*****.%s\n",
                 scalar_var_names[var], name, scalar_var_names[var]);
-        fprintf(out[0],"%s",filename);
+        fprintf(out[0], "%s", filename);
     }
-    
-    for (int var=0; var<num_vec_vars; var++){
+
+    for (int var = 0; var < num_vec_vars; var++)
+    {
         sprintf(filename,
                 "vector per node: %s data/%s.*****.%s\n",
                 vec_var_names[var], name, vec_var_names[var]);
-        fprintf(out[0],"%s",filename);
+        fprintf(out[0], "%s", filename);
     }
-    
-    fprintf(out[0],"TIME\n");
-    fprintf(out[0],"time set: 1\n");
-    fprintf(out[0],"number of steps: %4lu\n",graphics_id+1);
-    fprintf(out[0],"filename start number: 0\n");
-    fprintf(out[0],"filename increment: 1\n");
-    fprintf(out[0],"time values: \n");
-    
-    graphics_times(graphics_id)=time_value;
-    
-    for (int i=0;i<=graphics_id;i++) {
-        fprintf(out[0],"%12.5e\n",graphics_times(i));
+
+    fprintf(out[0], "TIME\n");
+    fprintf(out[0], "time set: 1\n");
+    fprintf(out[0], "number of steps: %4lu\n", graphics_id + 1);
+    fprintf(out[0], "filename start number: 0\n");
+    fprintf(out[0], "filename increment: 1\n");
+    fprintf(out[0], "time values: \n");
+
+    graphics_times(graphics_id) = time_value;
+
+    for (int i = 0; i <= graphics_id; i++)
+    {
+        fprintf(out[0], "%12.5e\n", graphics_times(i));
     }
     fclose(out[0]);
-    
-    
+
     // ---------------------------------------------------------------------------
     // Done writing the graphics dump
     // ---------------------------------------------------------------------------
-    
+
     // increment graphics id counter
     graphics_id++;
-    
+
     delete[] name;
-    
+
     return;
-    
 } // end of Ensight function
 
-
-
-void state_file( const mesh_t &mesh,
-                 const DViewCArrayKokkos <double> &node_coords,
-                 const DViewCArrayKokkos <double> &node_vel,
-                 const DViewCArrayKokkos <double> &node_mass,
-                 const DViewCArrayKokkos <double> &elem_den,
-                 const DViewCArrayKokkos <double> &elem_pres,
-                 const DViewCArrayKokkos <double> &elem_stress,
-                 const DViewCArrayKokkos <double> &elem_sspd,
-                 const DViewCArrayKokkos <double> &elem_sie,
-                 const DViewCArrayKokkos <double> &elem_vol,
-                 const DViewCArrayKokkos <double> &elem_mass,
-                 const DViewCArrayKokkos <size_t> &elem_mat_id,
-                 const double time_value ) {
-    
+void state_file(const mesh_t&                    mesh,
+                const DViewCArrayKokkos<double>& node_coords,
+                const DViewCArrayKokkos<double>& node_vel,
+                const DViewCArrayKokkos<double>& node_mass,
+                const DViewCArrayKokkos<double>& elem_den,
+                const DViewCArrayKokkos<double>& elem_pres,
+                const DViewCArrayKokkos<double>& elem_stress,
+                const DViewCArrayKokkos<double>& elem_sspd,
+                const DViewCArrayKokkos<double>& elem_sie,
+                const DViewCArrayKokkos<double>& elem_vol,
+                const DViewCArrayKokkos<double>& elem_mass,
+                const DViewCArrayKokkos<size_t>& elem_mat_id,
+                const double                     time_value)
+{
     struct stat st;
-    
-    if(stat("state",&st) != 0)
+
+    if (stat("state", &st) != 0)
+    {
         system("mkdir state");
-    
+    }
+
     size_t num_dims = mesh.num_dims;
-    
+
     //  ---------------------------------------------------------------------------
     //  Setup of file and directoring for exporting
     //  ---------------------------------------------------------------------------
-    
+
     // output file
-    FILE *out_elem_state;  //element average state
-    char filename[128];
-    
+    FILE* out_elem_state;  // element average state
+    char  filename[128];
+
     sprintf(filename, "state/elem_state_t%6.5e.txt", time_value);
-    
+
     // output files
-    out_elem_state  = fopen(filename, "w");
+    out_elem_state = fopen(filename, "w");
 
     // write state dump
     fprintf(out_elem_state, "# state dump file\n");
     fprintf(out_elem_state, "# x  y  z  radius_2D  radius_3D  den  pres  sie  sspd  vol  mass \n");
-    
 
-    
     // write out values for the elem
-    for (size_t elem_gid=0; elem_gid<mesh.num_elems; elem_gid++){
-        
+    for (size_t elem_gid = 0; elem_gid < mesh.num_elems; elem_gid++)
+    {
         double elem_coords[3];
         elem_coords[0] = 0.0;
         elem_coords[1] = 0.0;
         elem_coords[2] = 0.0;
-	
-	
+
         // get the coordinates of the element center
-        for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
+        for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+        {
             elem_coords[0] += node_coords.host(1, mesh.nodes_in_elem.host(elem_gid, node_lid), 0);
             elem_coords[1] += node_coords.host(1, mesh.nodes_in_elem.host(elem_gid, node_lid), 1);
-            if(num_dims == 3){
+            if (num_dims == 3)
+            {
                 elem_coords[2] += node_coords.host(1, mesh.nodes_in_elem.host(elem_gid, node_lid), 2);
             }
-            else {
+            else
+            {
                 elem_coords[2] = 0.0;
             }
         } // end loop over nodes in element
-	
-        elem_coords[0] = elem_coords[0]/mesh.num_nodes_in_elem;
-        elem_coords[1] = elem_coords[1]/mesh.num_nodes_in_elem;
-        elem_coords[2] = elem_coords[2]/mesh.num_nodes_in_elem;
-        
-        double rad2 = sqrt(elem_coords[0]*elem_coords[0] +
-                           elem_coords[1]*elem_coords[1]);
-        
-        double rad3 = sqrt(elem_coords[0]*elem_coords[0] +
-                           elem_coords[1]*elem_coords[1] +
-                           elem_coords[2]*elem_coords[2]);
-        
-        fprintf( out_elem_state,"%f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t \n",
+
+        elem_coords[0] = elem_coords[0] / mesh.num_nodes_in_elem;
+        elem_coords[1] = elem_coords[1] / mesh.num_nodes_in_elem;
+        elem_coords[2] = elem_coords[2] / mesh.num_nodes_in_elem;
+
+        double rad2 = sqrt(elem_coords[0] * elem_coords[0] +
+                           elem_coords[1] * elem_coords[1]);
+
+        double rad3 = sqrt(elem_coords[0] * elem_coords[0] +
+                           elem_coords[1] * elem_coords[1] +
+                           elem_coords[2] * elem_coords[2]);
+
+        fprintf(out_elem_state, "%f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t %f\t \n",
                  elem_coords[0],
                  elem_coords[1],
                  elem_coords[2],
@@ -490,24 +492,14 @@ void state_file( const mesh_t &mesh,
                  rad3,
                  elem_den.host(elem_gid),
                  elem_pres.host(elem_gid),
-                 elem_sie.host(1,elem_gid),
+                 elem_sie.host(1, elem_gid),
                  elem_sspd.host(elem_gid),
                  elem_vol.host(elem_gid),
                  elem_mass.host(elem_gid) );
-	
-        
-    }; // end for
-    
-    
+    }
+    ;  // end for
+
     fclose(out_elem_state);
- 
+
     return;
-    
 }; // end of state output
-
-
-
-
-
-
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/properties.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/properties.cpp
index 1f4325f82..d6252cad0 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/properties.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/properties.cpp
@@ -1,71 +1,64 @@
-                                                           
+
 // -----------------------------------------------------------------------------
 // This calls the models to update state
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "state.h"
 #include "mesh.h"
 
-
-void update_state(const CArrayKokkos <material_t> &material,
-                  const mesh_t &mesh,
-                  const DViewCArrayKokkos <double> &node_coords,
-                  const DViewCArrayKokkos <double> &node_vel,
-                  DViewCArrayKokkos <double> &elem_den,
-                  DViewCArrayKokkos <double> &elem_pres,
-                  DViewCArrayKokkos <double> &elem_stress,
-                  DViewCArrayKokkos <double> &elem_sspd,
-                  const DViewCArrayKokkos <double> &elem_sie,
-                  const DViewCArrayKokkos <double> &elem_vol,
-                  const DViewCArrayKokkos <double> &elem_mass,
-                  const DViewCArrayKokkos <size_t> &elem_mat_id,
-                  const DViewCArrayKokkos <double> &elem_statev,
-                  const double dt,
-                  const double rk_alpha
-                  ){
-
-
-    
-    
+void update_state(const CArrayKokkos<material_t>&  material,
+                  const mesh_t&                    mesh,
+                  const DViewCArrayKokkos<double>& node_coords,
+                  const DViewCArrayKokkos<double>& node_vel,
+                  DViewCArrayKokkos<double>&       elem_den,
+                  DViewCArrayKokkos<double>&       elem_pres,
+                  DViewCArrayKokkos<double>&       elem_stress,
+                  DViewCArrayKokkos<double>&       elem_sspd,
+                  const DViewCArrayKokkos<double>& elem_sie,
+                  const DViewCArrayKokkos<double>& elem_vol,
+                  const DViewCArrayKokkos<double>& elem_mass,
+                  const DViewCArrayKokkos<size_t>& elem_mat_id,
+                  const DViewCArrayKokkos<double>& elem_statev,
+                  const double                     dt,
+                  const double                     rk_alpha
+                  )
+{
     // loop over all the elements in the mesh
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-        
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_dims = mesh.num_dims;
         const size_t num_nodes_in_elem = mesh.num_nodes_in_elem;
 
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
+
         // --- Density ---
-        elem_den(elem_gid) = elem_mass(elem_gid)/elem_vol(elem_gid);
-        
+        elem_den(elem_gid) = elem_mass(elem_gid) / elem_vol(elem_gid);
+
         size_t mat_id = elem_mat_id(elem_gid);
-        
-        
+
         // --- Stress ---
         // hyper elastic plastic model
-        if(material(mat_id).strength_type == model::hyper){
-
+        if (material(mat_id).strength_type == model::hyper)
+        {
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
-            
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
+
             // --- Density ---
-            elem_den(elem_gid) = elem_mass(elem_gid)/elem_vol(elem_gid);
-            
+            elem_den(elem_gid) = elem_mass(elem_gid) / elem_vol(elem_gid);
+
             // corner area normals
             double area_array[24];
-            ViewCArrayKokkos <double> area(area_array, num_nodes_in_elem, num_dims);
-            
+            ViewCArrayKokkos<double> area(area_array, num_nodes_in_elem, num_dims);
+
             // velocity gradient
             double vel_grad_array[9];
-            ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
-            
+            ViewCArrayKokkos<double> vel_grad(vel_grad_array, num_dims, num_dims);
+
             // get the B matrix which are the OUTWARD corner area normals
             get_bmatrix(area,
                         elem_gid,
                         node_coords,
                         elem_node_gids);
-        
-            
+
             // --- Calculate the velocity gradient ---
             get_velgrad(vel_grad,
                         elem_node_gids,
@@ -73,8 +66,7 @@ void update_state(const CArrayKokkos <material_t> &material,
                         area,
                         elem_vol(elem_gid),
                         elem_gid);
-            
-            
+
             // --- call strength model ---
             material(mat_id).strength_model(elem_pres,
                                             elem_stress,
@@ -91,10 +83,8 @@ void update_state(const CArrayKokkos <material_t> &material,
                                             elem_vol(elem_gid),
                                             dt,
                                             rk_alpha);
-            
         } // end logical on hyper strength model
-        
-        
+
         // --- Pressure ---
         material(mat_id).eos_model(elem_pres,
                                    elem_stress,
@@ -103,76 +93,67 @@ void update_state(const CArrayKokkos <material_t> &material,
                                    elem_statev,
                                    elem_sspd,
                                    elem_den(elem_gid),
-                                   elem_sie(1,elem_gid));
-        
-        
+                                   elem_sie(1, elem_gid));
     }); // end parallel for
     Kokkos::fence();
-    
+
     return;
-    
 } // end method to update state
 
-
-
-void update_state2D(const CArrayKokkos <material_t> &material,
-                    const mesh_t &mesh,
-                    const DViewCArrayKokkos <double> &node_coords,
-                    const DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &elem_den,
-                    DViewCArrayKokkos <double> &elem_pres,
-                    DViewCArrayKokkos <double> &elem_stress,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    const DViewCArrayKokkos <double> &elem_sie,
-                    const DViewCArrayKokkos <double> &elem_vol,
-                    const DViewCArrayKokkos <double> &elem_mass,
-                    const DViewCArrayKokkos <size_t> &elem_mat_id,
-                    const DViewCArrayKokkos <double> &elem_statev,
-                    const double dt,
-                    const double rk_alpha
-                    ){
-    
-    
+void update_state2D(const CArrayKokkos<material_t>&  material,
+                    const mesh_t&                    mesh,
+                    const DViewCArrayKokkos<double>& node_coords,
+                    const DViewCArrayKokkos<double>& node_vel,
+                    DViewCArrayKokkos<double>&       elem_den,
+                    DViewCArrayKokkos<double>&       elem_pres,
+                    DViewCArrayKokkos<double>&       elem_stress,
+                    DViewCArrayKokkos<double>&       elem_sspd,
+                    const DViewCArrayKokkos<double>& elem_sie,
+                    const DViewCArrayKokkos<double>& elem_vol,
+                    const DViewCArrayKokkos<double>& elem_mass,
+                    const DViewCArrayKokkos<size_t>& elem_mat_id,
+                    const DViewCArrayKokkos<double>& elem_statev,
+                    const double                     dt,
+                    const double                     rk_alpha
+                    )
+{
     // loop over all the elements in the mesh
-    FOR_ALL (elem_gid, 0, mesh.num_elems, {
-        
+    FOR_ALL(elem_gid, 0, mesh.num_elems, {
         const size_t num_dims = mesh.num_dims;
         const size_t num_nodes_in_elem = mesh.num_nodes_in_elem;
 
         // cut out the node_gids for this element
-        ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
-        
+        ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
+
         // --- Density ---
-        elem_den(elem_gid) = elem_mass(elem_gid)/elem_vol(elem_gid);
-        
+        elem_den(elem_gid) = elem_mass(elem_gid) / elem_vol(elem_gid);
+
         size_t mat_id = elem_mat_id(elem_gid);
-        
-        
+
         // --- Stress ---
         // hyper elastic plastic model
-        if(material(mat_id).strength_type == model::hyper){
-
+        if (material(mat_id).strength_type == model::hyper)
+        {
             // cut out the node_gids for this element
-            ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
-            
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), num_nodes_in_elem);
+
             // --- Density ---
-            elem_den(elem_gid) = elem_mass(elem_gid)/elem_vol(elem_gid);
-            
+            elem_den(elem_gid) = elem_mass(elem_gid) / elem_vol(elem_gid);
+
             // corner area normals
             double area_array[8];
-            ViewCArrayKokkos <double> area(area_array, num_nodes_in_elem, num_dims);
-            
+            ViewCArrayKokkos<double> area(area_array, num_nodes_in_elem, num_dims);
+
             // velocity gradient
             double vel_grad_array[4];
-            ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
-            
+            ViewCArrayKokkos<double> vel_grad(vel_grad_array, num_dims, num_dims);
+
             // get the B matrix which are the OUTWARD corner area normals
             get_bmatrix(area,
                         elem_gid,
                         node_coords,
                         elem_node_gids);
-        
-            
+
             // --- Calculate the velocity gradient ---
             get_velgrad(vel_grad,
                         elem_node_gids,
@@ -180,8 +161,7 @@ void update_state2D(const CArrayKokkos <material_t> &material,
                         area,
                         elem_vol(elem_gid),
                         elem_gid);
-            
-            
+
             // --- call strength model ---
             material(mat_id).strength_model(elem_pres,
                                             elem_stress,
@@ -198,10 +178,8 @@ void update_state2D(const CArrayKokkos <material_t> &material,
                                             elem_vol(elem_gid),
                                             dt,
                                             rk_alpha);
-            
         } // end logical on hyper strength model
-        
-        
+
         // --- Pressure ---
         material(mat_id).eos_model(elem_pres,
                                    elem_stress,
@@ -210,15 +188,9 @@ void update_state2D(const CArrayKokkos <material_t> &material,
                                    elem_statev,
                                    elem_sspd,
                                    elem_den(elem_gid),
-                                   elem_sie(1,elem_gid));
-        
-        
+                                   elem_sie(1, elem_gid));
     }); // end parallel for
     Kokkos::fence();
-    
+
     return;
-    
 } // end method to update state
-
-
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/read_mesh.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/read_mesh.cpp
index 0035dc96d..e313da41e 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/read_mesh.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/read_mesh.cpp
@@ -1,115 +1,115 @@
 // -----------------------------------------------------------------------------
 // This code reads the mesh in different formats
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "mesh.h"
 #include "state.h"
 
-
-
 // -----------------------------------------------------------------------------
 // Reads an ensight .geo mesh file
-//------------------------------------------------------------------------------
-void read_mesh_ensight(char* MESH,
-                       mesh_t &mesh,
-                       node_t &node,
-                       elem_t &elem,
-                       corner_t &corner,
+// ------------------------------------------------------------------------------
+void read_mesh_ensight(char*        MESH,
+                       mesh_t&      mesh,
+                       node_t&      node,
+                       elem_t&      elem,
+                       corner_t&    corner,
                        const size_t num_dims,
-                       const size_t rk_num_bins){
-
+                       const size_t rk_num_bins)
+{
     const size_t rk_level = 0;
 
-	FILE *in;
-    char ch;
-    
-    
+    FILE* in;
+    char  ch;
+
     size_t num_nodes_in_elem = 1;
-    for (int dim=0; dim<num_dims; dim++){
+    for (int dim = 0; dim < num_dims; dim++)
+    {
         num_nodes_in_elem *= 2;
     }
 
+    // read the mesh    WARNING: assumes a .geo file
+    in = fopen(MESH, "r");
 
-    //read the mesh    WARNING: assumes a .geo file
-    in = fopen(MESH,"r");  
-    
-    //skip 8 lines
-    for (int j=1; j<=8;j++) {
-        int i=0;
-        while ((ch=(char)fgetc(in))!='\n') {
+    // skip 8 lines
+    for (int j = 1; j <= 8; j++)
+    {
+        int i = 0;
+        while ((ch = (char)fgetc(in)) != '\n') {
             i++;
-            //printf("%c",ch);
+            // printf("%c",ch);
         }
-        //printf("\n");
-    }  
-
-    
+        // printf("\n");
+    }
 
     // --- Read in the nodes in the mesh ---
-    
+
     size_t num_nodes = 0;
-    
-    fscanf(in,"%lu",&num_nodes);
-    printf("Num nodes read in %lu\n" , num_nodes);
-    
+
+    fscanf(in, "%lu", &num_nodes);
+    printf("Num nodes read in %lu\n", num_nodes);
+
     // intialize node variables
     mesh.initialize_nodes(num_nodes);
     node.initialize(rk_num_bins, num_nodes, num_dims);
-    
 
     // read the initial mesh coordinates
     // x-coords
-    for (int node_id = 0; node_id < mesh.num_nodes; node_id++) {
-        fscanf(in,"%le",&node.coords(rk_level,node_id, 0));
+    for (int node_id = 0; node_id < mesh.num_nodes; node_id++)
+    {
+        fscanf(in, "%le", &node.coords(rk_level, node_id, 0));
     }
 
     // y-coords
-    for (int node_id = 0; node_id < mesh.num_nodes; node_id++) {
-        fscanf(in,"%le",&node.coords(rk_level,node_id, 1));
-    }  
+    for (int node_id = 0; node_id < mesh.num_nodes; node_id++)
+    {
+        fscanf(in, "%le", &node.coords(rk_level, node_id, 1));
+    }
 
     // z-coords
-    for (int node_id = 0; node_id < mesh.num_nodes; node_id++) {
-        if(num_dims==3){
-            fscanf(in,"%le",&node.coords(rk_level,node_id, 2));
-        } else
+    for (int node_id = 0; node_id < mesh.num_nodes; node_id++)
+    {
+        if (num_dims == 3)
+        {
+            fscanf(in, "%le", &node.coords(rk_level, node_id, 2));
+        }
+        else
         {
             double dummy;
-            fscanf(in,"%le",&dummy);
-            
-            //printf("dummy = %le\n", dummy);
+            fscanf(in, "%le", &dummy);
+
+            // printf("dummy = %le\n", dummy);
         }
     } // end for
 
-    
     ch = (char)fgetc(in);
-    //printf("%c",ch);
+    // printf("%c",ch);
 
-    //skip 1 line
-    for (int j=1; j<=1; j++) {
-        int i=0;
-        while ((ch=(char)fgetc(in))!='\n') {
+    // skip 1 line
+    for (int j = 1; j <= 1; j++)
+    {
+        int i = 0;
+        while ((ch = (char)fgetc(in)) != '\n') {
             i++;
-            //printf("%c",ch);
+            // printf("%c",ch);
         }
-        //printf("\n");
+        // printf("\n");
     }
-    
 
     // --- read in the elements in the mesh ---
     size_t num_elem = 0;
-    
-    fscanf(in,"%lu",&num_elem);
-    printf("Num elements read in %lu\n" , num_elem);
+
+    fscanf(in, "%lu", &num_elem);
+    printf("Num elements read in %lu\n", num_elem);
 
     // intialize elem variables
     mesh.initialize_elems(num_elem, num_dims);
     elem.initialize(rk_num_bins, num_nodes, 3); // always 3D here, even for 2D
 
     // for each cell read the list of associated nodes
-    for (int elem_gid = 0; elem_gid < num_elem; elem_gid++) {
-        for (int node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
-            
-            fscanf(in,"%lu",&mesh.nodes_in_elem.host(elem_gid, node_lid));  // %d vs zu
+    for (int elem_gid = 0; elem_gid < num_elem; elem_gid++)
+    {
+        for (int node_lid = 0; node_lid < num_nodes_in_elem; node_lid++)
+        {
+            fscanf(in, "%lu", &mesh.nodes_in_elem.host(elem_gid, node_lid));  // %d vs zu
 
             // shift to start node index space at 0
             mesh.nodes_in_elem.host(elem_gid, node_lid) -= 1;
@@ -117,31 +117,26 @@ void read_mesh_ensight(char* MESH,
     }
     // update device side
     mesh.nodes_in_elem.update_device();
-    
-    
+
     // intialize corner variables
-    int num_corners = num_elem*mesh.num_nodes_in_elem;
+    int num_corners = num_elem * mesh.num_nodes_in_elem;
     mesh.initialize_corners(num_corners);
     corner.initialize(num_corners, num_dims);
 
-    
     // save the node coords to the current RK value
-    for (size_t node_gid=0; node_gid<num_nodes; node_gid++){
-        
-        for(int rk=1; rk<rk_num_bins; rk++){
-            for (int dim = 0; dim < num_dims; dim++){
+    for (size_t node_gid = 0; node_gid < num_nodes; node_gid++)
+    {
+        for (int rk = 1; rk < rk_num_bins; rk++)
+        {
+            for (int dim = 0; dim < num_dims; dim++)
+            {
                 node.coords(rk, node_gid, dim) = node.coords(0, node_gid, dim);
             } // end for dim
         } // end for rk
-        
     } // end parallel for
 
-
-
-
     // Close mesh input file
     fclose(in);
 
     return;
-    
 }
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/setup.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/setup.cpp
index b4e85a936..955f350ba 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/setup.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/setup.cpp
@@ -1,7 +1,7 @@
-                                                           
+
 // -----------------------------------------------------------------------------
 // This code to setup the ICs and BCs
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 
 #include <string.h>
 #include <sys/stat.h>
@@ -15,182 +15,168 @@
 #include <vector>
 #include <algorithm>
 
-
 #include "matar.h"
 #include "state.h"
 #include "mesh.h"
 
-
-
-//==============================================================================
+// ==============================================================================
 //   Functions to read voxel mesh
-//==============================================================================
-void user_voxel_init(DCArrayKokkos <size_t> &elem_values,
-                     double &dx,
-                     double &dy,
-                     double &dz,
-                     double &orig_x,
-                     double &orig_y,
-                     double &orig_z,
-                     size_t &voxel_num_i,
-                     size_t &voxel_num_j,
-                     size_t &voxel_num_k);
+// ==============================================================================
+void user_voxel_init(DCArrayKokkos<size_t>& elem_values,
+                     double&                dx,
+                     double&                dy,
+                     double&                dz,
+                     double&                orig_x,
+                     double&                orig_y,
+                     double&                orig_z,
+                     size_t&                voxel_num_i,
+                     size_t&                voxel_num_j,
+                     size_t&                voxel_num_k);
 
 // for string delimiter parsing
-std::vector<std::string> split (std::string s, std::string delimiter);
+std::vector<std::string> split(std::string s, std::string delimiter);
 
 // retrieves multiple values between [ ]
 std::vector<double> extract_list(std::string str);
 
 const std::string WHITESPACE = " ";
 
-std::string ltrim(const std::string &s);
- 
-std::string rtrim(const std::string &s);
- 
-std::string trim(const std::string &s);
+std::string ltrim(const std::string& s);
 
-KOKKOS_FUNCTION
-int get_id(int i, int j, int k, int num_i, int num_j);
+std::string rtrim(const std::string& s);
 
+std::string trim(const std::string& s);
 
+KOKKOS_FUNCTION
+int get_id(int i, int j, int k, int num_i, int num_j);
 
-void setup(const CArrayKokkos <material_t> &material,
-           const CArrayKokkos <mat_fill_t> &mat_fill,
-           const CArrayKokkos <boundary_t> &boundary,
-           mesh_t &mesh,
-           const DViewCArrayKokkos <double> &node_coords,
-           DViewCArrayKokkos <double> &node_vel,
-           DViewCArrayKokkos <double> &node_mass,
-           const DViewCArrayKokkos <double> &elem_den,
-           const DViewCArrayKokkos <double> &elem_pres,
-           const DViewCArrayKokkos <double> &elem_stress,
-           const DViewCArrayKokkos <double> &elem_sspd,
-           const DViewCArrayKokkos <double> &elem_sie,
-           const DViewCArrayKokkos <double> &elem_vol,
-           const DViewCArrayKokkos <double> &elem_mass,
-           const DViewCArrayKokkos <size_t> &elem_mat_id,
-           const DViewCArrayKokkos <double> &elem_statev,
-           const CArrayKokkos <double> &state_vars,
-           const DViewCArrayKokkos <double> &corner_mass,
-           const size_t num_fills,
-           const size_t rk_num_bins,
-           const size_t num_bcs,
-           const size_t num_materials,
-           const size_t num_state_vars
-           ){
-
-    
-    //--- calculate bdy sets ---//
+void setup(const CArrayKokkos<material_t>&  material,
+           const CArrayKokkos<mat_fill_t>&  mat_fill,
+           const CArrayKokkos<boundary_t>&  boundary,
+           mesh_t&                          mesh,
+           const DViewCArrayKokkos<double>& node_coords,
+           DViewCArrayKokkos<double>&       node_vel,
+           DViewCArrayKokkos<double>&       node_mass,
+           const DViewCArrayKokkos<double>& elem_den,
+           const DViewCArrayKokkos<double>& elem_pres,
+           const DViewCArrayKokkos<double>& elem_stress,
+           const DViewCArrayKokkos<double>& elem_sspd,
+           const DViewCArrayKokkos<double>& elem_sie,
+           const DViewCArrayKokkos<double>& elem_vol,
+           const DViewCArrayKokkos<double>& elem_mass,
+           const DViewCArrayKokkos<size_t>& elem_mat_id,
+           const DViewCArrayKokkos<double>& elem_statev,
+           const CArrayKokkos<double>&      state_vars,
+           const DViewCArrayKokkos<double>& corner_mass,
+           const size_t                     num_fills,
+           const size_t                     rk_num_bins,
+           const size_t                     num_bcs,
+           const size_t                     num_materials,
+           const size_t                     num_state_vars
+           )
+{
+    // --- calculate bdy sets ---//
     mesh.init_bdy_sets(num_bcs);
     printf("Num BC's = %lu\n", num_bcs);
-    
+
     // tag boundary patches in the set
     tag_bdys(boundary, mesh, node_coords);
 
     build_boundry_node_sets(boundary, mesh);
-    
+
     // loop over BCs
-    for (size_t this_bdy = 0; this_bdy < num_bcs; this_bdy++){
-        
+    for (size_t this_bdy = 0; this_bdy < num_bcs; this_bdy++)
+    {
         RUN({
             printf("Boundary Condition number %lu \n", this_bdy);
             printf("  Num bdy patches in this set = %lu \n", mesh.bdy_patches_in_set.stride(this_bdy));
             printf("  Num bdy nodes in this set = %lu \n", mesh.bdy_nodes_in_set.stride(this_bdy));
         });
         Kokkos::fence();
+    } // end for
 
-    }// end for
-    
-    
     // ---- Read model values from a file ----
     // check to see if state_vars come from an external file
-    DCArrayKokkos <size_t> read_from_file(num_materials);
+    DCArrayKokkos<size_t> read_from_file(num_materials);
     FOR_ALL(mat_id, 0, num_materials, {
-        
         read_from_file(mat_id) = material(mat_id).strength_setup;
-        
     }); // end parallel for
     Kokkos::fence();
-    
+
     read_from_file.update_host(); // copy to CPU if code is to read from a file
     Kokkos::fence();
-    
+
     // make memory to store state_vars from an external file
-       DCArrayKokkos <size_t> mat_num_state_vars(num_materials); // actual number of state_vars
+    DCArrayKokkos<size_t> mat_num_state_vars(num_materials);     // actual number of state_vars
     FOR_ALL(mat_id, 0, num_materials, {
-        
         mat_num_state_vars(mat_id) = material(mat_id).num_state_vars;
-        
     }); // end parallel for
     Kokkos::fence();
-    
+
     // copy actual number of state_vars to host
     mat_num_state_vars.update_host();
     Kokkos::fence();
-    
+
     // the state_vars from the file
-    DCArrayKokkos <double> file_state_vars(num_materials,mesh.num_elems,num_state_vars);
-    for (size_t mat_id=0; mat_id<num_materials; mat_id++){
-        
-        if (read_from_file.host(mat_id) == model_init::user_init){
-            
+    DCArrayKokkos<double> file_state_vars(num_materials, mesh.num_elems, num_state_vars);
+    for (size_t mat_id = 0; mat_id < num_materials; mat_id++)
+    {
+        if (read_from_file.host(mat_id) == model_init::user_init)
+        {
             size_t num_vars = mat_num_state_vars.host(mat_id);
-            
+
             user_model_init(file_state_vars,
                             num_vars,
                             mat_id,
                             mesh.num_elems);
-            
+
             // copy the values to the device
             file_state_vars.update_device();
             Kokkos::fence();
-            
         } // end if
-        
     } // end for
-    
 
     // for rading an external voxel mesh
-    DCArrayKokkos <size_t> voxel_elem_values;
-    double voxel_dx, voxel_dy, voxel_dz;           // voxel mesh resolution
-    double orig_x, orig_y, orig_z;                 // origin of voxel elem centers
-    size_t voxel_num_i, voxel_num_j, voxel_num_k;  // voxel elements in each direction
-    
+    DCArrayKokkos<size_t> voxel_elem_values;
+    double                voxel_dx, voxel_dy, voxel_dz; // voxel mesh resolution
+    double                orig_x, orig_y, orig_z;  // origin of voxel elem centers
+    size_t                voxel_num_i, voxel_num_j, voxel_num_k; // voxel elements in each direction
+
     // check to see if readVoxelFile
-    DCArrayKokkos <size_t> read_voxel_file(num_fills);
+    DCArrayKokkos<size_t> read_voxel_file(num_fills);
     FOR_ALL(f_id, 0, num_fills, {
-        if (mat_fill(f_id).volume == region::readVoxelFile){ 
-            read_voxel_file(f_id) = 1; 
-        }    
-        else {
-            read_voxel_file(f_id) = 0; 
-        }    
+        if (mat_fill(f_id).volume == region::readVoxelFile)
+        {
+            read_voxel_file(f_id) = 1;
+        }
+        else
+        {
+            read_voxel_file(f_id) = 0;
+        }
     }); // end parallel for
     read_voxel_file.update_host(); // copy to CPU if code is to read from a file
     Kokkos::fence();
-    
-    //--- apply the fill instructions over the Elements---//
-    
+
+    // --- apply the fill instructions over the Elements---//
+
     // loop over the fill instructures
-    for (int f_id = 0; f_id < num_fills; f_id++){
-        
-        
+    for (int f_id = 0; f_id < num_fills; f_id++)
+    {
         // voxel mesh setup
-        if (read_voxel_file.host(f_id) == 1){
+        if (read_voxel_file.host(f_id) == 1)
+        {
             // read voxel mesh
             user_voxel_init(voxel_elem_values,
                             voxel_dx, voxel_dy, voxel_dz,
                             orig_x, orig_y, orig_z,
                             voxel_num_i, voxel_num_j, voxel_num_k);
-            
+
             // copy values read from file to device
             voxel_elem_values.update_device();
         } // endif
-            
+
         // parallel loop over elements in mesh
         FOR_ALL(elem_gid, 0, mesh.num_elems, {
-
             const size_t rk_level = 1;
 
             // calculate the coordinates and radius of the element
@@ -200,133 +186,140 @@ void setup(const CArrayKokkos <material_t> &material,
             elem_coords[2] = 0.0;
 
             // get the coordinates of the element center
-            for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
+            for (int node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+            {
                 elem_coords[0] += node_coords(rk_level, mesh.nodes_in_elem(elem_gid, node_lid), 0);
                 elem_coords[1] += node_coords(rk_level, mesh.nodes_in_elem(elem_gid, node_lid), 1);
-                if (mesh.num_dims == 3){
+                if (mesh.num_dims == 3)
+                {
                     elem_coords[2] += node_coords(rk_level, mesh.nodes_in_elem(elem_gid, node_lid), 2);
-                } else
+                }
+                else
                 {
                     elem_coords[2] = 0.0;
                 }
             } // end loop over nodes in element
-            elem_coords[0] = elem_coords[0]/mesh.num_nodes_in_elem;
-            elem_coords[1] = elem_coords[1]/mesh.num_nodes_in_elem;
-            elem_coords[2] = elem_coords[2]/mesh.num_nodes_in_elem;
-                
-            
+            elem_coords[0] = elem_coords[0] / mesh.num_nodes_in_elem;
+            elem_coords[1] = elem_coords[1] / mesh.num_nodes_in_elem;
+            elem_coords[2] = elem_coords[2] / mesh.num_nodes_in_elem;
+
             // spherical radius
-            double radius = sqrt( elem_coords[0]*elem_coords[0] +
-                                  elem_coords[1]*elem_coords[1] +
-                                  elem_coords[2]*elem_coords[2] );
-                
+            double radius = sqrt(elem_coords[0] * elem_coords[0] +
+                                  elem_coords[1] * elem_coords[1] +
+                                  elem_coords[2] * elem_coords[2]);
+
             // cylinderical radius
-            double radius_cyl = sqrt( elem_coords[0]*elem_coords[0] +
-                                      elem_coords[1]*elem_coords[1] );   
-            
+            double radius_cyl = sqrt(elem_coords[0] * elem_coords[0] +
+                                      elem_coords[1] * elem_coords[1]);
+
             // default is not to fill the element
             size_t fill_this = 0;
-           
+
             // check to see if this element should be filled
-            switch(mat_fill(f_id).volume)
+            switch (mat_fill(f_id).volume)
             {
                 case region::global:
-                {
-                    fill_this = 1;
-                    break;
-                }
-                case region::box:
-                {
-                    if ( elem_coords[0] >= mat_fill(f_id).x1 && elem_coords[0] <= mat_fill(f_id).x2
-                      && elem_coords[1] >= mat_fill(f_id).y1 && elem_coords[1] <= mat_fill(f_id).y2
-                      && elem_coords[2] >= mat_fill(f_id).z1 && elem_coords[2] <= mat_fill(f_id).z2 )
+                    {
                         fill_this = 1;
-                    break;
-                }
+                        break;
+                    }
+                case region::box:
+                    {
+                        if (elem_coords[0] >= mat_fill(f_id).x1 && elem_coords[0] <= mat_fill(f_id).x2
+                            && elem_coords[1] >= mat_fill(f_id).y1 && elem_coords[1] <= mat_fill(f_id).y2
+                            && elem_coords[2] >= mat_fill(f_id).z1 && elem_coords[2] <= mat_fill(f_id).z2)
+                        {
+                            fill_this = 1;
+                        }
+                        break;
+                    }
                 case region::cylinder:
-                {
-                    if ( radius_cyl >= mat_fill(f_id).radius1
-                      && radius_cyl <= mat_fill(f_id).radius2 ) fill_this = 1;
-                    break;
-                }
+                    {
+                        if (radius_cyl >= mat_fill(f_id).radius1
+                            && radius_cyl <= mat_fill(f_id).radius2)
+                        {
+                            fill_this = 1;
+                        }
+                        break;
+                    }
                 case region::sphere:
-                {
-                    if ( radius >= mat_fill(f_id).radius1
-                      && radius <= mat_fill(f_id).radius2 ) fill_this = 1;
-                    break;
-                }
+                    {
+                        if (radius >= mat_fill(f_id).radius1
+                            && radius <= mat_fill(f_id).radius2)
+                        {
+                            fill_this = 1;
+                        }
+                        break;
+                    }
                 case region::readVoxelFile:
-                {
-                    fill_this = 0;  // default is no, don't fill it
-                    
-                    // find the closest element in the voxel mesh to this element
-                    double i0_real = (elem_coords[0] - orig_x)/(voxel_dx);
-                    double j0_real = (elem_coords[1] - orig_y)/(voxel_dy);
-                    double k0_real = (elem_coords[2] - orig_z)/(voxel_dz);
-                    
-                    int i0 = (int)i0_real;
-                    int j0 = (int)j0_real;
-                    int k0 = (int)k0_real;
-                    
-                    // look for the closest element in the voxel mesh
-                    int elem_id0 = get_id(i0,j0,k0,voxel_num_i,voxel_num_j);
-                    
-                    // if voxel mesh overlaps this mesh, then fill it if =1
-                    if (elem_id0 < voxel_elem_values.size() && elem_id0>=0){
-                        
-                        // voxel mesh elem values = 0 or 1
-                        fill_this = voxel_elem_values(elem_id0);  // values from file
-                        
-                    } // end if
-                    
-                } // end case
-                    
+                    {
+                        fill_this = 0; // default is no, don't fill it
+
+                        // find the closest element in the voxel mesh to this element
+                        double i0_real = (elem_coords[0] - orig_x) / (voxel_dx);
+                        double j0_real = (elem_coords[1] - orig_y) / (voxel_dy);
+                        double k0_real = (elem_coords[2] - orig_z) / (voxel_dz);
+
+                        int i0 = (int)i0_real;
+                        int j0 = (int)j0_real;
+                        int k0 = (int)k0_real;
+
+                        // look for the closest element in the voxel mesh
+                        int elem_id0 = get_id(i0, j0, k0, voxel_num_i, voxel_num_j);
+
+                        // if voxel mesh overlaps this mesh, then fill it if =1
+                        if (elem_id0 < voxel_elem_values.size() && elem_id0 >= 0)
+                        {
+                            // voxel mesh elem values = 0 or 1
+                            fill_this = voxel_elem_values(elem_id0); // values from file
+                        } // end if
+                    } // end case
             } // end of switch
 
-                 
             // paint the material state on the element
-            if (fill_this == 1){
-                    
+            if (fill_this == 1)
+            {
                 // density
                 elem_den(elem_gid) = mat_fill(f_id).den;
-                
+
                 // mass
-                elem_mass(elem_gid) = elem_den(elem_gid)*elem_vol(elem_gid);
-                
+                elem_mass(elem_gid) = elem_den(elem_gid) * elem_vol(elem_gid);
+
                 // specific internal energy
                 elem_sie(rk_level, elem_gid) = mat_fill(f_id).sie;
-		
+
                 elem_mat_id(elem_gid) = mat_fill(f_id).mat_id;
                 size_t mat_id = elem_mat_id(elem_gid); // short name
-                
+
                 // get state_vars from the input file or read them in
-                if (material(mat_id).strength_setup == model_init::user_init){
-                    
+                if (material(mat_id).strength_setup == model_init::user_init)
+                {
                     // use the values read from a file to get elem state vars
-                    for (size_t var=0; var<material(mat_id).num_state_vars; var++){
-                        elem_statev(elem_gid,var) = file_state_vars(mat_id,elem_gid,var);
+                    for (size_t var = 0; var < material(mat_id).num_state_vars; var++)
+                    {
+                        elem_statev(elem_gid, var) = file_state_vars(mat_id, elem_gid, var);
                     } // end for
-                    
                 }
-                else{
+                else
+                {
                     // use the values in the input file
                     // set state vars for the region where mat_id resides
-                    for (size_t var=0; var<material(mat_id).num_state_vars; var++){
-                        elem_statev(elem_gid,var) = state_vars(mat_id,var);
+                    for (size_t var = 0; var < material(mat_id).num_state_vars; var++)
+                    {
+                        elem_statev(elem_gid, var) = state_vars(mat_id, var);
                     } // end for
-                    
                 } // end logical on type
-                
+
                 // --- stress tensor ---
                 // always 3D even for 2D-RZ
-                for (size_t i=0; i<3; i++){
-                    for (size_t j=0; j<3; j++){
-                        elem_stress(rk_level,elem_gid,i,j) = 0.0;
-                    }        
+                for (size_t i = 0; i < 3; i++)
+                {
+                    for (size_t j = 0; j < 3; j++)
+                    {
+                        elem_stress(rk_level, elem_gid, i, j) = 0.0;
+                    }
                 }  // end for
-                
-                
-                
+
                 // --- Pressure and stress ---
                 material(mat_id).eos_model(elem_pres,
                                            elem_stress,
@@ -335,839 +328,774 @@ void setup(const CArrayKokkos <material_t> &material,
                                            elem_statev,
                                            elem_sspd,
                                            elem_den(elem_gid),
-                                           elem_sie(1,elem_gid));
-					    
-                
-                // loop over the nodes of this element and apply velocity
-                for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++){
+                                           elem_sie(1, elem_gid));
 
+                // loop over the nodes of this element and apply velocity
+                for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_elem; node_lid++)
+                {
                     // get the mesh node index
                     size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
 
-                
                     // --- Velocity ---
-                    switch(mat_fill(f_id).velocity)
+                    switch (mat_fill(f_id).velocity)
                     {
                         case init_conds::cartesian:
-                        {
-                        
-                            node_vel(rk_level, node_gid, 0) = mat_fill(f_id).u;
-                            node_vel(rk_level, node_gid, 1) = mat_fill(f_id).v;
-                            if (mesh.num_dims == 3) node_vel(rk_level, node_gid, 2) = mat_fill(f_id).w;
-                            
-                        
-                            break;
-                        }
-                        case init_conds::radial:
-                        {
-                            // Setting up cylindrical
-                            double dir[2]; 
-                            dir[0] = 0.0;
-                            dir[1] = 0.0;
-                            double radius_val = 0.0;
-                        
-                            for(int dim=0; dim<2; dim++){
-                                dir[dim] = node_coords(rk_level, node_gid, dim);
-                                radius_val += node_coords(rk_level, node_gid, dim)*node_coords(rk_level, node_gid, dim);
-                            } // end for
-                            radius_val = sqrt(radius_val);
-                        
-                            for(int dim=0; dim<2; dim++){
-                                if (radius_val > 1.0e-14){
-                                    dir[dim] /= (radius_val);
+                            {
+                                node_vel(rk_level, node_gid, 0) = mat_fill(f_id).u;
+                                node_vel(rk_level, node_gid, 1) = mat_fill(f_id).v;
+                                if (mesh.num_dims == 3)
+                                {
+                                    node_vel(rk_level, node_gid, 2) = mat_fill(f_id).w;
                                 }
-                                else{
-                                    dir[dim] = 0.0;
+
+                                break;
+                            }
+                        case init_conds::radial:
+                            {
+                                // Setting up cylindrical
+                                double dir[2];
+                                dir[0] = 0.0;
+                                dir[1] = 0.0;
+                                double radius_val = 0.0;
+
+                                for (int dim = 0; dim < 2; dim++)
+                                {
+                                    dir[dim]    = node_coords(rk_level, node_gid, dim);
+                                    radius_val += node_coords(rk_level, node_gid, dim) * node_coords(rk_level, node_gid, dim);
+                                } // end for
+                                radius_val = sqrt(radius_val);
+
+                                for (int dim = 0; dim < 2; dim++)
+                                {
+                                    if (radius_val > 1.0e-14)
+                                    {
+                                        dir[dim] /= (radius_val);
+                                    }
+                                    else
+                                    {
+                                        dir[dim] = 0.0;
+                                    }
+                                } // end for
+
+                                node_vel(rk_level, node_gid, 0) = mat_fill(f_id).speed * dir[0];
+                                node_vel(rk_level, node_gid, 1) = mat_fill(f_id).speed * dir[1];
+                                if (mesh.num_dims == 3)
+                                {
+                                    node_vel(rk_level, node_gid, 2) = 0.0;
                                 }
-                            } // end for
-                        
-                        
-                            node_vel(rk_level, node_gid, 0) = mat_fill(f_id).speed*dir[0];
-                            node_vel(rk_level, node_gid, 1) = mat_fill(f_id).speed*dir[1];
-                            if (mesh.num_dims == 3) node_vel(rk_level, node_gid, 2) = 0.0;
-                            
-                            break;
-                        }
+
+                                break;
+                            }
                         case init_conds::spherical:
-                        {
-                            
-                            // Setting up spherical
-                            double dir[3];
-                            dir[0] = 0.0;
-                            dir[1] = 0.0;
-                            dir[2] = 0.0;
-                            double radius_val = 0.0;
-                        
-                            for(int dim=0; dim<3; dim++){
-                                dir[dim] = node_coords(rk_level, node_gid, dim);
-                                radius_val += node_coords(rk_level, node_gid, dim)*node_coords(rk_level, node_gid, dim);
-                            } // end for
-                            radius_val = sqrt(radius_val);
-                        
-                            for(int dim=0; dim<3; dim++){
-                                if (radius_val > 1.0e-14){
-                                    dir[dim] /= (radius_val);
-                                }
-                                else{
-                                    dir[dim] = 0.0;
+                            {
+                                // Setting up spherical
+                                double dir[3];
+                                dir[0] = 0.0;
+                                dir[1] = 0.0;
+                                dir[2] = 0.0;
+                                double radius_val = 0.0;
+
+                                for (int dim = 0; dim < 3; dim++)
+                                {
+                                    dir[dim]    = node_coords(rk_level, node_gid, dim);
+                                    radius_val += node_coords(rk_level, node_gid, dim) * node_coords(rk_level, node_gid, dim);
+                                } // end for
+                                radius_val = sqrt(radius_val);
+
+                                for (int dim = 0; dim < 3; dim++)
+                                {
+                                    if (radius_val > 1.0e-14)
+                                    {
+                                        dir[dim] /= (radius_val);
+                                    }
+                                    else
+                                    {
+                                        dir[dim] = 0.0;
+                                    }
+                                } // end for
+
+                                node_vel(rk_level, node_gid, 0) = mat_fill(f_id).speed * dir[0];
+                                node_vel(rk_level, node_gid, 1) = mat_fill(f_id).speed * dir[1];
+                                if (mesh.num_dims == 3)
+                                {
+                                    node_vel(rk_level, node_gid, 2) = mat_fill(f_id).speed * dir[2];
                                 }
-                            } // end for
-                        
-                            node_vel(rk_level, node_gid, 0) = mat_fill(f_id).speed*dir[0];
-                            node_vel(rk_level, node_gid, 1) = mat_fill(f_id).speed*dir[1];
-                            if (mesh.num_dims == 3) node_vel(rk_level, node_gid, 2) = mat_fill(f_id).speed*dir[2];
 
-                            break;
-                        }
+                                break;
+                            }
                         case init_conds::radial_linear:
-                        {
-                        
-                            break;
-                        }
+                            {
+                                break;
+                            }
                         case init_conds::spherical_linear:
-                        {
-                        
-                            break;
-                        }
+                            {
+                                break;
+                            }
                         case init_conds::tg_vortex:
-                        {
-                        
-                            node_vel(rk_level, node_gid, 0) = sin(PI * node_coords(rk_level,node_gid, 0)) * cos(PI * node_coords(rk_level,node_gid, 1)); 
-                            node_vel(rk_level, node_gid, 1) =  -1.0*cos(PI * node_coords(rk_level,node_gid, 0)) * sin(PI * node_coords(rk_level,node_gid, 1)); 
-                            if (mesh.num_dims == 3) node_vel(rk_level, node_gid, 2) = 0.0;
+                            {
+                                node_vel(rk_level, node_gid, 0) = sin(PI * node_coords(rk_level, node_gid, 0)) * cos(PI * node_coords(rk_level, node_gid, 1));
+                                node_vel(rk_level, node_gid, 1) =  -1.0 * cos(PI * node_coords(rk_level, node_gid, 0)) * sin(PI * node_coords(rk_level, node_gid, 1));
+                                if (mesh.num_dims == 3)
+                                {
+                                    node_vel(rk_level, node_gid, 2) = 0.0;
+                                }
 
-                            break;
-                        }
+                                break;
+                            }
                     } // end of switch
+                } // end loop over nodes of element
 
-                }// end loop over nodes of element
-                
-                
-                if(mat_fill(f_id).velocity == init_conds::tg_vortex)
+                if (mat_fill(f_id).velocity == init_conds::tg_vortex)
                 {
-                    elem_pres(elem_gid) = 0.25*( cos(2.0*PI*elem_coords[0]) + cos(2.0*PI*elem_coords[1]) ) + 1.0;
-                
+                    elem_pres(elem_gid) = 0.25 * (cos(2.0 * PI * elem_coords[0]) + cos(2.0 * PI * elem_coords[1]) ) + 1.0;
+
                     // p = rho*ie*(gamma - 1)
                     size_t mat_id = f_id;
-                    double gamma = elem_statev(elem_gid,4); // gamma value
+                    double gamma  = elem_statev(elem_gid, 4); // gamma value
                     elem_sie(rk_level, elem_gid) =
-                                    elem_pres(elem_gid)/(mat_fill(f_id).den*(gamma - 1.0));
+                        elem_pres(elem_gid) / (mat_fill(f_id).den * (gamma - 1.0));
                 } // end if
-
             } // end if fill
-          
         }); // end FOR_ALL element loop
         Kokkos::fence();
-        
-  
     } // end for loop over fills
-    
-   
-    
+
     // apply BC's to velocity
     boundary_velocity(mesh, boundary, node_vel, 0.0);
-    
-    
+
     // calculate the corner massess if 2D
-    if(mesh.num_dims==2){
-        
+    if (mesh.num_dims == 2)
+    {
         FOR_ALL(elem_gid, 0, mesh.num_elems, {
-            
             // facial area of the corners
             double corner_areas_array[4];
-            
-            ViewCArrayKokkos <double> corner_areas(&corner_areas_array[0],4);
-            ViewCArrayKokkos <size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
-            
+
+            ViewCArrayKokkos<double> corner_areas(&corner_areas_array[0], 4);
+            ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid, 0), 4);
+
             get_area_weights2D(corner_areas,
                                elem_gid,
                                node_coords,
                                elem_node_gids);
-            
+
             // loop over the corners of the element and calculate the mass
-            for (size_t corner_lid=0; corner_lid<4; corner_lid++){
-                
-                size_t corner_gid = mesh.corners_in_elem(elem_gid, corner_lid);
-                corner_mass(corner_gid) = corner_areas(corner_lid)*elem_den(elem_gid); // node radius is added later
-                
+            for (size_t corner_lid = 0; corner_lid < 4; corner_lid++)
+            {
+                size_t corner_gid       = mesh.corners_in_elem(elem_gid, corner_lid);
+                corner_mass(corner_gid) = corner_areas(corner_lid) * elem_den(elem_gid); // node radius is added later
             } // end for over corners
         });
-    
     } // end of
-    
-    
+
     // calculate the nodal mass
     FOR_ALL(node_gid, 0, mesh.num_nodes, {
-        
         node_mass(node_gid) = 0.0;
-        
-        if(mesh.num_dims==3){
-            
-            for(size_t elem_lid=0; elem_lid<mesh.num_corners_in_node(node_gid); elem_lid++){
-                size_t elem_gid = mesh.elems_in_node(node_gid,elem_lid);
-                node_mass(node_gid) += 1.0/8.0*elem_mass(elem_gid);
+
+        if (mesh.num_dims == 3)
+        {
+            for (size_t elem_lid = 0; elem_lid < mesh.num_corners_in_node(node_gid); elem_lid++)
+            {
+                size_t elem_gid      = mesh.elems_in_node(node_gid, elem_lid);
+                node_mass(node_gid) += 1.0 / 8.0 * elem_mass(elem_gid);
             } // end for elem_lid
-            
-        }// end if dims=3
-        else {
-            
+        } // end if dims=3
+        else
+        {
             // 2D-RZ
-            for(size_t corner_lid=0; corner_lid<mesh.num_corners_in_node(node_gid); corner_lid++){
-                
-                size_t corner_gid = mesh.corners_in_node(node_gid, corner_lid);
+            for (size_t corner_lid = 0; corner_lid < mesh.num_corners_in_node(node_gid); corner_lid++)
+            {
+                size_t corner_gid    = mesh.corners_in_node(node_gid, corner_lid);
                 node_mass(node_gid) += corner_mass(corner_gid);  // sans the radius so it is areal node mass
-                
-                corner_mass(corner_gid) *= node_coords(1,node_gid,1); // true corner mass now
+
+                corner_mass(corner_gid) *= node_coords(1, node_gid, 1); // true corner mass now
             } // end for elem_lid
-            
         } // end else
-        
     }); // end FOR_ALL
     Kokkos::fence();
-    
+
     return;
-    
 } // end of setup
 
-
-
 // set planes for tagging sub sets of boundary patches
 // bc_tag = 0 xplane, 1 yplane, 2 zplane, 3 cylinder, 4 is shell
 // val = plane value, cyl radius, sphere radius
-void tag_bdys(const CArrayKokkos <boundary_t> &boundary,
-              mesh_t &mesh,
-              const DViewCArrayKokkos <double> &node_coords){
-
+void tag_bdys(const CArrayKokkos<boundary_t>&  boundary,
+              mesh_t&                          mesh,
+              const DViewCArrayKokkos<double>& node_coords)
+{
     size_t num_dims = mesh.num_dims;
-    
-    //if (bdy_set == mesh.num_bdy_sets){
+
+    // if (bdy_set == mesh.num_bdy_sets){
     //    printf(" ERROR: number of boundary sets must be increased by %zu",
     //              bdy_set-mesh.num_bdy_sets+1);
     //    exit(0);
-    //} // end if
-    
+    // } // end if
+
     FOR_ALL(bdy_set, 0, mesh.num_bdy_sets, {
-        
         // tag boundaries
         int bc_tag_id = boundary(bdy_set).surface;
-        double val = boundary(bdy_set).value;
-        
+        double val    = boundary(bdy_set).value;
+
         // save the boundary patches to this set that are on the plane, spheres, etc.
-        for (size_t bdy_patch_lid=0; bdy_patch_lid<mesh.num_bdy_patches; bdy_patch_lid++){
-            
+        for (size_t bdy_patch_lid = 0; bdy_patch_lid < mesh.num_bdy_patches; bdy_patch_lid++)
+        {
             // save the patch index
             size_t bdy_patch_gid = mesh.bdy_patches(bdy_patch_lid);
-            
-            
+
             // check to see if this patch is on the specified plane
             size_t is_on_bdy = check_bdy(bdy_patch_gid,
                                          bc_tag_id,
                                          val,
                                          mesh,
                                          node_coords); // no=0, yes=1
-            
-            if (is_on_bdy == 1){
-                
+
+            if (is_on_bdy == 1)
+            {
                 size_t index = mesh.bdy_patches_in_set.stride(bdy_set);
-                
+
                 // increment the number of boundary patches saved
-                mesh.bdy_patches_in_set.stride(bdy_set) ++;
-                
-                
+                mesh.bdy_patches_in_set.stride(bdy_set)++;
+
                 mesh.bdy_patches_in_set(bdy_set, index) = bdy_patch_gid;
             } // end if
-            
-            
         } // end for bdy_patch
-        
     });  // end FOR_ALL bdy_sets
-   
+
     return;
 } // end tag
 
-
-
 // routine for checking to see if a vertex is on a boundary
 // bc_tag = 0 xplane, 1 yplane, 2 zplane, 3 cylinder, 4 is shell
 // val = plane value, radius, radius
 KOKKOS_FUNCTION
-size_t check_bdy(const size_t patch_gid,
-                 const int this_bc_tag,
-                 const double val,
-                 const mesh_t &mesh,
-                 const DViewCArrayKokkos <double> &node_coords){
-    
-    
+size_t check_bdy(const size_t                     patch_gid,
+                 const int                        this_bc_tag,
+                 const double                     val,
+                 const mesh_t&                    mesh,
+                 const DViewCArrayKokkos<double>& node_coords)
+{
     size_t num_dims = mesh.num_dims;
-    
+
     // default bool is not on the boundary
     size_t is_on_bdy = 0;
-    
+
     // the patch coordinates
     double these_patch_coords[3];  // Note: cannot allocated array with num_dims
-    
+
     // loop over the nodes on the patch
-    for (size_t patch_node_lid=0; patch_node_lid<mesh.num_nodes_in_patch; patch_node_lid++){
-        
+    for (size_t patch_node_lid = 0; patch_node_lid < mesh.num_nodes_in_patch; patch_node_lid++)
+    {
         // get the nodal_gid for this node in the patch
         size_t node_gid = mesh.nodes_in_patch(patch_gid, patch_node_lid);
-        
-        for (size_t dim = 0; dim < num_dims; dim++){
+
+        for (size_t dim = 0; dim < num_dims; dim++)
+        {
             these_patch_coords[dim] = node_coords(1, node_gid, dim);  // (rk, node_gid, dim)
         } // end for dim
-        
-        
+
         // a x-plane
-        if (this_bc_tag == 0){
-            
-            if ( fabs(these_patch_coords[0] - val) <= 1.0e-7 ) is_on_bdy += 1;
-            
-        }// end if on type
-        
+        if (this_bc_tag == 0)
+        {
+            if (fabs(these_patch_coords[0] - val) <= 1.0e-7)
+            {
+                is_on_bdy += 1;
+            }
+        } // end if on type
         // a y-plane
-        else if (this_bc_tag == 1){
-            
-            if ( fabs(these_patch_coords[1] - val) <= 1.0e-7 ) is_on_bdy += 1;
-            
-        }// end if on type
-        
+        else if (this_bc_tag == 1)
+        {
+            if (fabs(these_patch_coords[1] - val) <= 1.0e-7)
+            {
+                is_on_bdy += 1;
+            }
+        } // end if on type
         // a z-plane
-        else if (this_bc_tag == 2){
-            
-            if ( fabs(these_patch_coords[2] - val) <= 1.0e-7 ) is_on_bdy += 1;
-            
-        }// end if on type
-        
-        
+        else if (this_bc_tag == 2)
+        {
+            if (fabs(these_patch_coords[2] - val) <= 1.0e-7)
+            {
+                is_on_bdy += 1;
+            }
+        } // end if on type
         // cylinderical shell where radius = sqrt(x^2 + y^2)
-        else if (this_bc_tag == 3){
-            
-            real_t R = sqrt(these_patch_coords[0]*these_patch_coords[0] +
-                            these_patch_coords[1]*these_patch_coords[1]);
-            
-            if ( fabs(R - val) <= 1.0e-7 ) is_on_bdy += 1;
-            
-            
-        }// end if on type
-        
+        else if (this_bc_tag == 3)
+        {
+            real_t R = sqrt(these_patch_coords[0] * these_patch_coords[0] +
+                            these_patch_coords[1] * these_patch_coords[1]);
+
+            if (fabs(R - val) <= 1.0e-7)
+            {
+                is_on_bdy += 1;
+            }
+        } // end if on type
         // spherical shell where radius = sqrt(x^2 + y^2 + z^2)
-        else if (this_bc_tag == 4){
-            
-            real_t R = sqrt(these_patch_coords[0]*these_patch_coords[0] +
-                            these_patch_coords[1]*these_patch_coords[1] +
-                            these_patch_coords[2]*these_patch_coords[2]);
-            
-            if ( fabs(R - val) <= 1.0e-7 ) is_on_bdy += 1;
-            
+        else if (this_bc_tag == 4)
+        {
+            real_t R = sqrt(these_patch_coords[0] * these_patch_coords[0] +
+                            these_patch_coords[1] * these_patch_coords[1] +
+                            these_patch_coords[2] * these_patch_coords[2]);
+
+            if (fabs(R - val) <= 1.0e-7)
+            {
+                is_on_bdy += 1;
+            }
         } // end if on type
-        
     } // end for nodes in the patch
-    
+
     // if all nodes in the patch are on the surface
-    if (is_on_bdy == mesh.num_nodes_in_patch){
+    if (is_on_bdy == mesh.num_nodes_in_patch)
+    {
         is_on_bdy = 1;
     }
-    else {
+    else
+    {
         is_on_bdy = 0;
     }
-    
-    
+
     return is_on_bdy;
-    
 } // end method to check bdy
 
+void build_boundry_node_sets(const CArrayKokkos<boundary_t>& boundary,
+                             mesh_t&                         mesh)
+{
+    // build boundary nodes in each boundary set
 
+    mesh.num_bdy_nodes_in_set = DCArrayKokkos<size_t>(mesh.num_bdy_sets);
+    CArrayKokkos<long long int> temp_count_num_bdy_nodes_in_set(mesh.num_bdy_sets, mesh.num_nodes);
+
+    DynamicRaggedRightArrayKokkos<size_t> temp_nodes_in_set(mesh.num_bdy_sets, mesh.num_bdy_patches * mesh.num_nodes_in_patch);
 
-void build_boundry_node_sets(const CArrayKokkos <boundary_t> &boundary,
-                             mesh_t &mesh){
-    
-    
-    
-    // build boundary nodes in each boundary set
-    
-    mesh.num_bdy_nodes_in_set = DCArrayKokkos <size_t> (mesh.num_bdy_sets);
-    CArrayKokkos <long long int> temp_count_num_bdy_nodes_in_set(mesh.num_bdy_sets, mesh.num_nodes);
-    
-    DynamicRaggedRightArrayKokkos <size_t> temp_nodes_in_set (mesh.num_bdy_sets, mesh.num_bdy_patches*mesh.num_nodes_in_patch);
-    
-    
     // Parallel loop over boundary sets on device
     FOR_ALL(bdy_set, 0, mesh.num_bdy_sets, {
-	
         // finde the number of patches_in_set
         size_t num_bdy_patches_in_set = mesh.bdy_patches_in_set.stride(bdy_set);
-        
+
         // Loop over boundary patches in boundary set
-        for (size_t bdy_patch_gid = 0; bdy_patch_gid<num_bdy_patches_in_set; bdy_patch_gid++){
-            
-                // get the global id for this boundary patch
-                size_t patch_gid = mesh.bdy_patches_in_set(bdy_set, bdy_patch_gid);
-                
-                // apply boundary condition at nodes on boundary
-                for(size_t node_lid = 0; node_lid < mesh.num_nodes_in_patch; node_lid++){
-                    
-                    size_t node_gid = mesh.nodes_in_patch(patch_gid, node_lid);
-                    
-                    temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) = -1;
-                        
-                } // end for node_lid
-            
+        for (size_t bdy_patch_gid = 0; bdy_patch_gid < num_bdy_patches_in_set; bdy_patch_gid++)
+        {
+            // get the global id for this boundary patch
+            size_t patch_gid = mesh.bdy_patches_in_set(bdy_set, bdy_patch_gid);
+
+            // apply boundary condition at nodes on boundary
+            for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_patch; node_lid++)
+            {
+                size_t node_gid = mesh.nodes_in_patch(patch_gid, node_lid);
+
+                temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) = -1;
+            }     // end for node_lid
         } // end for bdy_patch_gid
-        
-        
+
         // Loop over boundary patches in boundary set
-        for (size_t bdy_patch_gid = 0; bdy_patch_gid<num_bdy_patches_in_set; bdy_patch_gid++){
-            
-                // get the global id for this boundary patch
-                size_t patch_gid = mesh.bdy_patches_in_set(bdy_set, bdy_patch_gid);
-                
-                // apply boundary condition at nodes on boundary
-                for(size_t node_lid = 0; node_lid < mesh.num_nodes_in_patch; node_lid++){
-                    
-                    size_t node_gid = mesh.nodes_in_patch(patch_gid, node_lid);
-                    
-                    if (temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) == -1){
-                        
-                        size_t num_saved = mesh.num_bdy_nodes_in_set(bdy_set);
-                        
-                        mesh.num_bdy_nodes_in_set(bdy_set)++;
-                        
-                        // replace -1 with node_gid to denote the node was already saved
-                        temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) = node_gid;
-                        
-                        // increment the number of saved nodes, create memory
-                        temp_nodes_in_set.stride(bdy_set)++;
-                        temp_nodes_in_set(bdy_set, num_saved) = node_gid;
-                        
-                    } // end if
-                    
-                } // end for node_lid
-            
+        for (size_t bdy_patch_gid = 0; bdy_patch_gid < num_bdy_patches_in_set; bdy_patch_gid++)
+        {
+            // get the global id for this boundary patch
+            size_t patch_gid = mesh.bdy_patches_in_set(bdy_set, bdy_patch_gid);
+
+            // apply boundary condition at nodes on boundary
+            for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_patch; node_lid++)
+            {
+                size_t node_gid = mesh.nodes_in_patch(patch_gid, node_lid);
+
+                if (temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) == -1)
+                {
+                    size_t num_saved = mesh.num_bdy_nodes_in_set(bdy_set);
+
+                    mesh.num_bdy_nodes_in_set(bdy_set)++;
+
+                    // replace -1 with node_gid to denote the node was already saved
+                    temp_count_num_bdy_nodes_in_set(bdy_set, node_gid) = node_gid;
+
+                    // increment the number of saved nodes, create memory
+                    temp_nodes_in_set.stride(bdy_set)++;
+                    temp_nodes_in_set(bdy_set, num_saved) = node_gid;
+                }     // end if
+            }     // end for node_lid
         } // end for bdy_patch_gid
-        
     }); // end FOR_ALL bdy_set
     Kokkos::fence();
-    
-   
+
     // allocate the RaggedRight bdy_nodes_in_set array
-    mesh.bdy_nodes_in_set = RaggedRightArrayKokkos <size_t> (mesh.num_bdy_nodes_in_set);
+    mesh.bdy_nodes_in_set = RaggedRightArrayKokkos<size_t>(mesh.num_bdy_nodes_in_set);
 
-    FOR_ALL (bdy_set, 0, mesh.num_bdy_sets, {
-	
+    FOR_ALL(bdy_set, 0, mesh.num_bdy_sets, {
         // Loop over boundary patches in boundary set
-        for (size_t bdy_node_lid=0; bdy_node_lid<mesh.num_bdy_nodes_in_set(bdy_set); bdy_node_lid++){
-
+        for (size_t bdy_node_lid = 0; bdy_node_lid < mesh.num_bdy_nodes_in_set(bdy_set); bdy_node_lid++)
+        {
             // save the bdy_node_gid
             mesh.bdy_nodes_in_set(bdy_set, bdy_node_lid) = temp_nodes_in_set(bdy_set, bdy_node_lid);
-            
         } // end for
-        
     }); // end FOR_ALL bdy_set
-    
+
     // update the host side for the number nodes in a bdy_set
     mesh.num_bdy_nodes_in_set.update_host();
-    
+
     return;
 } // end method to build boundary nodes
 
-
-
-
-
 // -----------------------------------------------------------------------------
 // The function to read a voxel vtk file from Dream3d and intialize the mesh
-//------------------------------------------------------------------------------
-void user_voxel_init(DCArrayKokkos <size_t> &elem_values,
-                     double &dx,
-                     double &dy,
-                     double &dz,
-                     double &orig_x,
-                     double &orig_y,
-                     double &orig_z,
-                     size_t &num_elems_i,
-                     size_t &num_elems_j,
-                     size_t &num_elems_k) {
-
-    
+// ------------------------------------------------------------------------------
+void user_voxel_init(DCArrayKokkos<size_t>& elem_values,
+                     double&                dx,
+                     double&                dy,
+                     double&                dz,
+                     double&                orig_x,
+                     double&                orig_y,
+                     double&                orig_z,
+                     size_t&                num_elems_i,
+                     size_t&                num_elems_j,
+                     size_t&                num_elems_k)
+{
     std::string MESH = "voxel.vtk"; // user specified
-    
+
     std::ifstream in;  // FILE *in;
     in.open(MESH);
-    
+
     // check to see of a mesh was supplied when running the code
-    if (in){
+    if (in)
+    {
         printf("\nReading the 3D voxel mesh: ");
         std::cout << MESH << std::endl;
     }
-    else{
+    else
+    {
         std::cout << "\n\n**********************************\n\n";
         std::cout << " ERROR:\n";
         std::cout << " Voxel vtk input does not exist \n";
         std::cout << "**********************************\n\n" << std::endl;
         std::exit(EXIT_FAILURE);
     } // end if
-    
-    
 
     size_t i;           // used for writing information to file
     size_t point_id;    // the global id for the point
     size_t elem_id;     // the global id for the elem
     size_t this_point;   // a local id for a point in a elem (0:7 for a Hexahedral elem)
-    
+
     size_t num_points_i;
     size_t num_points_j;
     size_t num_points_k;
-    
+
     size_t num_dims = 3;
-    
 
     std::string token;
-    
+
     bool found = false;
-    
 
     // look for POINTS
     i = 0;
-    while (found==false) {
+    while (found == false) {
         std::string str;
         std::string delimiter = " ";
         std::getline(in, str);
-        std::vector<std::string> v = split (str, delimiter);
-        
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
         //      POINTS %d float
-        if(v[0] == "DIMENSIONS"){
+        if (v[0] == "DIMENSIONS")
+        {
             num_points_i = std::stoi(v[1]);
             num_points_j = std::stoi(v[2]);
             num_points_k = std::stoi(v[3]);
             printf("Num voxel nodes read in = %d, %d, %d\n", num_points_i, num_points_j, num_points_k);
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find POINTS \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    
-    found=false;
-    
-    int num_points = num_points_i*num_points_j*num_points_k;
-    CArray <double> pt_coords_x(num_points_i);
-    CArray <double> pt_coords_y(num_points_j);
-    CArray <double> pt_coords_z(num_points_k);
-    
-    
-    while (found==false) {
+
+    found = false;
+
+    int            num_points = num_points_i * num_points_j * num_points_k;
+    CArray<double> pt_coords_x(num_points_i);
+    CArray<double> pt_coords_y(num_points_j);
+    CArray<double> pt_coords_z(num_points_k);
+
+    while (found == false) {
         std::string str;
         std::string str0;
         std::string delimiter = " ";
         std::getline(in, str);
-        std::vector<std::string> v = split (str, delimiter);
-        
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
-        if(v[0] == "X_COORDINATES"){
+        if (v[0] == "X_COORDINATES")
+        {
+            size_t num_saved = 0;
 
-            size_t num_saved =0;
-            
-            while (num_saved < num_points_i-1){
+            while (num_saved < num_points_i - 1) {
                 // get next line
                 std::getline(in, str0);
-                
+
                 // remove starting and trailing spaces
                 str = trim(str0);
-                std::vector<std::string> v_coords = split (str, delimiter);
-                
-                
+                std::vector<std::string> v_coords = split(str, delimiter);
+
                 // loop over the contents of the vector v_coords
-                for (size_t this_point=0; this_point<v_coords.size(); this_point++){
+                for (size_t this_point = 0; this_point < v_coords.size(); this_point++)
+                {
                     pt_coords_x(num_saved) = std::stod(v_coords[this_point]);
                     num_saved++;
                 } // end for
-                
             } // end while
-            
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find X_COORDINATES \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    found=false;
-    
-    
-    while (found==false) {
+    found = false;
+
+    while (found == false) {
         std::string str;
         std::string str0;
         std::string delimiter = " ";
         std::getline(in, str);
-        std::vector<std::string> v = split (str, delimiter);
-        
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
-        if(v[0] == "Y_COORDINATES"){
+        if (v[0] == "Y_COORDINATES")
+        {
+            size_t num_saved = 0;
 
-            size_t num_saved =0;
-            
-            while (num_saved < num_points_j-1){
+            while (num_saved < num_points_j - 1) {
                 // get next line
                 std::getline(in, str0);
-                
+
                 // remove starting and trailing spaces
                 str = trim(str0);
-                std::vector<std::string> v_coords = split (str, delimiter);
-                
+                std::vector<std::string> v_coords = split(str, delimiter);
+
                 // loop over the contents of the vector v_coords
-                for (size_t this_point=0; this_point<v_coords.size(); this_point++){
-                    
+                for (size_t this_point = 0; this_point < v_coords.size(); this_point++)
+                {
                     pt_coords_y(num_saved) = std::stod(v_coords[this_point]);
                     num_saved++;
-                    
                 } // end for
-                
             } // end while
-            
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find Y_COORDINATES \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    found=false;
+    found = false;
 
-    
-    
-    while (found==false) {
+    while (found == false) {
         std::string str;
         std::string str0;
         std::string delimiter = " ";
         std::getline(in, str);
-        std::vector<std::string> v = split (str, delimiter);
-        
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
-        if(v[0] == "Z_COORDINATES"){
+        if (v[0] == "Z_COORDINATES")
+        {
+            size_t num_saved = 0;
 
-            size_t num_saved =0;
-            
-            while (num_saved < num_points_k-1){
+            while (num_saved < num_points_k - 1) {
                 // get next line
                 std::getline(in, str0);
-                
+
                 // remove starting and trailing spaces
                 str = trim(str0);
-                std::vector<std::string> v_coords = split (str, delimiter);
-                
+                std::vector<std::string> v_coords = split(str, delimiter);
+
                 // loop over the contents of the vector v_coords
-                for (size_t this_point=0; this_point<v_coords.size(); this_point++){
-                    
+                for (size_t this_point = 0; this_point < v_coords.size(); this_point++)
+                {
                     pt_coords_z(num_saved) = std::stod(v_coords[this_point]);
                     num_saved++;
-                    
                 } // end for
-                
             } // end while
-            
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find Z_COORDINATES \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    found=false;
-    
+    found = false;
 
-
-    
     size_t num_elems;
     num_elems_i = num_points_i - 1;
     num_elems_j = num_points_j - 1;
     num_elems_k = num_points_k - 1;
-    
-    
+
     // center to center distance between first and last elem along each edge
-    double Lx = (pt_coords_x(num_points_i-2) - pt_coords_x(0));
-    double Ly = (pt_coords_y(num_points_j-2) - pt_coords_y(0));
-    double Lz = (pt_coords_z(num_points_k-2) - pt_coords_z(0));
-    
+    double Lx = (pt_coords_x(num_points_i - 2) - pt_coords_x(0));
+    double Ly = (pt_coords_y(num_points_j - 2) - pt_coords_y(0));
+    double Lz = (pt_coords_z(num_points_k - 2) - pt_coords_z(0));
+
     // spacing between elems
-    dx = Lx/((double) num_elems_i);
-    dy = Ly/((double) num_elems_j);
-    dz = Lz/((double) num_elems_k);
-    
+    dx = Lx / ((double) num_elems_i);
+    dy = Ly / ((double) num_elems_j);
+    dz = Lz / ((double) num_elems_k);
+
     // element mesh origin
-    orig_x = 0.5*(pt_coords_x(0) + pt_coords_x(1)),
-    orig_y = 0.5*(pt_coords_y(0) + pt_coords_y(1)),
-    orig_z = 0.5*(pt_coords_z(0) + pt_coords_z(1)),
-    
+    orig_x = 0.5 * (pt_coords_x(0) + pt_coords_x(1)),
+    orig_y = 0.5 * (pt_coords_y(0) + pt_coords_y(1)),
+    orig_z = 0.5 * (pt_coords_z(0) + pt_coords_z(1)),
 
-    
     // look for CELLS
     i = 0;
-    while (found==false) {
+    while (found == false) {
         std::string str;
         std::getline(in, str);
-        
-        std::string delimiter = " ";
-        std::vector<std::string> v = split (str, delimiter);
-        
-        
+
+        std::string              delimiter = " ";
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
         //      CELLS num_elems size
-        if(v[0] == "CELL_DATA"){
+        if (v[0] == "CELL_DATA")
+        {
             num_elems = std::stoi(v[1]);
             printf("Num voxel elements read in %d\n", num_elems);
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find CELL_DATA \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    found=false;
-
+    found = false;
 
-    
     // allocate memory for element voxel values
-    elem_values = DCArrayKokkos <size_t> (num_elems);
-    
+    elem_values = DCArrayKokkos<size_t>(num_elems);
+
     // reading the cell data
-    while (found==false) {
+    while (found == false) {
         std::string str;
         std::string str0;
-        
+
         std::string delimiter = " ";
         std::getline(in, str);
-        std::vector<std::string> v = split (str, delimiter);
-        
+        std::vector<std::string> v = split(str, delimiter);
+
         // looking for the following text:
-        if(v[0] == "LOOKUP_TABLE"){
+        if (v[0] == "LOOKUP_TABLE")
+        {
+            size_t num_saved = 0;
 
-            size_t num_saved =0;
-            
-            while (num_saved < num_elems-1){
+            while (num_saved < num_elems - 1) {
                 // get next line
                 std::getline(in, str0);
-                
+
                 // remove starting and trailing spaces
                 str = trim(str0);
-                std::vector<std::string> v_values = split (str, delimiter);
-                
-                
+                std::vector<std::string> v_values = split(str, delimiter);
+
                 // loop over the contents of the vector v_coords
-                for (size_t this_elem=0; this_elem<v_values.size(); this_elem++){
-                    
+                for (size_t this_elem = 0; this_elem < v_values.size(); this_elem++)
+                {
                     // save integers (0 or 1) to host side
                     elem_values.host(num_saved) = std::stoi(v_values[this_elem]);
                     num_saved++;
-                    
                 } // end for
-                
-               // printf(" done with one row of data \n");
-                
+
+                // printf(" done with one row of data \n");
             } // end while
-            
-            
-            found=true;
+
+            found = true;
         } // end if
-        
-        
-        if (i>1000){
+
+        if (i > 1000)
+        {
             printf("ERROR: Failed to find LOOKUP_TABLE data \n");
             break;
         } // end if
-        
+
         i++;
     } // end while
-    found=false;
-    
+    found = false;
+
     printf("\n");
-    
-    in.close();
-    
 
+    in.close();
 } // end routine
 
-
-
 // Code from stackover flow for string delimiter parsing
-std::vector<std::string> split (std::string s, std::string delimiter) {
-    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
-    std::string token;
+std::vector<std::string> split(std::string s, std::string delimiter)
+{
+    size_t                   pos_start = 0, pos_end, delim_len = delimiter.length();
+    std::string              token;
     std::vector<std::string> res;
 
-    while ((pos_end = s.find (delimiter, pos_start)) != std::string::npos) {
-        token = s.substr (pos_start, pos_end - pos_start);
+    while ((pos_end = s.find(delimiter, pos_start)) != std::string::npos) {
+        token     = s.substr(pos_start, pos_end - pos_start);
         pos_start = pos_end + delim_len;
-        res.push_back (token);
+        res.push_back(token);
     }
 
-    res.push_back (s.substr (pos_start));
+    res.push_back(s.substr(pos_start));
     return res;
-    
 } // end of split
 
-
 // retrieves multiple values between [ ]
-std::vector<double> extract_list(std::string str) {
-    
+std::vector<double> extract_list(std::string str)
+{
     // replace '[' with a space and ']' with a space
     std::replace(str.begin(), str.end(), '[', ' ');
     std::replace(str.begin(), str.end(), ']', ' ');
-    
+
     std::vector<std::string> str_values;
-    std::vector<double> values;
+    std::vector<double>      values;
 
     // exact the str values into a vector
     str_values = split(str, ",");
-    
+
     // convert the text values into double values
-    for (auto &word : str_values) {
-        values.push_back( atof(word.c_str()) );
+    for (auto& word : str_values)
+    {
+        values.push_back(atof(word.c_str()) );
     } // end for
-    
+
     return values;
-    
 }  // end of extract_list
 
- 
-std::string ltrim(const std::string &s)
+std::string ltrim(const std::string& s)
 {
     size_t start = s.find_first_not_of(WHITESPACE);
     return (start == std::string::npos) ? "" : s.substr(start);
 }
- 
-std::string rtrim(const std::string &s)
+
+std::string rtrim(const std::string& s)
 {
     size_t end = s.find_last_not_of(WHITESPACE);
     return (end == std::string::npos) ? "" : s.substr(0, end + 1);
 }
- 
-std::string trim(const std::string &s) {
+
+std::string trim(const std::string& s)
+{
     return rtrim(ltrim(s));
 }
 
@@ -1175,11 +1103,11 @@ std::string trim(const std::string &s) {
 // This gives the index value of the point or the elem
 // the elem = i + (j)*(num_points_i-1) + (k)*(num_points_i-1)*(num_points_j-1)
 // the point = i + (j)*num_points_i + (k)*num_points_i*num_points_j
-//--------------------------------------------------------
+// --------------------------------------------------------
 //
 // Returns a global id for a given i,j,k
 KOKKOS_FUNCTION
 int get_id(int i, int j, int k, int num_i, int num_j)
 {
-    return i + j*num_i + k*num_i*num_j;
+    return i + j * num_i + k * num_i * num_j;
 }
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/sgh_solve.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/sgh_solve.cpp
index b933e5776..ebae82c2e 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/sgh_solve.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/sgh_solve.cpp
@@ -1,46 +1,44 @@
 // Call cycle loop for the SGH solver
 
-
 #include "state.h"
 #include "mesh.h"
 #include <chrono>
 
-void sgh_solve(CArrayKokkos <material_t> &material,
-               CArrayKokkos <boundary_t> &boundary,
-               mesh_t &mesh,
-               DViewCArrayKokkos <double> &node_coords,
-               DViewCArrayKokkos <double> &node_vel,
-               DViewCArrayKokkos <double> &node_mass,
-               DViewCArrayKokkos <double> &elem_den,
-               DViewCArrayKokkos <double> &elem_pres,
-               DViewCArrayKokkos <double> &elem_stress,
-               DViewCArrayKokkos <double> &elem_sspd,
-               DViewCArrayKokkos <double> &elem_sie,
-               DViewCArrayKokkos <double> &elem_vol,
-               DViewCArrayKokkos <double> &elem_div,
-               DViewCArrayKokkos <double> &elem_mass,
-               DViewCArrayKokkos <size_t> &elem_mat_id,
-               DViewCArrayKokkos <double> &elem_statev,
-               DViewCArrayKokkos <double> &corner_force,
-               DViewCArrayKokkos <double> &corner_mass,
-               double &time_value,
-               const double time_final,
-               const double dt_max,
-               const double dt_min,
-               const double dt_cfl,
-               double &graphics_time,
-               size_t graphics_cyc_ival,
-               double graphics_dt_ival,
-               const size_t cycle_stop,
-               const size_t rk_num_stages,
-               double dt,
-               const double fuzz,
-               const double tiny,
-               const double small,
-               CArray <double> &graphics_times,
-               size_t &graphics_id){
-    
-    
+void sgh_solve(CArrayKokkos<material_t>&  material,
+               CArrayKokkos<boundary_t>&  boundary,
+               mesh_t&                    mesh,
+               DViewCArrayKokkos<double>& node_coords,
+               DViewCArrayKokkos<double>& node_vel,
+               DViewCArrayKokkos<double>& node_mass,
+               DViewCArrayKokkos<double>& elem_den,
+               DViewCArrayKokkos<double>& elem_pres,
+               DViewCArrayKokkos<double>& elem_stress,
+               DViewCArrayKokkos<double>& elem_sspd,
+               DViewCArrayKokkos<double>& elem_sie,
+               DViewCArrayKokkos<double>& elem_vol,
+               DViewCArrayKokkos<double>& elem_div,
+               DViewCArrayKokkos<double>& elem_mass,
+               DViewCArrayKokkos<size_t>& elem_mat_id,
+               DViewCArrayKokkos<double>& elem_statev,
+               DViewCArrayKokkos<double>& corner_force,
+               DViewCArrayKokkos<double>& corner_mass,
+               double&                    time_value,
+               const double               time_final,
+               const double               dt_max,
+               const double               dt_min,
+               const double               dt_cfl,
+               double&                    graphics_time,
+               size_t                     graphics_cyc_ival,
+               double                     graphics_dt_ival,
+               const size_t               cycle_stop,
+               const size_t               rk_num_stages,
+               double                     dt,
+               const double               fuzz,
+               const double               tiny,
+               const double               small,
+               CArray<double>&            graphics_times,
+               size_t&                    graphics_id)
+{
     printf("Writing outputs to file at %f \n", time_value);
     write_outputs(mesh,
                   node_coords,
@@ -57,79 +55,76 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                   graphics_times,
                   graphics_id,
                   time_value);
-    
-    
-    
-    CArrayKokkos <double> node_extensive_mass(mesh.num_nodes);
-    
+
+    CArrayKokkos<double> node_extensive_mass(mesh.num_nodes);
+
     // extensive energy tallies over the entire mesh
     double IE_t0 = 0.0;
     double KE_t0 = 0.0;
     double TE_t0 = 0.0;
-    
+
     double IE_sum = 0.0;
     double KE_sum = 0.0;
-    
+
     double IE_loc_sum = 0.0;
     double KE_loc_sum = 0.0;
-    
+
     // extensive IE
     REDUCE_SUM(elem_gid, 0, mesh.num_elems, IE_loc_sum, {
-        
-        IE_loc_sum += elem_mass(elem_gid)*elem_sie(1,elem_gid);
-        
+        IE_loc_sum += elem_mass(elem_gid) * elem_sie(1, elem_gid);
     }, IE_sum);
     IE_t0 = IE_sum;
-    
+
     // extensive KE
     REDUCE_SUM(node_gid, 0, mesh.num_nodes, KE_loc_sum, {
-        
         double ke = 0;
-        for (size_t dim=0; dim<mesh.num_dims; dim++){
-            ke += node_vel(1,node_gid,dim)*node_vel(1,node_gid,dim); // 1/2 at end
+        for (size_t dim = 0; dim < mesh.num_dims; dim++)
+        {
+            ke += node_vel(1, node_gid, dim) * node_vel(1, node_gid, dim); // 1/2 at end
         } // end for
-        
-        if(mesh.num_dims==2){
-            KE_loc_sum += node_mass(node_gid)*node_coords(1,node_gid,1)*ke;
+
+        if (mesh.num_dims == 2)
+        {
+            KE_loc_sum += node_mass(node_gid) * node_coords(1, node_gid, 1) * ke;
         }
-        else{
-            KE_loc_sum += node_mass(node_gid)*ke;
+        else
+        {
+            KE_loc_sum += node_mass(node_gid) * ke;
         }
-        
     }, KE_sum);
     Kokkos::fence();
-    KE_t0 = 0.5*KE_sum;
-    
+    KE_t0 = 0.5 * KE_sum;
+
     // extensive TE
     TE_t0 = IE_t0 + KE_t0;
-    
-    
+
     // save the nodal mass
     FOR_ALL(node_gid, 0, mesh.num_nodes, {
-        
         double radius = 1.0;
-        if(mesh.num_dims == 2){
-            radius = node_coords(1,node_gid,1);
+        if (mesh.num_dims == 2)
+        {
+            radius = node_coords(1, node_gid, 1);
         }
-        node_extensive_mass(node_gid) = node_mass(node_gid)*radius;
-        
+        node_extensive_mass(node_gid) = node_mass(node_gid) * radius;
     }); // end parallel for
-    
-    
+
     // a flag to exit the calculation
-    size_t stop_calc=0;
-    
+    size_t stop_calc = 0;
+
     auto time_1 = std::chrono::high_resolution_clock::now();
-    
-	// loop over the max number of time integration cycles
-	for (size_t cycle = 0; cycle < cycle_stop; cycle++) {
 
-	    // stop calculation if flag
-	    if (stop_calc == 1) break;
-        
+    // loop over the max number of time integration cycles
+    for (size_t cycle = 0; cycle < cycle_stop; cycle++)
+    {
+        // stop calculation if flag
+        if (stop_calc == 1)
+        {
+            break;
+        }
 
-	    // get the step
-        if(mesh.num_dims==2){
+        // get the step
+        if (mesh.num_dims == 2)
+        {
             get_timestep2D(mesh,
                            node_coords,
                            node_vel,
@@ -144,7 +139,8 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                            dt,
                            fuzz);
         }
-        else {
+        else
+        {
             get_timestep(mesh,
                          node_coords,
                          node_vel,
@@ -159,16 +155,17 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                          dt,
                          fuzz);
         } // end if 2D
-        
-        
-        if (cycle==0){
+
+        if (cycle == 0)
+        {
             printf("cycle = %lu, time = %f, time step = %f \n", cycle, time_value, dt);
         }
         // print time step every 10 cycles
-        else if (cycle%20==0){
+        else if (cycle % 20 == 0)
+        {
             printf("cycle = %lu, time = %f, time step = %f \n", cycle, time_value, dt);
         } // end if
-        
+
         // ---------------------------------------------------------------------
         // build mesh colors for load balancing
         // ---------------------------------------------------------------------
@@ -180,38 +177,38 @@ void sgh_solve(CArrayKokkos <material_t> &material,
 
         // calculate mesh coloring on cycle 0 and every 1000 cycles
         if (cycle%1000==1){
-            
+
             // strain rate threshold to place elem_gid's in bin 0 or 1
             double threshold = 0.01;
-            
-            
+
+
             // 2D-RZ or 3D Cartesian coordinates
             if (mesh.num_dims == 2){
                 // RZ coordinates
                 FOR_ALL(elem_gid, 0, mesh.num_elems, {
-                    
+
                     const size_t num_nodes_in_elem = 4;
 
                     double area_normal_array[8];
                     double vel_grad_array[9];
                     double D_array[9];
-                    
+
                     ViewCArrayKokkos<double> area_normal(area_normal_array, num_nodes_in_elem, 2);
                     ViewCArrayKokkos<double> vel_grad(vel_grad_array,3,3);
                     ViewCArrayKokkos<double> D(D_array,3,3);
-                    
+
                     double vol = elem_vol(elem_gid);
-                    
+
                     ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid,0),num_nodes_in_elem);
-                    
+
                     get_bmatrix2D(area_normal,
                                   elem_gid,
                                   node_coords,
                                   elem_node_gids);
-                    
+
                     // facial area of the element
                     double elem_area = get_area_quad(elem_gid, node_coords, elem_node_gids);
-                    
+
                     // --- Calculate the velocity gradient ---
                     get_velgrad2D(vel_grad,
                                   elem_node_gids,
@@ -220,73 +217,73 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                                   elem_vol(elem_gid),
                                   elem_area,
                                   elem_gid);
-                    
+
                     for(size_t j = 0 ; j < 3 ; j++){
                         for(size_t k= 0; k < 3; k++){
                             D(j,k) = 0.5 * ( vel_grad(j,k) + vel_grad(k,j) );
                         } // end for
                     } // end for
-                    
-                    
+
+
                     // largest absolute value
                     max_eign_val(elem_gid) = max_Eigen3D(D);
 
                 }); // end parallel for
-                
+
             }
             else {
                 // 3D Cartesian coordinates
                 FOR_ALL(elem_gid, 0, mesh.num_elems, {
-                    
+
                     const size_t num_nodes_in_elem = 8;
 
                     double area_normal_array[24];
                     double vel_grad_array[9];
                     double D_array[9];
-                    
+
                     ViewCArrayKokkos<double> area_normal(area_normal_array, num_nodes_in_elem, 3);
                     ViewCArrayKokkos<double> vel_grad(vel_grad_array, 3, 3);
                     ViewCArrayKokkos<double> D(D_array,3,3);
-                    
+
                     double vol = elem_vol(elem_gid);
-                    
+
                     ViewCArrayKokkos<size_t> elem_node_gids(&mesh.nodes_in_elem(elem_gid,0), num_nodes_in_elem);
-                    
+
                     get_bmatrix(area_normal,
                                 elem_gid,
                                 node_coords,
                                 elem_node_gids);
-                    
+
                     get_velgrad(vel_grad,
                                 elem_node_gids,
                                 node_vel,
                                 area_normal,
                                 vol,
                                 elem_gid);
-                    
+
                   for(size_t j = 0 ; j < 3 ; j++){
                        for(size_t k= 0; k < 3; k++){
                             D(j,k) = 0.5 * ( vel_grad(j,k) + vel_grad(k,j) );
                         } // end for
                    } // end for
-                    
+
                     // largest absolute value
                     max_eign_val(elem_gid) = max_Eigen3D(D);
-                
-                    
+
+
                 }); // end parallel for
-                
+
             } //end if on dimensions
-            
-            
+
+
             // build coloring array
             RUN ({
-                
+
                 size_t count1 = 0;
                 size_t count2 = 0;
 
                 for( size_t elem_gid = 0; elem_gid < mesh.num_elems; elem_gid++ ){
-                        
+
                     if (max_eign_val(elem_gid) <= threshold){
                         // color(0) has the lower strain rate elements
                         elems_in_color(0, count1) = elem_gid;
@@ -297,43 +294,43 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                         elems_in_color(1, count2) = elem_gid;
                         count2 += 1;
                     } // end if
-                    
+
                 } // end serial for loop
-                
+
                 num_elems_in_color(0) = count1;
                 num_elems_in_color(1) = count2;
-                
+
             });  // end run serial on device
             Kokkos::fence();
             num_elems_in_color.update_host();
-            
-            
+
+
             // Using the mesh coloring as following
-            
-             
+
+
             for(size_t color = 0; color < 2; color++ ){
-                
+
                 // loop over the elems in each coloring
 
                 FOR_ALL(elem_lid, 0, num_elems_in_color.host(color), {
                     size_t elem_gid = elems_in_color(color, elem_lid);
-                    
+
                     // coding goes here
-                    
+
                 });
-                
+
             } // end for over colors
-             
-           
-            
+
+
+
         } // end if on cycle for making the mesh coloring
-        
+
          */
-        
+
         // ---------------------------------------------------------------------
         //  integrate the solution forward to t(n+1) via Runge Kutta (RK) method
         // ---------------------------------------------------------------------
-	     
+
         // save the values at t_n
         rk_init(node_coords,
                 node_vel,
@@ -342,34 +339,34 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                 mesh.num_dims,
                 mesh.num_elems,
                 mesh.num_nodes);
-	    
-        
-        
-        // integrate solution forward in time
-        for (size_t rk_stage = 0; rk_stage < rk_num_stages; rk_stage++){
 
-            
+        // integrate solution forward in time
+        for (size_t rk_stage = 0; rk_stage < rk_num_stages; rk_stage++)
+        {
             // ---- RK coefficient ----
-            double rk_alpha = 1.0/((double)rk_num_stages - (double)rk_stage);
-            
+            double rk_alpha = 1.0 / ((double)rk_num_stages - (double)rk_stage);
+
             // ---- Calculate velocity diveregence for the element ----
-            if(mesh.num_dims==2){
+            if (mesh.num_dims == 2)
+            {
                 get_divergence2D(elem_div,
                                  mesh,
                                  node_coords,
                                  node_vel,
                                  elem_vol);
             }
-            else {
+            else
+            {
                 get_divergence(elem_div,
                                mesh,
                                node_coords,
                                node_vel,
                                elem_vol);
             } // end if 2D
-            
+
             // ---- calculate the forces on the vertices and evolve stress (hypo model) ----
-            if(mesh.num_dims==2){
+            if (mesh.num_dims == 2)
+            {
                 get_force_sgh2D(material,
                                 mesh,
                                 node_coords,
@@ -389,7 +386,8 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                                 dt,
                                 rk_alpha);
             }
-            else {
+            else
+            {
                 get_force_sgh(material,
                               mesh,
                               node_coords,
@@ -409,7 +407,6 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                               dt,
                               rk_alpha);
             }
-            
 
             // ---- Update nodal velocities ---- //
             update_velocity_sgh(rk_alpha,
@@ -418,12 +415,10 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                                 node_vel,
                                 node_mass,
                                 corner_force);
-            
+
             // ---- apply force boundary conditions to the boundary patches----
             boundary_velocity(mesh, boundary, node_vel, time_value);
-            
-            
-            
+
             // ---- Update specific internal energy in the elements ----
             update_energy_sgh(rk_alpha,
                               dt,
@@ -433,8 +428,7 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                               elem_sie,
                               elem_mass,
                               corner_force);
-            
-            
+
             // ---- Update nodal positions ----
             update_position_sgh(rk_alpha,
                                 dt,
@@ -442,15 +436,13 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                                 mesh.num_nodes,
                                 node_coords,
                                 node_vel);
-            
-            
+
             // ---- Calculate cell volume for next time step ----
             get_vol(elem_vol, node_coords, mesh);
-            
-            
-            
+
             // ---- Calculate elem state (den, pres, sound speed, stress) for next time step ----
-            if(mesh.num_dims==2){
+            if (mesh.num_dims == 2)
+            {
                 update_state2D(material,
                                mesh,
                                node_coords,
@@ -467,7 +459,8 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                                dt,
                                rk_alpha);
             }
-            else{
+            else
+            {
                 update_state(material,
                              mesh,
                              node_coords,
@@ -490,23 +483,20 @@ void sgh_solve(CArrayKokkos <material_t> &material,
             //    2) hypo-elastic strength models are called in get_force
             //    3) strength models must be added by the user in user_mat.cpp
 
-            
             // calculate the new corner masses if 2D
-            if(mesh.num_dims==2){
-                
+            if (mesh.num_dims == 2)
+            {
                 // calculate the nodal areal mass
                 FOR_ALL(node_gid, 0, mesh.num_nodes, {
-                    
                     node_mass(node_gid) = 0.0;
-                    
-                    if (node_coords(1,node_gid,1) > tiny){
-                        node_mass(node_gid) = node_extensive_mass(node_gid)/node_coords(1,node_gid,1);
-                    }
 
+                    if (node_coords(1, node_gid, 1) > tiny)
+                    {
+                        node_mass(node_gid) = node_extensive_mass(node_gid) / node_coords(1, node_gid, 1);
+                    }
                 }); // end parallel for over node_gid
                 Kokkos::fence();
-                
-                
+
                 // -----------------------------------------------
                 // Calcualte the areal mass for nodes on the axis
                 // -----------------------------------------------
@@ -519,94 +509,88 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                 // 0---1
                 /*
                 FOR_ALL(elem_gid, 0, mesh.num_elems, {
-                    
+
                     // loop over the corners of the element and calculate the mass
                     for (size_t node_lid=0; node_lid<4; node_lid++){
-                        
+
                         size_t node_gid = mesh.nodes_in_elem(elem_gid, node_lid);
                         size_t node_minus_gid;
                         size_t node_plus_gid;
-                        
-                        
+
+
                         if (node_coords(1,node_gid,1) < tiny){
                             // node is on the axis
-                            
+
                             // minus node
                             if (node_lid==0){
                                 node_minus_gid = mesh.nodes_in_elem(elem_gid, 3);
                             } else {
                                 node_minus_gid = mesh.nodes_in_elem(elem_gid, node_lid-1);
                             }
-                            
+
                             // plus node
                             if (node_lid==3){
                                 node_plus_gid = mesh.nodes_in_elem(elem_gid, 0);
                             } else {
                                 node_plus_gid = mesh.nodes_in_elem(elem_gid, node_lid+1);
                             }
-                            
+
                             node_mass(node_gid) = fmax(node_mass(node_plus_gid), node_mass(node_minus_gid))/2.0;
-                            
+
                         } // end if
-                        
+
                     } // end for over corners
-                    
+
                 }); // end parallel for over elem_gid
                 Kokkos::fence();
                  */
-                
+
                 FOR_ALL(node_bdy_gid, 0, mesh.num_bdy_nodes, {
-                    
                     size_t node_gid = mesh.bdy_nodes(node_bdy_gid);
-                    
-                    if (node_coords(1,node_gid,1) < tiny){
+
+                    if (node_coords(1, node_gid, 1) < tiny)
+                    {
                         // node is on the axis
-                        
-                        for(size_t node_lid=0; node_lid<mesh.num_nodes_in_node(node_gid); node_lid++){
-                            
+
+                        for (size_t node_lid = 0; node_lid < mesh.num_nodes_in_node(node_gid); node_lid++)
+                        {
                             size_t node_neighbor_gid = mesh.nodes_in_node(node_gid, node_lid);
-                            
+
                             // if the node is off the axis, use it's areal mass on the boundary
-                            if (node_coords(1,node_neighbor_gid,1) > tiny){
-                                node_mass(node_gid) = fmax(node_mass(node_gid), node_mass(node_neighbor_gid)/2.0);
+                            if (node_coords(1, node_neighbor_gid, 1) > tiny)
+                            {
+                                node_mass(node_gid) = fmax(node_mass(node_gid), node_mass(node_neighbor_gid) / 2.0);
                             }
                         } // end for over neighboring nodes
-                        
                     } // end if
-                    
                 }); // end parallel for over elem_gid
-                
-                
-            
             } // end of if 2D-RZ
-            
-            
-            
         } // end of RK loop
-	        
 
-	    
+        // increment the time
+        time_value += dt;
 
-	    // increment the time
-	    time_value+=dt;
-    	
-        
         size_t write = 0;
-        if ((cycle+1)%graphics_cyc_ival == 0 && cycle>0){
+        if ((cycle + 1) % graphics_cyc_ival == 0 && cycle > 0)
+        {
             write = 1;
         }
-        else if (cycle == cycle_stop) {
+        else if (cycle == cycle_stop)
+        {
             write = 1;
         }
-        else if (time_value >= time_final){
+        else if (time_value >= time_final)
+        {
             write = 1;
         }
-        else if (time_value >= graphics_time){
+        else if (time_value >= graphics_time)
+        {
             write = 1;
         }
-            
+
         // write outputs
-        if (write == 1){
+        if (write == 1)
+        {
             printf("Writing outputs to file at %f \n", graphics_time);
             write_outputs(mesh,
                           node_coords,
@@ -623,162 +607,163 @@ void sgh_solve(CArrayKokkos <material_t> &material,
                           graphics_times,
                           graphics_id,
                           time_value);
-            
+
             graphics_time = time_value + graphics_dt_ival;
         } // end if
-        
-        
+
         // end of calculation
-        if (time_value>=time_final) break;
-        
+        if (time_value >= time_final)
+        {
+            break;
+        }
     } // end for cycle loop
-    
-    
-    auto time_2 = std::chrono::high_resolution_clock::now();
+
+    auto time_2    = std::chrono::high_resolution_clock::now();
     auto calc_time = std::chrono::duration_cast
-                           <std::chrono::nanoseconds>(time_2 - time_1).count();
-    
+                     <std::chrono::nanoseconds>(time_2 - time_1).count();
+
     printf("\nCalculation time in seconds: %f \n", calc_time * 1e-9);
-    
-    
+
     // ---- Calculate energy tallies ----
     double IE_tend = 0.0;
     double KE_tend = 0.0;
     double TE_tend = 0.0;
-    
+
     IE_loc_sum = 0.0;
     KE_loc_sum = 0.0;
-    IE_sum = 0.0;
-    KE_sum = 0.0;
-    
+    IE_sum     = 0.0;
+    KE_sum     = 0.0;
+
     // extensive IE
     REDUCE_SUM(elem_gid, 0, mesh.num_elems, IE_loc_sum, {
-        
-        IE_loc_sum += elem_mass(elem_gid)*elem_sie(1,elem_gid);
-        
+        IE_loc_sum += elem_mass(elem_gid) * elem_sie(1, elem_gid);
     }, IE_sum);
     IE_tend = IE_sum;
-    
+
     // extensive KE
     REDUCE_SUM(node_gid, 0, mesh.num_nodes, KE_loc_sum, {
-        
         double ke = 0;
-        for (size_t dim=0; dim<mesh.num_dims; dim++){
-            ke += node_vel(1,node_gid,dim)*node_vel(1,node_gid,dim); // 1/2 at end
+        for (size_t dim = 0; dim < mesh.num_dims; dim++)
+        {
+            ke += node_vel(1, node_gid, dim) * node_vel(1, node_gid, dim); // 1/2 at end
         } // end for
-        
-        if(mesh.num_dims==2){
-            KE_loc_sum += node_mass(node_gid)*node_coords(1,node_gid,1)*ke;
+
+        if (mesh.num_dims == 2)
+        {
+            KE_loc_sum += node_mass(node_gid) * node_coords(1, node_gid, 1) * ke;
         }
-        else{
-            KE_loc_sum += node_mass(node_gid)*ke;
+        else
+        {
+            KE_loc_sum += node_mass(node_gid) * ke;
         }
-            
-        
     }, KE_sum);
     Kokkos::fence();
-    KE_tend = 0.5*KE_sum;
-    
-    
+    KE_tend = 0.5 * KE_sum;
+
     // extensive TE
     TE_tend = IE_tend + KE_tend;
-    
+
     printf("Time=0:   KE = %f, IE = %f, TE = %f \n", KE_t0, IE_t0, TE_t0);
     printf("Time=End: KE = %f, IE = %f, TE = %f \n", KE_tend, IE_tend, TE_tend);
     printf("total energy conservation error = %e \n\n", TE_tend - TE_t0);
-    
+
     return;
-    
 } // end of SGH solve
 
-
-
 KOKKOS_FUNCTION
-double max_Eigen3D(const ViewCArrayKokkos<double> tensor) {
+double max_Eigen3D(const ViewCArrayKokkos<double> tensor)
+{
     // Compute largest eigenvalue of a 3x3 tensor
     // Algorithm only works if tensor is symmetric
-    double pi = 3.141592653589793;
-    size_t dim  = tensor.dims(0);
+    double pi  = 3.141592653589793;
+    size_t dim = tensor.dims(0);
     double trace, det;
-    trace = tensor(0,0) + tensor(1,1) + tensor(2,2);
-    det = tensor(0,0) * (tensor(1,1)*tensor(2,2) - tensor(1,2)*tensor(2,1));
-    det -= tensor(0,1) * (tensor(1,0)*tensor(2,2) - tensor(1,2)*tensor(2,0));
-    det += tensor(0,2) * (tensor(1,0)*tensor(2,1) - tensor(1,1)*tensor(2,0));
+    trace  = tensor(0, 0) + tensor(1, 1) + tensor(2, 2);
+    det    = tensor(0, 0) * (tensor(1, 1) * tensor(2, 2) - tensor(1, 2) * tensor(2, 1));
+    det   -= tensor(0, 1) * (tensor(1, 0) * tensor(2, 2) - tensor(1, 2) * tensor(2, 0));
+    det   += tensor(0, 2) * (tensor(1, 0) * tensor(2, 1) - tensor(1, 1) * tensor(2, 0));
     trace /= 3.; // easier for computation
-    double p2 = pow((tensor(0,0) - trace),2) + pow((tensor(1,1) - trace),2) +
-                pow((tensor(2,2) - trace),2);
-    p2 += 2. * (pow(tensor(0,1),2) + pow(tensor(0,2),2) + pow(tensor(1,2), 2));
-    
-    double p = sqrt(p2/6.);
-    
+    double p2 = pow((tensor(0, 0) - trace), 2) + pow((tensor(1, 1) - trace), 2) +
+                pow((tensor(2, 2) - trace), 2);
+    p2 += 2. * (pow(tensor(0, 1), 2) + pow(tensor(0, 2), 2) + pow(tensor(1, 2), 2));
+
+    double p = sqrt(p2 / 6.);
+
     // check for nan
-    if( det != det){
+    if (det != det)
+    {
         return 0;
     }
-    
-    if(det == 0){
+
+    if (det == 0)
+    {
         return 0;
     }
-    
-    double B_array[9];
-    ViewCArrayKokkos<double> B(B_array,3,3);
-    
-    for(int i = 0; i < 3; i++){
-        for(int j = 0; j < 3; j++){
-            
-            if(i == j){
-                B(i,i) = (1/p) * (tensor(i,i) - trace);
+
+    double                   B_array[9];
+    ViewCArrayKokkos<double> B(B_array, 3, 3);
+
+    for (int i = 0; i < 3; i++)
+    {
+        for (int j = 0; j < 3; j++)
+        {
+            if (i == j)
+            {
+                B(i, i) = (1 / p) * (tensor(i, i) - trace);
             }
-            else {
-                B(i,j) = (1/p) * tensor(i,j);
+            else
+            {
+                B(i, j) = (1 / p) * tensor(i, j);
             } // end if
-            
         } // end for j
     } // end for i
-    
+
     double r, phi;
-    r =  B(0,0) * (B(1,1)*B(2,2) - B(1,2)*B(2,1));
-    r -= B(0,1) * (B(1,0)*B(2,2) - B(1,2)*B(2,0));
-    r += B(0,2) * (B(1,0)*B(2,1) - B(1,1)*B(2,0));
+    r  =  B(0, 0) * (B(1, 1) * B(2, 2) - B(1, 2) * B(2, 1));
+    r -= B(0, 1) * (B(1, 0) * B(2, 2) - B(1, 2) * B(2, 0));
+    r += B(0, 2) * (B(1, 0) * B(2, 1) - B(1, 1) * B(2, 0));
     r /= 2;
     // first two cases are to handle numerical difficulties.
     // In exact math -1 <= r <= 1
-    if (r <= -1){
-        phi = pi/3;
+    if (r <= -1)
+    {
+        phi = pi / 3;
     }
-    else if (r >= 1){
+    else if (r >= 1)
+    {
         phi = 0;
     }
-    else {
+    else
+    {
         phi = acos(r) / 3.;
     } // end if
-    
+
     double eig1, eig2, eig3;
     eig1 = trace + 2 * p * cos(phi);
-    eig2 = trace + 2 * p * cos(phi + (2*pi/3));
-    eig3 = 3*trace - (eig1 + eig2);
-    
-    double abs_max_val = fmax( fmax(fabs(eig1), fabs(eig2)), fabs(eig3));
+    eig2 = trace + 2 * p * cos(phi + (2 * pi / 3));
+    eig3 = 3 * trace - (eig1 + eig2);
+
+    double abs_max_val = fmax(fmax(fabs(eig1), fabs(eig2)), fabs(eig3));
 
     return abs_max_val;
 }
 
-
 KOKKOS_FUNCTION
-double max_Eigen2D(const ViewCArrayKokkos<double> tensor) {
+double max_Eigen2D(const ViewCArrayKokkos<double> tensor)
+{
     // Compute largest eigenvalue of a 2x2 tensor
     // Algorithm only works if tensor is symmetric
-    size_t dim  = tensor.dims(0);
+    size_t dim = tensor.dims(0);
     double trace, det;
 
-    trace = tensor(0,0) + tensor(1,1);
-    det = tensor(0,0)*tensor(1,1) - tensor(0,1)*tensor(1,0);
-    
+    trace = tensor(0, 0) + tensor(1, 1);
+    det   = tensor(0, 0) * tensor(1, 1) - tensor(0, 1) * tensor(1, 0);
+
     double eig1, eig2;
-    
-    eig1 = (trace/2.) + sqrt(0.25*trace*trace - det);
-    eig2 = (trace/2.) - sqrt(0.25*trace*trace - det);
-    
+
+    eig1 = (trace / 2.) + sqrt(0.25 * trace * trace - det);
+    eig2 = (trace / 2.) - sqrt(0.25 * trace * trace - det);
+
     double abs_max_val = fmax(fabs(eig1), fabs(eig2));
     return abs_max_val;
 } // end 2D max eignen value
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/state.h b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/state.h
index d4afba8a6..7977f4f3b 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/state.h
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/state.h
@@ -1,183 +1,166 @@
 #ifndef STATE_H
-#define STATE_H  
-
+#define STATE_H
 
 #include "matar.h"
 
 using namespace mtr;
 
 // node_state
-struct node_t {
-
+struct node_t
+{
     // Position
-    CArray <double> coords;
+    CArray<double> coords;
 
     // velocity
-    CArray <double> vel;
+    CArray<double> vel;
 
     // mass at nodes
-    CArray <double> mass;
+    CArray<double> mass;
 
-    
     // initialization method (num_rk_storage_bins, num_nodes, num_dims)
     void initialize(size_t num_rk, size_t num_nodes, size_t num_dims)
     {
-        this->coords = CArray <double> (num_rk, num_nodes, num_dims);
-        this->vel    = CArray <double> (num_rk, num_nodes, num_dims);
-        this->mass   = CArray <double> (num_nodes);
+        this->coords = CArray<double>(num_rk, num_nodes, num_dims);
+        this->vel    = CArray<double>(num_rk, num_nodes, num_dims);
+        this->mass   = CArray<double>(num_nodes);
     }; // end method
-
 }; // end node_t
 
-
 // elem_state
-struct elem_t {
-
+struct elem_t
+{
     // den
-    CArray <double> den;
-    
+    CArray<double> den;
+
     // pres
-    CArray <double> pres;
-    
+    CArray<double> pres;
+
     // stress
-    CArray <double> stress;
-    
+    CArray<double> stress;
+
     // sspd
-    CArray <double> sspd;
-    
+    CArray<double> sspd;
+
     // sie
-    CArray <double> sie;
+    CArray<double> sie;
 
     // vol
-    CArray <double> vol;
-    
+    CArray<double> vol;
+
     // divergence of velocity
-    CArray <double> div;
-    
+    CArray<double> div;
+
     // mass of elem
-    CArray <double> mass;
-    
+    CArray<double> mass;
+
     // mat ids
-    CArray <size_t> mat_id;
-    
+    CArray<size_t> mat_id;
+
     // state variables
-    CArray <double> statev;
-    
+    CArray<double> statev;
+
     // initialization method (num_rk_storage_bins, num_cells, num_dims)
     void initialize(size_t num_rk, size_t num_elems, size_t num_dims)
     {
-        this->den    = CArray <double> (num_elems);
-        this->pres   = CArray <double> (num_elems);
-        this->stress = CArray <double> (num_rk, num_elems, num_dims, num_dims);
-        this->sspd   = CArray <double> (num_elems);
-        this->sie    = CArray <double> (num_rk, num_elems);
-        this->vol    = CArray <double> (num_elems);
-        this->div    = CArray <double> (num_elems);
-        this->mass   = CArray <double> (num_elems);
-        this->mat_id = CArray <size_t> (num_elems);
+        this->den    = CArray<double>(num_elems);
+        this->pres   = CArray<double>(num_elems);
+        this->stress = CArray<double>(num_rk, num_elems, num_dims, num_dims);
+        this->sspd   = CArray<double>(num_elems);
+        this->sie    = CArray<double>(num_rk, num_elems);
+        this->vol    = CArray<double>(num_elems);
+        this->div    = CArray<double>(num_elems);
+        this->mass   = CArray<double>(num_elems);
+        this->mat_id = CArray<size_t>(num_elems);
     }; // end method
-
 }; // end elem_t
 
-
 // corner_state
-struct corner_t {
-
+struct corner_t
+{
     // force
-    CArray <double> force;
-    
+    CArray<double> force;
+
     // mass of corner
-    CArray <double> mass;
+    CArray<double> mass;
 
-    
     // initialization method (num_corners, num_dims)
     void initialize(size_t num_corners, size_t num_dims)
     {
-        this->force = CArray <double> (num_corners, num_dims);
-        this->mass  = CArray <double> (num_corners);
+        this->force = CArray<double>(num_corners, num_dims);
+        this->mass  = CArray<double>(num_corners);
     }; // end method
-
 }; // end corner_t
 
-
 namespace model
 {
-
-    // strength model types
-    enum strength_tag
-    {
-        none = 0,
-        hypo = 1,     // hypoelastic plastic model
-        hyper = 2,    // hyperelastic plastic model
-    };
-
+// strength model types
+enum strength_tag
+{
+    none = 0,
+    hypo = 1,         // hypoelastic plastic model
+    hyper = 2,        // hyperelastic plastic model
+};
 } // end of namespace
 
 namespace model_init
 {
-
-    // strength model setup
-    enum strength_setup_tag
-    {
-        input = 0,
-        user_init = 1,
-    };
-
+// strength model setup
+enum strength_setup_tag
+{
+    input = 0,
+    user_init = 1,
+};
 } // end of namespace
 
-
 // material model parameters
-struct material_t {
-
+struct material_t
+{
     // statev(0) = gamma
     // statev(1) = minimum sound speed
     // statev(2) = specific heat c_v
     // statev(3) = ref temperature
     // statev(4) = ref density
     // statev(5) = ref specific internal energy
-    
+
     // eos fcn pointer
-    void (*eos_model) (const DViewCArrayKokkos <double> &elem_pres,
-                       const DViewCArrayKokkos <double> &elem_stress,
-                       const size_t elem_gid,
-                       const size_t mat_id,
-                       const DViewCArrayKokkos <double> &elem_state_vars,
-                       const DViewCArrayKokkos <double> &elem_sspd,
-                       const double den,
-                       const double sie) = NULL;
-    
+    void (*eos_model) (const DViewCArrayKokkos<double>& elem_pres,
+                       const DViewCArrayKokkos<double>& elem_stress,
+                       const size_t                     elem_gid,
+                       const size_t                     mat_id,
+                       const DViewCArrayKokkos<double>& elem_state_vars,
+                       const DViewCArrayKokkos<double>& elem_sspd,
+                       const double                     den,
+                       const double                     sie) = NULL;
+
     // strength fcn pointer
-    void (*strength_model) (const DViewCArrayKokkos <double> &elem_pres,
-                            const DViewCArrayKokkos <double> &elem_stress,
-                            const size_t elem_gid,
-                            const size_t mat_id,
-                            const DViewCArrayKokkos <double> &elem_state_vars,
-                            const DViewCArrayKokkos <double> &elem_sspd,
-                            const double den,
-                            const double sie,
-                            const ViewCArrayKokkos <double> &vel_grad,
-                            const ViewCArrayKokkos <size_t>  &elem_node_gids,
-                            const DViewCArrayKokkos <double> &node_coords,
-                            const DViewCArrayKokkos <double> &node_vel,
-                            const double vol,
-                            const double dt,
-                            const double alpha) = NULL;
-    
+    void (*strength_model) (const DViewCArrayKokkos<double>& elem_pres,
+                            const DViewCArrayKokkos<double>& elem_stress,
+                            const size_t                     elem_gid,
+                            const size_t                     mat_id,
+                            const DViewCArrayKokkos<double>& elem_state_vars,
+                            const DViewCArrayKokkos<double>& elem_sspd,
+                            const double                     den,
+                            const double                     sie,
+                            const ViewCArrayKokkos<double>&  vel_grad,
+                            const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                            const DViewCArrayKokkos<double>& node_coords,
+                            const DViewCArrayKokkos<double>& node_vel,
+                            const double                     vol,
+                            const double                     dt,
+                            const double                     alpha) = NULL;
+
     // hypo or hyper elastic plastic model
     model::strength_tag strength_type;
-    
+
     // setup the strength model via the input file for via a user_setup
-    model_init::strength_setup_tag strength_setup=model_init::input;
-    
+    model_init::strength_setup_tag strength_setup = model_init::input;
+
     size_t num_state_vars;
-    
+
     double q1;    // acoustic coefficient in Riemann solver for compresion
     double q1ex;  // acoustic coefficient in Riemann solver for expansion
     double q2;    // linear coefficient in Riemann solver for compression
     double q2ex;  // linear coefficient in Riemann solver for expansion
 }; // end material_t
 
-
-
-
-#endif 
+#endif
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/time_integration.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/time_integration.cpp
index 0134fe1c8..9df7e00d1 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/time_integration.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/time_integration.cpp
@@ -2,91 +2,82 @@
 #include "mesh.h"
 #include "state.h"
 
-
 // -----------------------------------------------------------------------------
 // This function saves the variables at rk_stage = 0, which is t_n
-//------------------------------------------------------------------------------
-void rk_init(DViewCArrayKokkos <double> &node_coords,
-             DViewCArrayKokkos <double> &node_vel,
-             DViewCArrayKokkos <double> &elem_sie,
-             DViewCArrayKokkos <double> &elem_stress,
-             const size_t num_dims,
-             const size_t num_elems,
-             const size_t num_nodes){
-
+// ------------------------------------------------------------------------------
+void rk_init(DViewCArrayKokkos<double>& node_coords,
+             DViewCArrayKokkos<double>& node_vel,
+             DViewCArrayKokkos<double>& elem_sie,
+             DViewCArrayKokkos<double>& elem_stress,
+             const size_t               num_dims,
+             const size_t               num_elems,
+             const size_t               num_nodes)
+{
     // save elem quantities
     FOR_ALL(elem_gid, 0, num_elems, {
-
         // stress is always 3D even with 2D-RZ
-        for(size_t i=0; i<3; i++){
-            for(size_t j=0; j<3; j++){
-                elem_stress(0,elem_gid,i,j) = elem_stress(1,elem_gid,i,j);
+        for (size_t i = 0; i < 3; i++)
+        {
+            for (size_t j = 0; j < 3; j++)
+            {
+                elem_stress(0, elem_gid, i, j) = elem_stress(1, elem_gid, i, j);
             }
         }  // end for
 
-        elem_sie(0,elem_gid) = elem_sie(1,elem_gid);
-
+        elem_sie(0, elem_gid) = elem_sie(1, elem_gid);
     }); // end parallel for
-    
-    
+
     // save nodal quantities
     FOR_ALL(node_gid, 0, num_nodes, {
-        
-        for(size_t i=0; i<num_dims; i++){
-            node_coords(0,node_gid,i) = node_coords(1,node_gid,i);
-            node_vel(0,node_gid,i) = node_vel(1,node_gid,i);
+        for (size_t i = 0; i < num_dims; i++)
+        {
+            node_coords(0, node_gid, i) = node_coords(1, node_gid, i);
+            node_vel(0, node_gid, i)    = node_vel(1, node_gid, i);
         }
     }); // end parallel for
     Kokkos::fence();
-    
+
     return;
-    
 } // end rk_init
 
-
-
-
 // -----------------------------------------------------------------------------
 // This function calculates the time step by finding the shortest distance
 // between any two nodes in the mesh
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 // WARNING WARNING :  Only works for 3D, 8 node elements
-void get_timestep(mesh_t &mesh,
-                  DViewCArrayKokkos <double> &node_coords,
-                  DViewCArrayKokkos <double> &node_vel,
-                  DViewCArrayKokkos <double> &elem_sspd,
-                  DViewCArrayKokkos <double> &elem_vol,
-                  double time_value,
-                  const double graphics_time,
-                  const double time_final,
-                  const double dt_max,
-                  const double dt_min,
-                  const double dt_cfl,
-                  double &dt,
-                  const double fuzz){
-
-    
+void get_timestep(mesh_t&                    mesh,
+                  DViewCArrayKokkos<double>& node_coords,
+                  DViewCArrayKokkos<double>& node_vel,
+                  DViewCArrayKokkos<double>& elem_sspd,
+                  DViewCArrayKokkos<double>& elem_vol,
+                  double                     time_value,
+                  const double               graphics_time,
+                  const double               time_final,
+                  const double               dt_max,
+                  const double               dt_min,
+                  const double               dt_cfl,
+                  double&                    dt,
+                  const double               fuzz)
+{
     // increase dt by 10%, that is the largest dt value
-    dt = dt*1.1;
+    dt = dt * 1.1;
 
     double dt_lcl;
     double min_dt_calc;
     REDUCE_MIN(elem_gid, 0, mesh.num_elems, dt_lcl, {
-        
         double coords0[24];  // element coords
-        ViewCArrayKokkos <double> coords(coords0, 8, 3);
+        ViewCArrayKokkos<double> coords(coords0, 8, 3);
 
-        
         double distance0[28];  // array for holding distances between each node
-        ViewCArrayKokkos <double> dist(distance0, 28);
-        
-        // Getting the coordinates of the element
-        for(size_t node_lid = 0; node_lid < 8; node_lid++){
+        ViewCArrayKokkos<double> dist(distance0, 28);
 
-            for (size_t dim = 0; dim < mesh.num_dims; dim++){
-                coords(node_lid, dim) = node_coords(1,  mesh.nodes_in_elem(elem_gid, node_lid), dim);
+        // Getting the coordinates of the element
+        for (size_t node_lid = 0; node_lid < 8; node_lid++)
+        {
+            for (size_t dim = 0; dim < mesh.num_dims; dim++)
+            {
+                coords(node_lid, dim) = node_coords(1, mesh.nodes_in_elem(elem_gid, node_lid), dim);
             } // end for dim
-            
         } // end for loop over node_lid
 
         // loop conditions needed for distance calculation
@@ -95,154 +86,155 @@ void get_timestep(mesh_t &mesh,
         size_t a;
         size_t b;
         size_t loop = 0;
-        
+
         // Only works for 3D
         // Solving for the magnitude of distance between each node
-        for (size_t i = 0; i < 28; i++){
-            
+        for (size_t i = 0; i < 28; i++)
+        {
             a = countA;
             b = countB;
-            
+
             // returns magnitude of distance between each node, 28 total options
-            dist(i) = fabs(sqrt(( pow((coords(b, 0) - coords(a, 0)), 2.0)
-                                + pow((coords(b, 1) - coords(a, 1)), 2.0)
-                                + pow((coords(b, 2) - coords(a, 2)), 2.0))));
+            dist(i) = fabs(sqrt((pow((coords(b, 0) - coords(a, 0)), 2.0)
+                                 + pow((coords(b, 1) - coords(a, 1)), 2.0)
+                                 + pow((coords(b, 2) - coords(a, 2)), 2.0))));
 
             countB++;
             countA++;
-            
+
             // tricky indexing
-            if (countB > 7) {
+            if (countB > 7)
+            {
                 loop++;
                 countB = 1 + loop;
                 countA = 0;
             }
-            
         } // endo for i
 
-
         double dist_min = dist(0);
-        
-        for(int i = 0; i < 28; ++i){
+
+        for (int i = 0; i < 28; ++i)
+        {
             dist_min = fmin(dist(i), dist_min);
         }
-        
+
         // local dt calc based on CFL
-        double dt_lcl_ = dt_cfl*dist_min/(elem_sspd(elem_gid) + fuzz);
-        
+        double dt_lcl_ = dt_cfl * dist_min / (elem_sspd(elem_gid) + fuzz);
+
         // make dt be in bounds
         dt_lcl_ = fmin(dt_lcl_, dt_max);               // make dt small than dt_max
         dt_lcl_ = fmax(dt_lcl_, dt_min);               // make dt larger than dt_min
-        
-       
-        if (dt_lcl_ < dt_lcl) dt_lcl = dt_lcl_;
-        
+
+        if (dt_lcl_ < dt_lcl)
+        {
+            dt_lcl = dt_lcl_;
+        }
     }, min_dt_calc);  // end parallel reduction
     Kokkos::fence();
-    
+
     // save the min dt
-    if(min_dt_calc < dt) dt = min_dt_calc;
-    
+    if (min_dt_calc < dt)
+    {
+        dt = min_dt_calc;
+    }
+
     // ensure time step hits the graphics time intervals
-    dt = fmin(dt, (graphics_time - time_value)+fuzz);
-    
+    dt = fmin(dt, (graphics_time - time_value) + fuzz);
+
     // make dt be exact for final time
-    dt = fmin(dt, time_final-time_value);
-    
+    dt = fmin(dt, time_final - time_value);
+
     return;
-    
 } // end get_timestep
 
 // -----------------------------------------------------------------------------
 // This function calculates the time step by finding the shortest distance
 // between any two nodes in the mesh
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 // WARNING WARNING :  Only works for 3D, 8 node elements
-void get_timestep2D(mesh_t &mesh,
-                    DViewCArrayKokkos <double> &node_coords,
-                    DViewCArrayKokkos <double> &node_vel,
-                    DViewCArrayKokkos <double> &elem_sspd,
-                    DViewCArrayKokkos <double> &elem_vol,
-                    double time_value,
-                    const double graphics_time,
-                    const double time_final,
-                    const double dt_max,
-                    const double dt_min,
-                    const double dt_cfl,
-                    double &dt,
-                    const double fuzz){
-
-    
+void get_timestep2D(mesh_t&                    mesh,
+                    DViewCArrayKokkos<double>& node_coords,
+                    DViewCArrayKokkos<double>& node_vel,
+                    DViewCArrayKokkos<double>& elem_sspd,
+                    DViewCArrayKokkos<double>& elem_vol,
+                    double                     time_value,
+                    const double               graphics_time,
+                    const double               time_final,
+                    const double               dt_max,
+                    const double               dt_min,
+                    const double               dt_cfl,
+                    double&                    dt,
+                    const double               fuzz)
+{
     // increase dt by 10%, that is the largest dt value
-    dt = dt*1.1;
+    dt = dt * 1.1;
 
     double dt_lcl;
     double min_dt_calc;
     REDUCE_MIN(elem_gid, 0, mesh.num_elems, dt_lcl, {
-        
         double coords0[8];  // element coords
-        ViewCArrayKokkos <double> coords(coords0, 4, 2);
+        ViewCArrayKokkos<double> coords(coords0, 4, 2);
 
-        
         double distance0[6];  // array for holding distances between each node
-        ViewCArrayKokkos <double> dist(distance0, 6);
-        
-        // Getting the coordinates of the nodes of the element
-        for(size_t node_lid = 0; node_lid < 4; node_lid++){
+        ViewCArrayKokkos<double> dist(distance0, 6);
 
-            for (size_t dim = 0; dim < mesh.num_dims; dim++){
-                coords(node_lid, dim) = node_coords(1,  mesh.nodes_in_elem(elem_gid, node_lid), dim);
+        // Getting the coordinates of the nodes of the element
+        for (size_t node_lid = 0; node_lid < 4; node_lid++)
+        {
+            for (size_t dim = 0; dim < mesh.num_dims; dim++)
+            {
+                coords(node_lid, dim) = node_coords(1, mesh.nodes_in_elem(elem_gid, node_lid), dim);
             } // end for dim
-            
         } // end for loop over node_lid
 
-        
         // Only works for 2D
         // Solving for the magnitude of distance between each node
         size_t count = 0;
-        for (size_t i = 0; i < 3; i++){
-            for (size_t j=i+1; j<=3; j++){
-                
+        for (size_t i = 0; i < 3; i++)
+        {
+            for (size_t j = i + 1; j <= 3; j++)
+            {
                 // returns magnitude of distance between each node, 6 total options
                 dist(count) = fabs(
-                                 sqrt( pow((coords(i, 0) - coords(j, 0)), 2.0)
-                                     + pow((coords(i, 1) - coords(j, 1)), 2.0) )
-                                  );
-                count ++;
+                                 sqrt(pow((coords(i, 0) - coords(j, 0)), 2.0)
+                        + pow((coords(i, 1) - coords(j, 1)), 2.0) )
+                    );
+                count++;
             } // end for j
         } // end for i
 
-
         double dist_min = dist(0);
-        
-        for(int i = 0; i < 6; ++i){
+
+        for (int i = 0; i < 6; ++i)
+        {
             dist_min = fmin(dist(i), dist_min);
         }
-        
+
         // local dt calc based on CFL
-        double dt_lcl_ = dt_cfl*dist_min/(elem_sspd(elem_gid) + fuzz);
-        
+        double dt_lcl_ = dt_cfl * dist_min / (elem_sspd(elem_gid) + fuzz);
+
         // make dt be in bounds
         dt_lcl_ = fmin(dt_lcl_, dt_max);    // make dt small than dt_max
         dt_lcl_ = fmax(dt_lcl_, dt_min);    // make dt larger than dt_min
-        
-       
-        if (dt_lcl_ < dt_lcl) dt_lcl = dt_lcl_;
-        
+
+        if (dt_lcl_ < dt_lcl)
+        {
+            dt_lcl = dt_lcl_;
+        }
     }, min_dt_calc);  // end parallel reduction
     Kokkos::fence();
-    
+
     // save the min dt
-    if(min_dt_calc < dt) dt = min_dt_calc;
-    
+    if (min_dt_calc < dt)
+    {
+        dt = min_dt_calc;
+    }
+
     // ensure time step hits the graphics time intervals
-    dt = fmin(dt, (graphics_time - time_value)+fuzz);
-    
+    dt = fmin(dt, (graphics_time - time_value) + fuzz);
+
     // make dt be exact for final time
-    dt = fmin(dt, time_final-time_value);
-    
+    dt = fmin(dt, time_final - time_value);
+
     return;
-    
 } // end get_timestep2D
-
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat.cpp
index 1a892dc68..fd0c43449 100644
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat.cpp
@@ -1,70 +1,63 @@
 // -----------------------------------------------------------------------------
 // This code contains the constitutive relation for a user supplied model
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include "state.h"
 #include "mesh.h"
 
-
 // -----------------------------------------------------------------------------
 // This is the user material model function for the equation of state
 // An eos function must be supplied or the code will fail to run.
 // The pressure and sound speed can be calculated from an analytic eos.
 // The pressure can also be calculated using p = -1/3 Trace(Stress)
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void user_eos_model(const DViewCArrayKokkos <double> &elem_pres,
-                    const DViewCArrayKokkos <double> &elem_stress,
-                    const size_t elem_gid,
-                    const size_t mat_id,
-                    const DViewCArrayKokkos <double> &elem_state_vars,
-                    const DViewCArrayKokkos <double> &elem_sspd,
-                    const double den,
-                    const double sie){
-    
+void user_eos_model(const DViewCArrayKokkos<double>& elem_pres,
+                    const DViewCArrayKokkos<double>& elem_stress,
+                    const size_t                     elem_gid,
+                    const size_t                     mat_id,
+                    const DViewCArrayKokkos<double>& elem_state_vars,
+                    const DViewCArrayKokkos<double>& elem_sspd,
+                    const double                     den,
+                    const double                     sie)
+{
     const int num_dims = 3;
-    
+
     // -----------------------------------------------------------------------------
     // Required variables are here
-    //------------------------------------------------------------------------------
+    // ------------------------------------------------------------------------------
     elem_pres(elem_gid) = 0.0;  // pressure
     elem_sspd(elem_gid) = 1.0e-8;  // sound speed
-    
+
     // pressure = 1/3tr(stress)
-    for (int i=0; i<num_dims; i++){
-        elem_pres(elem_gid) -= elem_stress(i,i);
+    for (int i = 0; i < num_dims; i++)
+    {
+        elem_pres(elem_gid) -= elem_stress(i, i);
     }
-    elem_pres(elem_gid) *= 1.0/3.0;
-    
-    
+    elem_pres(elem_gid) *= 1.0 / 3.0;
+
     return;
-    
 } // end for user_eos_model
 
-
-
-
-
 // -----------------------------------------------------------------------------
 // This is the user material model function for the stress tensor
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void user_strength_model(const DViewCArrayKokkos <double> &elem_pres,
-                         const DViewCArrayKokkos <double> &elem_stress,
-                         const size_t elem_gid,
-                         const size_t mat_id,
-                         const DViewCArrayKokkos <double> &elem_state_vars,
-                         const DViewCArrayKokkos <double> &elem_sspd,
-                         const double den,
-                         const double sie,
-                         const ViewCArrayKokkos <double> &vel_grad,
-                         const ViewCArrayKokkos <size_t> &elem_node_gids,
-                         const DViewCArrayKokkos <double> &node_coords,
-                         const DViewCArrayKokkos <double> &node_vel,
-                         const double vol,
-                         const double dt,
-                         const double rk_alpha){
-    
-
+void user_strength_model(const DViewCArrayKokkos<double>& elem_pres,
+                         const DViewCArrayKokkos<double>& elem_stress,
+                         const size_t                     elem_gid,
+                         const size_t                     mat_id,
+                         const DViewCArrayKokkos<double>& elem_state_vars,
+                         const DViewCArrayKokkos<double>& elem_sspd,
+                         const double                     den,
+                         const double                     sie,
+                         const ViewCArrayKokkos<double>&  vel_grad,
+                         const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                         const DViewCArrayKokkos<double>& node_coords,
+                         const DViewCArrayKokkos<double>& node_vel,
+                         const double                     vol,
+                         const double                     dt,
+                         const double                     rk_alpha)
+{
     // statev(0) = var_1
     //   :
     //   :
@@ -73,38 +66,33 @@ void user_strength_model(const DViewCArrayKokkos <double> &elem_pres,
 
     int num_dims = 3;
 
-    
     // -----------------------------------------------------------------------------
     // The user must coding goes here
-    //------------------------------------------------------------------------------
+    // ------------------------------------------------------------------------------
 
-    
     return;
-    
 } // end of user mat
 
-
 // -----------------------------------------------------------------------------
 // This is the user material model function
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 KOKKOS_FUNCTION
-void user_strength_model_vpsc(const DViewCArrayKokkos <double> &elem_pres,
-                              const DViewCArrayKokkos <double> &elem_stress,
-                              const size_t elem_gid,
-                              const size_t mat_id,
-                              const DViewCArrayKokkos <double> &elem_state_vars,
-                              const DViewCArrayKokkos <double> &elem_sspd,
-                              const double den,
-                              const double sie,
-                              const ViewCArrayKokkos <double> &vel_grad,
-                              const ViewCArrayKokkos <size_t> &elem_node_gids,
-                              const DViewCArrayKokkos <double> &node_coords,
-                              const DViewCArrayKokkos <double> &node_vel,
-                              const double vol,
-                              const double dt,
-                              const double rk_alpha){
-    
-
+void user_strength_model_vpsc(const DViewCArrayKokkos<double>& elem_pres,
+                              const DViewCArrayKokkos<double>& elem_stress,
+                              const size_t                     elem_gid,
+                              const size_t                     mat_id,
+                              const DViewCArrayKokkos<double>& elem_state_vars,
+                              const DViewCArrayKokkos<double>& elem_sspd,
+                              const double                     den,
+                              const double                     sie,
+                              const ViewCArrayKokkos<double>&  vel_grad,
+                              const ViewCArrayKokkos<size_t>&  elem_node_gids,
+                              const DViewCArrayKokkos<double>& node_coords,
+                              const DViewCArrayKokkos<double>& node_vel,
+                              const double                     vol,
+                              const double                     dt,
+                              const double                     rk_alpha)
+{
     // statev(0) = var_1
     //   :
     //   :
@@ -112,27 +100,25 @@ void user_strength_model_vpsc(const DViewCArrayKokkos <double> &elem_pres,
     // statev(N) = var_N
 
     const int num_dims = 3;
-    
+
     // -----------------------------------------------------------------------------
     // Required variables are here
-    //------------------------------------------------------------------------------
+    // ------------------------------------------------------------------------------
     elem_pres(elem_gid) = 1.0e-15;  // pressure
     elem_sspd(elem_gid) = 1.0e-15;  // sound speed
 
-    
     // -----------------------------------------------------------------------------
     // The user must coding goes here
-    //------------------------------------------------------------------------------
-    
+    // ------------------------------------------------------------------------------
+
     // For hypo-elastic models
     double D_tensor_values[9];
     double W_tensor_values[9];
 
     // convert to array syntax with the C-Language access pattern
-    ViewCArrayKokkos <double> D_tensor(D_tensor_values, num_dims, num_dims);  // D(i,j)
-    ViewCArrayKokkos <double> W_tensor(W_tensor_values, num_dims, num_dims);  // W(i,j)
-    
-    
+    ViewCArrayKokkos<double> D_tensor(D_tensor_values, num_dims, num_dims);  // D(i,j)
+    ViewCArrayKokkos<double> W_tensor(W_tensor_values, num_dims, num_dims);  // W(i,j)
+
     decompose_vel_grad(D_tensor,
                        W_tensor,
                        vel_grad,
@@ -141,28 +127,25 @@ void user_strength_model_vpsc(const DViewCArrayKokkos <double> &elem_pres,
                        node_coords,
                        node_vel,
                        vol);
-    
 
     // For hypo-elastic models
     double deps_values[9];
     double dW_values[9];
     double drot_values[9];
-    
-    ViewCMatrixKokkos <double> deps(&deps_values[0], num_dims, num_dims);
-    ViewCMatrixKokkos <double> dW(&dW_values[0], num_dims, num_dims);
-    ViewCMatrixKokkos <double> drot(&drot_values[0], num_dims, num_dims);
-    
+
+    ViewCMatrixKokkos<double> deps(&deps_values[0], num_dims, num_dims);
+    ViewCMatrixKokkos<double> dW(&dW_values[0], num_dims, num_dims);
+    ViewCMatrixKokkos<double> drot(&drot_values[0], num_dims, num_dims);
+
     // calculate strain and rotation increments
-    for (size_t i = 1; i <= 3; i++) {
-        for (size_t j = 1; j <= 3; j++) {
-            deps(i, j) = D_tensor(i-1,j-1)*dt; // infinitesimal strain increment
-            dW(i, j) = W_tensor(i-1,j-1)*dt;
+    for (size_t i = 1; i <= 3; i++)
+    {
+        for (size_t j = 1; j <= 3; j++)
+        {
+            deps(i, j) = D_tensor(i - 1, j - 1) * dt; // infinitesimal strain increment
+            dW(i, j)   = W_tensor(i - 1, j - 1) * dt;
         }
     } // end for
-    
-    
-    
+
     return;
-    
 } // end of ideal_gas
-
diff --git a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat_init.cpp b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat_init.cpp
old mode 100755
new mode 100644
index 18ea82e54..82f26a88f
--- a/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat_init.cpp
+++ b/single-node/src/Explicit-Lagrange-Kokkos/SGH_solver/user_mat_init.cpp
@@ -1,6 +1,6 @@
 // -----------------------------------------------------------------------------
 // This code contains the initialization of state vars for supplied models
-//------------------------------------------------------------------------------
+// ------------------------------------------------------------------------------
 #include <string.h>
 #include <sys/stat.h>
 #include <iostream>
@@ -13,31 +13,25 @@
 #include <vector>
 #include <algorithm>
 
-
-
 #include "mesh.h"
 #include "state.h"
 
-
-
-
 // -----------------------------------------------------------------------------
 // The function to read in the state vars for a user supplied model
-//------------------------------------------------------------------------------
-void user_model_init(const DCArrayKokkos <double> &file_state_vars,
-                     const size_t num_state_vars,
-                     const size_t mat_id,
-                     const size_t num_elems) {
-
+// ------------------------------------------------------------------------------
+void user_model_init(const DCArrayKokkos<double>& file_state_vars,
+                     const size_t                 num_state_vars,
+                     const size_t                 mat_id,
+                     const size_t                 num_elems)
+{
     // initialize to zero
-    for (size_t elem_gid = 0; elem_gid<num_elems; elem_gid++) {
-        for(size_t var=0; var<num_state_vars; var++){
-            file_state_vars.host(mat_id,elem_gid,var) = 0.0;
+    for (size_t elem_gid = 0; elem_gid < num_elems; elem_gid++)
+    {
+        for (size_t var = 0; var < num_state_vars; var++)
+        {
+            file_state_vars.host(mat_id, elem_gid, var) = 0.0;
         }
     }
-    
 
-	return;
+    return;
 }
-
-
diff --git a/src/CLI/src/main.cpp b/src/CLI/src/main.cpp
index 70cce4ada..c7cd3d018 100644
--- a/src/CLI/src/main.cpp
+++ b/src/CLI/src/main.cpp
@@ -11,10 +11,10 @@
 #include <string>
 
 std::vector<std::shared_ptr<FierroBackend>> BACKENDS {
-    std::shared_ptr<ParallelExplicit>(),
-    std::shared_ptr<ParallelImplicit>(),
-    std::shared_ptr<MeshBuilderBackend>(),
-    std::shared_ptr<VoxelizerBackend>(),
+    std::make_shared<ParallelExplicit>(),
+    std::make_shared<ParallelImplicit>(),
+    std::make_shared<MeshBuilderBackend>(),
+    std::make_shared<VoxelizerBackend>(),
 };
 
 std::vector<std::shared_ptr<FierroBackend>> find_fierro_backends() {
@@ -97,4 +97,4 @@ int main(int argc, char** argv) {
 
     // If they didn't select anything, give them some help.
     std::cout << parser << std::endl;
-}
\ No newline at end of file
+}
diff --git a/src/EVP/src/CMakeLists.txt b/src/EVP/src/CMakeLists.txt
index d3062cb86..f850a3a67 100644
--- a/src/EVP/src/CMakeLists.txt
+++ b/src/EVP/src/CMakeLists.txt
@@ -1,3 +1,4 @@
 set(SRC_Files UserEOSModel.cpp UserStrengthModel.cpp init_state_vars.cpp chg_basis.cpp evpal.cpp)
 add_library(user_material_models OBJECT ${SRC_Files})
 target_include_directories(user_material_models PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
+target_link_libraries(user_material_models Elements)
\ No newline at end of file
diff --git a/src/EVP/src/init_state_vars.cpp b/src/EVP/src/init_state_vars.cpp
index cb3d54b12..bf6154b32 100644
--- a/src/EVP/src/init_state_vars.cpp
+++ b/src/EVP/src/init_state_vars.cpp
@@ -32,7 +32,7 @@ std::vector<std::string> split (std::string s, std::string delimiter);
 void init_state_vars(
   const DCArrayKokkos <material_t> &material,
   const DViewCArrayKokkos <size_t> &elem_mat_id,
-  const DCArrayKokkos <double> &state_vars,
+  DCArrayKokkos <double> &state_vars,
   const DCArrayKokkos <double> &global_vars,
   const DCArrayKokkos <double> &elem_user_output_vars,
   const size_t num_elems)
@@ -412,7 +412,7 @@ void init_state_vars(
   //});
 	}
   Kokkos::fence();
-  //state_vars.update_host();
+  state_vars.update_host();
 
     printf("user_mat_init completed\n");
     
diff --git a/src/EVPFFT/README.md b/src/EVPFFT/README.md
index 33427bd1d..26602d3b3 100644
--- a/src/EVPFFT/README.md
+++ b/src/EVPFFT/README.md
@@ -152,7 +152,7 @@ Provide a vtk file type that contains information about which grid point is soli
 
 Run EVPFFT as:
 ```
-  mpirun -n 1 -f tension_11.txt -m 2
+  mpirun -n 1 evpfft -f tension_11.txt -m 2
 ```
 the `-m 2` option tells EVPFFT to use the vtk lattice file microstructure file type.
 
diff --git a/src/EVPFFT/example_input_files/taylor_anvil.yaml b/src/EVPFFT/example_input_files/taylor_anvil.yaml
index cdb8f0c8d..8ab8e24e4 100644
--- a/src/EVPFFT/example_input_files/taylor_anvil.yaml
+++ b/src/EVPFFT/example_input_files/taylor_anvil.yaml
@@ -1,19 +1,13 @@
-solver_type: Explicit 
-
-fea_modules:
-  - type: SGH 
-
 num_dims: 3
-rk_num_stages: 1
 
-time_variables:
-    time_final: 1.08e-04 #1.5e-04  #[s]
-    dt_min: 1.e-11  #[s]
-    dt_max: 1.e-3  #[s]
-    dt_start: 1.e-10  #[s]
-    dt_cfl: 0.01
-    cycle_stop: 200000000
-timer_output_level: thorough
+dynamic_options:
+  rk_num_stages: 1
+  time_final: 1.08e-04 #1.5e-04  #[s]
+  dt_min: 1.e-11  #[s]
+  dt_max: 1.e-3  #[s]
+  dt_start: 1.e-10  #[s]
+  dt_cfl: 0.01
+  cycle_stop: 200000000
 
 input_options:
     mesh_file_format: ensight
@@ -21,19 +15,43 @@ input_options:
     element_type: hex8
 
 output_options:
-    graphics_step: 2.4e-6 #1.2e-6 #6.0e-07  #[s]
-    output_file_format: vtk 
-    max_num_user_output_vars: 5
-    
-field_output:
-  - velocity
-  #- processor_id
-  #- element_id
-  #- stress
-  - user_vars
+  timer_output_level: thorough
+  output_file_format: vtk
+  graphics_step: 2.4e-6 #1.2e-6 #6.0e-07  #[s]
+  max_num_user_output_vars: 5
+  include_default_output_fields: false
+  output_fields:
+    - velocity
+    #- processor_id
+    #- element_id
+    #- stress
+    - user_vars
 
-material_options:
-  - eos_model: user_eos_model
+fea_module_parameters:
+  - type: SGH
+    material_id: 0
+    boundary_conditions:
+      # Tag X plane
+      - surface: 
+          type: x_plane
+          plane_position: 0.0
+        type: reflected
+                
+        # Tag Y plane
+      - surface: 
+          type: y_plane
+          plane_position: 0.0
+        type: reflected
+                
+        # Tag Z plane
+      - surface: 
+          type: z_plane
+          plane_position: 0.0
+        type: reflected
+
+materials:
+  - id: 0
+    eos_model: user_eos_model
     eos_run_location: device
     strength_model: user_strength_model
     strength_type: hypo
@@ -41,37 +59,21 @@ material_options:
     q1: 0.01 # 0.01 to 0.1
     q2: 0.5 # 0.5  q2/4
     q1ex: 0.01 # 0.01 to 0.1
-    q2ex: 0.0 
+    q2ex: 0.0
+    maximum_limiter: true
+    # num_state_vars: 5
     global_vars:
-      - 8 #N1
-      - 8 #N2
-      - 8 #N3
       - 0.0001 #udotAccTh
       - 3000000 #[mm/s] ref sound speed
 
-region_options:
-  - volume: global
-    mat_id: 0
+regions:
+  - volume:
+      type: global
+    material_id: 0
     den: 1.669e-8 #[tonne/mm3]
-    sie: 1.0e-11  #[mJ]
-
+    sie: 1.0e-12  #[mJ]
+        
     velocity: cartesian
     u: 0.0
     v: 0.0
-    w: -175000.0  #[mm/s]
-
-boundary_conditions:
-  # Tag X plane
-  - surface: x_plane
-    value: 0.0
-    condition_type: reflected
-
-    # Tag Y plane
-  - surface: y_plane
-    value: 0.0
-    condition_type: reflected
-
-    # Tag Z plane
-  - surface: z_plane
-    value: 0.0
-    condition_type: reflected
+    w: -175000.0  #[mm/s]
\ No newline at end of file
diff --git a/src/EVPFFT/example_input_files/tension_single_element.yaml b/src/EVPFFT/example_input_files/tension_single_element.yaml
index a0e3ca4cb..cab96c036 100644
--- a/src/EVPFFT/example_input_files/tension_single_element.yaml
+++ b/src/EVPFFT/example_input_files/tension_single_element.yaml
@@ -1,38 +1,53 @@
-solver_type: Explicit
-
-fea_modules:
-  - type: SGH
-
 num_dims: 3
-rk_num_stages: 1
 
-time_variables:
-    time_final: 1.0e+8  #[s]
-    dt_min: 1.e-11  #[s]
-    dt_max: 1.e-4  #[s]
-    dt_start: 1.e-6  #[s]
-    dt_cfl: 0.5
-    cycle_stop: 200000000
-timer_output_level: thorough
+dynamic_options:
+  rk_num_stages: 1
+  time_final: 1.0e-2 #1.0e+8  #[s]
+  dt_min: 1.e-11  #[s]
+  dt_max: 1.e-4  #[s]
+  dt_start: 1.e-6  #[s]
+  dt_cfl: 0.5
+  cycle_stop: 200000000
 
 input_options:
     mesh_file_format: ensight
-    mesh_file_name: /vast/home/cyenusah/github/Fierro/EVPFFT/input_files/mesh_single_element.geo
+    mesh_file_name: /vast/home/cyenusah/github/Fierro/src/EVPFFT/example_input_files/mesh_single_element.geo
     element_type: hex8
 
 output_options:
-  graphics_step: 1.0e-1  #[s]
+  timer_output_level: thorough
   output_file_format: vtk
+  graphics_step: 1.0e-1  #[s]
+  #output_fields:
+    #- velocity
+    #- processor_id
+    #- element_id
+    #- stress
+    #- user_vars
+
 
-#field_output:
-  #- velocity
-  #- processor_id
-  #- element_id
-  #- stress
-  #- user_vars
+fea_module_parameters:
+  - type: SGH
+    material_id: 0
+    boundary_conditions:
+      # Tag Z plane
+      - surface:
+          type: z_plane
+          plane_position: 0.0
+        type: fixed_position
+                
+      # Tag Z plane
+      - surface:
+          type: z_plane
+          plane_position: 1.0
+        type: velocity
+        u: 0.0  #[mm/s]
+        v: 0.0  #[mm/s]
+        w: 1.0  #[mm/s]
 
-material_options:
-  - eos_model: user_eos_model
+materials:
+  - id: 0
+    eos_model: user_eos_model
     eos_run_location: device
     strength_model: user_strength_model
     strength_type: hypo
@@ -41,16 +56,15 @@ material_options:
     q2: 1.3333
     q1ex: 1.0
     q2ex: 0.0
+    # num_state_vars: 5
     global_vars:
-      - 8 #N1
-      - 8 #N2
-      - 8 #N3
-      - 0.0001 #udotAccTh
+      - 0.0 #0.0001 #udotAccTh
       - 100000 #2400000 #[mm/s] ref sound speed
 
-region_options:
-  - volume: global
-    mat_id: 0
+regions:
+  - volume:
+      type: global
+    material_id: 0
     den: 8.96e-9 #[tonne/mm3]
     sie: 1.0e-12  #[mJ]
         
@@ -58,17 +72,3 @@ region_options:
     u: 0.0
     v: 0.0
     w: 0.0  #[mm/s]
-
-boundary_conditions:
-  # Tag Z plane
-  - surface: z_plane
-    value: 0.0
-    condition_type: fixed
-            
-  # Tag Z plane
-  - surface: z_plane
-    value: 1.0
-    condition_type: velocity
-    u: 0.0  #[mm/s]
-    v: 0.0  #[mm/s]
-    w: 1.0  #[mm/s]
diff --git a/src/EVPFFT/src/Fierro-EVPFFT-Link/init_state_vars.cpp b/src/EVPFFT/src/Fierro-EVPFFT-Link/init_state_vars.cpp
index 19d493a3c..f7cde4d41 100644
--- a/src/EVPFFT/src/Fierro-EVPFFT-Link/init_state_vars.cpp
+++ b/src/EVPFFT/src/Fierro-EVPFFT-Link/init_state_vars.cpp
@@ -6,7 +6,7 @@ using namespace mtr;
 void init_state_vars(
   const DCArrayKokkos <material_t> &material,
   const DViewCArrayKokkos <size_t> &elem_mat_id,
-  const DCArrayKokkos <double> &state_vars,
+  DCArrayKokkos <double> &state_vars,
   const DCArrayKokkos <double> &global_vars,
   const DCArrayKokkos <double> &elem_user_output_vars,
   const size_t num_elems)
@@ -20,6 +20,7 @@ void init_state_vars(
             state_vars.host(elem_gid,var) = 0.0;
         }
     }
+    state_vars.update_device();
     
 	  return;
 }
diff --git a/src/EVPFFT/src/data_grain.cpp b/src/EVPFFT/src/data_grain.cpp
index 9c386bbdf..d9c5fa0e6 100644
--- a/src/EVPFFT/src/data_grain.cpp
+++ b/src/EVPFFT/src/data_grain.cpp
@@ -131,6 +131,14 @@ void EVPFFT::data_grain(const std::string & filetext)
   MPI_Allreduce(MPI_IN_PLACE, c066.host_pointer(), c066.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
   c066.update_device();
 
+  // for single crystal only. it leads to fast convergence of the outer while loop for stress and strain fields
+  // for (int i = 1; i <= 6; i++) {
+  //   for (int j = 1; j <= 6; j++) {
+  //     c066.host(i,j) *= 1000000.0;
+  //   }
+  // }
+  // c066.update_device();
+
 #if 0
   // debug print
   print_array_to_file(c066.host_pointer(), c066.size(), my_rank, "c066.txt");
diff --git a/src/EVPFFT/src/evpal.cpp b/src/EVPFFT/src/evpal.cpp
index 958645c37..02125321f 100644
--- a/src/EVPFFT/src/evpal.cpp
+++ b/src/EVPFFT/src/evpal.cpp
@@ -242,7 +242,7 @@ void EVPFFT::evpal(int imicro)
             } // end for jj
           } // end for ii
 
-#if 0  
+//#if 0  
           int error_flag = invert_matrix <6> (xjacobinv.pointer());
 
           // TODO: optimize indexing of this loop
@@ -251,8 +251,9 @@ void EVPFFT::evpal(int imicro)
               sg6(ii) += -xjacobinv(ii,jj)*res(jj);
             } // end for jj
           } // end for ii
-#endif
+//#endif
 
+#if 0
           // Calculate new stress by solving the system -[J][delt_sg] = [R]
           int error_flag = solve_linear_system(xjacobinv.pointer(), res.pointer(), 6);
           //if (error_flag != 0) {
@@ -264,6 +265,7 @@ void EVPFFT::evpal(int imicro)
           for (int ii = 1; ii <= 6; ii++) {
             sg6(ii) -= res(ii);
           } // end for ii
+#endif
 
           dsgnorm1 = 0.0;
           dsgnorm2 = 0.0;
diff --git a/src/EVPFFT/src/evpfft.cpp b/src/EVPFFT/src/evpfft.cpp
index 80860e21c..9355b7ae9 100644
--- a/src/EVPFFT/src/evpfft.cpp
+++ b/src/EVPFFT/src/evpfft.cpp
@@ -1,3 +1,41 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+
 #include "evpfft.h"
 #include "utilities.h"
 #include <math.h>
diff --git a/src/EVPFFT/src/evpfft.h b/src/EVPFFT/src/evpfft.h
index 80fabd7ba..29e9c3b3f 100644
--- a/src/EVPFFT/src/evpfft.h
+++ b/src/EVPFFT/src/evpfft.h
@@ -1,3 +1,40 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
 #pragma once
 
 #include "mpi.h"
diff --git a/src/EVPFFT/src/main.cpp b/src/EVPFFT/src/main.cpp
index 2776e43ca..349a51a53 100644
--- a/src/EVPFFT/src/main.cpp
+++ b/src/EVPFFT/src/main.cpp
@@ -1,3 +1,41 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+
 #include "evpfft.h"
 #include "utilities.h"
 #include "vm.h"
diff --git a/src/LS-EVPFFT/.DS_Store b/src/LS-EVPFFT/.DS_Store
new file mode 100644
index 000000000..76d55dfd1
Binary files /dev/null and b/src/LS-EVPFFT/.DS_Store differ
diff --git a/src/LS-EVPFFT/.gitignore b/src/LS-EVPFFT/.gitignore
new file mode 100644
index 000000000..2be64c536
--- /dev/null
+++ b/src/LS-EVPFFT/.gitignore
@@ -0,0 +1,13 @@
+build
+CMakeFiles
+Makefile
+CMakeCache.txt
+cmake_install.cmake
+.vscode
+*.swp
+ls-evpfft_*
+hdf5
+heffte
+kokkos
+MATAR
+lib
diff --git a/src/LS-EVPFFT/README.md b/src/LS-EVPFFT/README.md
new file mode 100644
index 000000000..85831347b
--- /dev/null
+++ b/src/LS-EVPFFT/README.md
@@ -0,0 +1,158 @@
+# Large Strain Elasto-Viscoplastic Fast Fourier Transform-based (LS-EVPFFT) Micromechanical Solver
+
+LS-EVPFFT is a parallel implementation of an elasto-viscoplastic fast Fourier transform-based (EVPFFT) micromechanical solver to facilitate computationally efficient crystal plasticit modeling of polycrystalline materials. Emphasis is placed on performance portability to allow for a single implementation of LS-EVPFFT to run optimally on variety of homogeneous architectures that are comprised of multi-core central processing units(CPUs), and on heterogeneous computer architectures comprised of multi-core CPUs and graphics processing units (GPUs) from different vendors. To achieve this, we utilize MATAR, a C++ software library that allows the straightforward creation and usage of multidimensional dense or sparse matrix and array data structures that are also portable across disparate architectures using Kokkos, a performance portable library. Also, we utilize the message passing interface (MPI) to distribute the computational workload amongst the processors. The heFFTe (Highly Efficient FFT for Exascale) library is used to facilitate the performance portability of the fast Fourier transforms (FFTs) computation.
+
+# Building LS-EVPFFT as a standalone program
+
+LS-EVPFFT depends on the following to build:
+
+1. MPI
+2. [MATAR](https://github.com/lanl/MATAR)
+3. [Kokkos](https://github.com/kokkos/kokkos)
+4. [HeFFTe](https://github.com/icl-utk-edu/heffte) (FFTW,CUFFT,ROCFFT)
+5. [HDF5](https://www.hdfgroup.org/solutions/hdf5/)
+
+To make it easy to build LS-EVPFFT, we have included `scripts/build-scripts/build_ls-evpfft.sh` which when executed will download and install all required dependencies and LS-EVPFFT. It is important to note that LS-EVPFFT depends on three main libraries which can be time consuming to download from scratch namely, MPI, FFTW (MPI version), and HDF5 (MPI version). Therefore the user is advised to download these packages themselves either using `sudo apt install` of Anaconda package manager.
+
+## Building LS-EVPFFT with Anaconda
+It is advised to use Anaconda package manager to build LS-EVPFFT as follows:
+
+1. Create an environment and activate:
+```
+conda create --name ls-evpfftEnv
+conda activate ls-evpfftEnv
+```
+
+2. Install needed packages:
+```
+conda install cxx-compiler -c conda-forge
+conda install cmake
+conda install "fftw=*=mpi_openmpi*" -c conda-forge
+conda install "hdf5=*=mpi_openmpi*" -c conda-forge
+```
+
+3. Run the build script as:
+```
+source build_ls-evpfft.sh --help
+```
+
+Which outputs:
+
+```
+Usage: source build_ls-evpfft.sh [OPTION]
+Required arguments:
+  --heffte_build_type=<fftw|cufft|rocfft>
+  --kokkos_build_type=<serial|openmp|pthreads|cuda|hip>
+
+Optional arguments:
+  --build_fftw: builds fftw from scratch
+  --build_hdf5: builds hdf5 from scratch
+  --machine=<darwin|chicoma|linux|mac> (default: none)
+  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)
+  --help: Display this help message
+```
+
+To build LS-EVPFFT you would need to provide both the `--heffte_build_type` and `--kokkos_build_type` options. The command below build LS-EVPFFT using FFTW and Serial version of Kokkos:
+
+```
+source build_ls-evpfft.sh --heffte_build_type=fftw --kokkos_build_type=serial
+```
+
+This will build LS-EVPFFT in the folder `ls-evpfft_{fftw}_{serial}`. The binary, `ls-evpfft` is found in that folder.
+
+## Building LS-EVPFFT without Anaconda
+Install HDF5 (MPI version) and FFTW (MPI version) with `sudo apt install` which will install the libraries in the default location. Then run the build script as above.
+
+If you would like to build HDF5 and FFTW from scratch then use `--build_hdf5` and `--build_fftw` options of the build script.
+
+If you do not want to build HDF5 and FFTW from scratch because you already have a version on your system remove the `--build_hdf5` and `--build_fftw` option from the command. If your installed HDF5 and FFTW are located in a directory other than the default linux directories for libraries, then specify the following before running the build script.
+
+```
+export HDF5_ROOT=/path/to/where/HDF5/lib/and/include/folders/are/located
+export FFTW_ROOT=/path/to/where/FFTW/lib/and/include/folders/are/located
+```
+
+# Using LS-EVPFFT as a standalone program
+
+To get help on how to run `ls-evpfft` use the `-h` or `--help` opton:
+
+```
+./ls-evpfft --help
+```
+
+Which gives the following output:
+
+```
+Required arguments:
+  -f, --infile   <path>     input file (full path)
+...
+```
+
+We have provided example input files in the `example_input_files` directory. In summary, LS-EVPFFT needs these files:
+
+1. input file (see `example_input_files/example_ls-evpfft_standalone_inputfile.txt`)
+2. plastic parameters file (see `example_input_files/example_plastic_parameters.txt`)
+3. elastic parameters file (see `example_input_files/example_elastic_parameters.txt`)
+4. microstructure RVE file (see `example_input_files/random_microstructure_8x8x8.txt`)
+
+With all these files available you can now run LS-EVPFFT as:
+
+```
+mpirun -np 4 ls-evpfft --infile=example_ls-evpfft_standalone_inputfile.txt
+```
+
+# Using LS-EVPFFT with Fierro
+
+To use LS-EVPFFT as a user material model for Fierro, build Fierro with the following cmake flags `-DBUILD_LS_EVPFFT_FIERRO=ON`, choose the FFT backend with `-D<USE_FFTW,USE_CUFFT,USE_ROCFFT>=ON`, and use `-DABSOLUTE_NO_OUTPUT=ON` to specify that LS-EVPFFT does not provide output per element which can be cumbersome and memory intensive. It is best to modify LS-EVPFFT to output for a few elements of interst.
+
+The yaml file for Fierro should include:
+
+```
+material_options:
+  - eos_model: user_eos_model
+    eos_run_location: device
+    strength_model: user_strength_model
+    strength_type: hypo
+    strength_run_location: host
+    global_vars:
+      - 0.0001 #udotAccTh (LS-EVPFFT accumulated strain threshold before full solve)
+      - 3000000 #[mm/s] (LS-EVPFFT material sound speed)
+```
+
+Also, input files for each material should be defined as `evpfft1.in`, `evpfft2.in`, etc. and should be provided in the directory Fierro is being run from.
+
+# Additional Information
+
+These information are for users interested in more customization of the build process. You can add to the `cmake` options in `build_ls-evpfft.sh` as needed.
+
+Scripts for building each dependency are available in the `install-scripts` folder and can be run individually.
+
+Note that MATAR is used as an include header file.
+
+If you already have an installation of HDF5 (parallel version) on you system you can use your installation, just comment out the part of the `build_ls-evpfft.sh` that calls `install_hdf5.sh`.
+
+During the installation of HeFFTe libray we assume the required FFT backend (FFTW, CUFFT, ROCM) is installed on the system. If this is not the case then look at the [HeFFTe installation page](https://github.com/icl-utk-edu/heffte) for more information.
+
+The LS-EVPFFT `CMakeLists.txt` uses the following default values to build LS-EVPFFT:
+
+```
+  TWO_SIGN_SLIP_SYSTEMS: OFF
+  NON_SCHMID_EFFECTS: OFF
+  ABSOLUTE_NO_OUTPUT: OFF
+  ENABLE_PROFILE: OFF
+```
+
+To change these default options include the `-D OPTION=<value>` in the `cmake` option, E.g. `-D ABSOLUTE_NO_OUTPUT=ON` in the `build_ls-evpfft.sh`.
+
+# Using LS-EVPFFT for Lattice Structure Homogenization
+
+Example for input files needed to run LS-EVPFFT for lattice structure homogenization is shown in `example_input_files/lattice_input_files`. In that file you will see how to set up evpft input file, elastic and plastic parameter files.
+
+Provide a vtk file type that contains information about which grid point is solid (1) or void (0), example is shown in `example_input_files/lattice_input_files/void_in_solid.vtk`.
+
+Run LS-EVPFFT as:
+```
+  mpirun -n 1 ls-evpfft -f tension_11.txt -m 2
+```
+the `-m 2` option tells LS-EVPFFT to use the vtk lattice file microstructure file type.
+
diff --git a/src/LS-EVPFFT/example_input_files/README.md b/src/LS-EVPFFT/example_input_files/README.md
new file mode 100644
index 000000000..dfa581ec1
--- /dev/null
+++ b/src/LS-EVPFFT/example_input_files/README.md
@@ -0,0 +1 @@
+Use the examples in EVPFFT/example_input_files
\ No newline at end of file
diff --git a/src/LS-EVPFFT/scripts/install-scripts/install_fftw.sh b/src/LS-EVPFFT/scripts/install-scripts/install_fftw.sh
new file mode 100644
index 000000000..072c2241a
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/install-scripts/install_fftw.sh
@@ -0,0 +1,116 @@
+#!/bin/bash -e
+
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Valid options:"
+    echo "  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)"
+    echo "  --help: Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+num_jobs=1
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --num_jobs=*)
+            num_jobs="${arg#*=}"
+            if ! [[ "$num_jobs" =~ ^[0-9]+$ ]]; then
+                echo "Error: Invalid --num_jobs value. Must be a positive integer."
+                show_help
+                return 1
+            fi
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Determine the script's directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Script location: $SCRIPT_DIR"
+
+# Determine the parent directory of the script's directory
+PARENT_DIR=$(dirname $(dirname "${SCRIPT_DIR}"))
+
+# make lib directory to store all dependencies
+LIB_DIR="$PARENT_DIR/lib"
+mkdir -p "$LIB_DIR"
+
+# Define FFTW directories
+FFTW_SOURCE_DIR="$LIB_DIR/fftw-3.3.10"
+FFTW_INSTALL_DIR="$LIB_DIR/fftw-3.3.10/install_fftw"
+FFTW_BUILD_DIR="$LIB_DIR/fftw-3.3.10/build_fftw"
+
+# Check if the 'fftw-3.3.10' directory exists in the lib directory; if not, clone it
+if [ ! -d "$FFTW_SOURCE_DIR" ]; then
+  echo "Directory 'fftw' does not exist in '$LIB_DIR', downloading 'fftw'...."
+  wget -P "$LIB_DIR" "http://www.fftw.org/fftw-3.3.10.tar.gz"
+  tar -C "$LIB_DIR" -zxvf "$LIB_DIR/fftw-3.3.10.tar.gz"
+else
+  echo "Directory 'fftw' exists in '$LIB_DIR', skipping 'fftw' download"
+fi
+
+# Check to avoid reinstalling FFTW which might take time
+if [ -d "$FFTW_INSTALL_DIR" ]; then
+    echo "FFTW already installed, to reinstall FFTW delete $FFTW_INSTALL_DIR and $FFTW_BUILD_DIR"
+    return 0
+fi
+
+# Configure fftw
+config_options=(
+    CFLAGS='-O3 -DNDEBUG -fPIC'
+    --prefix=${FFTW_INSTALL_DIR}
+    --enable-mpi
+    --enable-threads
+    --enable-openmp
+    #--enable-avx
+    #--enable-avx2
+    #--enable-avx512
+)
+
+current_dir=$(pwd)
+# have to mkdir and cd because configure does not provide option for build_dir
+mkdir -p "$FFTW_BUILD_DIR"
+cd "$FFTW_BUILD_DIR"
+# Run configure
+"$FFTW_SOURCE_DIR/configure" "${config_options[@]}"
+
+echo "Building double precision fftw..."
+make -C "$FFTW_BUILD_DIR" -j"$num_jobs"
+
+echo "Installing double precision fftw..."
+make -C "$FFTW_BUILD_DIR" install
+
+# cleanup before installing single precision
+make distclean
+
+# Configure for single precision
+config_options+=(
+    --enable-float
+)
+
+# Run configure for single precision
+"$FFTW_SOURCE_DIR/configure" "${config_options[@]}"
+
+echo "Building single precision fftw..."
+make -C "$FFTW_BUILD_DIR" -j"$num_jobs"
+
+echo "Installing single precision fftw..."
+make -C "$FFTW_BUILD_DIR" install
+
+#cleanup
+make distclean
+
+cd "$current_dir"
+
+echo "fftw installation complete."
+
diff --git a/src/LS-EVPFFT/scripts/install-scripts/install_hdf5.sh b/src/LS-EVPFFT/scripts/install-scripts/install_hdf5.sh
new file mode 100644
index 000000000..c75552f75
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/install-scripts/install_hdf5.sh
@@ -0,0 +1,92 @@
+#!/bin/bash -e
+
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Valid options:"
+    echo "  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)"
+    echo "  --help: Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+num_jobs=1
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --num_jobs=*)
+            num_jobs="${arg#*=}"
+            if ! [[ "$num_jobs" =~ ^[0-9]+$ ]]; then
+                echo "Error: Invalid --num_jobs value. Must be a positive integer."
+                show_help
+                return 1
+            fi
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Determine the script's directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Script location: $SCRIPT_DIR"
+
+# Determine the parent directory of the script's directory
+PARENT_DIR=$(dirname $(dirname "${SCRIPT_DIR}"))
+
+# make lib directory to store all dependencies
+LIB_DIR="$PARENT_DIR/lib"
+mkdir -p "$LIB_DIR"
+
+# Define hdf5 directories
+HDF5_SOURCE_DIR="$LIB_DIR/hdf5"
+HDF5_INSTALL_DIR="$LIB_DIR/hdf5/install_hdf5"
+HDF5_BUILD_DIR="$LIB_DIR/hdf5/build_hdf5"
+
+# Check if the 'hdf5' directory exists in the parent directory; if not, clone it
+if [ ! -d "$HDF5_SOURCE_DIR" ]; then
+  echo "Directory 'hdf5' does not exist in '$LIB_DIR', downloading 'hdf5'...."
+  git clone --depth 1 https://github.com/HDFGroup/hdf5.git "$HDF5_SOURCE_DIR"
+else
+  echo "Directory 'hdf5' exists in '$LIB_DIR', skipping 'hdf5' download"
+fi
+
+
+# Check to avoid reinstalling HDF5 which might take time
+if [ -d "$HDF5_INSTALL_DIR" ]; then
+    echo "HDF5 already installed, to reinstall HDF5 delete $HDF5_INSTALL_DIR and $HDF5_BUILD_DIR"
+    return 0
+fi
+
+# Configure hdf5 using CMake
+cmake_options=(
+    -D CMAKE_INSTALL_PREFIX="$HDF5_INSTALL_DIR"
+    -D CMAKE_BUILD_TYPE=Release
+    -D HDF5_BUILD_FORTRAN=ON
+    -D HDF5_ENABLE_PARALLEL=ON
+    -D BUILD_TESTING=OFF
+)
+
+# Print CMake options for reference
+echo "CMake Options: ${cmake_options[@]}"
+
+# Configure hdf5
+cmake "${cmake_options[@]}" -B "$HDF5_BUILD_DIR" -S "$HDF5_SOURCE_DIR"
+
+# Build hdf5
+echo "Building hdf5..."
+make -C "$HDF5_BUILD_DIR" -j"$num_jobs"
+
+# Install hdf5
+echo "Installing hdf5..."
+make -C "$HDF5_BUILD_DIR" install
+
+echo "hdf5 installation complete."
+
diff --git a/src/LS-EVPFFT/scripts/install-scripts/install_heffte.sh b/src/LS-EVPFFT/scripts/install-scripts/install_heffte.sh
new file mode 100644
index 000000000..a6cbe601a
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/install-scripts/install_heffte.sh
@@ -0,0 +1,147 @@
+#!/bin/bash -e
+
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Required arguments:"
+    echo "  --heffte_build_type=<fftw|cufft|rocfft>"
+    echo " "
+    echo "Optional arguments:"
+    echo "  --build_fftw: builds fftw from scratch"
+    echo "  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)"
+    echo "  --help: Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+heffte_build_type=""
+num_jobs=1
+build_fftw=0  # Default value for build_fftw
+
+# Define arrays of valid options
+valid_heffte_build_types=("fftw" "cufft" "rocfft")
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --heffte_build_type=*)
+            option="${arg#*=}"
+            if [[ " ${valid_heffte_build_types[*]} " == *" $option "* ]]; then
+                heffte_build_type="$option"
+            else
+                echo "Error: Invalid --heffte_build_type specified."
+                show_help
+                return 1
+            fi
+            ;;
+       --num_jobs=*)
+            num_jobs="${arg#*=}"
+            if ! [[ "$num_jobs" =~ ^[0-9]+$ ]]; then
+                echo "Error: Invalid --num_jobs value. Must be a positive integer."
+                show_help
+                return 1
+            fi
+            ;;
+        --build_fftw)  # New option without a value
+            build_fftw=1
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Check if required options are specified
+if [ -z "$heffte_build_type" ]; then
+    echo "Error: --heffte_build_type is a required option."
+    show_help
+    return 1
+fi
+
+# Now you can use $heffte_build_type in your code or build commands
+echo "Heffte build type will be: $heffte_build_type"
+
+# Determine the script's directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Script location: $SCRIPT_DIR"
+
+# Determine the parent directory of the script's directory
+PARENT_DIR=$(dirname $(dirname "${SCRIPT_DIR}"))
+
+# make lib directory to store all dependencies
+LIB_DIR="$PARENT_DIR/lib"
+mkdir -p "$LIB_DIR"
+
+# Define HeFFTe directories
+HEFFTE_SOURCE_DIR="$LIB_DIR/heffte"
+HEFFTE_INSTALL_DIR="$LIB_DIR/heffte/install_heffte_$heffte_build_type"
+HEFFTE_BUILD_DIR="$LIB_DIR/heffte/build_heffte_$heffte_build_type"
+
+# --------building fftw
+if [ "$heffte_build_type" = "fftw" ] && [ "$build_fftw" -eq 1 ]; then
+  FFTW_INSTALL_SCRIPT="$PARENT_DIR/scripts/install-scripts/install_fftw.sh"
+  source "$FFTW_INSTALL_SCRIPT" --num_jobs=$num_jobs
+  FFTW_INSTALL_DIR="$LIB_DIR/fftw-3.3.10/install_fftw"
+fi
+
+# Check if the 'heffte' directory exists in the parent directory; if not, clone it
+if [ ! -d "$HEFFTE_SOURCE_DIR" ]; then
+  echo "Directory 'heffte' does not exist in '$LIB_DIR', downloading 'heffte'...."
+  git clone --depth 1 https://github.com/icl-utk-edu/heffte.git "$HEFFTE_SOURCE_DIR"
+else
+  echo "Directory 'heffte' exists in '$LIB_DIR', skipping 'heffte' download"
+fi
+
+# Configure heffte using CMake
+cmake_options=(
+    -D CMAKE_BUILD_TYPE=Release
+    -D CMAKE_INSTALL_PREFIX="$HEFFTE_INSTALL_DIR"
+    -D BUILD_SHARED_LIBS=ON
+)
+
+if [ "$heffte_build_type" = "fftw" ]; then
+    cmake_options+=(
+        #-D Heffte_ENABLE_AVX=ON
+        #-D Heffte_ENABLE_AVX512=ON
+        -D Heffte_ENABLE_FFTW=ON
+        #-D FFTW_ROOT="$FFTW_DIR"
+    )
+    if [ "$build_fftw" -eq 1 ]; then
+      cmake_options+=(
+        -D FFTW_ROOT="$FFTW_INSTALL_DIR"
+      )
+    fi
+elif [ "$heffte_build_type" = "cufft" ]; then
+    cmake_options+=(
+        -D Heffte_ENABLE_CUDA=ON
+        -D Heffte_DISABLE_GPU_AWARE_MPI=ON
+    )
+elif [ "$heffte_build_type" = "rocfft" ]; then
+    cmake_options+=(
+        -D CMAKE_CXX_COMPILER=hipcc
+        -D Heffte_ENABLE_ROCM=ON
+        -D Heffte_DISABLE_GPU_AWARE_MPI=ON
+    )
+fi
+
+# Print CMake options for reference
+echo "CMake Options: ${cmake_options[@]}"
+
+# Configure HeFFTe
+cmake "${cmake_options[@]}" -B "$HEFFTE_BUILD_DIR" -S "$HEFFTE_SOURCE_DIR"
+
+# Build HeFFTe
+echo "Building HeFFTe..."
+make -C "$HEFFTE_BUILD_DIR" -j"$num_jobs"
+
+# Install HeFFTe
+echo "Installing HeFFTe..."
+make -C "$HEFFTE_BUILD_DIR" install
+
+echo "HeFFTe installation complete."
+
diff --git a/src/LS-EVPFFT/scripts/install-scripts/install_kokkos.sh b/src/LS-EVPFFT/scripts/install-scripts/install_kokkos.sh
new file mode 100644
index 000000000..6f6a21531
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/install-scripts/install_kokkos.sh
@@ -0,0 +1,135 @@
+#!/bin/bash -e
+
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Valid options:"
+    echo "  --kokkos_build_type=<serial|openmp|pthreads|cuda|hip>"
+    echo "  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)"
+    echo "  --help: Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+kokkos_build_type=""
+num_jobs=1
+
+# Define arrays of valid options
+valid_kokkos_build_types=("serial" "openmp" "pthreads" "cuda" "hip")
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --kokkos_build_type=*)
+            option="${arg#*=}"
+            if [[ " ${valid_kokkos_build_types[*]} " == *" $option "* ]]; then
+                kokkos_build_type="$option"
+            else
+                echo "Error: Invalid --kokkos_build_type specified."
+                show_help
+                return 1
+            fi
+            ;;
+        --num_jobs=*)
+            num_jobs="${arg#*=}"
+            if ! [[ "$num_jobs" =~ ^[0-9]+$ ]]; then
+                echo "Error: Invalid --num_jobs value. Must be a positive integer."
+                show_help
+                return 1
+            fi
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Check if required options are specified
+if [ -z "$kokkos_build_type" ]; then
+    echo "Error: --kokkos_build_type is a required options."
+    show_help
+    return 1
+fi
+
+# Now you can use $kokkos_build_type in your code or build commands
+echo "Kokkos build type will be: $kokkos_build_type"
+
+# Determine the script's directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Script location: $SCRIPT_DIR"
+
+# Determine the parent directory of the script's directory
+PARENT_DIR=$(dirname $(dirname "${SCRIPT_DIR}"))
+
+# make lib directory to store all dependencies
+LIB_DIR="$PARENT_DIR/lib"
+mkdir -p "$LIB_DIR"
+
+# Define kokkos directories
+KOKKOS_SOURCE_DIR="$LIB_DIR/kokkos"
+KOKKOS_INSTALL_DIR="$LIB_DIR/kokkos/install_kokkos_$kokkos_build_type"
+KOKKOS_BUILD_DIR="$LIB_DIR/kokkos/build_kokkos_$kokkos_build_type"
+
+# Check if the 'kokkos' directory exists in the parent directory; if not, clone it
+if [ ! -d "$KOKKOS_SOURCE_DIR" ]; then
+  echo "Directory 'kokkos' does not exist in '$LIB_DIR', downloading 'kokkos'...."
+  git clone --depth 1 https://github.com/kokkos/kokkos.git "$KOKKOS_SOURCE_DIR"
+else
+  echo "Directory 'kokkos' exists in '$LIB_DIR', skipping 'kokkos' download"
+fi
+
+# Configure kokkos using CMake
+cmake_options=(
+    -D CMAKE_BUILD_TYPE=Release
+    -D CMAKE_INSTALL_PREFIX="${KOKKOS_INSTALL_DIR}"
+    -D CMAKE_CXX_STANDARD=17
+    -D Kokkos_ENABLE_SERIAL=ON
+    -D Kokkos_ARCH_NATIVE=ON
+    -D Kokkos_ENABLE_TESTS=OFF
+    -D BUILD_TESTING=OFF
+)
+
+if [ "$kokkos_build_type" = "openmp" ]; then
+    cmake_options+=(
+        -D Kokkos_ENABLE_OPENMP=ON
+    )
+elif [ "$kokkos_build_type" = "pthreads" ]; then
+    cmake_options+=(
+        -D Kokkos_ENABLE_THREADS=ON
+    )
+elif [ "$kokkos_build_type" = "cuda" ]; then
+    cmake_options+=(
+        -D Kokkos_ENABLE_CUDA=ON
+        -D Kokkos_ENABLE_CUDA_CONSTEXPR=ON
+        -D Kokkos_ENABLE_CUDA_LAMBDA=ON
+        -D Kokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE=ON
+    )
+elif [ "$kokkos_build_type" = "hip" ]; then
+    cmake_options+=(
+        -D CMAKE_CXX_COMPILER=hipcc
+        -D Kokkos_ENABLE_HIP=ON
+        -D Kokkos_ENABLE_HIP_RELOCATABLE_DEVICE_CODE=ON
+    )
+fi
+
+# Print CMake options for reference
+echo "CMake Options: ${cmake_options[@]}"
+
+# Configure kokkos
+cmake "${cmake_options[@]}" -B "$KOKKOS_BUILD_DIR" -S "$KOKKOS_SOURCE_DIR"
+
+# Build kokkos
+echo "Building kokkos..."
+make -C "$KOKKOS_BUILD_DIR" -j"$num_jobs"
+
+# Install kokkos
+echo "Installing kokkos..."
+make -C "$KOKKOS_BUILD_DIR" install
+
+echo "kokkos installation complete."
+
diff --git a/src/LS-EVPFFT/scripts/install-scripts/install_matar.sh b/src/LS-EVPFFT/scripts/install-scripts/install_matar.sh
new file mode 100644
index 000000000..c1a38e749
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/install-scripts/install_matar.sh
@@ -0,0 +1,121 @@
+#!/bin/bash -e
+
+
+#!/bin/bash -e
+
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Valid options:"
+    echo "  --kokkos_build_type=<serial|openmp|pthreads|cuda|hip>"
+    echo "  --num_jobs=<number>: Number of jobs for 'make' (default: 1, on Mac use 1)"
+    echo "  --help: Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+kokkos_build_type=""
+num_jobs=1
+
+# Define arrays of valid options
+valid_kokkos_build_types=("serial" "openmp" "pthreads" "cuda" "hip")
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --kokkos_build_type=*)
+            option="${arg#*=}"
+            if [[ " ${valid_kokkos_build_types[*]} " == *" $option "* ]]; then
+                kokkos_build_type="$option"
+            else
+                echo "Error: Invalid --kokkos_build_type specified."
+                show_help
+                return 1
+            fi
+            ;;
+        --num_jobs=*)
+            num_jobs="${arg#*=}"
+            if ! [[ "$num_jobs" =~ ^[0-9]+$ ]]; then
+                echo "Error: Invalid --num_jobs value. Must be a positive integer."
+                show_help
+                return 1
+            fi
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Check if required options are specified
+if [ -z "$kokkos_build_type" ]; then
+    echo "Error: --kokkos_build_type is a required options."
+    show_help
+    return 1
+fi
+
+# Now you can use $kokkos_build_type in your code or build commands
+echo "Kokkos build type will be: $kokkos_build_type"
+
+# Determine the script's directory
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Script location: $SCRIPT_DIR"
+
+# Determine the parent directory of the script's directory
+PARENT_DIR=$(dirname $(dirname "${SCRIPT_DIR}"))
+
+# make lib directory to store all dependencies
+LIB_DIR="$PARENT_DIR/lib"
+mkdir -p "$LIB_DIR"
+
+# Define MATAR and kokkos directories
+MATAR_SOURCE_DIR="$LIB_DIR/MATAR"
+MATAR_INSTALL_DIR="$LIB_DIR/MATAR/install_MATAR_$kokkos_build_type"
+MATAR_BUILD_DIR="$LIB_DIR/MATAR/build_MATAR_$kokkos_build_type"
+KOKKOS_INSTALL_DIR="${LIB_DIR}/kokkos/install_kokkos_$kokkos_build_type"
+
+# Check if the 'MATAR' directory exists in the parent directory; if not, clone it
+if [ ! -d "$MATAR_SOURCE_DIR" ]; then
+  echo "Directory 'MATAR' does not exist in '$LIB_DIR', downloading 'MATAR'...."
+  git clone --depth 1 https://github.com/lanl/MATAR.git "$MATAR_SOURCE_DIR"
+else
+  echo "Directory 'MATAR' exists in '$LIB_DIR', skipping 'MATAR' download"
+fi
+
+cmake_options=(
+    -D CMAKE_BUILD_TYPE=Release
+    -D CMAKE_INSTALL_PREFIX="${MATAR_INSTALL_DIR}"
+    -D CMAKE_PREFIX_PATH="${KOKKOS_INSTALL_DIR}"
+)
+
+if [ "$kokkos_build_type" = "none" ]; then
+    cmake_options+=(
+        -D Matar_ENABLE_KOKKOS=OFF
+    )
+else
+    cmake_options+=(
+        -D Matar_ENABLE_KOKKOS=ON
+    )
+fi
+
+# Print CMake options for reference
+echo "CMake Options: ${cmake_options[@]}"
+
+# Configure Matar
+cmake "${cmake_options[@]}" -B "${MATAR_BUILD_DIR}" -S "${MATAR_SOURCE_DIR}"
+
+# Build Matar
+echo "Building Matar..."
+make -C ${MATAR_BUILD_DIR} -j"$num_jobs"
+
+# Install Matar
+echo "Installing Matar..."
+make -C ${MATAR_BUILD_DIR} install
+
+echo "Matar installation complete"
+
diff --git a/src/LS-EVPFFT/scripts/machines/darwin-env.sh b/src/LS-EVPFFT/scripts/machines/darwin-env.sh
new file mode 100644
index 000000000..6d3c9db44
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/machines/darwin-env.sh
@@ -0,0 +1,69 @@
+#!/bin/bash -e
+show_help() {
+    echo "Usage: source $(basename "$BASH_SOURCE") [OPTION]"
+    echo "Valid options:"
+    echo "  --env_type=<serial|openmp|pthreads|cuda|hip>"
+    echo "  --help : Display this help message"
+    return 1
+}
+
+# Initialize variables with default values
+env_type=""
+
+# Define arrays of valid options
+valid_env_types=("serial" "openmp" "pthreads" "cuda" "hip")
+
+# Parse command line arguments
+for arg in "$@"; do
+    case "$arg" in
+        --env_type=*)
+            option="${arg#*=}"
+            if [[ " ${valid_env_types[*]} " == *" $option "* ]]; then
+                env_type="$option"
+            else
+                echo "Error: Invalid --env_type specified."
+                show_help
+                return 1
+            fi
+            ;;
+        --help)
+            show_help
+            return 1
+            ;;
+        *)
+            echo "Error: Invalid argument or value specified."
+            show_help
+            return 1
+            ;;
+    esac
+done
+
+# Check if required options are specified
+if [ -z "$env_type" ]; then
+    echo "Error: --env_type are required options."
+    show_help
+    return 1
+fi
+### Load environment modules here
+### Assign names as relevant
+
+mygcc="gcc/9.4.0"
+#myclang="clang/13.0.0"
+mycuda="cuda/11.4.0"
+myrocm="rocm"
+#mympi="mpich/3.3.2-gcc_9.4.0"
+mympi="openmpi/3.1.6-gcc_9.4.0"
+
+module purge
+module load ${mympi}
+if [ "$env_type" = "cuda" ]; then
+    module load ${mygcc}
+    module load ${mycuda}
+elif [ "$env_type" = "hip" ]; then
+    module load ${mygcc}
+    module load ${myrocm}
+else
+    module load ${mygcc}
+fi
+module load cmake
+module -t list
diff --git a/src/LS-EVPFFT/scripts/machines/linux-env.sh b/src/LS-EVPFFT/scripts/machines/linux-env.sh
new file mode 100644
index 000000000..8171a54b8
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/machines/linux-env.sh
@@ -0,0 +1,13 @@
+# Assign path's to the current software you want to use.
+# These path's will need to be changed based on your computer
+# The default for homebrew should place packages as shown below
+
+#export CC=/opt/homebrew/opt/llvm/bin/clang
+#export CXX=/opt/homebrew/opt/llvm/bin/clang++
+#export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
+#export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
+#export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
+#export OMPI_CC=${CC}
+#export OMPI_CXX=${CXX}
+#export HDF5_ROOT=/path/to/where/HDF5/lib/and/include/folders/are/located
+#export FFTW_ROOT=/path/to/where/FFTW/lib/and/include/folders/are/located
diff --git a/src/LS-EVPFFT/scripts/machines/mac-env.sh b/src/LS-EVPFFT/scripts/machines/mac-env.sh
new file mode 100644
index 000000000..16f7e94a3
--- /dev/null
+++ b/src/LS-EVPFFT/scripts/machines/mac-env.sh
@@ -0,0 +1,13 @@
+# Assign path's to the current software you want to use.
+# These path's will need to be changed based on your computer
+# The default for homebrew should place packages as shown below
+
+export CC=/opt/homebrew/opt/llvm/bin/clang
+export CXX=/opt/homebrew/opt/llvm/bin/clang++
+export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
+export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
+export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
+export OMPI_CC=${CC}
+export OMPI_CXX=${CXX}
+#export HDF5_ROOT=/path/to/where/HDF5/lib/and/include/folders/are/located
+#export FFTW_ROOT=/path/to/where/FFTW/lib/and/include/folders/are/located
diff --git a/src/LS-EVPFFT/src/CMakeLists.txt b/src/LS-EVPFFT/src/CMakeLists.txt
new file mode 100644
index 000000000..7c4e3fc22
--- /dev/null
+++ b/src/LS-EVPFFT/src/CMakeLists.txt
@@ -0,0 +1,166 @@
+cmake_minimum_required(VERSION 3.17)
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
+
+if(NOT "${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")
+  add_definitions(-DBUILD_LS_EVPFFT_FIERRO=1)
+endif()
+
+if (BUILD_LS_EVPFFT_FIERRO)
+  set(This user_material_models)
+else()
+  set(This ls-evpfft)
+  project (${This})
+enable_language(C)
+
+# CMAKE_BUILD_TYPE:
+# 1. Release: `-O3 -DNDEBUG`
+# 2. Debug: `-O0 -g`
+# 3. RelWithDebInfo: `-O2 -g -DNDEBUG`
+# 4. MinSizeRel: `-Os -DNDEBUG`
+if (NOT CMAKE_BUILD_TYPE)
+  set(CMAKE_BUILD_TYPE RelWithDebInfo)
+endif(NOT CMAKE_BUILD_TYPE)
+endif()
+
+# Set default values for EVPFFT options
+set(DEFAULT_TWO_SIGN_SLIP_SYSTEMS OFF)
+set(DEFAULT_NON_SCHMID_EFFECTS OFF)
+set(DEFAULT_ABSOLUTE_NO_OUTPUT OFF)
+set(DEFAULT_ENABLE_PROFILING OFF)
+
+# User-configurable options
+set(TWO_SIGN_SLIP_SYSTEMS ${DEFAULT_TWO_SIGN_SLIP_SYSTEMS} CACHE BOOL "Enable two sign slip systems")
+set(NON_SCHMID_EFFECTS ${DEFAULT_NON_SCHMID_EFFECTS} CACHE BOOL "Enable non-Schmid effects")
+set(ABSOLUTE_NO_OUTPUT ${DEFAULT_ABSOLUTE_NO_OUTPUT} CACHE BOOL "Enable ABSOLUTE_NO_OUTPUT")
+set(ENABLE_PROFILING ${DEFAULT_ENABLE_PROFILING} CACHE BOOL "Enable profiling each function in EVPFFT")
+# ...
+
+# Print out the CMake options
+message("EVPFFT will be built With those CMAKE options:")
+message("  TWO_SIGN_SLIP_SYSTEMS: ${TWO_SIGN_SLIP_SYSTEMS}")
+message("  NON_SCHMID_EFFECTS: ${NON_SCHMID_EFFECTS}")
+message("  ABSOLUTE_NO_OUTPUT: ${ABSOLUTE_NO_OUTPUT}")
+message("  ENABLE_PROFILING: ${ENABLE_PROFILING}")
+
+if(TWO_SIGN_SLIP_SYSTEMS)
+  add_definitions(-DTWO_SIGN_SLIP_SYSTEMS=1)
+endif()
+
+if(NON_SCHMID_EFFECTS)
+  add_definitions(-DNON_SCHMID_EFFECTS=1)
+endif()
+
+if(ABSOLUTE_NO_OUTPUT)
+  add_definitions(-DABSOLUTE_NO_OUTPUT=1)
+endif()
+
+if (ENABLE_PROFILING)
+  add_definitions(-DENABLE_PROFILING=1)
+endif()
+
+# HAVE_KOKKOS must be defined for MATAR to build Kokkos types
+add_definitions(-DHAVE_KOKKOS=1)
+
+if (NOT BUILD_LS_EVPFFT_FIERRO)
+  if (USE_CUFFT)
+    find_package(CUDAToolkit REQUIRED)
+  endif()
+  find_package(Kokkos REQUIRED)
+endif()
+find_package(MPI REQUIRED)
+find_package(Heffte REQUIRED)
+# HDF5
+find_package(HDF5 REQUIRED)
+if (NOT HDF5_IS_PARALLEL)
+    message(FATAL_ERROR " -- HDF5 version is not parallel.")
+endif (NOT HDF5_IS_PARALLEL)
+
+set( Sources
+     main.cpp
+     command_line_args.cpp
+     evpfft.cpp
+     allocate_memory.cpp
+     utilities.cpp
+     math_functions.cpp
+     chg_basis.cpp
+     voigt.cpp
+     matrix_exp.cpp
+     determinant33.cpp
+     euler.cpp
+     defgrad_dcmp.cpp
+     vpsc_input.cpp
+     data_crystal.cpp
+     data_crystal_elast.cpp
+     data_grain.cpp
+     step.cpp
+     vm.cpp
+     update_orient.cpp
+     orient.cpp
+     update_schmid.cpp
+     update_defgradp.cpp
+     update_defgrad.cpp
+     update_defgrade.cpp
+     calc_c0.cpp
+     update_el_stiff.cpp
+     Cauchy_to_PK1.cpp
+     calc_c066mod.cpp
+     update_grid_velgrad.cpp
+     forward_backward_fft.cpp
+     inverse_the_greens.cpp
+     initialize_velgrad.cpp
+     evpal.cpp
+     get_smacro.cpp
+     kinhard_param.cpp
+     harden.cpp
+     output_file_manager.cpp
+     write_macro_state.cpp
+     write_micro_state.cpp
+     write_texture.cpp
+     hdf5_io_functions.cpp
+     XdmfUniformGridWriter.cpp
+     Profiler.cpp)
+
+if (BUILD_LS_EVPFFT_FIERRO)
+  list(APPEND Sources
+       Fierro-LS-EVPFFT-Link/UserStrengthModel.cpp 
+       Fierro-LS-EVPFFT-Link/UserEOSModel.cpp
+       Fierro-LS-EVPFFT-Link/init_state_vars.cpp)
+
+  add_library(${This} STATIC ${Sources})
+  target_link_libraries(${This} PRIVATE ${Trilinos_LIBRARIES} ${Trilinos_TPL_LIBRARIES} Elements)
+  target_include_directories(${This} PUBLIC Fierro-LS-EVPFFT-Link)
+  target_include_directories(${This} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
+else ()
+  find_package(Matar REQUIRED)
+  add_executable(${This} ${Sources})
+  target_link_libraries(${This} PRIVATE matar Kokkos::kokkos)
+endif ()
+target_link_libraries(${This} PRIVATE Heffte::Heffte MPI::MPI_CXX ${HDF5_LIBRARIES})
+
+install(
+  TARGETS ${This}
+)
+
+if (USE_FFTW)
+  add_definitions(-DUSE_FFTW=1)
+elseif (USE_MKL)
+  add_definitions(-DUSE_MKL=1)
+elseif (USE_CUFFT)
+  add_definitions(-DUSE_CUFFT=1)
+elseif (USE_ROCFFT)
+  add_definitions(-DUSE_ROCFFT=1)
+else ()
+  message(FATAL_ERROR "\nPlease specify what FFT backend to use: -DUSE_FFTW=1, -DUSE_MKL=1, -DUSE_CUFFT=1, or -DUSE_ROCFFT=1\n") 
+endif ()
+
+if (CUDA)
+  add_definitions(-DHAVE_CUDA=1)
+elseif (HIP)
+  add_definitions(-DHAVE_HIP=1)
+elseif (OPENMP)
+  add_definitions(-DHAVE_OPENMP=1)
+elseif (THREADS)
+  add_definitions(-DHAVE_THREADS=1)
+endif()
+
diff --git a/src/LS-EVPFFT/src/Cauchy_to_PK1.cpp b/src/LS-EVPFFT/src/Cauchy_to_PK1.cpp
new file mode 100644
index 000000000..9a4733093
--- /dev/null
+++ b/src/LS-EVPFFT/src/Cauchy_to_PK1.cpp
@@ -0,0 +1,66 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+#include "reduction_data_structures.h"
+
+void EVPFFT::Cauchy_to_PK1()
+{
+  Profiler profiler(__FUNCTION__);
+
+  const size_t n = 9; // for sgavg (3,3)
+  ArrayType <real_t, n> all_reduce;
+
+  MatrixTypeRealDual sgavg(3,3);
+ 
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    // averages packed in array
+    int ic;
+    ic = -1;
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        ic = ic + 1;
+        loc_reduce.array[ic] += sg(ii,jj,i,j,k) * wgtc(i,j,k);
+      }
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  // averages unpacked from array
+  int ic;
+  ic = -1;
+  
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      sgavg.host(ii,jj) = all_reduce.array[ic];
+    }
+  }
+  sgavg.update_device();
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        sgPK1(ii,jj,i,j,k) = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          sgPK1(ii,jj,i,j,k) += (sg(ii,kk,i,j,k) - sgavg(ii,kk))*defgradinv(jj,kk,i,j,k)*detF(i,j,k);
+        }
+      }
+    }
+  
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/FFT3D_R2C.h b/src/LS-EVPFFT/src/FFT3D_R2C.h
new file mode 100644
index 000000000..ef7f0480b
--- /dev/null
+++ b/src/LS-EVPFFT/src/FFT3D_R2C.h
@@ -0,0 +1,136 @@
+#pragma once
+
+#include <assert.h>
+
+#ifdef USE_FFTW
+#include "fftw3.h"
+#elif USE_CUFFT
+#include <cufft.h>
+#endif
+
+template <typename T>
+class FFT3D_R2C
+{    
+public:
+    const size_t N1;
+    const size_t N2;
+    const size_t N3;
+    int result;
+#ifdef USE_FFTW
+    fftwf_plan planf_forward;
+    fftwf_plan planf_backward;
+    fftw_plan plan_forward;
+    fftw_plan plan_backward;
+#elif USE_CUFFT
+    cufftHandle plan_forward;
+    cufftHandle plan_backward;
+#endif
+    FFT3D_R2C(size_t N1_, size_t N2_, size_t N3_);
+    void make_plan(T* input, T*output);
+    void forward(T* input, T* output);
+    void backward(T* input, T* output);
+
+private:
+    void make_plan_(float* input, float* output);
+    void make_plan_(double* input, double* output);
+    void forward_(float* input, float* output);
+    void forward_(double* input, double* output);
+    void backward_(float* input, float* output);
+    void backward_(double* input, double* output);
+};
+
+template <typename T>
+FFT3D_R2C<T>::FFT3D_R2C(size_t N1_, size_t N2_, size_t N3_)
+    : N1(N1_)
+    , N2(N2_)
+    , N3(N3_)
+{
+}
+
+template <typename T>
+void FFT3D_R2C<T>::make_plan(T* input, T* output)
+{
+    make_plan_(input, output);
+}
+
+template <typename T>
+void FFT3D_R2C<T>::forward(T* input, T* output)
+{
+    forward_(input, output);
+}
+
+template <typename T>
+void FFT3D_R2C<T>::backward(T* input, T* output)
+{
+    backward_(input, output);
+}
+
+template <typename T>
+void FFT3D_R2C<T>::make_plan_(float* input, float* output)
+{
+#ifdef USE_FFTW
+    planf_forward = fftwf_plan_dft_r2c_3d(N1, N2, N3, (float *) input, (fftwf_complex *) output, FFTW_ESTIMATE);
+    planf_backward = fftwf_plan_dft_c2r_3d(N1, N2, N3, (fftwf_complex *) output, (float *) input, FFTW_ESTIMATE);
+#elif USE_CUFFT
+    cufftPlan3d(&plan_forward, N1, N2, N3, CUFFT_R2C);
+    cufftPlan3d(&plan_backward, N1, N2, N3, CUFFT_C2R);
+#endif
+}
+
+template <typename T>
+void FFT3D_R2C<T>::make_plan_(double* input, double* output)
+{
+#ifdef USE_FFTW
+    plan_forward = fftw_plan_dft_r2c_3d(N1, N2, N3, (double *) input, (fftw_complex *) output, FFTW_ESTIMATE);
+    plan_backward = fftw_plan_dft_c2r_3d(N1, N2, N3, (fftw_complex *) output, (double *) input, FFTW_ESTIMATE);
+#elif USE_CUFFT
+    cufftPlan3d(&plan_forward, N1, N2, N3, CUFFT_D2Z);
+    cufftPlan3d(&plan_backward, N1, N2, N3, CUFFT_Z2D);
+#endif
+}
+
+template <typename T>
+void FFT3D_R2C<T>::forward_(float* input, float* output)
+{
+#ifdef USE_FFTW
+    fftwf_execute_dft_r2c(planf_forward, (float *) input, (fftwf_complex *) output);
+#elif USE_CUFFT
+    result = cufftExecR2C(plan_forward, (cufftReal *) input, (cufftComplex *) output);
+    assert(result == CUFFT_SUCCESS);
+#endif
+}
+
+template <typename T>
+void FFT3D_R2C<T>::forward_(double* input, double* output)
+{
+#ifdef USE_FFTW
+    fftw_execute_dft_r2c(plan_forward, (double *) input, (fftw_complex *) output);
+#elif USE_CUFFT
+    result = cufftExecD2Z(plan_forward, (cufftDoubleReal *) input, (cufftDoubleComplex *) output);
+    assert(result == CUFFT_SUCCESS);
+#endif
+}
+
+template <typename T>
+void FFT3D_R2C<T>::backward_(float* input, float* output)
+{
+#ifdef USE_FFTW
+    fftwf_execute_dft_c2r(planf_backward, (fftwf_complex *) input, (float *) output);
+#elif USE_CUFFT
+    result = cufftExecC2R(plan_backward, (cufftComplex *) input, (cufftReal *) output);
+    assert(result == CUFFT_SUCCESS);
+#endif
+}
+
+template <typename T>
+void FFT3D_R2C<T>::backward_(double* input, double* output)
+{
+if (!plan_backward) printf("Not Good!\n");
+#ifdef USE_FFTW
+    fftw_execute_dft_c2r(plan_backward, (fftw_complex *) input, (double *) output);
+#elif USE_CUFFT
+    result = cufftExecZ2D(plan_backward, (cufftDoubleComplex *) input, (cufftDoubleReal *) output);
+    assert(result == CUFFT_SUCCESS);
+#endif
+}
+
diff --git a/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.cpp b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.cpp
new file mode 100644
index 000000000..2d1aaa82f
--- /dev/null
+++ b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.cpp
@@ -0,0 +1,52 @@
+#include "UserEOSModel.h"
+
+KOKKOS_FUNCTION
+UserEOSModel::UserEOSModel(
+  const DCArrayKokkos <material_t> &material,
+  const DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const size_t mat_id,
+  const size_t elem_gid)
+{
+ sound_speed = global_vars(mat_id,1); 
+}
+
+KOKKOS_FUNCTION
+UserEOSModel::~UserEOSModel()
+{
+}
+
+KOKKOS_FUNCTION
+int UserEOSModel::calc_sound_speed(
+  const DViewCArrayKokkos <double> &elem_pres,
+  const DViewCArrayKokkos <double> &elem_stress,
+  const size_t elem_gid,
+  const size_t mat_id,
+  const DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const DViewCArrayKokkos <double> &elem_sspd,
+  const double den,
+  const double sie)
+{
+  elem_sspd(elem_gid) = sound_speed; // sound speed
+  return 0;
+}
+
+KOKKOS_FUNCTION
+int UserEOSModel::calc_pressure(
+  const DViewCArrayKokkos <double> &elem_pres,
+  const DViewCArrayKokkos <double> &elem_stress,
+  const size_t elem_gid,
+  const size_t mat_id,
+  const DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const DViewCArrayKokkos <double> &elem_sspd,
+  const double den,
+  const double sie)
+{
+   elem_pres(elem_gid) = 0.0;  // pressure
+  return 0;
+}
diff --git a/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.h b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.h
new file mode 100644
index 000000000..301b790b2
--- /dev/null
+++ b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserEOSModel.h
@@ -0,0 +1,50 @@
+#pragma once
+
+#include "material_models.h"
+
+
+/* UserEOSModel */
+class UserEOSModel : public EOSParent {
+public:
+
+  KOKKOS_FUNCTION
+  UserEOSModel(
+    const DCArrayKokkos <material_t> &material,
+    const DCArrayKokkos <double> &state_vars,
+    const DCArrayKokkos <double> &global_vars,
+    const DCArrayKokkos <double> &elem_user_output_vars,
+    const size_t mat_id,
+    const size_t elem_gid);
+
+  KOKKOS_FUNCTION
+  ~UserEOSModel();
+
+  KOKKOS_FUNCTION
+  int calc_sound_speed(
+    const DViewCArrayKokkos <double> &elem_pres,
+    const DViewCArrayKokkos <double> &elem_stress,
+    const size_t elem_gid,
+    const size_t mat_id,
+    const DCArrayKokkos <double> &state_vars,
+    const DCArrayKokkos <double> &global_vars,
+    const DCArrayKokkos <double> &elem_user_output_vars,
+    const DViewCArrayKokkos <double> &elem_sspd,
+    const double den,
+    const double sie) override;
+
+  KOKKOS_FUNCTION
+  int calc_pressure(
+    const DViewCArrayKokkos <double> &elem_pres,
+    const DViewCArrayKokkos <double> &elem_stress,
+    const size_t elem_gid,
+    const size_t mat_id,
+    const DCArrayKokkos <double> &state_vars,
+    const DCArrayKokkos <double> &global_vars,
+    const DCArrayKokkos <double> &elem_user_output_vars,
+    const DViewCArrayKokkos <double> &elem_sspd,
+    const double den,
+    const double sie) override;
+
+private:
+  double sound_speed;
+};
diff --git a/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.cpp b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.cpp
new file mode 100644
index 000000000..a78594e00
--- /dev/null
+++ b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.cpp
@@ -0,0 +1,99 @@
+#include "UserStrengthModel.h"
+#include <string>
+#include "evpfft.h"
+
+MPI_Comm evpfft_mpi_comm = MPI_COMM_NULL;
+
+KOKKOS_FUNCTION
+UserStrengthModel::UserStrengthModel(
+  const DCArrayKokkos <material_t> &material,
+  const DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const size_t mat_id,
+  const size_t elem_gid)
+{
+    // First, lets create a new communicator with each rank having its own communicator containing only itself.
+    if (evpfft_mpi_comm == MPI_COMM_NULL) {
+      int global_rank;
+      MPI_Comm_rank(MPI_COMM_WORLD, &global_rank);
+      MPI_Comm_split(MPI_COMM_WORLD, global_rank, global_rank, &evpfft_mpi_comm);
+    }
+    //...
+
+    // Input files for for multiple materials should be names as evpfft1.in, evpfft2.in, etc.
+    std::string filename = "evpfft" + std::to_string(mat_id+1) + ".in";
+
+    real_t stress_scale = 1.0; // 1.0e-5; // used to convert MPa to MegaBar
+    real_t time_scale = 1.0; // 1.0e+6; // used to convert second to microsecond
+
+    CommandLineArgs cmd;
+    cmd.input_filename = filename; //"evpfft.in";
+    cmd.micro_filetype = 0;
+    cmd.check_cmd_args();
+
+    evpfft_ptr = new EVPFFT(evpfft_mpi_comm, cmd,stress_scale,time_scale);
+}
+
+KOKKOS_FUNCTION
+UserStrengthModel::~UserStrengthModel()
+{
+    delete evpfft_ptr;
+}
+
+KOKKOS_FUNCTION
+int UserStrengthModel::calc_stress(
+  const DViewCArrayKokkos <double> &elem_pres,
+  const DViewCArrayKokkos <double> &elem_stress,
+  const size_t elem_gid,
+  const size_t mat_id,
+  const DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const DViewCArrayKokkos <double> &elem_sspd,
+  const double den,
+  const double sie,
+  const ViewCArrayKokkos <double> &vel_grad,
+  const ViewCArrayKokkos <size_t> &elem_node_gids,
+  const DViewCArrayKokkos <double> &node_coords,
+  const DViewCArrayKokkos <double> &node_vel,
+  const double vol,
+  const double dt, 
+  const double rk_alpha,
+  const size_t cycle,
+  const size_t rk_level)
+{
+    real_t dt_rk = dt; // since using rk_num_stages = 1
+
+    // Note EVPFFT uses F-layout while Fierro uses C-layout
+    FArray <double> Fvel_grad(3,3);
+    FArray <double> Fstress(3,3);
+    // Transpose vel_grad
+    for (int i = 0; i < 3; i++) {
+      for (int j = 0; j < 3; j++) {
+        Fvel_grad(i,j) = vel_grad(i,j);
+        Fstress(i,j) = elem_stress.host(rk_level,elem_gid,i,j);
+      }
+    }
+
+    double udotAccTh = global_vars.host(mat_id,0); // Linear Aprox. Threshold
+    evpfft_ptr->solve(Fvel_grad.pointer(), Fstress.pointer(), dt_rk, cycle, elem_gid, udotAccTh);
+
+    // Transpose stress. Not needed, stress is symmetric. But why not.
+    for (int i = 0; i < 3; i++) {
+      for (int j = 0; j < 3; j++) {
+        elem_stress.host(rk_level,elem_gid,i,j) = Fstress(i,j);
+      }
+    }
+
+    // write into elem_state_vars for output
+    // evm,evmp,dvm,dvmp,svm
+    elem_user_output_vars.host(elem_gid,0) = evpfft_ptr->evm;
+    elem_user_output_vars.host(elem_gid,1) = evpfft_ptr->evmp;
+    elem_user_output_vars.host(elem_gid,2) = evpfft_ptr->dvm;
+    elem_user_output_vars.host(elem_gid,3) = evpfft_ptr->dvmp;
+    elem_user_output_vars.host(elem_gid,4) = evpfft_ptr->svm;
+
+    return 0;
+}
+
diff --git a/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.h b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.h
new file mode 100644
index 000000000..72b765ae1
--- /dev/null
+++ b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/UserStrengthModel.h
@@ -0,0 +1,46 @@
+#pragma once
+
+#include "material_models.h"
+
+class EVPFFT;
+
+/* UserStrengthModel */
+class UserStrengthModel : public StrengthParent {
+public:
+  KOKKOS_FUNCTION
+  UserStrengthModel(
+    const DCArrayKokkos <material_t> &material,
+    const DCArrayKokkos <double> &state_vars,
+    const DCArrayKokkos <double> &global_vars,
+    const DCArrayKokkos <double> &elem_user_output_vars,
+    const size_t mat_id,
+    const size_t elem_gid);
+
+  KOKKOS_FUNCTION
+  ~UserStrengthModel();
+
+  KOKKOS_FUNCTION
+  int calc_stress(
+    const DViewCArrayKokkos <double> &elem_pres,
+    const DViewCArrayKokkos <double> &elem_stress,
+    const size_t elem_gid,
+    const size_t mat_id,
+    const DCArrayKokkos <double> &state_vars,
+    const DCArrayKokkos <double> &global_vars,
+    const DCArrayKokkos <double> &elem_user_output_vars,
+    const DViewCArrayKokkos <double> &elem_sspd,
+    const double den,
+    const double sie,
+    const ViewCArrayKokkos <double> &vel_grad,
+    const ViewCArrayKokkos <size_t> &elem_node_gids,
+    const DViewCArrayKokkos <double> &node_coords,
+    const DViewCArrayKokkos <double> &node_vel,
+    const double vol,
+    const double dt, 
+    const double rk_alpha,
+    const size_t cycle,
+    const size_t rk_level) override;
+
+public:
+  EVPFFT *evpfft_ptr; 
+};
diff --git a/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/init_state_vars.cpp b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/init_state_vars.cpp
new file mode 100644
index 000000000..f7cde4d41
--- /dev/null
+++ b/src/LS-EVPFFT/src/Fierro-LS-EVPFFT-Link/init_state_vars.cpp
@@ -0,0 +1,26 @@
+#include "Simulation_Parameters/Material.h"
+#include "matar.h"
+using namespace mtr;
+
+
+void init_state_vars(
+  const DCArrayKokkos <material_t> &material,
+  const DViewCArrayKokkos <size_t> &elem_mat_id,
+  DCArrayKokkos <double> &state_vars,
+  const DCArrayKokkos <double> &global_vars,
+  const DCArrayKokkos <double> &elem_user_output_vars,
+  const size_t num_elems)
+{
+
+    // initialize to zero
+    for (size_t elem_gid = 0; elem_gid<num_elems; elem_gid++) {
+        size_t mat_id = elem_mat_id.host(elem_gid);
+        size_t num_state_vars = material.host(mat_id).num_state_vars;
+        for(size_t var=0; var<num_state_vars; var++){
+            state_vars.host(elem_gid,var) = 0.0;
+        }
+    }
+    state_vars.update_device();
+    
+	  return;
+}
diff --git a/src/LS-EVPFFT/src/Profiler.cpp b/src/LS-EVPFFT/src/Profiler.cpp
new file mode 100644
index 000000000..419a90031
--- /dev/null
+++ b/src/LS-EVPFFT/src/Profiler.cpp
@@ -0,0 +1,47 @@
+#include "Profiler.h"
+#include "Kokkos_Core.hpp"
+
+// Define the static member outside the class
+std::unordered_map<std::string, double> Profiler::profileMap;
+
+Profiler::Profiler(const std::string& functionName)
+  : functionName(functionName) {
+#if ENABLE_PROFILING
+  Kokkos::fence();
+  startTime = std::chrono::high_resolution_clock::now();
+#endif
+}
+
+Profiler::~Profiler() {
+#if ENABLE_PROFILING
+  Kokkos::fence();
+  auto endTime = std::chrono::high_resolution_clock::now();
+  double durationSeconds = std::chrono::duration_cast<std::chrono::microseconds>(endTime - startTime).count() / 1e6; // Convert to seconds
+
+  profileMap[functionName] += durationSeconds;
+#endif
+}
+
+void Profiler::print(const MPI_Comm mpiComm) {
+#if ENABLE_PROFILING
+  int myRank, numRanks;
+  MPI_Comm_rank(mpiComm, &myRank);
+  MPI_Comm_size(mpiComm, &numRanks);
+
+  std::map<std::string, double> sortedMap(profileMap.begin(), profileMap.end());
+  std::vector<double> timeValue;
+  for (const auto& pair : sortedMap) {
+    timeValue.push_back(pair.second/numRanks);
+  }
+  MPI_Allreduce(MPI_IN_PLACE, timeValue.data(), timeValue.size(), MPI_DOUBLE, MPI_SUM, mpiComm);
+
+  if (0 == myRank) {
+    int i = 0;
+    for (const auto& entry : sortedMap) {
+      std::cout << entry.first << " - Time (seconds): " << timeValue[i] << std::endl;
+      i++;
+    }
+  }
+#endif
+}
+
diff --git a/src/LS-EVPFFT/src/Profiler.h b/src/LS-EVPFFT/src/Profiler.h
new file mode 100644
index 000000000..a70994dea
--- /dev/null
+++ b/src/LS-EVPFFT/src/Profiler.h
@@ -0,0 +1,22 @@
+#pragma once // Include guard to prevent multiple inclusion
+
+#include <iostream>
+#include <chrono>
+#include <string>
+#include <unordered_map>
+#include "mpi.h"
+
+class Profiler {
+public:
+  Profiler(const std::string& functionName);
+
+  ~Profiler();
+
+  static void print(const MPI_Comm mpiComm);
+
+private:
+  std::string functionName;
+  std::chrono::time_point<std::chrono::high_resolution_clock> startTime;
+  static std::unordered_map<std::string, double> profileMap;
+};
+
diff --git a/src/LS-EVPFFT/src/VTKUniformGridWriter.h b/src/LS-EVPFFT/src/VTKUniformGridWriter.h
new file mode 100644
index 000000000..706955af4
--- /dev/null
+++ b/src/LS-EVPFFT/src/VTKUniformGridWriter.h
@@ -0,0 +1,69 @@
+#pragma once
+
+#include <cstdio>
+#include <array>
+#include <string>
+#include <stdexcept>
+
+
+class VTKUniformGridWriter
+{
+public:
+  size_t N1;
+  size_t N2;
+  size_t N3;
+  FILE *vtk_file;
+
+  VTKUniformGridWriter(size_t N1_, size_t N2_, size_t N3_);
+  void open_file(const char* vtk_filename);
+  void write_header();
+  template <typename T>
+  void write_scalar_dataset(T* data_ptr, const char* dataName, const char* dataType);
+  void close_file();
+};
+
+
+VTKUniformGridWriter::VTKUniformGridWriter(size_t N1_, size_t N2_, size_t N3_)
+  : N1 (N1_)
+  , N2 (N2_)
+  , N3 (N3_)
+  , vtk_file (nullptr)
+{
+}
+
+void VTKUniformGridWriter::open_file(const char* vtk_filename)
+{
+  vtk_file = fopen(vtk_filename, "w");
+}
+
+void VTKUniformGridWriter::write_header()
+{
+  // write vtk file heading
+  fprintf(vtk_file, "%s\n", "# vtk DataFile Version 3.0");
+  fprintf(vtk_file, "%s\n", "Simulation outputs");
+  fprintf(vtk_file, "%s\n", "ASCII");
+  fprintf(vtk_file, "%s\n", "DATASET STRUCTURED_POINTS");
+
+  fprintf(vtk_file, "%s %d  %d  %d\n", "DIMENSIONS", N1, N2, N3);
+  fprintf(vtk_file, "%s %d %d %d\n", "ORIGIN", 0, 0, 0);
+  fprintf(vtk_file, "%s %d %d %d\n", "SPACING", 1, 1, 1);
+  fprintf(vtk_file, "%s %d\n", "POINT_DATA", N1*N2*N3);
+}
+
+template <typename T>
+void VTKUniformGridWriter::write_scalar_dataset(T* data_ptr, const char* dataName, const char* dataType)
+{
+  fprintf(vtk_file, "%s %s %s\n", "SCALARS", dataName, dataType);
+  fprintf(vtk_file, "%s\n", "LOOKUP_TABLE default");
+
+  for (int i = 0; i < N1*N2*N3; i++) {
+    fprintf(vtk_file, " %12.6E\n", data_ptr[i]);
+  }
+}
+
+void VTKUniformGridWriter::close_file()
+{
+  if (vtk_file != nullptr) {
+    fclose(vtk_file);
+  }
+}
diff --git a/src/LS-EVPFFT/src/XdmfUniformGridWriter.cpp b/src/LS-EVPFFT/src/XdmfUniformGridWriter.cpp
new file mode 100644
index 000000000..65d242a12
--- /dev/null
+++ b/src/LS-EVPFFT/src/XdmfUniformGridWriter.cpp
@@ -0,0 +1,59 @@
+#include "XdmfUniformGridWriter.h"
+
+XdmfUniformGridWriter::XdmfUniformGridWriter(int N1_, int N2_, int N3_)
+  : N1 (N1_)
+  , N2 (N2_)
+  , N3 (N3_)
+  , xdmf (nullptr)
+{
+}
+
+void XdmfUniformGridWriter::open_file(const char* xdmf_filename)
+{
+ xdmf = fopen(xdmf_filename, "w");
+}
+
+void XdmfUniformGridWriter::write_header(const char* gridName)
+{
+    fprintf(xdmf, "<?xml version=\"1.0\"?>\n");
+    fprintf(xdmf, "<!DOCTYPE Xdmf SYSTEM \"Xdmf.dtd\"[]>\n");
+    fprintf(xdmf, "<Xdmf xmlns:xi=\"http://www.w3.org/2003/XInclude\" Version=\"2.2\">\n");
+
+    fprintf(xdmf, " <Domain>\n");
+    fprintf(xdmf, "   <Grid Name=\"%s\" GridType=\"Uniform\">\n", gridName);
+    fprintf(xdmf, "     <Topology TopologyType=\"3DCoRectMesh\" Dimensions=\"%d %d %d \"></Topology>\n", N1+1, N2+1, N3+1);
+    fprintf(xdmf, "       <Geometry Type=\"ORIGIN_DXDYDZ\">\n");
+    fprintf(xdmf, "         <!-- Origin  Z, Y, X -->\n"); // comment
+    fprintf(xdmf, "         <DataItem Format=\"XML\" Dimensions=\"3\">%d %d %d</DataItem>\n", 0, 0, 0);
+    fprintf(xdmf, "         <!-- DxDyDz (Spacing/Resolution) Z, Y, X -->\n"); // comment
+    fprintf(xdmf, "         <DataItem Format=\"XML\" Dimensions=\"3\">%d %d %d</DataItem>\n", 1, 1, 1);
+    fprintf(xdmf, "       </Geometry>\n");
+}
+
+void XdmfUniformGridWriter::write_attribute(const char* aName, const char* aType,
+        const char* aNumberType, const char* aPrecision, const char* aLocation)
+{
+    fprintf(xdmf, "       <Attribute Name=\"%s\" AttributeType=\"%s\" Center=\"Cell\">\n", aName, aType);
+    fprintf(xdmf, "         <DataItem Format=\"HDF\" Dimensions=\"%d %d %d\" NumberType=\"%s\" Precision=\"%s\" >\n", N1, N2, N3, aNumberType, aPrecision);
+    fprintf(xdmf, "           %s\n", aLocation);
+    fprintf(xdmf, "         </DataItem>\n");
+    fprintf(xdmf, "       </Attribute>\n");
+}
+
+void XdmfUniformGridWriter::write_footer()
+{
+    fprintf(xdmf, "   </Grid>\n");
+    fprintf(xdmf, " </Domain>\n");
+    fprintf(xdmf, "</Xdmf>\n");
+}
+
+void XdmfUniformGridWriter::close_file()
+{
+  if (xdmf != nullptr) {
+    fclose(xdmf);
+  } 
+}
+
+XdmfUniformGridWriter::~XdmfUniformGridWriter()
+{
+}
diff --git a/src/LS-EVPFFT/src/XdmfUniformGridWriter.h b/src/LS-EVPFFT/src/XdmfUniformGridWriter.h
new file mode 100644
index 000000000..713249bf3
--- /dev/null
+++ b/src/LS-EVPFFT/src/XdmfUniformGridWriter.h
@@ -0,0 +1,46 @@
+#pragma once
+
+#include <cstdio>
+#include <array>
+#include <string>
+#include <stdexcept>
+
+struct XdmfUniformGridWriter
+{
+  int N1;
+  int N2;
+  int N3;
+  FILE *xdmf;
+
+  XdmfUniformGridWriter(int N1_, int N2_, int N3_);
+  ~XdmfUniformGridWriter();
+  void open_file(const char* xdmf_filename);
+  void write_header(const char* gridName);
+  void write_attribute(const char* aName, const char* aType, const char* aNumberType,
+                       const char* aPrecision, const char* aLocation);
+  void write_footer();
+  void close_file();
+
+  template <typename T>
+  void deduce_attribute_number_type(T *data_ptr, std::string &aNumberType, std::string &aPrecision);
+};
+
+template <typename T>
+void XdmfUniformGridWriter::deduce_attribute_number_type(T *data_ptr, std::string &aNumberType, std::string &aPrecision)
+{
+    if constexpr (std::is_same<T, int>::value) {
+        aNumberType = "Int";
+        aPrecision = "4";
+    }
+    else if constexpr (std::is_same<T, float>::value) {
+        aNumberType = "Float";
+        aPrecision = "4";
+    }
+    else if constexpr (std::is_same<T, double>::value) {
+        aNumberType = "Float";
+        aPrecision = "8";
+    }
+    else {
+        throw std::runtime_error("attribute_number_type cannot be deduced.\n");
+    }
+}
diff --git a/src/LS-EVPFFT/src/allocate_memory.cpp b/src/LS-EVPFFT/src/allocate_memory.cpp
new file mode 100644
index 000000000..9bec4a12f
--- /dev/null
+++ b/src/LS-EVPFFT/src/allocate_memory.cpp
@@ -0,0 +1,160 @@
+#include "evpfft.h"
+
+void EVPFFT::allocate_memory()
+{
+
+  fft = std::make_shared<FFT3D_R2C<heffte_backend,real_t>>(mpi_comm, std::array<int,3>{npts1_g,npts2_g,npts3_g});
+
+  npts1 = fft->localRealBoxSizes[my_rank][0];
+  npts2 = fft->localRealBoxSizes[my_rank][1];
+  npts3 = fft->localRealBoxSizes[my_rank][2];
+  local_start1 = fft->localRealBoxes[my_rank].low[0];
+  local_start2 = fft->localRealBoxes[my_rank].low[1];
+  local_start3 = fft->localRealBoxes[my_rank].low[2];
+  local_end1 = fft->localRealBoxes[my_rank].high[0];
+  local_end2 = fft->localRealBoxes[my_rank].high[1];
+  local_end3 = fft->localRealBoxes[my_rank].high[2];
+  wgt = real_t(1.0) / real_t(npts1_g*npts2_g*npts3_g);
+
+  npts1_g_cmplx = fft->globalComplexBoxSize[0];
+  npts2_g_cmplx = fft->globalComplexBoxSize[0];
+  npts3_g_cmplx = fft->globalComplexBoxSize[0];
+  npts1_cmplx = fft->localComplexBoxSizes[my_rank][0];
+  npts2_cmplx = fft->localComplexBoxSizes[my_rank][1];
+  npts3_cmplx = fft->localComplexBoxSizes[my_rank][2];
+  local_start1_cmplx = fft->localComplexBoxes[my_rank].low[0];
+  local_start2_cmplx = fft->localComplexBoxes[my_rank].low[1];
+  local_start3_cmplx = fft->localComplexBoxes[my_rank].low[2];
+  local_end1_cmplx = fft->localComplexBoxes[my_rank].high[0];
+  local_end2_cmplx = fft->localComplexBoxes[my_rank].high[1];
+  local_end3_cmplx = fft->localComplexBoxes[my_rank].high[2];
+
+  udot  = MatrixTypeRealDual (3,3);
+  dsim = MatrixTypeRealHost (3,3);
+  scauchy = MatrixTypeRealHost (3,3);
+  sdeviat = MatrixTypeRealHost (3,3);
+  tomtot = MatrixTypeRealDual (3,3);
+  delt = MatrixTypeRealHost (3);
+
+  disgradmacro = MatrixTypeRealDual (3,3);
+  dvelgradmacro = MatrixTypeRealHost (3,3);
+  scauav = MatrixTypeRealHost (3,3);
+  dvelgradmacroacum = MatrixTypeRealHost (3,3);
+  velmax = MatrixTypeRealHost (3);
+  iudot = MatrixTypeIntHost (3,3);
+  idsim = MatrixTypeIntHost (6);
+  iscau = MatrixTypeIntHost (6);
+  defgradinvavgc_inv = MatrixTypeRealHost (3,3);
+
+  dnca = MatrixTypeRealDual (3,NSYSMX,NPHMX);
+  dbca = MatrixTypeRealDual (3,NSYSMX,NPHMX);
+  schca = MatrixTypeRealDual (5,NSYSMX,NPHMX);
+  tau = MatrixTypeRealDual (NSYSMX,3,NPHMX);
+  hard = MatrixTypeRealDual (NSYSMX,NSYSMX,NPHMX);
+  thet = MatrixTypeRealDual (NSYSMX,2,NPHMX);
+  nrs = MatrixTypeIntDual (NSYSMX,NPHMX);
+  gamd0 = MatrixTypeRealDual (NSYSMX,NPHMX);
+#ifdef NON_SCHMID_EFFECTS
+  dtca = MatrixTypeRealHost (3,NSYSMX,NPHMX);
+  cns = MatrixTypeRealHost (5,NMODMX,NPHMX);
+  schcnon = MatrixTypeRealDual (5,NSYSMX,NPHMX);
+#endif
+
+  twsh = MatrixTypeRealHost (NSYSMX,NPHMX);
+  nsm = MatrixTypeIntHost (NMODMX,NPHMX);
+  nmodes = MatrixTypeIntHost (NPHMX);
+  nsyst = MatrixTypeIntDual (NPHMX);
+  ntwmod = MatrixTypeIntHost (NPHMX);
+  ntwsys = MatrixTypeIntHost (NPHMX);
+  isectw = MatrixTypeIntHost (NSYSMX,NPHMX);
+  icryst = MatrixTypeStringHost (NPHMX);
+
+  igas = MatrixTypeIntDual (NPHMX);
+
+  cc = MatrixTypeRealHost (3,3,3,3,NPHMX);
+  c0 = MatrixTypeRealDual (3,3,3,3);
+  s0 = MatrixTypeRealDual (3,3,3,3);
+  c066 = MatrixTypeRealDual (6,6);
+
+  scauav1 = MatrixTypeRealHost (3,3);
+
+  xk_gb = MatrixTypeRealDual (npts1);
+  yk_gb = MatrixTypeRealDual (npts2);
+  zk_gb = MatrixTypeRealDual (npts3);
+
+  sg = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  disgrad = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  velgrad = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  edotp = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  cg66 = MatrixTypeRealDual (6, 6, npts1, npts2, npts3);
+  ept = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  ag = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  crss = MatrixTypeRealDual (NSYSMX, 2, npts1, npts2, npts3);
+  sch = MatrixTypeRealDual (5, NSYSMX, npts1, npts2, npts3);
+  sgt = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  defgradp = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  defgrade = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  defgrad = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  defgradinv = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  defgradini = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  wgtc = MatrixTypeRealDual (npts1, npts2, npts3);
+  detF = MatrixTypeRealDual (npts1, npts2, npts3);
+  sgPK1 = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  c066mod = MatrixTypeRealDual (6, 6, npts1, npts2, npts3);
+  velgradref = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  xnode = MatrixTypeRealDual (3, npts1 + 1, npts2 + 1, npts3 + 1);
+#ifdef NON_SCHMID_EFFECTS
+  schnon = MatrixTypeRealDual (5, NSYSMX, npts1, npts2, npts3);
+#endif
+
+  gamdot = MatrixTypeRealDual (NSYSMX, npts1, npts2, npts3);
+  gacumgr = MatrixTypeRealDual (npts1, npts2, npts3);
+  //trialtau = MatrixTypeRealDual (NSYSMX, 2, npts1, npts2, npts3);
+  xkin = MatrixTypeRealDual (NSYSMX, npts1, npts2, npts3);
+
+  ph_array = MatrixTypeRealHost (npts1, npts2, npts3);
+  th_array = MatrixTypeRealHost (npts1, npts2, npts3);
+  om_array = MatrixTypeRealHost (npts1, npts2, npts3);
+  jphase = MatrixTypeIntDual (npts1, npts2, npts3);
+  jgrain = MatrixTypeIntHost (npts1, npts2, npts3);
+
+  work = MatrixTypeRealDual (3, 3, npts1, npts2, npts3);
+  workim = MatrixTypeRealDual (3, 3, npts1_cmplx, npts2_cmplx, npts3_cmplx);
+  data = MatrixTypeRealDual (npts1, npts2, npts3);
+  data_cmplx = MatrixTypeRealDual (2, npts1_cmplx, npts2_cmplx, npts3_cmplx);
+
+  epav = MatrixTypeRealHost (3,3);
+  edotpav = MatrixTypeRealHost (3,3);
+  velgradmacroactual = MatrixTypeRealHost (3,3);
+  disgradmacrot = MatrixTypeRealHost (3,3);
+  velgradmacro = MatrixTypeRealDual (3,3);
+
+  // for fierro linking
+  M66 = MatrixTypeRealHost (6,6);
+  edotp_avg = MatrixTypeRealHost (3,3);
+  dedotp66_avg = MatrixTypeRealHost (6,6);
+  cg66_avg = MatrixTypeRealHost (6,6);
+  sg66_avg = MatrixTypeRealHost (6,6);
+  udotAcc = MatrixTypeRealHost (3,3);
+
+  //... For file management
+  mpi_io_real_t = std::make_shared<Manager_MPI_IO<real_t,MPI_ORDER_FORTRAN>>
+    (mpi_comm, fft->globalRealBoxSize, fft->localRealBoxSizes[my_rank], fft->localRealBoxes[my_rank].low, 3, 0);
+
+  mpi_io_int = std::make_shared<Manager_MPI_IO<int,MPI_ORDER_FORTRAN>>
+    (mpi_comm, fft->globalRealBoxSize, fft->localRealBoxSizes[my_rank], fft->localRealBoxes[my_rank].low, 3, 0);
+
+  micro_writer = std::make_shared<MicroOutputWriter<MPI_ORDER_FORTRAN>>
+    (mpi_comm, hdf5_filename.c_str(), 3, fft->globalRealBoxSize.data(), fft->localRealBoxSizes[my_rank].data(),
+     fft->localRealBoxes[my_rank].low.data());
+  //...
+
+
+  // Device memory space
+  rss = MatrixTypeRealDevice (NSYSMX, npts1, npts2, npts3);
+  rss1 = MatrixTypeRealDevice (NSYSMX, npts1, npts2, npts3);
+  rss2 = MatrixTypeRealDevice (NSYSMX, npts1, npts2, npts3);
+
+  set_some_voxels_arrays_to_zero();
+  return;
+}
diff --git a/src/LS-EVPFFT/src/calc_c0.cpp b/src/LS-EVPFFT/src/calc_c0.cpp
new file mode 100644
index 000000000..3ae78411f
--- /dev/null
+++ b/src/LS-EVPFFT/src/calc_c0.cpp
@@ -0,0 +1,208 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "matrix_exp.h"
+#include "determinant33.h"
+#include "math_functions.h"
+#include "reduction_data_structures.h"
+
+void EVPFFT::calc_c0()
+{
+  Profiler profiler(__FUNCTION__);
+
+  const size_t n = 6*6; // for dP6_dF6(6,6)
+  ArrayType <real_t, n> all_reduce;
+
+  real_t dP6_dF6_[6*6];
+
+  // create views of thread private arrays
+  ViewMatrixTypeReal dP6_dF6(dP6_dF6_,6,6);
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    int jph;
+    int isign;
+
+    // thread private arrays
+    real_t sg6_[6];
+    real_t sgaux_[3*3];
+    real_t dedotp66_[6*6];
+    real_t sg66_[6*6];
+    real_t dsg6_de6_[6*6];
+    real_t dPg6_dF6_[6*6];
+    real_t dsg_de_[3*3*3*3];
+    real_t dPg_dF_[3*3*3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal sg6(sg6_,6);
+    ViewMatrixTypeReal sgaux(sgaux_,3,3);
+    ViewMatrixTypeReal dedotp66(dedotp66_,6,6);
+    ViewMatrixTypeReal sg66(sg66_,6,6);
+    ViewMatrixTypeReal dsg6_de6(dsg6_de6_,6,6);
+    ViewMatrixTypeReal dPg6_dF6(dPg6_dF6_,6,6);
+    ViewMatrixTypeReal dsg_de(dsg_de_,3,3,3,3);
+    ViewMatrixTypeReal dPg_dF(dPg_dF_,3,3,3,3);
+
+    jph = jphase(i,j,k);
+
+    if (igas(jph) == 0) {
+
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          sg66(ii,jj) = cg66(ii,jj,i,j,k);
+        }
+      }
+
+      invert_matrix <6> (sg66.pointer());
+      
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sgaux(ii,jj)     = sg(ii,jj,i,j,k);
+        }
+      }
+
+      cb.chg_basis_2(sg6.pointer(), sgaux.pointer(), 2, 6, cb.B_basis_device_pointer());
+
+      //     GET RESOLVED SHEAR STRESSES 'rss' AND SHEAR RATES 'gamdot'.
+      //     SIGN(GAMDOT)=SIGN(RSS).
+      //     NRS CAN BE EVEN OR ODD.
+      //     RSS1 IS ALWAYS > 0 AND IS USED TO CALCULATE VISCOUS COMPLIANCE.
+  
+      for (int is = 1; is <= nsyst(jph); is++) {
+#ifdef NON_SCHMID_EFFECTS
+        rss(is,i,j,k) = schnon(1,is,,i,j,k)*sg6(1) + 
+                  schnon(2,is,i,j,k)*sg6(2) + 
+                  schnon(3,is,i,j,k)*sg6(3) + 
+                  schnon(4,is,i,j,k)*sg6(4) + 
+                  schnon(5,is,i,j,k)*sg6(5);
+#else
+        rss(is,i,j,k) = sch(1,is,i,j,k)*sg6(1) +
+                  sch(2,is,i,j,k)*sg6(2) + 
+                  sch(3,is,i,j,k)*sg6(3) + 
+                  sch(4,is,i,j,k)*sg6(4) + 
+                  sch(5,is,i,j,k)*sg6(5);
+#endif      isign = 1;
+        if ( (rss(is,i,j,k)-xkin(is,i,j,k)) < 0.0 ) {
+          isign = 2;
+        }
+     
+        rss(is,i,j,k) = (rss(is,i,j,k)-xkin(is,i,j,k))/crss(is,isign,i,j,k);
+  
+#ifdef TWO_SIGN_SLIP_SYSTEMS
+        if ( rss(is,i,j,k) < 0.0 ) {
+          rss(is,i,j,k) = 0.0;
+        }
+#endif
+  
+#ifdef TWO_SIGN_SLIP_SYSTEMS
+        rss1(is,i,j,k) = gamd0(is,jph) * nrs(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),(nrs(is,jph)-1))) / crss(is,isign,i,j,k);
+        rss2(is,i,j,k) = gamd0(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),nrs(is,jph))) * COPYSIGN(1.0,rss(is,i,j,k));
+#else
+        rss1(is,i,j,k) = gamd0(is,jph) * nrs(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),(nrs(is,jph)-1))) / crss(is,isign,i,j,k);
+        rss2(is,i,j,k) = gamd0(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),nrs(is,jph))) * COPYSIGN(1.0,rss(is,i,j,k));
+#endif
+  
+        gamdot(is,i,j,k) = rss2(is,i,j,k);
+      } // end for is
+  
+      for (int jj = 1; jj <= 5; jj++) {
+        for (int ii = 1; ii <= 5; ii++) {
+  
+          dedotp66(ii,jj) = 0.0;
+  
+          for (int k1 = 1; k1 <= nsyst(jph); k1++) {         
+#ifdef NON_SCHMID_EFFECTS   
+            dedotp66(ii,jj) += sch(ii,k1,i,j,k)*schnon(jj,k1,i,j,k)*rss1(k1,i,j,k);
+#else
+            dedotp66(ii,jj) += sch(ii,k1,i,j,k)*sch(jj,k1,i,j,k)*rss1(k1,i,j,k);
+#endif
+          } // end for k1
+
+        } // end for ii
+      } // end for jj
+
+      for (int ii = 1; ii <= 6; ii++) {
+        dedotp66(ii,6) = 0.0;
+        dedotp66(6,ii) = 0.0;
+      }
+
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          dsg6_de6(ii,jj) = dedotp66(ii,jj) + sg66(ii,jj)/tdot;
+        }
+      }
+      invert_matrix <6> (dsg6_de6.pointer());
+
+      cb.chg_basis_3(dsg6_de6.pointer(), dsg_de.pointer(), 3, 6, cb.B_basis_device_pointer());
+
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          for (int kk = 1; kk <= 3; kk++) {
+            for (int ll = 1; ll <=3; ll++) {
+              dPg_dF(ii,jj,kk,ll) = 0.0;
+              for (int p = 1; p <= 3; p++) {
+                for (int r = 1; r <= 3; r++) {
+                  dPg_dF(ii,jj,kk,ll) += dsg_de(ii,p,kk,r)*defgradinv(jj,p,i,j,k)*defgradinv(ll,r,i,j,k)*detF(i,j,k);
+                }
+              }
+            }
+          }
+        }
+      }
+
+      cb.chg_basis_4(dPg6_dF6.pointer(), dPg_dF.pointer(), 4, 6, cb.B_basis_device_pointer());
+
+      // averages packed in array
+      int ic;
+      ic = -1;
+      
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          ic = ic + 1;
+          loc_reduce.array[ic] += dPg6_dF6(ii,jj) * wgt;
+        }
+      }
+
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  // averages unpacked from array
+  int ic;
+  ic = -1;
+
+  for (int ii = 1; ii <= 6; ii++) {
+    for (int jj = 1; jj <= 6; jj++) {
+      ic = ic + 1;
+      dP6_dF6(ii,jj) = all_reduce.array[ic];
+    }
+  }
+
+  real_t s066_[6*6];
+  ViewMatrixTypeReal s066(s066_,6,6);
+  for (int ii = 1; ii <= 6; ii++) {
+    for (int jj = 1; jj <= 6; jj++) {
+      c066.host(ii,jj) = dP6_dF6(ii,jj);
+      s066(ii,jj) = dP6_dF6(ii,jj);
+    }
+  }
+  invert_matrix <6> (s066.pointer());
+
+  cb.chg_basis_3(c066.host_pointer(), c0.host_pointer(), 3, 6, cb.B_basis_host_pointer());
+  cb.chg_basis_3(s066.pointer(), s0.host_pointer(), 3, 6, cb.B_basis_host_pointer());
+
+  // update_device
+  c066.update_device();
+  c0.update_device();
+  s0.update_device();
+
+}
diff --git a/src/LS-EVPFFT/src/calc_c066mod.cpp b/src/LS-EVPFFT/src/calc_c066mod.cpp
new file mode 100644
index 000000000..488e2889f
--- /dev/null
+++ b/src/LS-EVPFFT/src/calc_c066mod.cpp
@@ -0,0 +1,67 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+#include "determinant33.h"
+
+void EVPFFT::calc_c066mod()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    real_t dum;
+
+    // thread private arrays
+    real_t c0mod_[3*3*3*3];
+    real_t c0modsym_[3*3*3*3];
+    real_t c066modtmp_[6*6];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal c0mod(c0mod_,3,3,3,3);
+    ViewMatrixTypeReal c0modsym(c0modsym_,3,3,3,3);
+    ViewMatrixTypeReal c066modtmp(c066modtmp_,6,6);
+
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int kk = 1; kk <= 3; kk++) {
+          for (int ll = 1; ll <= 3; ll++) {
+            dum = 0.0;
+            for (int m = 1; m <= 3; m++) {
+              for (int n = 1; n <= 3; n++) {
+                dum += c0(ii,m,kk,n)*defgrad(jj,m,i,j,k)*defgrad(ll,n,i,j,k);
+              }
+            }
+            c0mod(ii,jj,kk,ll) = dum/detF(i,j,k);
+          }
+        }
+      }
+    }
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int kk = 1; kk <= 3; kk++) {
+          for (int ll = 1; ll <= 3; ll++) {
+            c0modsym(ii,jj,kk,ll) = 0.25*(c0mod(ii,jj,kk,ll) + c0mod(jj,ii,kk,ll) +
+                                  c0mod(ii,jj,ll,kk) + c0mod(jj,ii,ll,kk));
+          }
+        }
+      }
+    }
+
+    cb.chg_basis_4(c066modtmp.pointer(), c0modsym.pointer(), 4, 6, cb.B_basis_device_pointer());
+
+    for (int ii = 1; ii <= 6; ii++) {
+      for (int jj = 1; jj <= 6; jj++) {
+        c066mod(ii,jj,i,j,k) = c066modtmp(ii,jj);
+      }
+    }
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/chg_basis.cpp b/src/LS-EVPFFT/src/chg_basis.cpp
new file mode 100644
index 000000000..d303d59e2
--- /dev/null
+++ b/src/LS-EVPFFT/src/chg_basis.cpp
@@ -0,0 +1,153 @@
+#include "chg_basis.h"
+#include "utilities.h"
+
+ChgBasis::ChgBasis()
+  : B_basis_ (3,3,6)
+{
+  set_b_basis();
+}
+
+void ChgBasis::set_b_basis()
+{
+
+// *** CALCULATES BASIS TENSORS B(N)
+  for (int k = 1; k <= 6; k++) {
+    for (int j = 1; j <= 3; j++) {
+      for (int i = 1; i <= 3; i++) {
+        B_basis_.host(i,j,k) = 0.0;
+      }
+    }
+  }
+  
+  B_basis_.host(1,1,2) = -RSQ6_;
+  B_basis_.host(2,2,2) = -RSQ6_;
+  B_basis_.host(3,3,2) =  real_t(2.0)*RSQ6_;
+
+  B_basis_.host(1,1,1) = -RSQ2_;
+  B_basis_.host(2,2,1) =  RSQ2_;
+
+  B_basis_.host(2,3,3) =  RSQ2_;
+  B_basis_.host(3,2,3) =  RSQ2_;
+
+  B_basis_.host(1,3,4) =  RSQ2_;
+  B_basis_.host(3,1,4) =  RSQ2_;
+
+  B_basis_.host(1,2,5) =  RSQ2_;
+  B_basis_.host(2,1,5) =  RSQ2_;
+
+  B_basis_.host(1,1,6) =  RSQ3_;
+  B_basis_.host(2,2,6) =  RSQ3_;
+  B_basis_.host(3,3,6) =  RSQ3_;
+
+  // update device
+  B_basis_.update_device();
+  Kokkos::fence();
+
+}
+
+
+KOKKOS_FUNCTION
+void ChgBasis::chg_basis_1(real_t *CE2_, real_t *C2_, int IOPT, int KDIM, real_t *B_basis_ptr) const
+{
+  assert(IOPT == 1 && "ERROR: IOPT should be 1 for call to chg_basis_1");
+
+  ViewMatrixTypeReal CE2 (CE2_, KDIM);
+  ViewMatrixTypeReal C2  (C2_, 3,3);
+  ViewMatrixTypeReal B_basis (B_basis_ptr, 3,3,6);
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      C2(i,j) = 0.0;
+      for (int n = 1; n <= KDIM; n++) {
+        C2(i,j) += CE2(n) * B_basis(i,j,n);
+      }
+    }
+  }
+
+}
+
+
+KOKKOS_FUNCTION
+void ChgBasis::chg_basis_2(real_t *CE2_, real_t *C2_, int IOPT, int KDIM, real_t *B_basis_ptr) const
+{
+  assert(IOPT == 2 && "ERROR: IOPT should be 2 for call to chg_basis_2");
+
+  ViewMatrixTypeReal CE2 (CE2_, KDIM);
+  ViewMatrixTypeReal C2  (C2_, 3,3);
+  ViewMatrixTypeReal B_basis (B_basis_ptr, 3,3,6);
+
+  for (int n = 1; n <= KDIM; n++) {
+    CE2(n) = 0.0;
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        CE2(n) += C2(i,j) * B_basis(i,j,n);
+      }
+    }
+  }
+
+}
+
+
+KOKKOS_FUNCTION
+void ChgBasis::chg_basis_3(real_t *CE4_, real_t *C4_, int IOPT, int KDIM, real_t *B_basis_ptr) const
+{
+  assert(IOPT == 3 && "ERROR: IOPT should be 3 for call to chg_basis_3");
+
+  ViewMatrixTypeReal CE4 (CE4_, KDIM,KDIM);
+  ViewMatrixTypeReal C4  (C4_, 3,3,3,3);
+  ViewMatrixTypeReal B_basis (B_basis_ptr, 3,3,6);
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      for (int k = 1; k <= 3; k++) {
+        for (int l = 1; l <= 3; l++) {
+          C4(i,j,k,l) = 0.0;
+          for (int n = 1; n <= KDIM; n++) {
+            for (int m = 1; m <= KDIM; m++) {
+              C4(i,j,k,l) += CE4(n,m) * B_basis(i,j,n) * B_basis(k,l,m);
+            }
+          }
+        }
+      }
+    }
+  }
+
+}
+
+
+KOKKOS_FUNCTION
+void ChgBasis::chg_basis_4(real_t *CE4_, real_t *C4_, int IOPT, int KDIM, real_t *B_basis_ptr) const
+{
+  assert(IOPT == 4 && "ERROR: IOPT should be 4 for call to chg_basis_4");
+
+  ViewMatrixTypeReal CE4 (CE4_, KDIM,KDIM);
+  ViewMatrixTypeReal C4  (C4_, 3,3,3,3);
+  ViewMatrixTypeReal B_basis (B_basis_ptr, 3,3,6);
+
+  for (int n = 1; n <= KDIM; n++) {
+    for (int m = 1; m <= KDIM; m++) {
+      CE4(n,m) = 0.0;
+      for (int i = 1; i <= 3; i++) {
+        for (int j = 1; j <= 3; j++) {
+          for (int k = 1; k <= 3; k++) {
+            for (int l = 1; l <= 3; l++) {
+              CE4(n,m) += C4(i,j,k,l) * B_basis(i,j,n) * B_basis(k,l,m);
+            }
+          }
+        }
+      }
+    }
+  }
+
+}
+
+KOKKOS_FUNCTION
+real_t* ChgBasis::B_basis_device_pointer() const
+{
+  return B_basis_.device_pointer();
+}
+
+real_t* ChgBasis::B_basis_host_pointer() const
+{
+  return B_basis_.host_pointer();
+}
diff --git a/src/LS-EVPFFT/src/chg_basis.h b/src/LS-EVPFFT/src/chg_basis.h
new file mode 100644
index 000000000..83d5c93ed
--- /dev/null
+++ b/src/LS-EVPFFT/src/chg_basis.h
@@ -0,0 +1,57 @@
+
+
+// ************************************************************************
+//     SUBROUTINE CHG_BASIS    --->   VERSION 19/JUL/01
+//
+//     (modif. 06/FEB/98 - same convention as SELFPOLY - C.N.T.)
+//     (modif. 16/JUN/99 - same convention as Maudlin  - C.N.T.)
+//     (modif. 10/MAY/01 - KDIM version - R.L.)
+//
+//     KDIM=5 or 6, FOR DEVIATORIC or DEV+HYDROST TENSORS, RESPECTIVELY.
+//     IOPT=0: DEFINES A BASIS OF 6 SECOND ORDER TENSORS B(N).
+//     IOPT=1: CALCULATES SECOND ORDER TENSOR 'C2' AS AN EXPANSION IN TERMS
+//             OF VECTOR COMPONENTS CE2(KDIM) AND THE BASIS TENSORS B(KDIM).
+//     IOPT=2: CALCULATES COMPONENTS OF C2 AS A VECTOR CE2(KDIM).
+//     IOPT=3: CALCULATES FOURTH ORDER TENSOR 'C4' AS AN EXPANSION IN TERMS
+//             OF MATRIX COMPONENTS CE4(K,K) AND THE BASIS TENSORS B(KDIM).
+//     IOPT=4: CALCULATES MATRIX COMPONENTS CE4(K,K) OF TENSOR 'C4'.
+// **************************************************************************
+
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+class ChgBasis
+{
+  private:
+    const real_t SQR2_ = 1.41421356237309;
+    const real_t RSQ2_ = 0.70710678118654744;
+    const real_t RSQ3_ = 0.57735026918962584;
+    const real_t RSQ6_ = 0.40824829046386304;
+
+    MatrixTypeRealDual B_basis_;
+        
+  public:
+    ChgBasis();
+
+    void set_b_basis();
+
+    KOKKOS_FUNCTION
+    void chg_basis_1(real_t *CE2_, real_t *C2_, int IOPT, int KDIM, real_t *B_basis_ptr) const;
+
+    KOKKOS_FUNCTION
+    void chg_basis_2(real_t *CE2_, real_t *C2_, int IOPT, int KDIM, real_t *B_basis_ptr) const;
+
+    KOKKOS_FUNCTION
+    void chg_basis_3(real_t *CE4_, real_t *C4_, int IOPT, int KDIM, real_t *B_basis_ptr) const;
+
+    KOKKOS_FUNCTION
+    void chg_basis_4(real_t *CE4_, real_t *C4_, int IOPT, int KDIM, real_t *B_basis_ptr) const;
+
+    KOKKOS_FUNCTION
+    real_t* B_basis_device_pointer() const;
+
+    real_t* B_basis_host_pointer() const;
+};
diff --git a/src/LS-EVPFFT/src/command_line_args.cpp b/src/LS-EVPFFT/src/command_line_args.cpp
new file mode 100644
index 000000000..88e8e0895
--- /dev/null
+++ b/src/LS-EVPFFT/src/command_line_args.cpp
@@ -0,0 +1,110 @@
+#include "command_line_args.h"
+#include <getopt.h>
+#include <iostream>
+
+CommandLineArgs::CommandLineArgs()
+{
+}
+
+void CommandLineArgs::parse_command_line(int argc, char *argv[])
+{
+    const char* const short_opts = "f:m:e:g:p:u:h";
+    const option long_opts[] = {{"infile", required_argument, nullptr, 'f'},
+                                {"micro-filetype", required_argument, nullptr, 'm'},
+                                {"euler-angles", required_argument, nullptr, 'e'},
+                                {"feature-ids", required_argument, nullptr, 'g'},
+                                {"phase-ids", required_argument, nullptr, 'p'},
+                                {"euler-unit", required_argument, nullptr, 'u'},
+                                {"help", no_argument, nullptr, 'h'},
+                                {nullptr, no_argument, nullptr, 0}};
+
+    while (true)
+    {
+        const auto opt = getopt_long(argc, argv, short_opts, long_opts, nullptr);
+
+        if (-1 == opt)
+            break;
+
+        switch (opt)
+        {
+
+        case 'f':
+          input_filename = optarg;
+          break;
+
+        case 'm':
+          micro_filetype = atoi(optarg);
+          break;
+
+        case 'e':
+          EulerAnglesFullPath = optarg;
+          break;
+
+        case 'g':
+          FeatureIdsFullPath = optarg;
+          break;
+
+        case 'p':
+          PhaseIdsFullPath = optarg;
+          break;
+
+        case 'u':
+          EulerAnglesUnit = atoi(optarg);
+          break;
+
+        case 'h': // -h or --help
+          print_help(); 
+          break;
+
+        case '?': // Unrecognized option
+          print_help();
+          break;
+
+        default:
+          print_help();
+          break;
+        }  // end switch
+    }  // end while
+
+    check_cmd_args();
+}
+
+void CommandLineArgs::check_cmd_args()
+{
+    if (input_filename.size() == 0) {
+      print_help();
+    }
+
+    if (micro_filetype == 1) {
+      if (EulerAnglesFullPath.size() == 0 or
+          FeatureIdsFullPath.size() == 0 or
+          PhaseIdsFullPath.size() == 0 or
+          EulerAnglesUnit == -1)
+      {
+        print_help();
+      }
+    }
+}
+
+void CommandLineArgs::print_help()
+{
+std::cout << "Required arguments:\n"
+            "  -f, --infile   <path>     input file (full path)\n"
+             "\n"
+             "Optional arguments:\n"
+             "  -m, --micro-filetype <value>    microstructure file type (0: ASCII, 1: HDF5, 2: VTK)\n"
+             "  -e, --euler-angles   <path>     full path to euler angles dataset in HDF5 file\n"
+             "  -g, --feature-ids    <path>     full path to feature ids dataset in HDF5 file\n"
+             "  -p, --phase-ids      <path>     full path to phase ids dataset in HDF5 file\n"
+             "  -u, --euler-unit     <value>    unit of euler angles in HDF5 file (0: degree, 1: radian)\n"
+             "  -h, --help                      print this message\n"
+             "\n"
+            "Examples:\n"
+             "  Los Alamos FFT ASCII microstructure filetype:\n"
+             "    ./executable -f <inputfile_path>\n"
+             "    ./executable --infile=<inputfile_path>\n"
+             "  HDF5 microstructure filetype:\n"
+             "    ./executable -f <inputfile_path> -m 1 -e <eulerAngles_path> -g <featureIds_path> -p <phaseIds_path> -u 1\n";
+    
+   exit(1);
+}
diff --git a/src/LS-EVPFFT/src/command_line_args.h b/src/LS-EVPFFT/src/command_line_args.h
new file mode 100644
index 000000000..6e90428fa
--- /dev/null
+++ b/src/LS-EVPFFT/src/command_line_args.h
@@ -0,0 +1,26 @@
+#pragma once
+
+#include <array>
+#include <string>
+
+struct CommandLineArgs
+{
+    std::string input_filename; // name and path of input file (./fft.in)
+
+    // micro_filetype = 0 means classic Los Alamos FFT ASCII microstructure filetype
+    // micro_filetype = 1 means hdf5 microstructure filetype
+    int micro_filetype = 0; // default to ASCII
+
+    // if micro_filetype = 1 then information about the full path to locate 
+    // the euler angles, featureId, and phases in the hdf5 must be specified;
+    // also the euler angles units (degree:0 or radians:1) must be specified
+    std::string EulerAnglesFullPath;
+    std::string FeatureIdsFullPath;
+    std::string PhaseIdsFullPath;
+    int EulerAnglesUnit = -1; // set to -1 initially to check that user specified it
+
+    CommandLineArgs();
+    void parse_command_line(int argc, char *argv[]);
+    void check_cmd_args();
+    void print_help();
+};
diff --git a/src/LS-EVPFFT/src/data_crystal.cpp b/src/LS-EVPFFT/src/data_crystal.cpp
new file mode 100644
index 000000000..ab3831386
--- /dev/null
+++ b/src/LS-EVPFFT/src/data_crystal.cpp
@@ -0,0 +1,293 @@
+#include <iostream>
+
+#include "evpfft.h"
+#include "definitions.h"
+#include "utilities.h"
+
+using namespace utils;
+      
+void EVPFFT::data_crystal(int iph, const std::string & filecryspl)
+{
+
+  std::ifstream ur1;
+  ur1.open(filecryspl);
+  check_that_file_is_open(ur1, filecryspl.c_str());
+  std::string prosa;
+
+  MatrixTypeIntHost isn(NSYSMX,4);
+  MatrixTypeIntHost isb(NSYSMX,4);
+  MatrixTypeIntHost mode(10);
+  MatrixTypeRealHost sn(3);
+  MatrixTypeRealHost sb(3);
+  MatrixTypeRealHost cdim(3);
+  MatrixTypeRealHost aux5(5);
+  MatrixTypeRealHost aux33(3,3);
+  MatrixTypeRealHost aux55(5,5);
+  MatrixTypeRealHost aux3333(3,3,3,3);
+  MatrixTypeRealHost hselfx(10);
+  MatrixTypeRealHost hlatex(10,10);
+  int nmodesx,kount,nm,modex,nsmx,nrsx;
+  int isectwx,nsysx;
+  real_t covera,gamd0x,twshx,tau0xf,tau0xb,tau1x,thet0x,thet1x;
+  real_t snor,qnor,prod;
+
+  ur1 >> prosa; CLEAR_LINE(ur1);
+  ur1 >> icryst(iph); CLEAR_LINE(ur1);
+  for (int i = 1; i <= 3; i++) {
+    ur1 >> cdim(i);
+  }
+  CLEAR_LINE(ur1);
+  covera = cdim(3)/cdim(1);
+
+  ur1 >> nmodesx; CLEAR_LINE(ur1);
+  ur1 >> nmodes(iph); CLEAR_LINE(ur1);
+  for (int i = 1; i <= mode.size(); i++) mode(i) = 0;
+  for (int i = 1; i <= nmodes(iph); i++) {
+    ur1 >> mode(i);
+  }
+  CLEAR_LINE(ur1);
+
+  ntwmod(iph) = 0;
+  nsyst.host(iph)  = 0;
+  ntwsys(iph) = 0;
+  kount = 1;
+
+  //
+  //     START READING DEFORMATION MODES FROM FILECRYS
+  //
+  //nm = 0;
+  //label_100: {
+  //  do {
+  //    nm++;
+  for (int nm = 1; nm <= nmodesx; nm++) {
+
+      ur1 >> prosa; CLEAR_LINE(ur1);
+      ur1 >> modex >> nsmx >> nrsx >> gamd0x >> twshx >> isectwx; CLEAR_LINE(ur1);
+      ur1 >> tau0xf >> tau0xb >> tau1x >> thet0x >> thet1x; CLEAR_LINE(ur1);
+      ur1 >> hselfx(nm);
+      for (int jm = 1; jm <= nmodesx; jm++) {
+        ur1 >> hlatex(nm,jm);
+      }
+      CLEAR_LINE(ur1);
+
+      // scale variables
+      gamd0x /= time_scale;
+      tau0xf  *= stress_scale;
+      tau0xb *= stress_scale;
+      tau1x *= stress_scale;
+      thet0x *= stress_scale;
+      thet1x *= stress_scale;
+
+      //     SKIPS nsmx LINES IF THE MODE IS NOT IN THE LIST.
+      if (modex != mode(kount)) {
+        for (int iz = 1; iz <= nsmx; iz++) {
+          CLEAR_LINE(ur1);
+        } 
+        //goto label_100;
+        continue;
+      }
+
+      if (thet0x < thet1x) {
+        printf("INITIAL HARDENING LOWER THAN FINAL HARDENING FOR MODE %d IN PHASE %d\n", kount, iph);
+        exit(1);
+      }
+
+      //
+      //  CASE TAU1=0 CORRESPONDS TO LINEAR HARDENING AND IS INDEPENDENT OF TAU0.
+      //  AVOID DIVISION BY ZERO
+      if (tau1x <= 1.0e-6) {
+        tau1x = 1.0e-6;
+        thet0x = thet1x;
+      }
+      
+      // REORDER HARDENING COEFFICIENTS TO ACCOUNT ONLY FOR ACTIVE MODES
+      hselfx(kount) = hselfx(nm);
+      for (int i = 1; i <= nmodes(iph); i++) {
+        hlatex(kount,i) = hlatex(nm,mode(i));
+      }
+
+      // SYSTEMS GIVEN IN FOUR INDEX NOTATION: HEXAGONALS AND TRIGONALS
+      // SYSTEMS GIVEN IN THREE INDEX NOTATION: CUBIC AND ORTHORHOMBIC
+      if (icryst(iph).substr(0,3) == "HEX" || icryst(iph).substr(0,3) == "TRI" ||
+          icryst(iph).substr(0,3) == "hex" || icryst(iph).substr(0,3) == "tri") {
+        for (int j = 1; j <= nsmx; j++) {
+          for (int k = 1; k <= 4; k++) ur1 >> isn(j,k);
+          for (int k = 1; k <= 4; k++) ur1 >> isb(j,k);
+          CLEAR_LINE(ur1);
+        }
+      } else if (icryst(iph).substr(0,3) == "CUB" || icryst(iph).substr(0,3) == "ORT" ||
+                 icryst(iph).substr(0,3) == "cub" || icryst(iph).substr(0,3) == "ort") {
+        for (int j = 1; j <= nsmx; j++) {
+          for (int k = 1; k <= 3; k++) ur1 >> isn(j,k);
+          for (int k = 1; k <= 3; k++) ur1 >> isb(j,k);
+          CLEAR_LINE(ur1);
+        }
+      } else {
+        printf(" cannot identify the crystal symmetry of phase  %d\n", iph);
+        exit(1);
+      }
+
+      nsm(kount,iph) = nsmx;
+      if (twshx != 0) ntwmod(iph) = ntwmod(iph) + 1;
+
+#ifdef NON_SCHMID_EFFECTS
+    //===============non_schmid_effects ===================================    
+    // NOTE: the reading sequence here might be wrong.
+    // But was duplicated form the original fortran code verbatim. 
+    for (int kmo = 1; kmo <= nmodes(iph); kmo++) {
+      CLEAR_LINE(ur1);
+      ur1 >> cns(1,kmo,iph) >> cns(2,kmo,iph) 
+             >> cns(3,kmo,iph) >> cns(4,kmo,iph);
+      // FIXME: the commented out line below leads to use of uninitialised cns(_,_,_)
+      // in the later part of this function. But this is the way it appears
+      // in the original fortran code. It was modified as below. 
+      //cns(5,1,1) = -( cns(3,kmo,iph) + cns(4,kmo,iph) );
+      cns(5,kmo,iph) = -( cns(3,kmo,iph) + cns(4,kmo,iph) );
+    }
+    //=====================================================================
+#endif
+
+      for (int js = 1; js <= nsm(kount,iph); js++) {
+        nsyst.host(iph) = nsyst.host(iph) + 1;
+        nsysx = nsyst.host(iph);
+        if (twshx != 0) ntwsys(iph) = ntwsys(iph) + 1;
+
+        //
+        //   DEFINES RATE SENSITIVITY AND CRSS FOR EACH SYSTEM IN THE MODE
+        //
+        gamd0.host(nsysx,iph)  = gamd0x;
+        nrs.host(nsysx,iph)    = nrsx;
+        twsh(nsysx,iph)   = twshx;
+        tau.host(nsysx,1,iph)  = tau0xf;
+        tau.host(nsysx,2,iph)  = tau0xb;
+        tau.host(nsysx,3,iph)  = tau1x;
+        thet.host(nsysx,1,iph) = thet0x;
+        thet.host(nsysx,2,iph) = thet1x;
+
+        //
+        isectw(nsysx,iph) = isectwx;
+        //
+
+        if (icryst(iph).substr(0,3) == "HEX" || icryst(iph).substr(0,3) == "TRI" ||
+            icryst(iph).substr(0,3) == "hex" || icryst(iph).substr(0,3) == "tri") {
+          sn(1) = isn(js,1);
+          sn(2) = (isn(js,1)+2.0*isn(js,2))/sqrt(3.0);
+          sn(3) = isn(js,4)/covera;
+          sb(1) = 3.0/2.0*isb(js,1);
+          sb(2) = (isb(js,1)/2.0+isb(js,2))*sqrt(3.0);
+          sb(3) = isb(js,4)*covera;
+        } else if (icryst(iph).substr(0,3) == "CUB" || icryst(iph).substr(0,3) == "ORT" ||
+                   icryst(iph).substr(0,3) == "cub" || icryst(iph).substr(0,3) == "ort") {
+          for (int m = 1; m <= 3; m++) {
+            sn(m) = isn(js,m) / cdim(m);
+            sb(m) = isb(js,m) * cdim(m);
+          }
+        }
+
+        //
+        // *** NORMALIZES SYSTEM VECTORS AND CHECKS NORMALITY
+        //
+        snor = sqrt( sn(1)*sn(1) + sn(2)*sn(2) + sn(3)*sn(3) );
+        qnor = sqrt( sb(1)*sb(1) + sb(2)*sb(2) + sb(3)*sb(3) );
+        prod = 0.0;
+        for (int j = 1; j <= 3; j++) {
+          dnca.host(j,nsysx,iph) = sn(j) / snor;
+          dbca.host(j,nsysx,iph) = sb(j) / qnor;
+          if (abs(dnca.host(j,nsysx,iph)) < 1.e-03) dnca.host(j,nsysx,iph) = 0.0;
+          if (abs(dbca.host(j,nsysx,iph)) < 1.e-03) dbca.host(j,nsysx,iph) = 0.0;
+          prod += dnca.host(j,nsysx,iph) * dbca.host(j,nsysx,iph);
+        }
+
+        if (prod >= 1.0e-3) {
+          printf("SYSTEM %d IN MODE %d IN PHASE %d IS NOT ORTHOGONAL !!\n", js, nm, iph);
+          printf("%d %d %d", isb(js,1), isb(js,2), isb(js,3));
+          printf("%d %d %d", isn(js,1), isn(js,2), isn(js,3));
+          exit(1);
+        }
+
+#ifdef NON_SCHMID_EFFECTS
+        dtca(1,nsysx,iph) =  dnca.host(2,nsysx,iph)*dbca.host(3,nsysx,iph) - dnca.host(3,nsysx,iph)*dbca.host(2,nsysx,iph);
+        dtca(2,nsysx,iph) = -dnca.host(1,nsysx,iph)*dbca.host(3,nsysx,iph) + dnca.host(3,nsysx,iph)*dbca.host(1,nsysx,iph);
+        dtca(3,nsysx,iph) =  dnca.host(1,nsysx,iph)*dbca.host(2,nsysx,iph) - dnca.host(2,nsysx,iph)*dbca.host(1,nsysx,iph);
+#endif
+           
+        //
+        //   DEFINE SCHMID VECTOR IN CRYSTAL AXES FOR EACH SYSTEM
+        //
+
+        for (int i = 1; i <= 3; i++) {
+          for (int j = 1; j <= 3; j++) {
+            aux33(i,j) = (dnca.host(i,nsysx,iph)*dbca.host(j,nsysx,iph) + dnca.host(j,nsysx,iph)*dbca.host(i,nsysx,iph)) / 2.0;
+          }
+        }
+ 
+        cb.chg_basis_2(aux5.pointer(), aux33.pointer(), 2, 5, cb.B_basis_host_pointer());
+
+        for (int i = 1; i <= 5; i++) {
+          schca.host(i,nsysx,iph) = aux5(i);
+        }
+
+#ifdef NON_SCHMID_EFFECTS
+        for (int i = 1; i <= 3; i++) {
+          for (int j = 1; j <= 3; j++) {
+            aux33(i,j) = cns(1,nm,iph)*(dtca(i,nsysx,iph)*dbca.host(j,nsysx,iph) + 
+                        dtca(j,nsysx,iph)*dbca.host(i,nsysx,iph))/2.0 +
+                        cns(2,nm,iph)*(dtca(i,nsysx,iph)*dnca.host(j,nsysx,iph) +
+                        dtca(j,nsysx,iph)*dnca.host(i,nsysx,iph))/2.0 +
+                        cns(3,nm,iph)*(dnca.host(i,nsysx,iph)*dnca.host(j,nsysx,iph) +
+                        dnca.host(j,nsysx,iph)*dnca.host(i,nsysx,iph))/2.0 +
+                        cns(4,nm,iph)*(dtca(i,nsysx,iph)*dtca(j,nsysx,iph) +
+                        dtca(j,nsysx,iph)*dtca(i,nsysx,iph))/2.0 +
+                        cns(5,nm,iph)*(dbca.host(i,nsysx,iph)*dbca.host(j,nsysx,iph) +
+                        dbca.host(j,nsysx,iph)*dbca.host(i,nsysx,iph))/2.0 +
+                        (dnca.host(i,nsysx,iph)*dbca.host(j,nsysx,iph) + 
+                        dnca.host(j,nsysx,iph)*dbca.host(i,nsysx,iph))/2.0;
+          }
+        }
+
+        cb.chg_basis_2(aux5.pointer(), aux33.pointer(), 2, 5, cb.B_basis_host_pointer());
+
+        for (int i = 1; i <= 5; i++) {
+          schcnon.host(i,nsysx,iph) = aux5(i);
+        }
+#endif
+
+      } // end for js
+
+      kount++;
+    //} while (nm < nmodesx); // end of do/while loop
+  //} // end of label_100 scope
+  } // end for (nm)
+
+  //     INITIALIZE SELF & LATENT HARDENING COEFS FOR EACH SYSTEM OF THE PHASE.
+  //     ABSOLUTE UNITS ARE ACCOUNTED FOR BY MODULATING FACTOR IN HARDENING LAW.
+
+  int i = 0;
+  int j;
+  for (int im = 1; im <= nmodes(iph); im++) {
+    for (int is = 1; is <= nsm(im,iph); is++) {
+      i += 1;
+      j = 0;
+      for (int jm = 1; jm <= nmodes(iph); jm++) {
+        for (int js = 1; js <= nsm(jm,iph); js++) {
+          j += 1;
+          hard.host(i,j,iph) = hlatex(im,jm);
+        }
+      }
+      hard.host(i,i,iph) = hselfx(im);
+    }
+  }
+
+  //     VERIFICATION OF TWINNING DATA TO BE SURE PROGRAM WILL RUN PROPERLY
+  if (nmodes(iph) > 1) {
+    for (int i = 2; i <= nsyst.host(iph); i++) {
+      if (twsh(i,iph) == 0.0 && twsh(i-1,iph) != 0.0) {
+        printf(" WARNING! THE TWINNING MODES MUST FOLLOW THE \n");
+        printf(" SLIP MODES   -->   REORDER CRYSTAL FILE ");
+        exit(1);
+      }
+    }
+  }
+
+  ur1.close();  
+}
diff --git a/src/LS-EVPFFT/src/data_crystal_elast.cpp b/src/LS-EVPFFT/src/data_crystal_elast.cpp
new file mode 100644
index 000000000..8431baad8
--- /dev/null
+++ b/src/LS-EVPFFT/src/data_crystal_elast.cpp
@@ -0,0 +1,97 @@
+#include "evpfft.h"
+#include "voigt.h"
+#include "definitions.h"
+#include "utilities.h"
+
+using namespace utils;
+
+
+void EVPFFT::data_crystal_elast(int iph, const std::string & filecrysel)
+{
+  std::ifstream ur1;
+  ur1.open(filecrysel);
+  check_that_file_is_open(ur1, filecrysel.c_str());
+
+  MatrixTypeRealHost dde(3,3);
+  MatrixTypeRealHost xid4(3,3,3,3);
+  MatrixTypeRealHost cc66v(6,6);
+  MatrixTypeRealHost ccaux(3,3,3,3);
+  MatrixTypeRealHost aux6(6);
+  MatrixTypeRealHost aux33(3,3);
+  MatrixTypeRealHost aux66(6,6);
+  MatrixTypeRealHost aux3333(3,3,3,3); 
+  int iso;
+  real_t young,tnu,tmu,tla;
+  
+  //       UNITARY TENSORS
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      dde(i,j) = 0.0;
+	    if (i == j) dde(i,j) = 1.0;
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      for (int k = 1; k <= 3; k++) {
+        for (int l = 1; l <= 3; l++) {
+	        xid4(i,j,k,l) = (dde(i,k)*dde(j,l) + dde(i,l)*dde(j,k)) / real_t(2.0);
+        }
+      }
+    }
+  }
+
+  ur1 >> iso; CLEAR_LINE(ur1);
+
+	if (iso == 0) {
+    for (int i = 1; i <= 6; i++) {
+      for (int j = 1; j <= 6; j++) {
+        ur1 >> cc66v(i,j);
+      }
+      CLEAR_LINE(ur1);
+    }
+
+    voigt(cc66v.pointer(),ccaux.pointer(),1);
+
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        for (int k = 1; k <= 3; k++) {
+          for (int l = 1; l <= 3; l++) {
+            cc(i,j,k,l,iph) = ccaux(i,j,k,l);
+          }
+        }
+      }
+    }
+
+  } else {
+
+  	ur1 >> young >> tnu; CLEAR_LINE(ur1);
+    tmu = young / (real_t(2.0)*(real_t(1.0) + tnu));
+	  tla = real_t(2.0)*tmu*tnu / (real_t(1.0) - real_t(2.0)*tnu);
+
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        for (int k = 1; k <= 3; k++) {
+          for (int l = 1; l <= 3; l++) {
+            cc(i,j,k,l,iph) = tla*dde(i,j)*dde(k,l) + real_t(2.0)*tmu*xid4(i,j,k,l);
+          }
+        }
+      }
+    }
+
+  } // end if (iso == 0)
+
+
+  // scale stiffness tensor
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      for (int k = 1; k <= 3; k++) {
+        for (int l = 1; l <= 3; l++) {
+          cc(i,j,k,l,iph) *= stress_scale;
+        }
+      }
+    }
+  } 
+
+  ur1.close();
+}
diff --git a/src/LS-EVPFFT/src/data_grain.cpp b/src/LS-EVPFFT/src/data_grain.cpp
new file mode 100644
index 000000000..2fbded4f4
--- /dev/null
+++ b/src/LS-EVPFFT/src/data_grain.cpp
@@ -0,0 +1,353 @@
+#include <stdio.h>
+#include "evpfft.h"
+#include "definitions.h"
+#include "utilities.h"
+#include "euler.h"
+#include "math_functions.h"
+#include "voigt.h"
+#include "hdf5_io_functions.h"
+
+using namespace utils;
+
+int get_dataset_rank(hid_t hdf5_file_id, const char *dataset_name);
+
+void EVPFFT::data_grain(const std::string & filetext)
+{
+
+  switch (cmd.micro_filetype)
+  {
+    case 0:
+    read_classic_los_alamos_texture_file(filetext);
+    break;
+
+    case 1:
+    read_hdf5_texture_file(filetext);
+    break;
+
+    case 2:
+    read_vtk_lattice_structure(filetext);
+    break;
+
+    default:
+      throw std::runtime_error("invalid micro_filetype.");
+  }
+
+#if 0
+  // for debug
+  mpi_io_real_t->write_ascii("debug_ph.txt", ph_array.host_pointer(), "");
+  mpi_io_real_t->write_ascii("debug_th.txt", th_array.host_pointer(), "");
+  mpi_io_real_t->write_ascii("debug_om.txt", om_array.host_pointer(), "");
+  mpi_io_int->write_ascii("debug_jgr.txt", jgrain.host_pointer(), "");
+  mpi_io_int->write_ascii("debug_jph.txt", jphase.host_pointer(), "");
+#endif
+
+  int nph1 = 0;
+  for (int kk = 1; kk <= npts3; kk++) {
+    for (int jj = 1; jj <= npts2; jj++) {
+      for (int ii = 1; ii <= npts1; ii++) {
+                     
+      int jph;
+      real_t ph, th, om;
+      real_t ph_rad, th_rad, om_rad;
+      real_t dum;
+      
+      real_t aa_[3*3];
+      real_t caux3333_[3*3*3*3];
+      real_t caux66_[6*6];
+      ViewMatrixTypeReal aa(aa_,3,3);
+      ViewMatrixTypeReal caux3333(caux3333_,3,3,3,3);
+      ViewMatrixTypeReal caux66(caux66_,6,6);
+ 
+      ph  = ph_array(ii,jj,kk);
+      th  = th_array(ii,jj,kk);
+      om  = om_array(ii,jj,kk);
+      jph = jphase.host(ii,jj,kk);
+
+      if (jph == 1) nph1 = nph1 + 1;
+      if (igas.host(jph) == 1) continue; // nothing needs to be done if gas phase
+           
+      // CALCULATES THE TRANSFORMATION MATRIX AA WHICH TRANSFORMS FROM
+      // SAMPLE TO CRYSTAL. STORES AG, WHICH TRANSFORMS FROM CRYSTAL TO SAMPLE.
+      ph_rad = ph*pi/180.0;
+      th_rad = th*pi/180.0;
+      om_rad = om*pi/180.0;
+      euler(2, ph_rad, th_rad, om_rad, aa.pointer());
+                    
+      for (int j = 1; j <= 3; j++) {
+        for (int k = 1; k <= 3; k++) {
+          ag.host(j,k,ii,jj,kk) = aa(k,j);
+        }
+      }
+
+      for (int i1 = 1; i1 <= 3; i1++) {
+        for (int j1 = 1; j1 <= 3; j1++) {
+          for (int k1 = 1; k1 <= 3; k1++) {
+            for (int l1 = 1; l1 <= 3; l1++) {
+            
+              dum = 0.0;
+              
+              for (int i2 = 1; i2 <= 3; i2++) {
+                for (int j2 = 1; j2 <= 3; j2++) {
+                  for (int k2 = 1; k2 <= 3; k2++) {
+                    for (int l2 = 1; l2 <= 3; l2++) {
+                    
+                      dum += aa(i2,i1) *
+                             aa(j2,j1) *
+                             aa(k2,k1) *
+                             aa(l2,l1) *
+                             cc(i2,j2,k2,l2,jph);
+                             
+                    }
+                  }
+                }
+              }
+                            
+              caux3333(i1,j1,k1,l1) = dum;
+             
+            }
+          }
+        }
+      }
+            
+      cb.chg_basis_4(caux66.pointer(), caux3333.pointer(), 4, 6, cb.B_basis_host_pointer());
+      
+      for (int i = 1; i <= 6; i++) {
+        for (int j = 1; j <= 6; j++) {
+          cg66.host(i,j,ii,jj,kk) = caux66(i,j);
+          c066.host(i,j) += caux66(i,j)*wgt;  
+        }
+      }
+
+      } // end for ii
+    } // end for jj
+  } // end for kk
+
+  // update device
+  cg66.update_device();
+  ag.update_device();
+  Kokkos::fence();
+
+  // Allreduce on c066
+  MPI_Allreduce(MPI_IN_PLACE, c066.host_pointer(), c066.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
+  c066.update_device();
+  
+  // for single crystal only. it leads to fast convergence of the outer while loop for stress and strain fields
+  // for (int i = 1; i <= 6; i++) {
+  //   for (int j = 1; j <= 6; j++) {
+  //     c066.host(i,j) *= 1000000.0;
+  //   }
+  // }
+  // c066.update_device();
+
+#if 0
+  // debug print
+  print_array_to_file(c066.host_pointer(), c066.size(), my_rank, "c066.txt");
+#endif
+
+  wph1 = (1.0*nph1)/(npts1_g*npts2_g*npts3_g);
+
+  MatrixTypeRealHost s066(6,6);
+  for (int i = 1; i <= 6; i++) {
+    for (int j = 1; j <= 6; j++) {
+      s066(i,j) = c066.host(i,j);
+    }
+  }
+  invert_matrix <6> (s066.pointer());
+
+  cb.chg_basis_3(c066.host_pointer(), c0.host_pointer(), 3, 6, cb.B_basis_host_pointer());
+  cb.chg_basis_3(s066.pointer(), s0.host_pointer(), 3, 6, cb.B_basis_host_pointer());
+
+  // update device
+  c0.update_device();
+  s0.update_device();
+  Kokkos::fence();
+}
+
+
+void EVPFFT::read_classic_los_alamos_texture_file(const std::string & filetext)
+{
+  std::ifstream ur2;
+  ur2.open(filetext);
+  check_that_file_is_open(ur2, filetext.c_str());
+
+  real_t ph, th, om;
+  int iiii, jjjj, kkkk, jgr, jph;
+  int i,j,k;
+
+  for (int kk = 0; kk < npts3_g; kk++) {
+    for (int jj = 0; jj < npts2_g; jj++) {
+      for (int ii = 0; ii < npts1_g; ii++) {
+      
+        ur2 >> ph >> th >> om >> iiii >> jjjj >> kkkk >> jgr >> jph; CLEAR_LINE(ur2);
+        if ( (ii >= local_start1 and ii <= local_end1) and
+             (jj >= local_start2 and jj <= local_end2) and
+             (kk >= local_start3 and kk <= local_end3) )
+        {
+          i = ii - local_start1 + 1;
+          j = jj - local_start2 + 1;
+          k = kk - local_start3 + 1;
+          ph_array(i,j,k) = ph;
+          th_array(i,j,k) = th;
+          om_array(i,j,k) = om;
+          jgrain(i,j,k)   = jgr;
+          jphase.host(i,j,k)   = jph;
+        }
+        
+      }
+    }
+  }
+
+  // update device
+  jphase.update_device();
+  Kokkos::fence();
+
+  ur2.close();
+}
+
+void EVPFFT::read_hdf5_texture_file(const std::string & filetext)
+{
+  // open hdf5 file
+  hid_t hdf5_file_id = open_hdf5_file(filetext.c_str(), mpi_comm);
+ 
+  // get all dataset ranks
+  int EulerAnglesRank = get_dataset_rank(hdf5_file_id, cmd.EulerAnglesFullPath.c_str());
+  int FeatureIdsRank = get_dataset_rank(hdf5_file_id, cmd.FeatureIdsFullPath.c_str());
+  int PhaseIdsRank = get_dataset_rank(hdf5_file_id, cmd.PhaseIdsFullPath.c_str());
+
+  if (EulerAnglesRank != 4) {
+    throw std::runtime_error("EulerAngles should be stored as rank 4 array in hdf5 file");
+  }
+    
+  // set dims and others
+  // Note that hdf5 is stored in C layout, and when read from a c or c++ program
+  // it reads in c-array-order but when read from fortran it reads in fortran-array-layout.
+  // Therefore, the above arrays of dims should be reversed
+  // Example: global_array_dims = [nx,ny,nz] should be [nz,ny,nx] 
+  int global_array_dims[4] = {npts3_g,npts2_g,npts1_g,1};
+  int EulerAngles_global_array_dims[4] = {npts3_g,npts2_g,npts1_g,3};
+  int local_array_dims[4] = {npts3,npts2,npts1,1};
+  int EulerAngles_local_array_dims[4] = {npts3,npts2,npts1,3};
+  int start_coordinates[4] = {local_start3,local_start2,local_start1,0};
+  int stride[4] = {1,1,1,1};
+  
+  // create filespace, memspace
+  hid_t FeatureIds_filespace = create_hdf5_filespace(FeatureIdsRank, global_array_dims, local_array_dims,
+                               start_coordinates, stride);
+  hid_t PhaseIds_filespace = create_hdf5_filespace(PhaseIdsRank, global_array_dims, local_array_dims,
+                             start_coordinates, stride);
+  hid_t EulerAngles_filespace = create_hdf5_filespace(EulerAnglesRank, EulerAngles_global_array_dims,
+                                EulerAngles_local_array_dims, start_coordinates, stride);
+  hid_t FeatureIds_memspace = create_hdf5_memspace(FeatureIdsRank, local_array_dims, stride, 0);
+  hid_t PhaseIds_memspace = create_hdf5_memspace(PhaseIdsRank, local_array_dims, stride, 0);
+  hid_t EulerAngles_memspace = create_hdf5_memspace(EulerAnglesRank, EulerAngles_local_array_dims, stride, 0);
+  
+  // array to hold datasets
+  FMatrix <real_t> EulerAngles (3,npts1,npts2,npts3);
+  FMatrix <int> FeatureIds (1,npts1,npts2,npts3);
+  FMatrix <int> PhaseIds (1,npts1,npts2,npts3);
+
+  // read datasets
+  read_hdf5_dataset <real_t> (cmd.EulerAnglesFullPath.c_str(), EulerAngles.pointer(), hdf5_file_id,
+                              EulerAngles_memspace, EulerAngles_filespace, mpi_comm);
+  read_hdf5_dataset <int> (cmd.FeatureIdsFullPath.c_str(), FeatureIds.pointer(), hdf5_file_id,
+                           FeatureIds_memspace, FeatureIds_filespace, mpi_comm);
+  read_hdf5_dataset <int> (cmd.PhaseIdsFullPath.c_str(), PhaseIds.pointer(), hdf5_file_id,
+                           PhaseIds_memspace, PhaseIds_filespace, mpi_comm);
+ 
+  // write data into the appropriate arrays
+  real_t ph, th, om;
+  int iiii, jjjj, kkkk, jgr, jph;
+  int i,j,k;
+  for (int kk = 1; kk <= npts3; kk++) {
+    for (int jj = 1; jj <= npts2; jj++) {
+      for (int ii = 1; ii <= npts1; ii++) {
+
+        iiii = ii;
+        jjjj = jj;
+        kkkk = kk;
+        ph = EulerAngles(1,ii,jj,kk);
+        th = EulerAngles(2,ii,jj,kk);
+        om = EulerAngles(3,ii,jj,kk);
+        jgr = FeatureIds(1,ii,jj,kk); 
+        jph = PhaseIds(1,ii,jj,kk);
+
+        if (cmd.EulerAnglesUnit == 1) {
+          // change from radian to degree
+          ph = ph*180.0/pi;
+          th = th*180.0/pi;
+          om = om*180.0/pi;
+        }
+
+        ph_array(ii,jj,kk) = ph;
+        th_array(ii,jj,kk) = th;
+        om_array(ii,jj,kk) = om;
+        jgrain(ii,jj,kk)   = jgr;
+        jphase.host(ii,jj,kk) = jph;
+        
+      }
+    }
+  }
+
+  // update device
+  jphase.update_device();
+  Kokkos::fence();
+  
+  H5Fclose(hdf5_file_id);
+}
+
+int get_dataset_rank(hid_t hdf5_file_id, const char *dataset_name)
+{
+  int dataset_rank;
+  hid_t dataset = H5Dopen2(hdf5_file_id, dataset_name, H5P_DEFAULT);
+  hid_t dataspace = H5Dget_space(dataset);
+  dataset_rank = H5Sget_simple_extent_ndims(dataspace);
+  H5Dclose(dataset);
+  return dataset_rank;
+}
+
+void EVPFFT::read_vtk_lattice_structure(const std::string & filetext)
+{
+  std::ifstream ur2;
+  ur2.open(filetext);
+  check_that_file_is_open(ur2, filetext.c_str());
+
+  // Skip header lines
+  std::string line;
+  while (std::getline(ur2, line)) {
+    if (line == "LOOKUP_TABLE default") {
+      break;
+    }
+  }
+
+  int i,j,k;
+  double grid_value;
+  for (int kk = 0; kk < npts3_g; kk++) {
+    for (int jj = 0; jj < npts2_g; jj++) {
+      for (int ii = 0; ii < npts1_g; ii++) {
+      
+        ur2 >> grid_value;
+        if ( (ii >= local_start1 and ii <= local_end1) and
+             (jj >= local_start2 and jj <= local_end2) and
+             (kk >= local_start3 and kk <= local_end3) )
+        {
+          i = ii - local_start1 + 1;
+          j = jj - local_start2 + 1;
+          k = kk - local_start3 + 1;
+          ph_array(i,j,k)    = 0;
+          th_array(i,j,k)    = 0;
+          om_array(i,j,k)    = 0;
+          jgrain(i,j,k)      = grid_value + 1;
+          jphase.host(i,j,k) = grid_value + 1;
+        }
+
+      }
+    }
+  }
+  // update device
+  jphase.update_device();
+  Kokkos::fence();
+
+  ur2.close();
+
+}
+
diff --git a/src/LS-EVPFFT/src/defgrad_dcmp.cpp b/src/LS-EVPFFT/src/defgrad_dcmp.cpp
new file mode 100644
index 000000000..51302de89
--- /dev/null
+++ b/src/LS-EVPFFT/src/defgrad_dcmp.cpp
@@ -0,0 +1,140 @@
+#include "defgrad_dcmp.h"
+#include "math_functions.h"
+#include "utilities.h"
+
+KOKKOS_FUNCTION
+real_t frobenius_norm(const real_t* A, int m, int n);
+
+KOKKOS_FUNCTION
+void defgrad_dcmp(real_t *F_, real_t *V_, real_t *R_)
+{
+
+  ViewMatrixTypeReal F (F_,3,3);
+  ViewMatrixTypeReal V (V_,3,3);
+  ViewMatrixTypeReal R (R_,3,3);
+
+  real_t A_curr_  [3*3];
+  real_t A_next_  [3*3];
+  real_t A_inv_   [3*3];
+  real_t A_trans_ [3*3];
+  real_t R_inv_   [3*3];
+  ViewMatrixTypeReal A_curr (A_curr_,3,3);
+  ViewMatrixTypeReal A_next (A_next_,3,3);
+  ViewMatrixTypeReal A_inv  (A_inv_,3,3);
+  ViewMatrixTypeReal A_trans  (A_trans_,3,3);
+  ViewMatrixTypeReal R_inv  (R_inv_,3,3);
+
+  const int iter_max = 50;
+  const real_t tol = 1.0e-12;
+  real_t conv_err = 1;
+
+  // Initialize A_curr as F before starting the iteration
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      A_curr(i,j) = F(i,j);
+      A_next(i,j) = 0.0;
+    }
+  }
+
+  // Do iterations
+  int iter;
+  for (iter = 1; iter <= iter_max; iter++)
+  {
+    // Store A_curr in A_inv before taking inverse
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        A_inv(i,j) = A_curr(i,j);
+      }
+    }
+
+    invert_matrix <3> (A_inv.pointer());
+
+    // Transpose A_inv
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        A_trans(i,j) = A_inv(j,i);
+      }
+    }
+
+    // Set scaling factor
+    real_t c;
+    if (conv_err < 10e-2) {
+      c = 1.0;
+    } else {
+      // c = 1.0;
+      c = SQRT(frobenius_norm(A_inv.pointer(),3,3) / 
+               frobenius_norm(A_curr.pointer(),3,3));
+    }
+
+    // Update A_next using the iterative formula
+    real_t c_inv = 1.0 / c;
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        A_next(i,j) = 0.5 * ( c * A_curr(i,j) + c_inv * A_trans(i,j) ) ;
+      }
+    }
+
+    // Check for convergence
+    conv_err = 0.0;
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        real_t temp = A_next(i,j) - A_curr(i,j);
+        conv_err += temp * temp;
+      }
+    }
+    if (conv_err < tol) {
+      break;
+    }
+
+    // Update A_curr for the next iteration
+    for (int i = 1; i <= 3; i++) {
+      for (int j = 1; j <= 3; j++) {
+        A_curr(i,j) = A_next(i,j);
+      }
+    }
+  }
+  // printf("\nTotal iter = %d\n", iter);
+
+  if (iter > iter_max) {
+    printf("Max number of iterations exceeded in %s\n", __FUNCTION__);
+    exit(1);
+  }
+
+  // The resulting matrix A_next is the rotation matrix R
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      R(i,j) = A_next(i,j);
+    }
+  }
+
+  // Calculate V using the relationship F = VR, V = F * R^{-1}
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      R_inv(i,j) = R(i,j);
+    }
+  }
+  invert_matrix <3> (R_inv.pointer());
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      V(i,j) = 0.0;
+      for (int k = 1; k <= 3; k++) {
+        V(i,j) += F(i,k) * R_inv(k,j);
+      }   
+    }   
+  }
+
+  return;
+}
+
+
+KOKKOS_FUNCTION
+real_t frobenius_norm(const real_t* A, int m, int n) {
+  real_t norm = 0.0;
+
+  for (int i = 0; i < m * n; i++) {
+    norm += A[i] * A[i];
+  }
+
+  return sqrt(norm);
+}
\ No newline at end of file
diff --git a/src/LS-EVPFFT/src/defgrad_dcmp.h b/src/LS-EVPFFT/src/defgrad_dcmp.h
new file mode 100644
index 000000000..02b1efa71
--- /dev/null
+++ b/src/LS-EVPFFT/src/defgrad_dcmp.h
@@ -0,0 +1,8 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+KOKKOS_FUNCTION
+void defgrad_dcmp(real_t *F_, real_t *V_, real_t *R_);
\ No newline at end of file
diff --git a/src/LS-EVPFFT/src/definitions.h b/src/LS-EVPFFT/src/definitions.h
new file mode 100644
index 000000000..015fc7ec0
--- /dev/null
+++ b/src/LS-EVPFFT/src/definitions.h
@@ -0,0 +1,105 @@
+#pragma once
+
+#include <string>
+#include "matar.h"
+
+namespace utils {
+// SINGLE determines whether single or double precision is built
+#ifdef SINGLE
+  #define MPI_REAL_T MPI_FLOAT
+  typedef float real_t;  // define native type for EVPFFT as single precision
+  #define FMT1 "%24.14f" //"%g"      // format argument for floats    
+  #define EMT1 "%24.14E" // format argument for eng floats
+#else
+  #define MPI_REAL_T MPI_DOUBLE
+  typedef double real_t; // define native type for EVPFFT as double precision
+  #define FMT1 "%24.14f" //"%lg"     // format argument for doubles   
+  #define EMT1 "%24.14E" // format argument for eng doubles
+#endif
+} // end of namespace utils scope
+//
+using namespace utils;
+using namespace mtr;
+
+// math functions
+#define POW2(x) ( (x)*(x) )
+#define POW3(x) ( (x)*(x)*(x) )
+
+#ifdef SINGLE
+  #define SIN(x)          sinf(x)
+  #define COS(x)          cosf(x)
+  #define TAN(x)          tanf(x)
+
+  #define ASIN(x)         asinf(x)
+  #define ACOS(x)         acosf(x)
+  #define ATAN(x)         atanf(x)
+  #define ATAN2(x,y)      atan2f(x,y)
+
+  #define ABS(x)          fabsf(x)
+  #define SQRT(x)         sqrtf(x)
+  #define EXP(x)          expf(x)
+  #define LOG(x)          logf(x)
+  #define POW(x,y)        powf(x,y)
+  #define COPYSIGN(x,y)   copysignf(x,y)
+
+#else
+  #define SIN(x)          sin(x)
+  #define COS(x)          cos(x)
+  #define TAN(x)          tan(x)
+
+  #define ASIN(x)         asin(x)
+  #define ACOS(x)         acos(x)
+  #define ATAN(x)         atan(x)
+  #define ATAN2(x,y)      atan2(x,y)
+
+  #define ABS(x)          fabs(x)
+  #define SQRT(x)         sqrt(x)
+  #define EXP(x)          exp(x)
+  #define LOG(x)          log(x)
+  #define POW(x,y)        pow(x,y)
+  #define COPYSIGN(x,y)   copysign(x,y)
+#endif
+
+
+// aliases
+using MatrixTypeIntDevice   = FMatrixKokkos <int>;
+using MatrixTypeRealDevice  = FMatrixKokkos <real_t>;
+using MatrixTypeRealDual    = DFMatrixKokkos <real_t>;
+using MatrixTypeIntDual     = DFMatrixKokkos <int>;
+using MatrixTypeRealHost    = FMatrix <real_t>;
+using MatrixTypeIntHost     = FMatrix <int>;
+using MatrixTypeStringHost  = FMatrix <std::string>;
+// Views dont need Device or Host designations
+// Views can be created on both device or host.
+using ViewMatrixTypeInt     = ViewFMatrixKokkos <int>;
+using ViewMatrixTypeReal    = ViewFMatrixKokkos <real_t>;
+
+// CArray nested loop convention use Right, FArray use Left
+#define LOOP_ORDER Kokkos::Iterate::Right
+
+// Definitions for printing and file manipulation
+#define screenOut stdout
+#define CLEAR_LINE(ifstream) ( ifstream.ignore(std::numeric_limits<std::streamsize>::max(), '\n') );
+#define FULL_INPUT_PATH(filename) ( _INPUT_DIR filename )
+#define FULL_OUTPUT_PATH(filename) ( _OUTPUT_DIR filename )
+
+// *********************** For debuging ***********************
+#define PRINT_LINE Kokkos::fence(); printf("HERE LINE %d IN FILE %s\n", __LINE__, __FILE__); fflush(stdout);
+
+#define PRINT_ARRAY_DIMS(x) \
+        for (int i = 0; i < x.order(); i++) { \
+            printf("size of dim %d = %d\n", i, x.dims(i)); \
+            fflush(stdout); \
+        }
+
+#define PRINT_MATRIX_DIMS(x) \
+        for (int i = 1; i <= x.order(); i++) { \
+            printf("size of dim %d = %d\n", i, x.dims(i)); \
+            fflush(stdout); \
+        }
+        
+#define PAUSE Kokkos::fence(); std::cout << "\nC++ PAUSE: enter <return> or <ctrl>d to continue>\n"; std::cin.get();
+
+#define PRINT_ARRAY(arr) Kokkos::fence(); print_array(arr.pointer(), arr.size());
+        
+#define PRINT_DARRAY(arr) Kokkos::fence(); print_array(arr.host_pointer(), arr.size());
diff --git a/src/LS-EVPFFT/src/determinant33.cpp b/src/LS-EVPFFT/src/determinant33.cpp
new file mode 100644
index 000000000..2be7660b7
--- /dev/null
+++ b/src/LS-EVPFFT/src/determinant33.cpp
@@ -0,0 +1,18 @@
+#include "determinant33.h"
+
+KOKKOS_FUNCTION
+void determinant33(real_t *mat_, real_t &det_mat)
+{
+
+  ViewMatrixTypeReal mat(mat_, 3,3);
+
+  det_mat = 
+     mat(1,1)*mat(2,2)*mat(3,3) 
+    +mat(1,2)*mat(2,3)*mat(3,1) 
+    +mat(1,3)*mat(2,1)*mat(3,2) 
+    -mat(3,1)*mat(2,2)*mat(1,3) 
+    -mat(3,2)*mat(2,3)*mat(1,1) 
+    -mat(3,3)*mat(2,1)*mat(1,2);
+
+  
+}
diff --git a/src/LS-EVPFFT/src/determinant33.h b/src/LS-EVPFFT/src/determinant33.h
new file mode 100644
index 000000000..823712ac5
--- /dev/null
+++ b/src/LS-EVPFFT/src/determinant33.h
@@ -0,0 +1,8 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+KOKKOS_FUNCTION
+void determinant33(real_t *mat_, real_t &det_mat);
diff --git a/src/LS-EVPFFT/src/euler.cpp b/src/LS-EVPFFT/src/euler.cpp
new file mode 100644
index 000000000..4752b56a7
--- /dev/null
+++ b/src/LS-EVPFFT/src/euler.cpp
@@ -0,0 +1,46 @@
+#include "euler.h"
+
+KOKKOS_FUNCTION
+void euler(int iopt, real_t &ph, real_t &th, real_t &tm, real_t *a_)
+{
+  ViewMatrixTypeReal a(a_,3,3);
+  real_t sth,sph,cph,cth,stm,ctm; 
+  const real_t pi = 4.*ATAN(1.0);
+
+  //  CALCULATE THE EULER ANGLES ASSOCIATED WITH THE TRANSFORMATION
+  //  MATRIX A(I,J) IF IOPT=1 AND VICEVERSA IF IOPT=2
+  //  A(i,j) TRANSFORMS FROM SYSTEM sa TO SYSTEM ca.
+  //  ph,th,om ARE THE EULER ANGLES OF ca REFERRED TO sa.
+
+  if (iopt == 1) {
+    th = ACOS(a(3,3));
+    if(ABS(a(3,3)) >= 0.9999) {
+      tm = 0.0;
+      ph = ATAN2(a(1,2),a(1,1));
+    } else {
+      sth = SIN(th);
+      tm  = ATAN2(a(1,3)/sth,a(2,3)/sth);
+      ph  = ATAN2(a(3,1)/sth,-a(3,2)/sth);
+    }
+    th = th*180.0/pi;
+    ph = ph*180.0/pi;
+    tm = tm*180.0/pi;
+  } else if (iopt == 2) {
+    sph = SIN(ph);
+    cph = COS(ph);
+    sth = SIN(th);
+    cth = COS(th);
+    stm = SIN(tm);
+    ctm = COS(tm);
+    a(1,1) =  ctm*cph-sph*stm*cth;
+    a(2,1) = -stm*cph-sph*ctm*cth;
+    a(3,1) =  sph*sth;
+    a(1,2) =  ctm*sph+cph*stm*cth;
+    a(2,2) = -sph*stm+cph*ctm*cth;
+    a(3,2) = -sth*cph;
+    a(1,3) =  sth*stm;
+    a(2,3) =  ctm*sth;
+    a(3,3) =  cth;
+  } // end if (iopt == 1)
+
+}
diff --git a/src/LS-EVPFFT/src/euler.h b/src/LS-EVPFFT/src/euler.h
new file mode 100644
index 000000000..51019f345
--- /dev/null
+++ b/src/LS-EVPFFT/src/euler.h
@@ -0,0 +1,8 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+KOKKOS_FUNCTION
+void euler(int iopt, real_t &ph, real_t &th, real_t &tm, real_t *a_);
diff --git a/src/LS-EVPFFT/src/evpal.cpp b/src/LS-EVPFFT/src/evpal.cpp
new file mode 100644
index 000000000..964b766fe
--- /dev/null
+++ b/src/LS-EVPFFT/src/evpal.cpp
@@ -0,0 +1,395 @@
+#include "evpfft.h"
+#include "math_functions.h"
+#include "utilities.h"
+#include "reduction_data_structures.h"
+#include "Profiler.h"
+
+void EVPFFT::evpal(int imicro)
+{
+  Profiler profiler(__FUNCTION__);
+
+#if BUILD_LS_EVPFFT_FIERRO
+  // create space to perform reduction for dsde and cg66.
+  const size_t n = 3+36+36+9; // 3 for errs, erre, iter. 36 for dsde_avg. 36 for cg66_avg. 9 for edotp_avg
+  ArrayType <real_t, n> all_reduce;
+#else
+  const size_t n = 3; // 3 for errs, erre, iter
+  ArrayType <real_t, n> all_reduce;
+#endif
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    int jph;
+    int itmaxal;
+    int iter1;
+    int isign;
+    int icut;
+
+    real_t sgnorm;
+    real_t dgnorm;
+    real_t erroral;
+    real_t erral;
+    real_t dsgnorm1;
+    real_t dsgnorm2;
+    real_t ddgnorm;
+    real_t errald;
+    real_t resn;
+    real_t resoldn;
+
+    // thread private arrays
+    real_t xlambda_[3*3];
+    real_t xlambda6_[6];
+    real_t sgaux_[3*3];
+    real_t sgtaux_[3*3];
+    real_t sg6_[6];
+    real_t sgt6_[6];
+    real_t sg6old_[6];
+    real_t eptaux_[3*3];
+    real_t ept6_[6];
+    real_t edotpaux_[3*3];
+    real_t edotp6_[6];
+    real_t edote6_[6];
+    real_t dedotp66_[6*6];
+    real_t strainaux_[3*3];
+    real_t strain6_[6];
+    real_t sg66_[6*6];
+    real_t strainceq_[3*3];
+    real_t strainceq6_[6];
+    real_t res_[6];
+    real_t xjacobinv_[6*6];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal xlambda(xlambda_,3,3);
+    ViewMatrixTypeReal xlambda6(xlambda6_,6);
+    ViewMatrixTypeReal sgaux(sgaux_,3,3);
+    ViewMatrixTypeReal sgtaux(sgtaux_,3,3);
+    ViewMatrixTypeReal sg6(sg6_,6);
+    ViewMatrixTypeReal sgt6(sgt6_,6);
+    ViewMatrixTypeReal sg6old(sg6old_,6);
+    ViewMatrixTypeReal eptaux(eptaux_,3,3);
+    ViewMatrixTypeReal ept6(ept6_,6);
+    ViewMatrixTypeReal edotpaux(edotpaux_,3,3);
+    ViewMatrixTypeReal edotp6(edotp6_,6);
+    ViewMatrixTypeReal edote6(edote6_,6);
+    ViewMatrixTypeReal dedotp66(dedotp66_,6,6);
+    ViewMatrixTypeReal strainaux(strainaux_,3,3);
+    ViewMatrixTypeReal strain6(strain6_,6);
+    ViewMatrixTypeReal sg66(sg66_,6,6);
+    ViewMatrixTypeReal strainceq(strainceq_,3,3);
+    ViewMatrixTypeReal strainceq6(strainceq6_,6);
+    ViewMatrixTypeReal res(res_,6);
+    ViewMatrixTypeReal xjacobinv(xjacobinv_,6,6);
+
+    jph = jphase(i,j,k);
+
+    if (igas(jph) == 0) {
+
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          sg66(ii,jj) = cg66(ii,jj,i,j,k);
+        }
+      }
+
+      invert_matrix <6> (sg66.pointer());
+
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          xlambda(ii,jj)   = sg(ii,jj,i,j,k);
+          sgaux(ii,jj)     = sg(ii,jj,i,j,k);
+          sgtaux(ii,jj)    = sgt(ii,jj,i,j,k);
+          eptaux(ii,jj)    = ept(ii,jj,i,j,k);
+          strainaux(ii,jj) = (velgrad(ii,jj,i,j,k) + velgrad(jj,ii,i,j,k)) / 2.0;
+        }
+      }
+
+      cb.chg_basis_2(xlambda6.pointer(), xlambda.pointer(), 2, 6, cb.B_basis_device_pointer());
+      cb.chg_basis_2(sg6.pointer(), sgaux.pointer(), 2, 6, cb.B_basis_device_pointer());
+      cb.chg_basis_2(sgt6.pointer(), sgtaux.pointer(), 2, 6, cb.B_basis_device_pointer());
+      cb.chg_basis_2(ept6.pointer(), eptaux.pointer(), 2, 6, cb.B_basis_device_pointer());
+      cb.chg_basis_2(strain6.pointer(), strainaux.pointer(), 2, 6, cb.B_basis_device_pointer());
+
+      sgnorm = 0.0;
+      dgnorm = 0.0;
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sgnorm += xlambda(ii,jj)*xlambda(ii,jj);
+          dgnorm += strainaux(ii,jj)*strainaux(ii,jj);
+        }
+      }
+      sgnorm = SQRT(sgnorm);
+      dgnorm = SQRT(dgnorm);
+
+      erroral = 1.0e-10;
+      erral   = 2.0*erroral;
+      itmaxal = 100;
+      iter1 = 0;
+      resoldn = 1.0;
+
+      while (iter1 < itmaxal && erral > erroral) {
+          iter1 += 1;
+
+          icut = 0;
+          resn = 2.0*resoldn;
+          while (resn > resoldn && icut < 10) {
+
+            //     GET RESOLVED SHEAR STRESSES 'rss' AND SHEAR RATES 'gamdot'.
+            //     SIGN(GAMDOT)=SIGN(RSS).
+            //     NRS CAN BE EVEN OR ODD.
+            //     RSS1 IS ALWAYS > 0 AND IS USED TO CALCULATE VISCOUS COMPLIANCE.
+  
+            for (int is = 1; is <= nsyst(jph); is++) {
+#ifdef NON_SCHMID_EFFECTS
+              rss(is,i,j,k) = schnon(1,is,,i,j,k)*sg6(1) + 
+                        schnon(2,is,i,j,k)*sg6(2) + 
+                        schnon(3,is,i,j,k)*sg6(3) + 
+                        schnon(4,is,i,j,k)*sg6(4) + 
+                        schnon(5,is,i,j,k)*sg6(5);
+#else
+              rss(is,i,j,k) = sch(1,is,i,j,k)*sg6(1) +
+                        sch(2,is,i,j,k)*sg6(2) + 
+                        sch(3,is,i,j,k)*sg6(3) + 
+                        sch(4,is,i,j,k)*sg6(4) + 
+                        sch(5,is,i,j,k)*sg6(5);
+#endif
+              isign = 1;
+              if ( (rss(is,i,j,k)-xkin(is,i,j,k)) < 0.0 ) {
+                isign = 2;
+              }
+     
+              rss(is,i,j,k) = (rss(is,i,j,k)-xkin(is,i,j,k))/crss(is,isign,i,j,k);
+  
+#ifdef TWO_SIGN_SLIP_SYSTEMS
+              if ( rss(is,i,j,k) < 0.0 ) {
+                rss(is,i,j,k) = 0.0;
+              }
+#endif
+  
+#ifdef TWO_SIGN_SLIP_SYSTEMS
+              rss1(is,i,j,k) = gamd0(is,jph) * nrs(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),(nrs(is,jph)-1))) / crss(is,isign,i,j,k);
+              rss2(is,i,j,k) = gamd0(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),nrs(is,jph))) * COPYSIGN(1.0,rss(is,i,j,k));
+#else
+              rss1(is,i,j,k) = gamd0(is,jph) * nrs(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),(nrs(is,jph)-1))) / crss(is,isign,i,j,k);
+              rss2(is,i,j,k) = gamd0(is,jph) * ABS(PowIntExpo(rss(is,i,j,k),nrs(is,jph))) * COPYSIGN(1.0,rss(is,i,j,k));
+#endif
+  
+              gamdot(is,i,j,k) = rss2(is,i,j,k);
+            } // end for is
+  
+            for (int ii = 1; ii <= 5; ii++) {
+              edotp6(ii) = 0.0;
+              for (int k1 = 1; k1 <= nsyst(jph); k1++) {
+                edotp6(ii) += sch(ii,k1,i,j,k) * rss2(k1,i,j,k);
+              }
+            }
+            edotp6(6) = 0.0;
+  
+            for (int jj = 1; jj <= 5; jj++) {
+              for (int ii = 1; ii <= 5; ii++) {
+  
+                dedotp66(ii,jj) = 0.0;
+  
+                for (int k1 = 1; k1 <= nsyst(jph); k1++) {         
+#ifdef NON_SCHMID_EFFECTS   
+                  dedotp66(ii,jj) += sch(ii,k1,i,j,k)*schnon(jj,k1,i,j,k)*rss1(k1,i,j,k);
+#else
+                  dedotp66(ii,jj) += sch(ii,k1,i,j,k)*sch(jj,k1,i,j,k)*rss1(k1,i,j,k);
+#endif
+                } // end for k1
+  
+              } // end for ii
+            } // end for jj
+  
+            for (int ii = 1; ii <= 6; ii++) {
+              dedotp66(ii,6) = 0.0;
+              dedotp66(6,ii) = 0.0;
+            }
+  
+            // elastic strain rate
+            for (int ii = 1; ii <= 6; ii++) {
+              edote6(ii) = 0.0;
+              for (int jj = 1; jj <= 6; jj++) {
+                edote6(ii) += sg66(ii,jj)*(sg6(jj) - sgt6(jj))/tdot;
+              } // end for jj
+            } // end for ii
+     
+            for (int ii = 1; ii <= 6; ii++) {
+                strainceq6(ii) = edotp6(ii) + edote6(ii);
+            } // end for ii
+  
+            // cb.chg_basis_1(strainceq6.pointer(), strainceq.pointer(), 1, 6, cb.B_basis_device_pointer()); // strainceq is not used anywhere
+  
+            for (int ii = 1; ii <= 6; ii++) {
+              res(ii) = sg6(ii) - xlambda6(ii);
+              for (int jj = 1; jj <= 6; jj++) {
+                res(ii) += c066mod(ii,jj,i,j,k) * (strainceq6(jj) - strain6(jj));
+              }
+            }
+
+            resn = 0.0;
+
+            for (int ii = 1; ii <= 6; ii++) {     
+              resn += POW2(res(ii));
+            } // end for ii
+
+            resn = sqrt(resn);
+
+            if (iter1 == 1) {
+              resoldn = resn;
+            }
+
+            if (resn > resoldn) { 
+              for (int ii = 1; ii <= 6; ii++) {     
+                sg6(ii) = sg6old(ii) + (sg6(ii) - sg6old(ii))*0.5;
+              } // end for ii
+            }
+
+            icut += 1;
+
+          } // end while (resn > resoldn && icut < 10)
+
+          resoldn = resn;
+          for (int iii = 1; iii <= 6; iii++) {
+            sg6old(iii) = sg6(iii);
+          } // end for iii
+
+          for (int ii = 1; ii <= 6; ii++) {
+            for (int jj = 1; jj <= 6; jj++) {
+
+              xjacobinv(ii,jj) = (ii/jj)*(jj/ii);
+
+              for (int kk = 1; kk <= 6; kk++) {
+                xjacobinv(ii,jj) += c066mod(ii,kk,i,j,k)*(sg66(kk,jj)/tdot+dedotp66(kk,jj));
+              } // end for kk
+
+            } // end for jj
+          } // end for ii
+
+ 
+          int error_flag = invert_matrix <6> (xjacobinv.pointer());
+
+
+          // TODO: optimize indexing of this loop
+          for (int ii = 1; ii <= 6; ii++) {
+            for (int jj = 1; jj <= 6; jj++) {
+              sg6(ii) += -xjacobinv(ii,jj)*res(jj);
+            } // end for jj
+          } // end for ii
+
+          // solve_linear_system results in more NR iterations than inversion
+#if 0 
+          // Calculate new stress by solving the system -[J][delt_sg] = [R]
+          int error_flag = solve_linear_system(xjacobinv.pointer(), res.pointer(), 6);
+          //if (error_flag != 0) {
+          //  printf("Failed solve linear system\n");
+          //  exit(1);
+          //}
+        
+          //sg6 = sg6 - res
+          for (int ii = 1; ii <= 6; ii++) {
+            sg6(ii) -= res(ii);
+          } // end for ii
+#endif
+
+          dsgnorm1 = 0.0;
+          dsgnorm2 = 0.0;
+          ddgnorm  = 0.0;
+
+          for (int ii = 1; ii <= 6; ii++) {     
+            dsgnorm1 += POW2(sg6(ii)-sg6old(ii));
+            dsgnorm2 += POW2(sg6(ii)-xlambda6(ii));
+            ddgnorm  += POW2(strainceq6(ii)-strain6(ii));
+          } // end for ii
+
+          erral  = sqrt(dsgnorm1) / sgnorm;
+          errald = sqrt(ddgnorm) / dgnorm;
+
+      } // end while (iter1 < itmaxal && erral > erroral)
+
+      cb.chg_basis_1(sg6.pointer(), sgaux.pointer(), 1, 6, cb.B_basis_device_pointer());
+      cb.chg_basis_1(edotp6.pointer(), edotpaux.pointer(), 1, 6, cb.B_basis_device_pointer());
+
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sg(ii,jj,i,j,k)    = sgaux(ii,jj);
+          edotp(ii,jj,i,j,k) = edotpaux(ii,jj);
+        } // end for jj
+      } // end for ii
+
+      loc_reduce.array[0] += dsgnorm2*wgtc(i,j,k);
+      loc_reduce.array[1] += ddgnorm*wgtc(i,j,k);
+      loc_reduce.array[2] += iter1*wgtc(i,j,k);
+
+    } else { // else for if (igas(jph) == 0)
+
+      // TODO: optimize indexing of this loop
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sg(ii,jj,i,j,k) = 0.0;  
+          edotp(ii,jj,i,j,k) = 0.0;    //  ????
+        } // end for jj
+      } // end for ii
+
+    } // end if (igas(jph) == 0)
+
+#if BUILD_LS_EVPFFT_FIERRO
+    ViewMatrixTypeReal cg66_avg(&loc_reduce.array[3],6,6);
+    ViewMatrixTypeReal dedotp66_avg(&loc_reduce.array[39],6,6);
+    for (int ii = 1; ii <= 6; ii++) {
+      for (int jj = 1; jj <= 6; jj++) {
+        cg66_avg(ii,jj) += cg66(ii,jj,i,j,k) * wgtc(i,j,k);
+        dedotp66_avg(ii,jj) += dedotp66(ii,jj) * wgtc(i,j,k);
+      }
+    }
+
+    ViewMatrixTypeReal edotp_avg_loc(&loc_reduce.array[75],3,3);
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        edotp_avg_loc(ii,jj) += edotp(ii,jj,i,j,k) * wgtc(i,j,k);
+      }
+    }
+#endif
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  //update host
+  gamdot.update_host();
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  errs = all_reduce.array[0];
+  erre = all_reduce.array[1];
+  avg_nr_iter = all_reduce.array[2];
+
+#if BUILD_LS_EVPFFT_FIERRO
+  ViewMatrixTypeReal cg66_avg_view (&all_reduce.array[3],6,6);
+  ViewMatrixTypeReal dedotp66_avg_view (&all_reduce.array[39],6,6);
+  ViewMatrixTypeReal edotp_avg_view (&all_reduce.array[75],3,3); 
+
+  for (int ii = 1; ii <= 6; ii++) {
+    for (int jj = 1; jj <= 6; jj++) {
+      cg66_avg(ii,jj) = cg66_avg_view(ii,jj);
+      sg66_avg(ii,jj) = cg66_avg_view(ii,jj);
+      dedotp66_avg(ii,jj) = dedotp66_avg_view(ii,jj);
+    }
+  }
+
+  invert_matrix <6> (sg66_avg.pointer());
+
+  // copy edotp_avg_view into edotp_avg
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      edotp_avg(ii,jj) = edotp_avg_view(ii,jj);
+    }
+  }
+
+// endif for if BUILD_LS_EVPFFT_FIERRO
+#endif
+
+}
diff --git a/src/LS-EVPFFT/src/evpfft.cpp b/src/LS-EVPFFT/src/evpfft.cpp
new file mode 100644
index 000000000..a74f3962d
--- /dev/null
+++ b/src/LS-EVPFFT/src/evpfft.cpp
@@ -0,0 +1,607 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+
+#include "evpfft.h"
+#include "utilities.h"
+#include <math.h>
+#include <algorithm>
+#include "Profiler.h"
+
+EVPFFT::EVPFFT(const MPI_Comm mpi_comm_, const CommandLineArgs cmd_, const real_t stress_scale_, const real_t time_scale_)
+//-------------------------------------------------
+// Data Members needed for EVPFFT Calculations
+//-------------------------------------------------
+
+  : cmd (cmd_)
+  , mpi_comm (mpi_comm_)
+  , root (0)
+  , my_rank (get_mpi_comm_rank(mpi_comm))
+  , num_ranks (get_mpi_comm_size(mpi_comm))
+  , cb()
+
+  , nsteps (0)
+  , pi (4.0*atan(1.0))
+  , imicro(0)
+
+  , active(false)
+  , stress_scale(stress_scale_)
+  , time_scale(time_scale_)
+  , dtAcc(0.0)
+
+  , ofile_mgr ()
+  , hdf5_filename ("micro_state_evpfft.h5")
+//-----------------------------------------------
+// End Of EVPFFT Data Members
+//-----------------------------------------------
+{
+  Profiler profiler(__FUNCTION__);
+
+  // Reading and initialization of arrays and parameters
+  vpsc_input();
+  //set_some_voxels_arrays_to_zero();
+  init_after_reading_input_data();
+
+  //... For file management
+  if (0 == my_rank) {
+    ofile_mgr.open_files();
+  }
+
+  //...
+}
+
+EVPFFT::~EVPFFT(){}
+
+void EVPFFT::set_some_voxels_arrays_to_zero()
+{
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    for (int jj = 1; jj <= 6; jj++) {
+      for (int ii = 1; ii <= 6; ii++) {
+        cg66(ii,jj,i,j,k) = 0.0;
+      }
+    }
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        ag(ii,jj,i,j,k) = 0.0;
+        velgrad(ii,jj,i,j,k) = 0.0;
+      }
+    }
+
+    for (int ii = 1; ii <= NSYSMX; ii++) {     
+      xkin(ii,i,j,k) = 0.0;
+      gamdot(ii,i,j,k) = 0.0;
+    }
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        disgrad(ii,jj,i,j,k) = 0.0;
+        edotp(ii,jj,i,j,k) = 0.0;
+        sg(ii,jj,i,j,k) = 0.0;
+      }
+    }
+
+    gacumgr(i,j,k) = 0.0;
+
+  }); // end FOR_ALL_CLASS
+  Kokkos::fence();
+
+  // update host
+  cg66.update_host();
+  ag.update_host();
+  velgrad.update_host();
+  gamdot.update_host();
+  disgrad.update_host();
+  edotp.update_host();
+  sg.update_host();
+  defgradp.update_host();
+  gacumgr.update_host();
+
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      scauav(ii,jj) = 0.0;  // macroscopic stress
+      udotAcc(ii,jj) = 0.0;
+      if (ii == jj) {
+        defgradinvavgc_inv(ii,jj) = 1.0;
+      } else {
+        defgradinvavgc_inv(ii,jj) = 0.0;
+      }
+    }
+  }
+}
+
+void EVPFFT::init_after_reading_input_data()
+{
+    init_xk_gb();
+    init_disgradmacro_velgradmacro();
+    init_ept();
+    init_disgrad();
+    init_evm();
+    init_defgrad();
+
+// the variables initialized in the funcitons below are reduced into
+// and should be done once, hence the need for #if guard since the variables
+// needs to be initialized after udot and dt are know from fierro
+#ifndef BUILD_LS_EVPFFT_FIERRO
+    init_sg();
+    // init_c0_s0();
+#endif
+}
+
+void EVPFFT::init_xk_gb()
+{
+
+  FOR_ALL_CLASS(kxx, 0, npts1, {
+        int kx;
+        kx = kxx + local_start1_cmplx;
+        if (kx >  npts1_g/2) kx = kx-npts1_g;
+        xk_gb(kxx+1) = kx/float(npts1_g);
+  });
+
+  FOR_ALL_CLASS(kyy, 0, npts2, {
+        int ky;
+        ky = kyy + local_start2_cmplx;
+        if (ky >  npts2_g/2) ky = ky-npts2_g;
+        yk_gb(kyy+1) = ky/float(npts2_g);
+  });
+
+  FOR_ALL_CLASS(kzz, 0, npts3, {
+        int kz;
+        kz = kzz + local_start3_cmplx;
+        if (kz >  npts3_g/2) kz = kz-npts3_g;
+        zk_gb(kzz+1) = kz/float(npts3_g);
+  });
+  Kokkos::fence();
+
+  // update host
+  xk_gb.update_host();
+  yk_gb.update_host();
+  zk_gb.update_host();
+}
+
+void EVPFFT::init_disgradmacro_velgradmacro()
+{
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      disgradmacrot(ii,jj) = 0.0;
+      //disgradmacro.host(ii,jj)  = udot.host(ii,jj) * tdot;
+      disgradmacro.host(ii,jj)  = 0.0;
+      velgradmacro.host(ii,jj)  = udot.host(ii,jj);
+    }
+  }
+
+  // update_device
+  disgradmacro.update_device();
+  velgradmacro.update_device();
+  Kokkos::fence();
+}
+
+void EVPFFT::init_ept()
+{
+
+  for (int k = 1; k <= npts3; k++) {
+  for (int j = 1; j <= npts2; j++) {
+  for (int i = 1; i <= npts1; i++) {
+
+    int jgr;
+    jgr = jgrain(i,j,k);
+    if (ithermo == 1 and jgr > 0) {
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          ept.host(ii,jj,i,j,k) = eth(ii,jj,jgr);
+        }
+      }
+    } else {
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          ept.host(ii,jj,i,j,k) = 0.0;
+        }
+      }
+    }
+
+  } // end for i
+  } // end for j
+  } // end for k
+
+  // update device
+  ept.update_device();
+  Kokkos::fence();
+
+}
+
+void EVPFFT::init_disgrad()
+{
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        disgrad(ii,jj,i,j,k) = 0.0;
+      }
+    }
+  }); // end FOR_ALL_CLASS
+  Kokkos::fence();
+
+  // update device
+  disgrad.update_host();
+}
+
+void EVPFFT::init_sg()
+{
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        sg(ii,jj,i,j,k) = 0.0;
+        sgt(ii,jj,i,j,k) = 0.0;
+      }
+    }
+  }); // end FOR_ALL_CLASS
+
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    int jph;
+    real_t cg66aux_[6*6];
+    real_t cg_[3*3*3*3];
+    ViewMatrixTypeReal cg66aux(cg66aux_,6,6);
+    ViewMatrixTypeReal cg(cg_,3,3,3,3);
+
+    jph = jphase(i,j,k);
+
+    if (igas(jph) == 0) {
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          cg66aux(ii,jj) = cg66(ii,jj,i,j,k);
+        }
+      }
+
+      cb.chg_basis_3(cg66aux.pointer(), cg.pointer(), 3, 6, cb.B_basis_device_pointer());
+
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+
+          sg(ii,jj,i,j,k) = 0.0;
+
+          for (int kk = 1; kk <= 3; kk++) {
+            for (int ll = 1; ll <= 3; ll++) {
+              // sg(ii,jj,i,j,k) += cg(ii,jj,kk,ll) * disgradmacro(kk,ll);
+              sg(ii,jj,i,j,k) += cg(ii,jj,kk,ll) * udot(kk,ll) * tdot;
+            }
+          }
+        }
+      }
+
+    } else {
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sg(ii,jj,i,j,k) = 0.0;
+        }
+      }
+    }
+  }); // end FOR_ALL_CLASS
+  Kokkos::fence();
+
+  // update host
+  sg.update_host();
+  sgt.update_host();
+}
+
+void EVPFFT::init_evm()
+{
+  evm = dvm*tdot;
+}
+
+void EVPFFT::init_c0_s0()
+{
+  for (int ii = 1; ii <= 6; ii++) {
+    for (int jj = 1; jj <= 6; jj++) {
+      c066.host(ii,jj) *= tdot;
+    }
+  }
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int kk = 1; kk <= 3; kk ++) {
+        for (int ll = 1; ll <= 3; ll ++) {
+          c0.host(ii,jj,kk,ll)  *= tdot;
+          s0.host(ii,jj,kk,ll)  /= tdot;
+        }
+      }
+    }
+  }
+
+  // update_device
+  c066.update_device();
+  c0.update_device();
+  s0.update_device();
+  Kokkos::fence();
+}
+
+void EVPFFT::deinit_c0_s0()
+{
+  for (int ii = 1; ii <= 6; ii++) {
+    for (int jj = 1; jj <= 6; jj++) {
+      c066.host(ii,jj) /= tdot;
+    }
+  }
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int kk = 1; kk <= 3; kk ++) {
+        for (int ll = 1; ll <= 3; ll ++) {
+          c0.host(ii,jj,kk,ll)  /= tdot;
+          s0.host(ii,jj,kk,ll)  *= tdot;
+        }
+      }
+    }
+  }
+
+  // update_device
+  c066.update_device();
+  c0.update_device();
+  s0.update_device();
+  Kokkos::fence();
+}
+
+void EVPFFT::init_defgrad() {
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgradp (ii,jj,i,j,k) = 0.0;
+        defgrade (ii,jj,i,j,k) = 0.0;
+        defgrad (ii,jj,i,j,k) = 0.0;
+        defgradinv (ii,jj,i,j,k) = 0.0;
+        defgradini (ii,jj,i,j,k) = 0.0;
+      }
+      defgradp (jj,jj,i,j,k) = 1.0;
+      defgrade (jj,jj,i,j,k) = 1.0;
+      defgrad (jj,jj,i,j,k) = 1.0;
+      defgradinv (jj,jj,i,j,k) = 1.0;
+      defgradini (jj,jj,i,j,k) = 1.0;
+    }
+    detF(i,j,k) = 1.0;
+
+    wgtc(i,j,k) = wgt;
+  }); // end FOR_ALL_CLASS
+
+  FOR_ALL_CLASS(k, 1, npts3+2,
+                j, 1, npts2+2,
+                i, 1, npts1+2, {
+
+    xnode(1,i,j,k) = float(i) - 0.5;
+    xnode(2,i,j,k) = float(j) - 0.5;
+    xnode(3,i,j,k) = float(k) - 0.5;
+
+  }); // end FOR_ALL_CLASS
+
+}
+
+void EVPFFT::evolve()
+{
+
+    //for (imicro = 1; imicro <= nsteps; imicro++) {
+#ifndef ABSOLUTE_NO_OUTPUT
+      if (0 == my_rank) {
+        printf(" ***************************************************\n");
+        printf("       Current  Time  STEP = %d\n", imicro);
+        fprintf(ofile_mgr.conv_file.get(), "STEP = %d\n", imicro);
+      }
+#endif
+      init_c0_s0();
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          velgradmacro.host(ii,jj)  = udot.host(ii,jj);
+        }
+      }
+      // if (imicro == 1) {
+      //   step_update_velgrad();
+      // }
+      step_update_velgrad();
+
+      if (imicro == 1 || iupdate == 1) {
+        update_schmid();
+      }
+
+      calc_c066mod();
+
+      step_set_dvelgradmacro_and_dvelgradmacroacum_to_zero();
+
+      iter = 0;
+      erre = 2.0*error;
+      errs = 2.0*error;
+
+      while (iter < itmax && (errs > error || erre > error)) {
+
+        iter += 1;
+
+#ifndef ABSOLUTE_NO_OUTPUT
+        if (0 == my_rank) {
+          printf(" --------------------\n");
+          printf(" STEP = %d\n", imicro);
+          printf(" ITER = %d\n", iter);
+          printf(" DIRECT FFT OF STRESS FIELD\n");
+          fprintf(ofile_mgr.conv_file.get(), "ITER = %d\n", iter);
+        }
+#endif
+
+        // transform to PK1
+        Cauchy_to_PK1();
+
+        // forward fft of sgPK1
+        forward_fft();
+
+#ifndef ABSOLUTE_NO_OUTPUT
+        if (0 == my_rank) {
+          printf(" CALCULATING G^pq,ij : SG^ij ...\n");
+        }
+#endif
+
+        // nr_lu_minval takes local part of work,
+        // updates local work, then allgathers
+        inverse_the_greens();
+
+#ifndef ABSOLUTE_NO_OUTPUT
+        if (0 == my_rank) {
+          printf(" INVERSE FFT TO GET STRAIN FIELD\n");
+        }
+#endif
+
+        // backward fft of work/workim
+        backward_fft();
+
+        initialize_velgrad();
+
+#ifndef ABSOLUTE_NO_OUTPUT
+        if (0 == my_rank) {
+          printf(" UPDATE STRESS FIELD\n\n");
+        }
+#endif
+
+        // Newton-Raphson iteration
+        // evpal takes local disgrad, trialtau, xkin, crss, gacumgr
+        // updates local sg, edotp, trialtau, gamdot
+        evpal(imicro);
+
+        get_smacro();
+
+        if (dvm > 0.0) erre = sqrt(erre) / dvm;
+        if (svm > 0.0) errs = sqrt(errs) / svm;
+
+#ifndef ABSOLUTE_NO_OUTPUT
+        if (0 == my_rank) {
+          printf(" STRAIN FIELD ERROR = %24.14E\n", erre);
+          printf(" STRESS FIELD ERROR = %24.14E\n", errs);
+          printf(" AVG NUM OF NR ITER = %24.14E\n", avg_nr_iter);
+          fprintf(ofile_mgr.err_file.get(), "%d,%d,%10.4E,%10.4E,%10.4E,%10.4E\n", 
+                  imicro, iter, erre, errs, svm, avg_nr_iter);
+        }
+#endif
+      } // end while loop
+
+      // kinematic hardening (ithermo == 1)
+      if (ithermo == 1 && imicro == 1) {
+        kinhard_param();
+      }
+
+      step_update();
+
+      step_vm_calc();
+
+      // if (iupdate == 1 && (ithermo != 1 || imicro > 1)) {
+      if (iupdate == 1 ) { // do not skip first increment
+        // step_texture_rve_update(); // ag now stores inital crystallographic orientation
+        update_defgradp();
+        update_grid_velgrad();
+        update_defgrad();
+        update_defgrade();
+        update_el_stiff();
+      }
+
+      if (iuphard == 1 && (ithermo != 1 || imicro > 2)) {
+        harden(imicro);
+      }
+
+      // calc_c0();
+      deinit_c0_s0();
+
+#ifndef ABSOLUTE_NO_OUTPUT
+      write_macro_state();
+      if (iwfields == 1 and imicro % iwstep == 0) {
+        write_micro_state();
+      }
+      if (iwtex == 1 and imicro == nsteps) {
+        write_texture();
+      }
+#endif
+    //} // end for imicro
+
+  return;
+}
+
+void EVPFFT::solve() {
+  for (imicro = 1; imicro <= nsteps; imicro++) {
+    evolve();
+  }
+  return;
+}
+
+void EVPFFT::check_macrostress()
+{
+  bool nan_stress = false;
+
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      if (isnan(scauav(i,j))) {
+        nan_stress = true;
+      }
+    } // end for i
+  } // end for j
+
+  if (nan_stress == true) {
+    printf("NaN stress at elem %d, cycle %d, dt %24.14E \n", elem_id, fierro_cycle, tdot);
+    print_vel_grad();
+    exit(1);
+  }
+
+}
+
+void EVPFFT::print_vel_grad()
+{
+    printf("vel_grad at elem_gid %d, cycle %d, dt %24.14E is :\n", elem_id, fierro_cycle, tdot);
+
+    for (int j = 1; j <= 3; j++) {
+      for (int i = 1; i <= 3; i++) {
+        printf("  velgrad_%d%d : %24.14E", i, j, udot.host(i,j));
+      } // end for i
+      printf("\n");
+    } // end for j
+
+}
diff --git a/src/LS-EVPFFT/src/evpfft.h b/src/LS-EVPFFT/src/evpfft.h
new file mode 100644
index 000000000..d46c2c621
--- /dev/null
+++ b/src/LS-EVPFFT/src/evpfft.h
@@ -0,0 +1,332 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+
+#pragma once
+
+#include "mpi.h"
+#include "command_line_args.h"
+#include "chg_basis.h"
+#include "heffte_fft.h"
+#include "output_file_manager.h"
+#include "micro_output_writer.h"
+#include "manager_mpi_io.h"
+#include <memory>
+
+class EVPFFT
+{
+public:
+//-----------------------------------------------
+// EVPFFT Data Members
+//-----------------------------------------------
+  
+  const MPI_Comm mpi_comm;
+  const int root;
+  const int my_rank;
+  const int num_ranks;
+  CommandLineArgs cmd;
+  ChgBasis cb;
+  std::shared_ptr<FFT3D_R2C<heffte_backend,real_t>> fft;
+
+  int npts1_g;   // global array size in x dim.
+  int npts2_g;   // global array size in y dim.
+  int npts3_g;   // global array size in z dim.
+  int npts1;     // local per rank array size in x dim.
+  int npts2;     // local per rank array size in y dim.
+  int npts3;     // local per rank array size in z dim.
+  int local_start1; // start index of local array in x dim.
+  int local_start2; // start index of local array in y dim.
+  int local_start3; // start index of local array in z dim.
+  int local_end1;   // end index of local array in x dim.
+  int local_end2;   // end index of local array in y dim.
+  int local_end3;   // end index of local array in z dim.
+  real_t wgt;
+
+  int npts1_g_cmplx;   // complex global array size in x dim.
+  int npts2_g_cmplx;   // complex global array size in y dim.
+  int npts3_g_cmplx;   // complex global array size in z dim.
+  int npts1_cmplx;     // complex local per rank array size in x dim.
+  int npts2_cmplx;     // complex local per rank array size in y dim.
+  int npts3_cmplx;     // complex local per rank array size in z dim.
+  int local_start1_cmplx; // complex start index of local array in x dim.
+  int local_start2_cmplx; // complex start index of local array in y dim.
+  int local_start3_cmplx; // complex start index of local array in z dim.
+  int local_end1_cmplx;   // complex end index of local array in x dim.
+  int local_end2_cmplx;   // complex end index of local array in y dim.
+  int local_end3_cmplx;   // complex end index of local array in z dim.
+
+  int NPHMX;    //Maximum number of phases
+  int NMODMX;   //Maximum number of active SL+TW modes in any phase
+  int NTWMMX;   //Maximum number of active twin modes in any phase
+  int NSYSMX;   //Maximum number of active SL+TW systems in any phase
+
+  MatrixTypeRealDual udot;
+  MatrixTypeRealHost dsim;
+  MatrixTypeRealHost scauchy;
+  MatrixTypeRealHost sdeviat;
+  MatrixTypeRealDual tomtot;
+  //MatrixTypeRealHost dbar5;
+  //MatrixTypeRealHost dbar6;
+  MatrixTypeRealHost delt;
+  real_t deltvol3;
+  real_t tdot;
+
+  MatrixTypeRealDual disgradmacro;
+  MatrixTypeRealHost dvelgradmacro;
+  MatrixTypeRealHost scauav;
+  MatrixTypeRealHost dvelgradmacroacum;
+  MatrixTypeRealHost velmax;
+  MatrixTypeIntHost iudot;
+  MatrixTypeIntHost idsim;
+  MatrixTypeIntHost iscau;
+  MatrixTypeRealHost defgradinvavgc_inv;
+  int ictrl;
+  int nsteps;
+
+  MatrixTypeRealDual dnca;
+  MatrixTypeRealDual dbca;
+  MatrixTypeRealDual schca;
+  MatrixTypeRealDual tau;
+  MatrixTypeRealDual hard;
+  MatrixTypeRealDual thet;
+  MatrixTypeIntDual  nrs;
+  MatrixTypeRealDual gamd0;
+#ifdef NON_SCHMID_EFFECTS
+  MatrixTypeRealHost dtca;
+  MatrixTypeRealHost cns;
+  MatrixTypeRealDual schcnon;
+#endif
+
+  MatrixTypeRealHost twsh;
+  MatrixTypeIntHost nsm;
+  MatrixTypeIntHost nmodes;
+  MatrixTypeIntDual nsyst;
+  MatrixTypeIntHost ntwmod;
+  MatrixTypeIntHost ntwsys;
+  MatrixTypeIntHost isectw;
+  MatrixTypeStringHost icryst;
+
+  int nph;
+  int nelem;
+  int iphbot;
+  int iphtop;
+  
+  const real_t pi;
+
+  real_t error;
+  real_t fact2;
+  int itmax;
+  int irecover;
+  int isave;
+  int iwtex;
+  int iwdeq;
+
+  real_t svm;
+  real_t dvm;
+  real_t evm;
+  real_t errs;
+  real_t erre;
+  real_t erre2;
+  real_t avg_nr_iter;
+  int iupdate;
+  int iuphard;
+  MatrixTypeIntDual igas;
+  int iwfields;
+  int iwstep;
+
+  MatrixTypeRealHost cc;
+  MatrixTypeRealDual c0;
+  MatrixTypeRealDual s0;
+  MatrixTypeRealDual c066;
+
+  real_t wph1;
+  real_t svm1;
+  MatrixTypeRealHost scauav1;
+
+  MatrixTypeRealHost eth;
+  int ithermo;
+
+  MatrixTypeRealDual xk_gb;
+  MatrixTypeRealDual yk_gb;
+  MatrixTypeRealDual zk_gb;
+
+  MatrixTypeRealDual sg;
+  MatrixTypeRealDual disgrad;
+  MatrixTypeRealDual velgrad;
+  MatrixTypeRealDual edotp;
+  MatrixTypeRealDual cg66;
+  MatrixTypeRealDual ept;
+  MatrixTypeRealDual ag;
+  MatrixTypeRealDual crss;
+  MatrixTypeRealDual sch;
+  MatrixTypeRealDual sgt;
+  MatrixTypeRealDual defgradp;
+  MatrixTypeRealDual defgrade;
+  MatrixTypeRealDual defgrad;
+  MatrixTypeRealDual defgradinv;
+  MatrixTypeRealDual defgradini;
+  MatrixTypeRealDual wgtc;
+  MatrixTypeRealDual detF;
+  MatrixTypeRealDual sgPK1;
+  MatrixTypeRealDual c066mod;
+  MatrixTypeRealDual velgradref;
+  MatrixTypeRealDual xnode;
+#ifdef NON_SCHMID_EFFECTS
+  MatrixTypeRealDual schnon;
+#endif
+
+  MatrixTypeRealDual gamdot;
+  MatrixTypeRealDual gacumgr;
+  //MatrixTypeRealDual trialtau;
+  MatrixTypeRealDual xkin;
+
+  MatrixTypeRealHost ph_array;
+  MatrixTypeRealHost th_array;
+  MatrixTypeRealHost om_array;
+  MatrixTypeIntDual  jphase;
+  MatrixTypeIntHost  jgrain;
+
+  MatrixTypeRealDual work;
+  MatrixTypeRealDual workim;
+  MatrixTypeRealDual data;
+  MatrixTypeRealDual data_cmplx;
+
+  int elem_id;
+  int imicro;
+  int iter;
+  real_t evmp;
+  real_t dvmp;
+  MatrixTypeRealHost epav;
+  MatrixTypeRealHost edotpav;
+  MatrixTypeRealHost velgradmacroactual;
+  MatrixTypeRealHost disgradmacrot;
+  MatrixTypeRealDual velgradmacro;
+
+  bool active;
+  size_t fierro_cycle;
+  const real_t stress_scale;
+  const real_t time_scale;
+  MatrixTypeRealHost M66;
+  MatrixTypeRealHost edotp_avg;
+  MatrixTypeRealHost dedotp66_avg;
+  MatrixTypeRealHost cg66_avg;
+  MatrixTypeRealHost sg66_avg;
+  MatrixTypeRealHost udotAcc; 
+  double dtAcc;
+
+  // For file management
+  std::string hdf5_filename;
+  OutputFileManager ofile_mgr;
+  std::shared_ptr<Manager_MPI_IO<real_t,MPI_ORDER_FORTRAN>> mpi_io_real_t;
+  std::shared_ptr<Manager_MPI_IO<int,MPI_ORDER_FORTRAN>> mpi_io_int;
+  std::shared_ptr<MicroOutputWriter<MPI_ORDER_FORTRAN>> micro_writer;
+
+  // Device memory space
+  MatrixTypeRealDevice  rss;
+  MatrixTypeRealDevice rss1;
+  MatrixTypeRealDevice rss2;
+
+//-----------------------------------------------
+// End Of EVPFFT Data Members
+//-----------------------------------------------
+
+
+//-----------------------------------------------
+// EVPFFT Functions
+//-----------------------------------------------
+  EVPFFT(const MPI_Comm mpi_comm_, const CommandLineArgs cmd_, const real_t stress_scale_=1.0, const real_t time_scale_=1.0);
+  ~EVPFFT();
+  void vpsc_input();
+  void check_iudot();
+  void decompose_vel_grad(real_t* vel_grad);
+  void init_dvm();
+  void check_mixed_bc();
+  void init_after_reading_input_data();
+  void solve();
+  void solve(real_t* vel_grad, real_t* stress, real_t dt, size_t cycle, size_t elem_gid, real_t udotAccThIn);
+  void evolve();
+  void check_macrostress();
+  void print_vel_grad();
+  void allocate_memory();
+
+  void set_some_voxels_arrays_to_zero();
+  void init_ept();
+  void init_disgrad();
+  void init_sg();
+  void init_xk_gb();
+  void init_disgradmacro_velgradmacro();
+  void init_evm();
+  void init_c0_s0();
+  void deinit_c0_s0();
+  void init_defgrad();
+  void data_crystal(int iph, const std::string & filecryspl);
+  void data_crystal_elast(int iph, const std::string & filecrysel);
+  void data_grain(const std::string & filetext);
+  void step_update_velgrad();
+  void update_schmid();
+  void step_set_dvelgradmacro_and_dvelgradmacroacum_to_zero();
+  void forward_fft();
+  void backward_fft();
+  void inverse_the_greens();
+  void initialize_velgrad();
+  void evpal(int imicro);
+  void get_smacro();
+  void kinhard_param();
+  void step_update();
+  void step_vm_calc();
+  void step_texture_rve_update();
+  void update_orient();
+  void update_defgradp();
+  void update_defgrad();
+  void update_defgrade();
+  void update_el_stiff();
+  void calc_c0();
+  void Cauchy_to_PK1();
+  void calc_c066mod();
+  void update_grid_velgrad();
+  void harden(int imicro);
+  void read_classic_los_alamos_texture_file(const std::string & filetext);
+  void read_hdf5_texture_file(const std::string & filetext);
+  void read_vtk_lattice_structure(const std::string & filetext);
+  void calculate_eel(MatrixTypeRealDual &eel);
+  void write_macro_state();
+  void write_micro_state();
+  void write_texture();
+
+  void init_crss_voce();
+  void update_crss_voce();
+};
diff --git a/src/LS-EVPFFT/src/forward_backward_fft.cpp b/src/LS-EVPFFT/src/forward_backward_fft.cpp
new file mode 100644
index 000000000..c16b7dfb8
--- /dev/null
+++ b/src/LS-EVPFFT/src/forward_backward_fft.cpp
@@ -0,0 +1,84 @@
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+
+void EVPFFT::forward_fft()
+{
+  Profiler profiler(__FUNCTION__);
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+
+      // prep for forward FFT
+      FOR_ALL_CLASS(k, 1, npts3+1,
+                    j, 1, npts2+1,
+                    i, 1, npts1+1, {
+        data(i,j,k) = sgPK1(ii,jj,i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+#if defined USE_FFTW || USE_MKL
+      data.update_host();
+
+      // perform forward FFT
+      fft->forward(data.host_pointer(), (std::complex<double>*) data_cmplx.host_pointer());
+      data_cmplx.update_device();
+#else
+      // perform forward FFT
+      fft->forward(data.device_pointer(), (std::complex<double>*) data_cmplx.device_pointer());
+#endif
+
+      // write result to ouput
+      FOR_ALL_CLASS(k, 1, npts3_cmplx+1,
+                    j, 1, npts2_cmplx+1,
+                    i, 1, npts1_cmplx+1, {
+        work(ii,jj,i,j,k)   = data_cmplx(1,i,j,k);
+        workim(ii,jj,i,j,k) = data_cmplx(2,i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+    } // end for jj
+  } // end for ii
+
+}
+
+
+void EVPFFT::backward_fft()
+{
+  Profiler profiler(__FUNCTION__);
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+
+      // prep for backward FFT
+      FOR_ALL_CLASS(k, 1, npts3_cmplx+1,
+                    j, 1, npts2_cmplx+1,
+                    i, 1, npts1_cmplx+1, {
+        data_cmplx(1,i,j,k) = work(ii,jj,i,j,k);
+        data_cmplx(2,i,j,k) = workim(ii,jj,i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+#if defined USE_FFTW || USE_MKL
+      data_cmplx.update_host();
+
+      // perform backward FFT
+      fft->backward((std::complex<double>*) data_cmplx.host_pointer(), data.host_pointer());
+      data.update_device();
+#else
+      // perform backward FFT
+      fft->backward((std::complex<double>*) data_cmplx.device_pointer(), data.device_pointer());
+#endif
+
+      // write result to ouput
+      FOR_ALL_CLASS(k, 1, npts3+1,
+                    j, 1, npts2+1,
+                    i, 1, npts1+1, {
+        work(ii,jj,i,j,k) = data(i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+    } // end for jj
+  } // end for ii
+
+}
diff --git a/src/LS-EVPFFT/src/get_smacro.cpp b/src/LS-EVPFFT/src/get_smacro.cpp
new file mode 100644
index 000000000..1fb61d860
--- /dev/null
+++ b/src/LS-EVPFFT/src/get_smacro.cpp
@@ -0,0 +1,119 @@
+#include "evpfft.h"
+#include "utilities.h"
+#include "reduction_data_structures.h"
+#include "Profiler.h"
+
+void EVPFFT::get_smacro()
+{
+  Profiler profiler(__FUNCTION__);
+  
+  // Note: reduction is performed on scauav(3,3) and scauav1(3,3)
+  //       all_reduce[0] = scauav
+  //       all_reduce[1] = scauav1
+  ArrayOfArrayType <2, real_t, 3*3> all_reduce;
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i,
+                  ArrayOfArrayType <2, real_t, 3*3> & loc_reduce) {
+
+      ViewMatrixTypeReal scauav_loc  (&loc_reduce.array[0].array[0], 3, 3);
+      ViewMatrixTypeReal scauav1_loc (&loc_reduce.array[1].array[0], 3, 3);
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+
+          scauav_loc(ii,jj) += sg(ii,jj,i,j,k) * wgtc(i,j,k);
+
+          if (jphase(i,j,k) == 1) {
+            scauav1_loc(ii,jj) += sg(ii,jj,i,j,k) * wgtc(i,j,k)/wph1;
+          }
+
+        } // end for ii
+      } // end for jj
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition on scauav and scauav1
+
+  ViewMatrixTypeReal scauav_loc  (&all_reduce.array[0].array[0], 3, 3);
+  ViewMatrixTypeReal scauav1_loc (&all_reduce.array[1].array[0], 3, 3);
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      scauav(ii,jj) = scauav_loc(ii,jj);
+      scauav1(ii,jj) = scauav1_loc(ii,jj);
+    }
+  }
+  MPI_Allreduce(MPI_IN_PLACE, scauav.pointer(), scauav.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
+  MPI_Allreduce(MPI_IN_PLACE, scauav1.pointer(), scauav1.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
+
+//
+//    MIXED BC
+//
+
+  int ii;
+  int jj;
+  int kk;
+  int ll;
+  int ijv_[6*2] = {1,2,3,2,1,1,1,2,3,3,3,2};
+  ViewMatrixTypeInt  ijv(ijv_,6,2);
+  MatrixTypeRealHost sav6(6);
+  MatrixTypeRealHost sav5(5);
+
+  for (int i = 1; i <= 6; i++) {
+
+    ii = ijv(i,1);
+    jj = ijv(i,2);
+
+    dvelgradmacro(ii,jj) = 0.0;
+
+    if (idsim(i) == 0) {
+      for (int k = 1; k <= 6; k++) {
+        kk = ijv(k,1);
+        ll = ijv(k,2);
+
+        dvelgradmacro(ii,jj) += s0.host(ii,jj,kk,ll) * iscau(k) *
+                                     (scauchy(kk,ll) - scauav(kk,ll));
+      } // end for k
+    } // end if (idsim(i) == 0)
+
+  } // end for i
+
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      dvelgradmacroacum(ii,jj) += dvelgradmacro(ii,jj);
+    }
+  }
+
+  cb.chg_basis_2(sav6.pointer(), scauav.pointer(), 2, 6, cb.B_basis_host_pointer());
+
+  for (int ii = 1; ii <= 5; ii++) {
+    sav5(ii) = sav6(ii);
+  }
+
+  cb.chg_basis_1(sav5.pointer(), sdeviat.pointer(), 1, 5, cb.B_basis_host_pointer());
+
+  svm = 0.0;
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      svm += sdeviat(i,j) * sdeviat(i,j);
+    }
+  }
+  svm = sqrt(3.0/2.0*svm);
+
+  cb.chg_basis_2(sav6.pointer(), scauav1.pointer(), 2, 6, cb.B_basis_host_pointer());
+
+  for (int ii = 1; ii <= 5; ii++) {
+    sav5(ii) = sav6(ii);
+  }
+
+  cb.chg_basis_1(sav5.pointer(), sdeviat.pointer(), 1, 5, cb.B_basis_host_pointer());
+
+  svm1 = 0.0;
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      svm1 += sdeviat(i,j) * sdeviat(i,j);
+    }
+  }
+  svm1 = sqrt(3.0/2.0*svm1);
+
+}
diff --git a/src/LS-EVPFFT/src/harden.cpp b/src/LS-EVPFFT/src/harden.cpp
new file mode 100644
index 000000000..ebb0ed6e1
--- /dev/null
+++ b/src/LS-EVPFFT/src/harden.cpp
@@ -0,0 +1,93 @@
+// *****************************************************************************
+//     Harden_Routine    Harden_Routine  Harden_Routine  Harden_Routine    
+//     SUBROUTINE UPDATE_CRSS_VOCE     --->      VERSION OF 23/APR/00
+//
+//     A VOCE LAW FUNCTION OF THE ACCUMULATED SHEAR IN EACH GRAIN IS ADDED
+//     AS A MULTIPLICATIVE FACTOR THAT MODULATES THE ORIGINAL LINEAR HARDENING.
+//     THE UNITS (AND THE STRENGTH) OF THE HARDENING ARE CARRIED BY THE
+//     MULTIPLICATIVE FACTOR 'VOCE' (THE SAME FOR EVERY MODE).
+//     THE SELF & LATENT COUPLING COEFFICIENTS 'HARD' ARE DIMENSIONLESS
+//     CONSTANTS RELATIVE TO THE FACTOR 'VOCE'.
+//*******************************************************************************
+
+
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+
+void EVPFFT::harden(int imicro)
+{
+  Profiler profiler(__FUNCTION__);
+  update_crss_voce();
+}
+
+void EVPFFT::update_crss_voce()
+{
+  Kokkos::parallel_for(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int kk, const int jj, const int ii) {
+    
+    int iph;
+    real_t gamtotx;
+    real_t deltgam;
+    real_t dtau;
+    real_t tau0;
+    real_t tau1;
+    real_t thet0;
+    real_t thet1;
+    real_t tiny;
+    real_t voce;
+    real_t fact;
+    real_t expini;
+    real_t expdel;   
+
+    iph = jphase(ii,jj,kk);
+
+    if (igas(iph) == 0) {
+
+      gamtotx = gacumgr(ii,jj,kk);
+      deltgam = 0.0;
+      for (int is = 1; is <= nsyst(iph); is++) {
+        deltgam += ABS(gamdot(is,ii,jj,kk)) * tdot;
+      }
+
+      for (int is = 1; is <= nsyst(iph); is++) {
+        dtau = 0.0;
+        for (int js = 1; js <= nsyst(iph); js++) {
+          dtau += hard(is,js,iph) * ABS(gamdot(js,ii,jj,kk)) * tdot;
+        }
+        tau0  = tau(is,1,iph);
+        tau1  = tau(is,3,iph);
+        thet0 = thet(is,1,iph);
+        thet1 = thet(is,2,iph);        
+        tiny = 1.0e-4*tau0;
+
+        voce = 0.0;
+        if (ABS(thet0) > tiny) {
+          voce = thet1 * deltgam;
+          if (ABS(tau1) > tiny) {
+            fact = ABS(thet0/tau1);
+            expini = EXP(-gamtotx*fact);
+            expdel = EXP(-deltgam*fact);
+            voce += -(fact*tau1-thet1)/fact*expini*(expdel-1.0) - 
+                      thet1/fact*expini*(expdel*((gamtotx+deltgam)*fact+1.0)-(gamtotx*fact+1.0));
+          } // end if (ABS(tau1) > tiny)
+        } // end if (ABS(thet0) > tiny)
+
+        if (ABS(deltgam) > 0.0) {
+          crss(is,1,ii,jj,kk) += dtau*voce/deltgam;
+          crss(is,2,ii,jj,kk) += dtau*voce/deltgam;
+        }
+
+        //trialtau(is,1,ii,jj,kk) = crss(is,1,ii,jj,kk);
+        //trialtau(is,2,ii,jj,kk) = crss(is,2,ii,jj,kk);
+      } // end for is
+
+      gacumgr(ii,jj,kk) = gamtotx + deltgam;
+
+    } // end if (igas(iph) == 0)
+
+  });
+
+}
+
diff --git a/src/LS-EVPFFT/src/hdf5_io_functions.cpp b/src/LS-EVPFFT/src/hdf5_io_functions.cpp
new file mode 100644
index 000000000..dac843b39
--- /dev/null
+++ b/src/LS-EVPFFT/src/hdf5_io_functions.cpp
@@ -0,0 +1,149 @@
+#include "hdf5_io_functions.h"
+
+hid_t create_hdf5_filespace(int num_dims_, 
+    const int *global_array_dims_, const int *local_array_dims_,
+    const int *start_coordinates_, const int *stride_)
+{
+    hsize_t *global_array_dims = new hsize_t[num_dims_];
+    hsize_t *local_array_dims = new hsize_t[num_dims_];
+    hsize_t *start_coordinates = new hsize_t[num_dims_];
+    hsize_t *stride = new hsize_t[num_dims_];
+
+    for (int i = 0; i < num_dims_; i++)
+    {
+        global_array_dims[i] = global_array_dims_[i];
+        local_array_dims[i] = local_array_dims_[i];
+        start_coordinates[i] = start_coordinates_[i];
+        stride[i] = stride_[i];
+    }
+
+    hid_t filespace = H5Screate_simple(num_dims_, global_array_dims, NULL);
+    H5Sselect_hyperslab(filespace, H5S_SELECT_SET, 
+                        start_coordinates, stride, local_array_dims, NULL);
+
+    delete [] global_array_dims;
+    delete [] local_array_dims;
+    delete [] start_coordinates;
+    delete [] stride;
+
+    return filespace;
+}
+
+hid_t create_hdf5_memspace(int num_dims_, const int *local_array_dims_,
+    const int *stride_, int num_ghost_nodes_)
+{
+    /*
+        local_array_dims is should not include the ghost_nodes
+    */
+
+    hsize_t *global_array_dims = new hsize_t[num_dims_];
+    hsize_t *local_array_dims = new hsize_t[num_dims_];
+    hsize_t *start_coordinates = new hsize_t[num_dims_];
+    hsize_t *stride = new hsize_t[num_dims_];
+
+    for (int i = 0; i < num_dims_; i++)
+    {
+        global_array_dims[i] = local_array_dims_[i]+2*num_ghost_nodes_;
+        local_array_dims[i] = local_array_dims_[i];
+        start_coordinates[i] = num_ghost_nodes_;
+        stride[i] = stride_[i];
+    }
+
+    hid_t memspace = H5Screate_simple(num_dims_, global_array_dims, NULL);
+    H5Sselect_hyperslab(memspace, H5S_SELECT_SET,
+                        start_coordinates, stride, local_array_dims, NULL);
+
+    delete [] global_array_dims;
+    delete [] local_array_dims;
+    delete [] start_coordinates;
+    delete [] stride;
+
+    return memspace;
+}
+
+hid_t create_hdf5_file(const char *filename, MPI_Comm mpi_hdf5_comm)
+{
+    hid_t file_creation_plist = H5P_DEFAULT;   // File creation property list
+    // set the file access template for parallel IO access
+    hid_t file_access_plist   = H5P_DEFAULT;   // File access property list
+    file_access_plist = H5Pcreate(H5P_FILE_ACCESS);
+
+    // set collective mode for metadata writes
+    H5Pset_coll_metadata_write(file_access_plist, true);
+
+    MPI_Info mpi_info = MPI_INFO_NULL; // For MPI IO hints
+    MPI_Info_create(&mpi_info);
+    MPI_Info_set(mpi_info, "striping_factor", "8");
+    MPI_Info_set(mpi_info, "striping_unit", "4194304");
+
+    // tell the HDF5 library that we want to use MPI-IO to do the writing
+    H5Pset_fapl_mpio(file_access_plist, mpi_hdf5_comm, mpi_info);
+
+    // Open the file collectively
+    // H5F_ACC_TRUNC - overwrite existing file. H5F_ACC_EXCL - no overwrite
+    // 3rd argument is file creation property list. Using default here
+    // 4th argument is the file access property list identifier
+    hid_t file_identifier = H5Fcreate(filename, H5F_ACC_TRUNC, file_creation_plist,
+                    file_access_plist);
+
+    // release the file access template
+    H5Pclose(file_access_plist);
+    MPI_Info_free(&mpi_info);
+
+    return file_identifier;
+}
+
+hid_t create_hdf5_dataset(const char *dataset_name, hid_t hdf5_datatype, 
+  hid_t file_identifier, hid_t filespace)
+{
+    // create the dataset
+    hid_t link_creation_plist    = H5P_DEFAULT; // Link creation property list
+    hid_t dataset_creation_plist = H5P_DEFAULT; // Dataset creation property list
+    hid_t dataset_access_plist   = H5P_DEFAULT; // Dataset access property list
+    hid_t dataset = H5Dcreate2(
+      file_identifier,          // Arg 1: file identifier
+      dataset_name,             // Arg 2: dataset name
+      hdf5_datatype,            // Arg 3: datatype identifier
+      filespace,                // Arg 4: filespace identifier
+      link_creation_plist,      // Arg 5: link creation property list
+      dataset_creation_plist,   // Arg 6: dataset creation property list
+      dataset_access_plist);    // Arg 7: dataset access property list
+
+    return dataset;
+}
+
+hid_t open_hdf5_file(const char *filename, MPI_Comm mpi_hdf5_comm)
+{
+    // set the file access template for parallel IO access
+    hid_t file_access_plist = H5P_DEFAULT;   // File access property list
+    file_access_plist = H5Pcreate(H5P_FILE_ACCESS);
+
+    // set collective mode for metadata reads (ops)
+    H5Pset_all_coll_metadata_ops(file_access_plist, true);
+
+    // tell the HDF5 library that we want to use MPI-IO to do the reading
+    H5Pset_fapl_mpio(file_access_plist, mpi_hdf5_comm, MPI_INFO_NULL);
+
+    // Open the file collectively
+    // H5F_ACC_RDONLY - sets access to read or write on open of an existing file.
+    // 3rd argument is the file access property list identifier
+    hid_t file_identifier = H5Fopen(filename, H5F_ACC_RDONLY, file_access_plist);
+
+    // release the file access template
+    H5Pclose(file_access_plist);
+
+    return file_identifier;
+}
+
+hid_t open_hdf5_dataset(const char *dataset_name, hid_t file_identifier)
+{
+    
+    hid_t dataset_access_plist = H5P_DEFAULT; // Dataset access property list
+
+    hid_t dataset = H5Dopen2(
+      file_identifier,        // Arg 1: file identifier
+      dataset_name,           // Arg 2: dataset name to match for read
+      dataset_access_plist);  // Arg 3: dataset access property list
+
+    return dataset;
+}
diff --git a/src/LS-EVPFFT/src/hdf5_io_functions.h b/src/LS-EVPFFT/src/hdf5_io_functions.h
new file mode 100644
index 000000000..ba646cbd5
--- /dev/null
+++ b/src/LS-EVPFFT/src/hdf5_io_functions.h
@@ -0,0 +1,98 @@
+#pragma once
+
+#include "hdf5.h"
+#include "mpi.h"
+#include <iostream>
+
+hid_t create_hdf5_filespace(int num_dims_, 
+    const int *global_array_dims_, const int *local_array_dims_,
+    const int *start_coordinates_, const int *stride_);
+
+hid_t create_hdf5_memspace(int num_dims_, const int *local_array_dims_,
+const int *stride_, int num_ghost_nodes_);
+
+hid_t create_hdf5_file(const char *filename, MPI_Comm mpi_hdf5_comm);
+
+hid_t create_hdf5_dataset(const char *dataset_name, hid_t hdf5_datatype, 
+  hid_t file_identifier, hid_t filespace);
+
+hid_t open_hdf5_file(const char *filename, MPI_Comm mpi_hdf5_comm);
+
+hid_t open_hdf5_dataset(const char *dataset_name, hid_t file_identifier);
+
+template <typename R>
+hid_t deduce_hdf5_datatype(R *data_ptr);
+
+template <typename R>
+void write_hdf5_dataset(const char *dataset_name, R *data_ptr,
+  hid_t file_identifier, hid_t memspace, hid_t filespace, MPI_Comm mpi_hdf5_comm);
+
+template <typename R>
+void read_hdf5_dataset(const char *dataset_name, R *data_ptr,
+  hid_t file_identifier, hid_t memspace, hid_t filespace, MPI_Comm mpi_hdf5_comm);
+
+
+template <typename R>
+hid_t deduce_hdf5_datatype(R *data_ptr)
+{
+    hid_t hdf5_datatype;
+
+    if constexpr (std::is_same<R, int>::value) {
+        hdf5_datatype = H5T_NATIVE_INT;
+    }
+    else if constexpr (std::is_same<R, float>::value) {
+        hdf5_datatype = H5T_NATIVE_FLOAT;
+    }
+    else if constexpr (std::is_same<R, double>::value) {
+        hdf5_datatype = H5T_NATIVE_DOUBLE;
+    }
+    else if constexpr (std::is_same<R, long double>::value) {
+        hdf5_datatype = H5T_NATIVE_LDOUBLE;
+    }
+    else {
+        throw std::runtime_error("HDF5 datatype cannot be deduced.\n");
+    }
+
+    return hdf5_datatype;
+}
+
+template <typename R>
+void write_hdf5_dataset(const char *dataset_name, R *data_ptr,
+  hid_t file_identifier, hid_t memspace, hid_t filespace, MPI_Comm mpi_hdf5_comm)
+{
+    hid_t hdf5_datatype = deduce_hdf5_datatype <R> (data_ptr);
+
+    hid_t dataset = create_hdf5_dataset(dataset_name, hdf5_datatype, file_identifier, filespace);
+
+    // Create property list for collective dataset write.
+    hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
+    H5Pset_dxpl_mpio(xfer_plist, H5FD_MPIO_COLLECTIVE);
+
+    // write the data to disk using both the memory space and the data space.
+    H5Dwrite(dataset, hdf5_datatype, memspace, filespace, xfer_plist,
+             data_ptr);
+
+    H5Dclose(dataset);
+
+    H5Pclose(xfer_plist);
+}
+
+template <typename R>
+void read_hdf5_dataset(const char *dataset_name, R *data_ptr,
+  hid_t file_identifier, hid_t memspace, hid_t filespace, MPI_Comm mpi_hdf5_comm)
+{
+    hid_t hdf5_datatype = deduce_hdf5_datatype <R> (data_ptr);
+
+    hid_t dataset = open_hdf5_dataset(dataset_name, file_identifier);
+
+    // Create property list for collective dataset write.
+    hid_t xfer_plist = H5Pcreate(H5P_DATASET_XFER);
+    H5Pset_dxpl_mpio(xfer_plist, H5FD_MPIO_COLLECTIVE);
+
+    // read the data from disk using both the memory space and the data space.
+    H5Dread(dataset, hdf5_datatype, memspace, filespace, H5P_DEFAULT, data_ptr);
+
+    H5Dclose(dataset);
+
+    H5Pclose(xfer_plist);
+}
diff --git a/src/LS-EVPFFT/src/heffte_fft.h b/src/LS-EVPFFT/src/heffte_fft.h
new file mode 100644
index 000000000..d3d9fe876
--- /dev/null
+++ b/src/LS-EVPFFT/src/heffte_fft.h
@@ -0,0 +1,196 @@
+#pragma once
+
+#include <mpi.h>
+#include "heffte.h"
+#include <array>
+#include <memory>
+
+#ifdef USE_CUFFT
+    using heffte_backend = heffte::backend::cufft;
+#elif USE_ROCFFT
+    using heffte_backend = heffte::backend::rocfft;
+#elif USE_FFTW
+    using heffte_backend = heffte::backend::fftw;
+#elif USE_MKL
+    using heffte_backend = heffte::backend::mkl;
+#endif
+
+/**************************************************
+    FFTBase
+***************************************************
+*/
+
+template <typename HEFFTE_BACKEND, typename R>
+class FFTBase
+{
+public:
+    const MPI_Comm comm;
+    const int root;
+    const int my_rank;
+    const int num_ranks;
+    std::array<int,3> globalRealBoxSize;
+    std::array<int,3> globalComplexBoxSize;
+    heffte::box3d<> globalRealBox;
+    heffte::box3d<> globalComplexBox;
+    std::array<int, 3> procGrid;
+    std::vector<heffte::box3d<>> localRealBoxes;
+    std::vector<heffte::box3d<>> localComplexBoxes;
+    heffte::plan_options options;
+    std::vector<std::array<int,3>> localRealBoxSizes;
+    std::vector<std::array<int,3>> localComplexBoxSizes;
+
+    FFTBase(MPI_Comm comm_, const std::array<int,3> & globalRealBoxSize_, const std::array<int,3> & globalComplexBoxSize_);
+    virtual ~FFTBase();
+    virtual void forward(const R *input, std::complex<R> *output) = 0;
+    virtual void backward(const std::complex<R> *input, R *output) = 0;
+};
+
+template <typename HEFFTE_BACKEND, typename R>
+FFTBase<HEFFTE_BACKEND,R>::FFTBase(MPI_Comm comm_, const std::array<int,3> & globalRealBoxSize_, const std::array<int,3> & globalComplexBoxSize_)
+    : comm(comm_)
+    , root(0)
+    , my_rank(heffte::mpi::comm_rank(comm))
+    , num_ranks(heffte::mpi::comm_size(comm))
+    , globalRealBoxSize(globalRealBoxSize_)
+    , globalComplexBoxSize(globalComplexBoxSize_)
+    , globalRealBox({0, 0, 0}, {globalRealBoxSize[0]-1, globalRealBoxSize[1]-1, globalRealBoxSize[2]-1})
+    , globalComplexBox({0, 0, 0}, {globalComplexBoxSize[0]-1, globalComplexBoxSize[1]-1, globalComplexBoxSize[2]-1})
+    , procGrid(heffte::proc_setup_min_surface(globalRealBox, num_ranks))
+    , localRealBoxes(heffte::split_world(globalRealBox, procGrid))
+    , localComplexBoxes(heffte::split_world(globalComplexBox, procGrid))
+    , options(heffte::default_options<HEFFTE_BACKEND>())
+    , localRealBoxSizes(num_ranks)
+    , localComplexBoxSizes(num_ranks)
+{
+
+    // calculate sizes of real and complex domains(boxes) for each rank
+    for (int i = 0; i < num_ranks; i++)
+    {
+        for (int j = 0; j < 3; j++)
+        {
+            localRealBoxSizes[i][j]  = localRealBoxes[i].high[j] - localRealBoxes[i].low[j] + 1;
+            localComplexBoxSizes[i][j]  = localComplexBoxes[i].high[j] - localComplexBoxes[i].low[j] + 1;
+        }
+    }
+
+    // use strided 1-D FFT operations
+    // some backends work just as well when the entries of the data are not contiguous
+    // then there is no need to reorder the data in the intermediate stages which saves time
+    options.use_reorder = true; // default is true
+
+    // use point-to-point communications
+    // collaborative all-to-all and individual point-to-point communications are two alternatives
+    // one may be better than the other depending on
+    // the version of MPI, the hardware interconnect, and the problem size
+    options.algorithm = heffte::reshape_algorithm::alltoallv;
+
+    // in the intermediate steps, the data can be shapes as either 2-D slabs or 1-D pencils
+    // for sufficiently large problem, it is expected that the pencil decomposition is better
+    // but for smaller problems, the slabs may perform better (depending on hardware and backend)
+    options.use_pencils = true;
+
+#if defined(USE_CUFFT) || defined(USE_ROCFFT)
+    // on a multi-gpu system, distribute the devices across the mpi ranks
+    if (heffte::gpu::device_count() > 1) {
+        heffte::gpu::device_set(heffte::mpi::comm_rank(comm) % heffte::gpu::device_count());
+    }
+#endif
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+FFTBase<HEFFTE_BACKEND,R>::~FFTBase()
+{
+}
+
+/**************************************************
+    FFT3D_R2C
+***************************************************
+*/
+template <typename HEFFTE_BACKEND, typename R>
+class FFT3D_R2C : public FFTBase<HEFFTE_BACKEND,R>
+{
+public:
+    int r2c_direction;
+    heffte::fft3d_r2c<HEFFTE_BACKEND> fft; // heffte class for performing the fft
+    typename heffte::fft3d<HEFFTE_BACKEND>::template buffer_container<std::complex<R>> workspace;
+
+    FFT3D_R2C(MPI_Comm comm, const std::array<int,3> & globalRealBoxSize);
+    void forward(const R *input, std::complex<R> *output) override;
+    void backward(const std::complex<R> *input, R *output) override;
+};
+
+template <typename HEFFTE_BACKEND, typename R>
+FFT3D_R2C<HEFFTE_BACKEND,R>::FFT3D_R2C(MPI_Comm comm, const std::array<int,3> & globalRealBoxSize)
+    : FFTBase<HEFFTE_BACKEND,R>(comm, globalRealBoxSize, {globalRealBoxSize[0]/2+1, globalRealBoxSize[1], globalRealBoxSize[2]})
+    , r2c_direction(0)
+    , fft(this->localRealBoxes[this->my_rank], this->localComplexBoxes[this->my_rank], r2c_direction, this->comm, this->options)
+    , workspace(fft.size_workspace())
+{
+    // check if the complex indexes have correct dimension
+    if (this->globalRealBox.r2c(r2c_direction) != this->globalComplexBox){
+      throw std::runtime_error("Error with complex indexes dimension.\n");
+    }
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D_R2C<HEFFTE_BACKEND,R>::forward(const R *input, std::complex<R> *output)
+{
+    fft.forward(input, output, workspace.data());
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D_R2C<HEFFTE_BACKEND,R>::backward(const std::complex<R> *input, R *output)
+{
+    fft.backward(input, output, workspace.data(), heffte::scale::full);
+}
+
+
+/**************************************************
+    FFT3D
+***************************************************
+*/
+template <typename HEFFTE_BACKEND, typename R>
+class FFT3D : public FFTBase<HEFFTE_BACKEND,R>
+{
+public:
+    heffte::fft3d<HEFFTE_BACKEND> fft; // heffte class for performing the fft
+    typename heffte::fft3d<HEFFTE_BACKEND>::template buffer_container<std::complex<R>> workspace;
+
+    FFT3D(MPI_Comm comm, const std::array<int,3> & globalRealBoxSize);
+    void forward(const R *input, std::complex<R> *output) override;
+    void backward(const std::complex<R> *input, R *output) override;
+    void forward(const std::complex<R> *input, std::complex<R> *output);
+    void backward(const std::complex<R> *input, std::complex<R> *output);
+};
+
+template <typename HEFFTE_BACKEND, typename R>
+FFT3D<HEFFTE_BACKEND,R>::FFT3D(MPI_Comm comm, const std::array<int,3> & globalRealBoxSize)
+    : FFTBase<HEFFTE_BACKEND,R>(comm, globalRealBoxSize, globalRealBoxSize)
+    , fft(this->localRealBoxes[this->my_rank], this->localComplexBoxes[this->my_rank], this->comm, this->options)
+    , workspace(fft.size_workspace())
+{
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D<HEFFTE_BACKEND,R>::forward(const R *input, std::complex<R> *output)
+{
+    fft.forward(input, output, workspace.data());
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D<HEFFTE_BACKEND,R>::forward(const std::complex<R> *input, std::complex<R> *output)
+{
+    fft.forward(input, output, workspace.data());
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D<HEFFTE_BACKEND,R>::backward(const std::complex<R> *input, R *output)
+{
+    fft.backward(input, output, workspace.data(), heffte::scale::full);
+}
+
+template <typename HEFFTE_BACKEND, typename R>
+void FFT3D<HEFFTE_BACKEND,R>::backward(const std::complex<R> *input, std::complex<R> *output)
+{
+    fft.backward(input, output, workspace.data(), heffte::scale::full);
+}
diff --git a/src/LS-EVPFFT/src/initialize_velgrad.cpp b/src/LS-EVPFFT/src/initialize_velgrad.cpp
new file mode 100644
index 000000000..13172bcae
--- /dev/null
+++ b/src/LS-EVPFFT/src/initialize_velgrad.cpp
@@ -0,0 +1,87 @@
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "reduction_data_structures.h"
+
+void EVPFFT::initialize_velgrad()
+{
+  Profiler profiler(__FUNCTION__);
+
+  const size_t n = 9; // for dvelgradavg (3,3)
+  ArrayType <real_t, n> all_reduce;
+
+  real_t dvelgradavg_[3*3];
+
+  // create views of thread private arrays
+  ViewMatrixTypeReal dvelgradavg(dvelgradavg_,3,3);
+ 
+  // average of correction in current
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    // averages packed in array
+    int ic;
+    real_t dum;
+
+    ic = -1;
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        dum = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          dum += work(ii,kk,i,j,k)*defgradinv(kk,jj,i,j,k);
+        }
+        ic = ic + 1;
+        loc_reduce.array[ic] += dum * wgtc(i,j,k);
+      }
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  // averages unpacked from array
+  int ic;
+  ic = -1;
+  
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      dvelgradavg(ii,jj) = all_reduce.array[ic];
+    }
+  }
+
+  MatrixTypeRealDual dudot_dX_avg(3,3);
+  
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      dudot_dX_avg.host(ii,jj) = 0.0;
+      for (int kk = 1; kk <= 3; kk++) {
+        dudot_dX_avg.host(ii,jj) += (dvelgradmacro(ii,kk) - dvelgradavg(ii,kk))*defgradinvavgc_inv(kk,jj);
+      }
+    }
+  }
+
+  // update device
+  dudot_dX_avg.update_device();
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    real_t dum;
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        dum = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          dum += (work(ii,kk,i,j,k) + dudot_dX_avg(ii,kk))*defgradinv(kk,jj,i,j,k);
+        }
+        velgrad(ii,jj,i,j,k) += dum;
+      }
+    }
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/inverse_the_greens.cpp b/src/LS-EVPFFT/src/inverse_the_greens.cpp
new file mode 100644
index 000000000..895d08a45
--- /dev/null
+++ b/src/LS-EVPFFT/src/inverse_the_greens.cpp
@@ -0,0 +1,118 @@
+#include "evpfft.h"
+#include "math_functions.h"
+#include "utilities.h"
+#include "Profiler.h"
+
+void EVPFFT::inverse_the_greens()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(kzz, 1, npts3_cmplx+1,
+                kyy, 1, npts2_cmplx+1,
+                kxx, 1, npts1_cmplx+1, {
+
+    real_t xknorm;
+
+    // thread private arrays
+    real_t xk_[3];
+    real_t ddisgrad_[3*3];
+    real_t ddisgradim_[3*3];
+    real_t a_[3*3];
+    real_t g1_[3*3*3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal xk(xk_,3);
+    ViewMatrixTypeReal ddisgrad(ddisgrad_,3,3);
+    ViewMatrixTypeReal ddisgradim(ddisgradim_,3,3);
+    ViewMatrixTypeReal a(a_,3,3);
+    ViewMatrixTypeReal g1(g1_,3,3,3,3);
+
+    xk(1) = xk_gb(kxx);
+    xk(2) = yk_gb(kyy);
+    xk(3) = zk_gb(kzz);
+
+    xknorm = sqrt( xk(1)*xk(1) + 
+                   xk(2)*xk(2) + 
+                   xk(3)*xk(3) );
+
+    if (xknorm != 0.0) {
+      for (int i = 1; i <= 3; i++) {
+        xk(i) = xk(i) / xknorm;
+      } // end for i
+    } // end if (xknorm != 0.0)
+
+    if ( kxx + local_start1_cmplx == npts1_g/2+1 || 
+         kyy + local_start2_cmplx == npts2_g/2+1 || 
+         (npts3_g > 1 && kzz + local_start3_cmplx == npts3_g/2+1) ) {
+      for (int l = 1; l <= 3; l++) {
+        for (int k = 1; k <= 3; k++) {
+          for (int j = 1; j <= 3; j++) {
+            for (int i = 1; i <= 3; i++) {
+              g1(i,j,k,l) = -s0(i,j,k,l);
+            }
+          }
+        }
+      }
+    } else {
+      // TODO: optimize indexing of this loop
+      for (int i = 1; i <= 3; i++) {
+        for (int k = 1; k <= 3; k++) {
+          a(i,k) = 0.0;
+          for (int j = 1; j <= 3; j++) {
+            for (int l = 1; l <= 3; l++) {
+              a(i,k) += c0(i,j,k,l)*xk(j)*xk(l);
+            }
+          }
+        }
+      }
+
+      invert_matrix <3> (a.pointer());
+
+      // TODO: optimize indexing of this loop 
+      for (int p = 1; p <= 3; p++) {
+        for (int qq = 1; qq <= 3; qq++) {
+          for (int i = 1; i <= 3; i++) {
+            for (int j = 1; j <= 3; j++) {
+              g1(p,qq,i,j) = -a(p,i)*xk(qq)*xk(j);
+            }
+          }
+        }
+      }
+
+    } // end if ( kxx == npts1/2 || kyy == npts2/2 || (npts3 > 1 && kzz == npts3/2) )
+
+    if ( kxx + local_start1_cmplx == 1 &&
+         kyy + local_start2_cmplx == 1 && 
+         kzz + local_start3_cmplx == 1 ) {
+      for (int j = 1; j <= 3; j++) {
+        for (int i = 1; i <= 3; i++) {
+          ddisgrad(i,j)   = 0.0;
+          ddisgradim(i,j) = 0.0;
+        }
+      }
+    } else {
+      // TODO: optimize indexing of this loop
+      for (int i = 1; i <= 3; i++) {
+        for (int j = 1; j <= 3; j++) {
+          ddisgrad(i,j)   = 0.0;
+          ddisgradim(i,j) = 0.0;
+          for (int k = 1; k <= 3; k++) {
+            for (int l = 1; l <= 3; l++) {
+              ddisgrad(i,j)   += g1(i,j,k,l) * work(k,l,kxx,kyy,kzz);
+              ddisgradim(i,j) += g1(i,j,k,l) * workim(k,l,kxx,kyy,kzz);
+            }
+          }
+        }
+      }
+    } // end if ( kxx == 1 && kyy == 1 && kzz == 1 )
+
+    for (int j = 1; j <= 3; j++) {
+      for (int i = 1; i <= 3; i++) { 
+        work(i,j,kxx,kyy,kzz)   = ddisgrad(i,j);
+        workim(i,j,kxx,kyy,kzz) = ddisgradim(i,j);
+      }
+    }
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/kinhard_param.cpp b/src/LS-EVPFFT/src/kinhard_param.cpp
new file mode 100644
index 000000000..fe38a9594
--- /dev/null
+++ b/src/LS-EVPFFT/src/kinhard_param.cpp
@@ -0,0 +1,59 @@
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+
+void EVPFFT::kinhard_param()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    int jph;
+
+    // thread private arrays
+    real_t sc5_[5];
+    real_t sg5_[5];
+    real_t sgx_[3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal sc5(sc5_,5);
+    ViewMatrixTypeReal sg5(sg5_,5);
+    ViewMatrixTypeReal sgx(sgx_,3,3);
+
+    jph = jphase(i,j,k);
+
+    if (igas(jph) == 0) {
+
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          sgx(ii,jj) = sg(ii,jj,i,j,k);
+        }
+      }
+
+      cb.chg_basis_2(sg5.pointer(), sgx.pointer(), 2, 5, cb.B_basis_device_pointer());
+
+      for (int is = 1; is <= nsyst(jph); is++) {
+        for (int jj = 1; jj <= 5; jj++) {
+           sc5(jj) = sch(jj,is,i,j,k);
+        }
+
+#ifdef NON_SCHMID_EFFECTS       
+        for (int jj = 1; jj <= 5; jj++) {
+          sc5(jj) = schnon(jj,is,i,j,k);
+        }
+#endif
+
+        xkin(is,i,j,k) = sc5(1)*sg5(1) +
+                            sc5(2)*sg5(2) +
+                            sc5(3)*sg5(3) + 
+                            sc5(4)*sg5(4) + 
+                            sc5(5)*sg5(5);
+      } // end for is
+
+    } //end if (igas(jph) == 0)
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/main.cpp b/src/LS-EVPFFT/src/main.cpp
new file mode 100644
index 000000000..719aff449
--- /dev/null
+++ b/src/LS-EVPFFT/src/main.cpp
@@ -0,0 +1,235 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+
+#include "evpfft.h"
+#include "utilities.h"
+#include "vm.h"
+#include "math_functions.h"
+#include "Profiler.h"
+#ifndef NDEBUG
+  #include <cfenv>
+#endif
+
+
+#ifndef BUILD_LS_EVPFFT_FIERRO
+int main(int argc, char *argv[])
+{
+#ifndef NDEBUG
+    // These are used for Floating point exception debuging.
+    // However, comment out hdf5 output (`micro_writer`) or compile 
+    // EVPFFT with ABSOLUTE_NO_OUTPUT before using these
+    // because `H5Screate_simple` function has a Floating point exception bug
+    // when run with these on. Although, it does not affect anything in the EVPFFT code.
+
+    //feenableexcept (FE_DIVBYZERO); 
+    //feenableexcept (FE_INVALID);
+    //feenableexcept (FE_OVERFLOW);
+#endif 
+
+  MPI_Init(&argc, &argv);
+  Kokkos::initialize(argc, argv);
+  { // kokkos scope
+
+    int my_rank = get_mpi_comm_rank(MPI_COMM_WORLD);
+    int num_ranks = get_mpi_comm_size(MPI_COMM_WORLD);
+
+    { Profiler profiler("Total");
+
+    // Command line arguments
+    CommandLineArgs cmd;
+    cmd.parse_command_line(argc, argv);
+
+    // EVPFFT
+    EVPFFT evpfft(MPI_COMM_WORLD, cmd);
+    evpfft.solve();
+
+    } // end profiler("Total") scope
+
+    Profiler::print(MPI_COMM_WORLD);
+
+  } // end of kokkos scope
+  Kokkos::finalize();
+  MPI_Finalize();
+
+  return 0;
+}
+#endif
+
+
+
+
+#if BUILD_LS_EVPFFT_FIERRO
+void EVPFFT::solve(real_t* vel_grad, real_t* stress, real_t dt, size_t cycle, size_t elem_gid, real_t udotAccThIn)
+{
+#ifndef NDEBUG
+    feenableexcept (FE_DIVBYZERO); 
+    feenableexcept (FE_INVALID);
+    feenableexcept (FE_OVERFLOW);
+#endif 
+
+  /* All tensors must come in in a F-layout */
+
+  ViewFMatrix vel_grad_view (vel_grad,3,3);
+  ViewFMatrix stress_view (stress,3,3);
+  double udotAccTh = udotAccThIn;
+
+  // calculate L2 norm of vel_grad
+  real_t L2norm = 0.0;
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      L2norm += vel_grad_view(i,j) * vel_grad_view(i,j);
+    }
+  }
+  L2norm = sqrt(L2norm);
+
+  // return if L2norm is too small
+  if (L2norm < 1.0e-16) {
+    return;
+  }
+
+  // Accumulate udot and dt
+  dtAcc += dt;
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      udotAcc(i,j) += vel_grad_view(i,j) * dt;
+    }
+  }
+  // Symmetrize udotAcc
+  MatrixTypeRealHost udotAccSymm(3,3);
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      udotAccSymm(i,j) = 0.5*(udotAcc(i,j) + udotAcc(j,i));
+    }
+  }
+  // Calculate VonMises of udotAccSymm
+  double udotAccVm = vm(udotAccSymm.pointer());
+
+  // calculate strain increament
+  MatrixTypeRealHost dstran(3,3);
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      dstran(i,j) = 0.5*(vel_grad_view(i,j) + vel_grad_view(j,i)) * dt;
+    }
+  }
+
+  // Linear extrapolation
+  if (active == true and udotAccVm < udotAccTh) {
+
+    // calculate M66
+    for (int ii = 1; ii <= 6; ii++) {
+      for (int jj = 1; jj <= 6; jj++) {
+        M66(ii,jj) = sg66_avg(ii,jj) + dedotp66_avg(ii,jj) * dt;
+      }
+    }
+    invert_matrix <6> (M66.pointer());
+
+    MatrixTypeRealHost M3333(3,3,3,3);
+    cb.chg_basis_3(M66.pointer(), M3333.pointer(), 3, 6, cb.B_basis_host_pointer());
+    MatrixTypeRealHost dstress(3,3);
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        dstress(ii,jj) = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          for (int ll = 1; ll <= 3; ll++) {
+            dstress(ii,jj) += M3333(ii,jj,kk,ll) * (dstran(kk,ll) - edotp_avg(kk,ll)*dt);
+          }
+        }
+        stress_view(ii,jj) += dstress(ii,jj);
+      }
+    }
+    return;
+  }
+
+  // set dt
+  dt = dtAcc;
+
+  real_t dtAcc_inv = 1.0 / dtAcc;
+  // copy udotAcc into udot
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      udot.host(i,j) = udotAcc(i,j) * dtAcc_inv;
+    }
+  }
+  // update device
+  udot.update_device();
+
+  // Reset Acculated variables
+  dtAcc = 0.0;
+  for (int i = 0; i < 9; i++) udotAcc.pointer()[i] = 0.0;
+
+  // nsteps should be set to 1 either in the input file or here
+  nsteps = 1;
+
+  // calculate strain-rate and rotation-rate
+  decompose_vel_grad(udot.host_pointer());
+
+  // assign recieved variables to evpfft global variables
+  tdot = dt;
+  imicro += 1; // imicro starts at 1 while cycle starts at 0 in fierro.
+  elem_id = elem_gid;
+  fierro_cycle = cycle;
+
+  // do some initialization if first load step
+  if (active == false) {
+    init_dvm();
+    init_evm();
+    init_disgradmacro_velgradmacro();
+    init_disgrad();
+    init_sg();
+    // init_c0_s0();
+    init_defgrad();
+  }
+  active = true;
+
+  //--------------------------------
+  // EVPFFT evolve
+  evolve();
+
+  // check macrostress for NaN
+  check_macrostress();
+
+  // copy scauav into stress
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      stress_view(i,j) = scauav(i,j);
+    }
+  }
+
+  return;
+}
+#endif
diff --git a/src/LS-EVPFFT/src/manager_mpi_io.h b/src/LS-EVPFFT/src/manager_mpi_io.h
new file mode 100644
index 000000000..773a1be5b
--- /dev/null
+++ b/src/LS-EVPFFT/src/manager_mpi_io.h
@@ -0,0 +1,299 @@
+#pragma once
+
+#include <mpi.h>
+#include <array>
+#include <string>
+#include <cstring>
+#include <iostream>
+
+template <typename R, int ARRAY_LAYOUT>
+class Manager_MPI_IO
+{
+private:
+    MPI_Comm mpi_io_comm_;
+    int root;
+    int my_rank_;
+    int num_ranks_;
+    std::array<int,3> dimensions_full_array_;
+    std::array<int,3> dimensions_subarray_;
+    std::array<int,3> start_coordinates_;
+    std::array<int,3> local_dimensions_full_array_;  // with ghost nodes
+    std::array<int,3> local_dimensions_subarray_;    // without ghost nodes
+    std::array<int,3> local_start_coordinates_;      // without ghost nodes
+    const int num_dims_;
+    const int num_ghost_nodes_;
+    MPI_Datatype file_space_type_;
+    MPI_Datatype memory_space_type_;
+    MPI_Datatype mpi_native_type_;
+
+    // for writing ASCII
+    std::string format_;
+    int chars_per_num_;
+    MPI_Datatype chars_per_num_type_;
+    MPI_Datatype ascii_file_space_type_;
+    MPI_Datatype ascii_memory_space_type_;
+
+    MPI_File create_mpi_io_file_(const char *filename) const;
+    MPI_File open_mpi_io_file_(const char *filename) const;
+
+public:
+    Manager_MPI_IO(MPI_Comm mpi_io_comm,
+                   const std::array<int,3> & dimensions_full_array,
+                   const std::array<int,3> & dimensions_subarray,
+                   const std::array<int,3> & start_coordinates,
+                   int num_dims,
+                   int num_ghost_nodes);
+    Manager_MPI_IO(const Manager_MPI_IO &) = delete;
+    ~Manager_MPI_IO();
+    void write_binary(const char *filename, R *data, int file_offset) const;
+    void read_binary(const char *filename, R *data, int file_offset) const;
+    void write_ascii(const char *filename, R *data, const char* header_text) const;
+    void write_ascii(const char *filename, const char *data_as_chars, const char* header_text, int chars_per_line) const;
+    
+};
+
+template <typename R, int ARRAY_LAYOUT>
+Manager_MPI_IO<R,ARRAY_LAYOUT>::Manager_MPI_IO(MPI_Comm mpi_io_comm,
+                   const std::array<int,3> & dimensions_full_array,
+                   const std::array<int,3> & dimensions_subarray,
+                   const std::array<int,3> & start_coordinates,
+                   int num_dims,
+                   int num_ghost_nodes)
+    : mpi_io_comm_ (mpi_io_comm)
+    , root (0)
+    , dimensions_full_array_ (dimensions_full_array)
+    , dimensions_subarray_ (dimensions_subarray)
+    , start_coordinates_ (start_coordinates)
+    , num_dims_(num_dims)
+    , num_ghost_nodes_ (num_ghost_nodes)
+    , mpi_native_type_ (MPI_DATATYPE_NULL)
+    , format_ ("%013.6E\n")
+{
+    /*
+    Note that dimensions_full_array, etc. should be in the order the original data was
+    allocated. For example if C layout, {nz,ny,nx} and if FORTRAN layout, {nx,ny,nz}.
+    More example: if the array to be used with mpi_io was allocated as {nz,ny,nx}
+                  then dimensions_full_array, etc. should be in the form {z,y,x}.
+    */
+
+    MPI_Comm_rank(mpi_io_comm_, &my_rank_);
+    MPI_Comm_size(mpi_io_comm_, &num_ranks_);
+
+    local_dimensions_full_array_ = {dimensions_subarray_[0]+2*num_ghost_nodes_,
+                                    dimensions_subarray_[1]+2*num_ghost_nodes_,
+                                    dimensions_subarray_[2]+2*num_ghost_nodes_};
+    local_dimensions_subarray_ = dimensions_subarray_;
+    local_start_coordinates_ = {num_ghost_nodes_, num_ghost_nodes_, num_ghost_nodes_};
+
+    // set correct mpiType dependent on typename R
+    if constexpr (std::is_same<R, int>::value)
+    {
+        mpi_native_type_ = MPI_INT;
+    }
+    else if constexpr (std::is_same<R, float>::value)
+    {
+        mpi_native_type_ = MPI_FLOAT;
+    }
+    else if constexpr (std::is_same<R, double>::value)
+    {
+        mpi_native_type_ = MPI_DOUBLE;
+    }
+    else if constexpr (std::is_same<R, long double>::value)
+    {
+        mpi_native_type_ = MPI_LONG_DOUBLE;
+    }
+    else
+    {
+        throw std::runtime_error("mpi_native_type_ can not be deduced in Manager_MPI_IO. Please add the mpi type.\n");
+    }
+
+    // create file_space_type_
+    MPI_Type_create_subarray(num_dims_, dimensions_full_array_.data(), dimensions_subarray_.data(),
+                             start_coordinates_.data(), ARRAY_LAYOUT, mpi_native_type_,
+                             &file_space_type_);
+    MPI_Type_commit(&file_space_type_);
+
+    // create memory_space_type_
+    MPI_Type_create_subarray(num_dims_, local_dimensions_full_array_.data(), local_dimensions_subarray_.data(),
+                             local_start_coordinates_.data(), ARRAY_LAYOUT, mpi_native_type_, &memory_space_type_);
+    MPI_Type_commit(&memory_space_type_);
+
+
+    // for writing ASCII
+    // calculating chars_per_num based on format specified
+    char s[100];
+    sprintf(s, format_.c_str(), R(0));
+    chars_per_num_ = strlen(s);
+
+    // create chars_per_num_type_
+    MPI_Type_contiguous(chars_per_num_, MPI_CHAR, &chars_per_num_type_);
+    MPI_Type_commit(&chars_per_num_type_);
+
+    // create ascii_file_space_type_
+    MPI_Type_create_subarray(num_dims_, dimensions_full_array_.data(), dimensions_subarray_.data(),
+                             start_coordinates_.data(), ARRAY_LAYOUT, chars_per_num_type_,
+                             &ascii_file_space_type_);
+    MPI_Type_commit(&ascii_file_space_type_);
+
+    // create ascii_memory_space_type_
+    MPI_Type_create_subarray(num_dims_, local_dimensions_full_array_.data(), local_dimensions_subarray_.data(),
+                             local_start_coordinates_.data(), ARRAY_LAYOUT, chars_per_num_type_, &ascii_memory_space_type_);
+    MPI_Type_commit(&ascii_memory_space_type_);
+}
+
+template <typename R, int ARRAY_LAYOUT>
+Manager_MPI_IO<R,ARRAY_LAYOUT>::~Manager_MPI_IO()
+{
+    MPI_Type_free(&file_space_type_);
+    MPI_Type_free(&memory_space_type_);
+    MPI_Type_free(&chars_per_num_type_);
+    MPI_Type_free(&ascii_file_space_type_);
+    MPI_Type_free(&ascii_memory_space_type_);
+}
+
+template <typename R, int ARRAY_LAYOUT>
+MPI_File Manager_MPI_IO<R,ARRAY_LAYOUT>::create_mpi_io_file_(const char *filename) const
+{
+    int err;
+    int file_mode = MPI_MODE_UNIQUE_OPEN | MPI_MODE_WRONLY | MPI_MODE_CREATE;
+
+    MPI_Info mpi_info = MPI_INFO_NULL; // For MPI IO hints
+//#if 0
+    MPI_Info_create(&mpi_info);
+    MPI_Info_set(mpi_info, "collective_buffering", "true");
+    MPI_Info_set(mpi_info, "striping_factor", "8");
+    MPI_Info_set(mpi_info, "striping_unit", "4194304");
+//#endif
+
+    MPI_File file_handle = NULL;
+    err = MPI_File_open(mpi_io_comm_, filename, file_mode, mpi_info, &file_handle);
+    if (err)
+    {
+        std::string error_msg = std::string("MPI_File_open failed to open ") + std::string(filename);
+        throw std::runtime_error(error_msg);
+    }
+    return file_handle;
+}
+
+template <typename R, int ARRAY_LAYOUT>
+MPI_File Manager_MPI_IO<R,ARRAY_LAYOUT>::open_mpi_io_file_(const char *filename) const
+{
+    int err;
+    int file_mode = MPI_MODE_RDONLY | MPI_MODE_UNIQUE_OPEN;
+
+    MPI_Info mpi_info = MPI_INFO_NULL; // For MPI IO hints
+    MPI_Info_create(&mpi_info);
+    MPI_Info_set(mpi_info, "collective_buffering", "true");
+
+    MPI_File file_handle = NULL;
+    err = MPI_File_open(mpi_io_comm_, filename, file_mode, mpi_info, &file_handle);
+    if (err)
+    {
+        std::string error_msg = std::string("MPI_File_open failed to open ") + std::string(filename);
+        throw std::runtime_error(error_msg);
+    }
+    return file_handle;
+}
+
+template <typename R, int ARRAY_LAYOUT>
+void Manager_MPI_IO<R,ARRAY_LAYOUT>::write_binary(const char *filename, R *data, int file_offset) const
+{
+    MPI_File file_handle = create_mpi_io_file_(filename);
+
+    MPI_File_set_view(file_handle, file_offset, mpi_native_type_, file_space_type_, "native", MPI_INFO_NULL);
+    MPI_File_write_all(file_handle, data, 1, memory_space_type_, MPI_STATUS_IGNORE);
+  
+    MPI_File_close(&file_handle);
+}
+
+template <typename R, int ARRAY_LAYOUT>
+void Manager_MPI_IO<R,ARRAY_LAYOUT>::read_binary(const char *filename, R *data, int file_offset) const
+{
+    // open file
+    MPI_File file_handle = open_mpi_io_file_(filename);
+
+    // set view and read file
+    MPI_File_set_view(file_handle, file_offset, mpi_native_type_, file_space_type_, "native", MPI_INFO_NULL);
+    MPI_File_read_all(file_handle, data, 1, memory_space_type_, MPI_STATUS_IGNORE);
+    MPI_File_close(&file_handle);
+}
+
+template <typename R, int ARRAY_LAYOUT>
+void Manager_MPI_IO<R,ARRAY_LAYOUT>::write_ascii(const char *filename, R *data, const char* header_text) const
+{
+    // for holding data converted to chars
+    const int size = local_dimensions_full_array_[0]*local_dimensions_full_array_[1]*local_dimensions_full_array_[2];
+    char *data_as_chars = new char[size*chars_per_num_];
+
+    // write data into data_as_chars
+    for (int i = 0; i < size; i++)
+    {
+        sprintf(&data_as_chars[i*chars_per_num_], format_.c_str(), (double)data[i]);
+    }
+
+    int header_text_size = strlen(header_text);
+    int data_as_chars_size = strlen(data_as_chars);
+
+    // open file
+    MPI_File file_handle = create_mpi_io_file_(filename);
+
+    // my_rank_ == 0 writes header of file
+    if (my_rank_ == root && header_text_size > 0)
+    {
+        MPI_File_write(file_handle, header_text, header_text_size, MPI_CHAR, MPI_STATUS_IGNORE);
+    }
+    MPI_Barrier(mpi_io_comm_);
+
+    // set view and write data
+    MPI_File_set_view(file_handle, header_text_size, MPI_CHAR, ascii_file_space_type_, "native", MPI_INFO_NULL);
+    MPI_File_write_all(file_handle, data_as_chars, 1, ascii_memory_space_type_, MPI_STATUS_IGNORE);
+
+    MPI_File_close(&file_handle);
+
+    delete[] data_as_chars;
+}
+
+template <typename R, int ARRAY_LAYOUT>
+void Manager_MPI_IO<R,ARRAY_LAYOUT>::write_ascii(const char *filename, const char *data_as_chars, const char* header_text, int chars_per_line) const
+{
+    /*
+      - header_text can be set to "" if no header_text is needed.
+      - All lines in data_as_chars must have the same chars_per_line.
+      - Do not forget to count the \n char when counting chars_per_line.
+    */
+
+    // create chars_per_line_type
+    MPI_Datatype chars_per_line_type;
+    MPI_Type_contiguous(chars_per_line, MPI_CHAR, &chars_per_line_type);
+    MPI_Type_commit(&chars_per_line_type);
+
+    // create custom_file_space_type
+    MPI_Datatype custom_file_space_type;
+    MPI_Type_create_subarray(num_dims_, dimensions_full_array_.data(), dimensions_subarray_.data(),
+                             start_coordinates_.data(), ARRAY_LAYOUT, chars_per_line_type,
+                             &custom_file_space_type);
+    MPI_Type_commit(&custom_file_space_type);
+
+    int header_text_size = strlen(header_text);
+    int data_as_chars_size = strlen(data_as_chars);
+
+    // open file
+    MPI_File file_handle = create_mpi_io_file_(filename);
+
+    // my_rank_ == 0 writes header of file
+    if (my_rank_ == root && header_text_size > 0)
+    {
+        MPI_File_write(file_handle, header_text, header_text_size, MPI_CHAR, MPI_STATUS_IGNORE);
+    }
+    MPI_Barrier(mpi_io_comm_);
+
+    // set view and write data
+    MPI_File_set_view(file_handle, header_text_size, MPI_CHAR, custom_file_space_type, "native", MPI_INFO_NULL);
+    MPI_File_write_all(file_handle, data_as_chars, data_as_chars_size, MPI_CHAR, MPI_STATUS_IGNORE);
+
+    MPI_File_close(&file_handle);
+
+    // free created MPI_Datatype
+    MPI_Type_free(&chars_per_line_type);
+    MPI_Type_free(&custom_file_space_type);
+}
diff --git a/src/LS-EVPFFT/src/math_functions.cpp b/src/LS-EVPFFT/src/math_functions.cpp
new file mode 100644
index 000000000..bbd62161d
--- /dev/null
+++ b/src/LS-EVPFFT/src/math_functions.cpp
@@ -0,0 +1,81 @@
+#include <stdio.h>
+#include "math_functions.h"
+
+
+// New Gauss-Jordan elimination matrix inverse functions 
+KOKKOS_FUNCTION
+void swap_rows(double *matrix, int row1, int row2, int n) {
+    for (int i = 0; i < n; i++) {
+        double temp = *(matrix + row1 * n + i);
+        *(matrix + row1 * n + i) = *(matrix + row2 * n + i);
+        *(matrix + row2 * n + i) = temp;
+    }
+}
+
+KOKKOS_FUNCTION
+void scale_row(double *matrix, int row, double factor, int n) {
+    for (int i = 0; i < n; i++) {
+        *(matrix + row * n + i) *= factor;
+    }
+}
+
+KOKKOS_FUNCTION
+void add_rows(double *matrix, int src_row, int dest_row, double factor, int n) {
+    for (int i = 0; i < n; i++) {
+        *(matrix + dest_row * n + i) += *(matrix + src_row * n + i) * factor;
+    }
+}
+
+
+KOKKOS_FUNCTION
+int solve_linear_system(double* A, double* b, int n) {
+    // Function to solve linear system using Gaussian elimination
+    
+    int error_flag=1;
+    
+    for (int i = 0; i < n; i++) {
+        // Find the pivot row
+        int pivotRow = i;
+        for (int j = i + 1; j < n; j++) {
+            if (fabs(A[j * n + i]) > fabs(A[pivotRow * n + i])) {
+                pivotRow = j;
+            }   
+        }   
+        
+        // Swap the current row with the pivot row using the provided function
+        if (pivotRow != i) {
+            swap_rows(A, i, pivotRow, n);
+            
+            // C-style swap for the b vector elements
+            double temp = b[i];
+            b[i] = b[pivotRow];
+            b[pivotRow] = temp;
+        }   
+        
+        // Check for singular or nearly singular matrix
+        if (A[i * n + i] == 0.0) {
+            return error_flag; // Matrix is singular or nearly singular
+        }   
+        
+        // Make the diagonal element 1
+        double pivot = A[i * n + i];
+        for (int j = i; j < n; j++) {
+            A[i * n + j] /= pivot;
+        }   
+        b[i] /= pivot;
+        
+        // Eliminate other rows
+        for (int j = 0; j < n; j++) {
+            if (j != i) { 
+                double factor = A[j * n + i];
+                for (int k = i; k < n; k++) {
+                    A[j * n + k] -= factor * A[i * n + k];
+                }   
+                b[j] -= factor * b[i];
+            }   
+        }   
+    }   
+    
+    error_flag = 0;
+    return error_flag; // Matrix is non-singular
+}
\ No newline at end of file
diff --git a/src/LS-EVPFFT/src/math_functions.h b/src/LS-EVPFFT/src/math_functions.h
new file mode 100644
index 000000000..3eb51cc8b
--- /dev/null
+++ b/src/LS-EVPFFT/src/math_functions.h
@@ -0,0 +1,101 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+template<typename T>
+KOKKOS_FUNCTION
+T PowIntExpo(T base, int exponent);
+
+KOKKOS_FUNCTION
+void swap_rows(double *matrix, int row1, int row2, int n);
+ 
+KOKKOS_FUNCTION
+void scale_row(double *matrix, int row, double factor, int n);
+
+KOKKOS_FUNCTION
+void add_rows(double *matrix, int src_row, int dest_row, double factor, int n);
+
+template<int n>
+KOKKOS_FUNCTION
+int invert_matrix(double *matrix);
+
+KOKKOS_FUNCTION
+int solve_linear_system(double* A, double* b, int n);
+
+
+
+template<int n>
+KOKKOS_FUNCTION
+int invert_matrix(double *matrix)
+{
+    int error_flag=0;
+    double identity[n*n];
+    if (identity == NULL) {
+        printf("Error: Failed to allocate memory for identity matrix.\n");
+        return 1;
+    }
+
+    // Initialize identity matrix
+    for (int i = 0; i < n; i++) {
+        for (int j = 0; j < n; j++) {
+            *(identity + i * n + j) = (i == j) ? 1.0 : 0.0;
+        }
+    }
+
+    // Perform Gauss-Jordan elimination
+    for (int i = 0; i < n; i++) {
+        if (*(matrix + i * n + i) == 0) {
+            //printf("Error: Matrix is not invertible.\n");
+            return error_flag;
+        }
+
+        double pivot = *(matrix + i * n + i);
+        scale_row(matrix, i, 1.0 / pivot, n);
+        scale_row(identity, i, 1.0 / pivot, n);
+
+        for (int j = 0; j < n; j++) {
+            if (j != i) {
+                double factor = -*(matrix + j * n + i);
+                add_rows(matrix, i, j, factor, n);
+                add_rows(identity, i, j, factor, n);
+            }
+        }
+    }
+
+    // Copy the inverted matrix to the input matrix pointer
+    for (int i = 0; i < n; i++) {
+        for (int j = 0; j < n; j++) {
+            *(matrix + i * n + j) = *(identity + i * n + j);
+        }
+    }
+
+    error_flag = 0;
+    return error_flag;
+}
+// End New Gauss-Jordan elimination matrix inverse functions
+
+
+// Optimized pow for integer exponents only
+template<typename T>
+KOKKOS_FUNCTION
+T PowIntExpo(T base, int exponent) {
+    if (exponent == 0)
+        return 1;
+    else if (exponent == 1)
+        return base;
+
+    T result = 1;
+    bool negativeExponent = (exponent < 0);
+    exponent = abs(exponent);
+
+    while (exponent > 0) {
+        if (exponent & 1)
+            result *= base;
+        base *= base;
+        exponent >>= 1;
+    }
+
+    return negativeExponent ? 1.0 / result : result;
+}
\ No newline at end of file
diff --git a/src/LS-EVPFFT/src/matrix_exp.cpp b/src/LS-EVPFFT/src/matrix_exp.cpp
new file mode 100644
index 000000000..f43cf252d
--- /dev/null
+++ b/src/LS-EVPFFT/src/matrix_exp.cpp
@@ -0,0 +1,40 @@
+#include "matrix_exp.h"
+
+KOKKOS_FUNCTION
+void matrix_exp(real_t *mat_, real_t *mat_exp_)
+{
+
+  ViewMatrixTypeReal mat(mat_, 3,3);
+  ViewMatrixTypeReal mat_exp(mat_exp_, 3,3);
+
+  real_t mat2_[3*3];
+  real_t mat3_[3*3];
+  ViewMatrixTypeReal mat2(mat2_, 3,3);
+  ViewMatrixTypeReal mat3(mat3_, 3,3);
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      mat2(i,j) = 0.0;
+      for (int k = 1; k <=3; k++){
+        mat2(i,j) += mat(i,k)*mat(k,j);
+      }
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      mat3(i,j) = 0.0;
+      for (int k = 1; k <=3; k++){
+        mat3(i,j) += mat2(i,k)*mat(k,j);
+      }
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      mat_exp(i,j) = mat(i,j) + mat2(i,j)/2.0 + mat3(i,j)/6.0;
+    }
+    mat_exp(i,i) = mat_exp(i,i) + 1.0;
+  }
+  
+}
diff --git a/src/LS-EVPFFT/src/matrix_exp.h b/src/LS-EVPFFT/src/matrix_exp.h
new file mode 100644
index 000000000..a8ce4d59b
--- /dev/null
+++ b/src/LS-EVPFFT/src/matrix_exp.h
@@ -0,0 +1,8 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+KOKKOS_FUNCTION
+void matrix_exp(real_t *mat_, real_t *mat_exp_);
diff --git a/src/LS-EVPFFT/src/micro_output_writer.h b/src/LS-EVPFFT/src/micro_output_writer.h
new file mode 100644
index 000000000..ec24f585b
--- /dev/null
+++ b/src/LS-EVPFFT/src/micro_output_writer.h
@@ -0,0 +1,141 @@
+#pragma once
+
+#include "hdf5_io_functions.h"
+
+template <int ARRAY_LAYOUT>
+struct MicroOutputWriter
+{
+    MicroOutputWriter(MPI_Comm mpi_hdf5_comm, const char *filename,
+        const int num_dims, const int *global_array_dims,
+        const int *local_array_dims, const int *start_coordinates,
+        const int *stride=nullptr);
+    MicroOutputWriter(const MicroOutputWriter &) = delete;
+    ~MicroOutputWriter();
+
+    template <typename T>
+    void reverse_array(T *arr, int start, int end);
+
+    template<typename T>
+    void write(const char *dataset_name, T *data_ptr);
+
+    template<typename T>
+    void write(const char *dataset_name, T *data_ptr, hid_t group_id);
+
+//private:
+    MPI_Comm mpi_hdf5_comm_;
+    int num_dims_;
+    int *global_array_dims_;
+    int *local_array_dims_;
+    int *start_coordinates_;
+    int *stride_;
+    hid_t filespace_;
+    hid_t memspace_;
+    hid_t file_id_;
+    const char *filename_;
+};
+
+
+template <int ARRAY_LAYOUT>
+MicroOutputWriter<ARRAY_LAYOUT>::MicroOutputWriter(MPI_Comm mpi_hdf5_comm, const char *filename,
+        const int num_dims, const int *global_array_dims, const int *local_array_dims,
+        const int *start_coordinates, const int *stride)
+    : mpi_hdf5_comm_ (mpi_hdf5_comm)
+    , filename_ (filename)
+    , num_dims_ (num_dims)
+    , filespace_ (H5S_NULL)
+    , memspace_ (H5S_NULL)
+{
+#ifndef ABSOLUTE_NO_OUTPUT
+    global_array_dims_ = new int[num_dims_];
+    local_array_dims_ = new int[num_dims_];
+    start_coordinates_ = new int[num_dims_];
+    stride_ = new int[num_dims_];
+
+    for (int i = 0; i < num_dims_; i++)
+    {
+        global_array_dims_[i] = global_array_dims[i];
+        local_array_dims_[i] = local_array_dims[i];
+        start_coordinates_[i] = start_coordinates[i];
+        if (stride == nullptr)
+        {
+          stride_[i] = 1;
+        }
+        else
+        {
+          stride_[i] = stride[i];
+        }
+    }
+
+    if constexpr (ARRAY_LAYOUT == MPI_ORDER_FORTRAN)
+    {
+        // reverse arrays because hdf5 called from c code writes in c order
+        reverse_array(global_array_dims_, 0, num_dims_-1);
+        reverse_array(local_array_dims_, 0, num_dims_-1);
+        reverse_array(start_coordinates_, 0, num_dims_-1);
+        reverse_array(stride_, 0, num_dims_-1);
+    }
+
+    filespace_ = create_hdf5_filespace(3, global_array_dims_, local_array_dims_,
+                                       start_coordinates_, stride_);
+    memspace_ = create_hdf5_memspace(3, local_array_dims_, stride_, 0);
+    file_id_ = create_hdf5_file(filename, mpi_hdf5_comm_);
+#endif
+}
+
+
+template <int ARRAY_LAYOUT>
+MicroOutputWriter<ARRAY_LAYOUT>::~MicroOutputWriter()
+{
+#ifndef ABSOLUTE_NO_OUTPUT
+    delete [] global_array_dims_;
+    delete [] local_array_dims_;
+    delete [] start_coordinates_;
+    delete [] stride_;
+
+    H5Sclose(filespace_);
+    filespace_ = H5S_NULL;
+
+    H5Sclose(memspace_);
+    memspace_ = H5S_NULL;
+
+    H5Fclose(file_id_);
+#endif
+};
+
+
+template <int ARRAY_LAYOUT>
+template<typename T>
+void MicroOutputWriter<ARRAY_LAYOUT>::reverse_array(T *arr, int start, int end)
+{
+    /* Function to reverse T *arr from start to end*/
+
+    while (start < end)
+    {
+        T temp = arr[start];
+        arr[start] = arr[end];
+        arr[end] = temp;
+        start++;
+        end--;
+    }
+}
+
+
+template <int ARRAY_LAYOUT>
+template<typename T>
+void MicroOutputWriter<ARRAY_LAYOUT>::write(const char *dataset_name, T *data_ptr)
+{
+    /* dataset_name: can be absolute link path or a relative link path */
+
+    write_hdf5_dataset(dataset_name, data_ptr, file_id_,
+                       memspace_, filespace_, mpi_hdf5_comm_);
+}
+
+template <int ARRAY_LAYOUT>
+template<typename T>
+void MicroOutputWriter<ARRAY_LAYOUT>::write(const char *dataset_name, T *data_ptr, hid_t group_id)
+{
+    /* dataset_name: relative link path to group_id */
+
+    write_hdf5_dataset(dataset_name, data_ptr, group_id,
+                       memspace_, filespace_, mpi_hdf5_comm_);
+}
diff --git a/src/LS-EVPFFT/src/orient.cpp b/src/LS-EVPFFT/src/orient.cpp
new file mode 100644
index 000000000..8d4637714
--- /dev/null
+++ b/src/LS-EVPFFT/src/orient.cpp
@@ -0,0 +1,87 @@
+#include "orient.h"
+
+KOKKOS_FUNCTION
+void orient(real_t *a_, real_t *c_)
+{
+  ViewMatrixTypeReal a(a_,3,3);
+  ViewMatrixTypeReal c(c_,3,3);
+
+  real_t snorm;
+  real_t snorm1;
+
+  // thread private arrays
+  real_t th2_[3*3];
+  real_t v_[3];
+  real_t vbar_[3];
+  real_t th_[3*3];
+  real_t rot_[3*3];
+  real_t anew_[3*3];
+
+  // create views of thread private arrays
+  ViewMatrixTypeReal th2(th2_,3,3);
+  ViewMatrixTypeReal v(v_,3);
+  ViewMatrixTypeReal vbar(vbar_,3);
+  ViewMatrixTypeReal th(th_,3,3);
+  ViewMatrixTypeReal rot(rot_,3,3);
+  ViewMatrixTypeReal anew(anew_,3,3);
+  
+
+//     BUILD ROTATION TENSOR BASED ON RODRIGUES FORMULA
+
+  v(1) = c(3,2);
+  v(2) = c(1,3);
+  v(3) = c(2,1);
+  snorm = SQRT(v(1)*v(1)+v(2)*v(2)+v(3)*v(3));
+  snorm1 = TAN(snorm/2.0);
+  if (snorm > 1.0E-06) goto label_97;
+  snorm = 1.0;
+
+label_97 : {
+  for (int i = 1; i <= 3; i++) {
+    vbar(i) = snorm1*v(i)/snorm;
+  }
+} // end label_97
+
+  snorm = vbar(1)*vbar(1)+vbar(2)*vbar(2)+vbar(3)*vbar(3);
+  th(3,2) =  vbar(1);
+  th(1,3) =  vbar(2);
+  th(2,1) =  vbar(3);
+  th(2,3) = -vbar(1);
+  th(3,1) = -vbar(2);
+  th(1,2) = -vbar(3);
+
+  for (int i = 1; i <= 3; i++) {
+    th(i,i) = 0.0;
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      th2(i,j) = 0.0;
+      for (int k = 1; k <= 3; k++) {
+        th2(i,j) += th(i,k)*th(k,j);
+      }
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      rot(i,j) = (i/j)*(j/i)+2.0*(th(i,j)+th2(i,j))/(1.0+snorm);
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      anew(i,j) = 0.0;
+      for (int k = 1; k <= 3; k++) {
+        anew(i,j) += rot(i,k)*a(k,j);
+      }
+    }
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      a(i,j) = anew(i,j);
+    }
+  }
+
+}
diff --git a/src/LS-EVPFFT/src/orient.h b/src/LS-EVPFFT/src/orient.h
new file mode 100644
index 000000000..3924f3e2d
--- /dev/null
+++ b/src/LS-EVPFFT/src/orient.h
@@ -0,0 +1,6 @@
+#pragma once
+
+#include "definitions.h"
+
+KOKKOS_FUNCTION
+void orient(real_t *a_, real_t *c_);
diff --git a/src/LS-EVPFFT/src/output_file_manager.cpp b/src/LS-EVPFFT/src/output_file_manager.cpp
new file mode 100644
index 000000000..64f7e1342
--- /dev/null
+++ b/src/LS-EVPFFT/src/output_file_manager.cpp
@@ -0,0 +1,44 @@
+#include "output_file_manager.h"
+#include "utilities.h"
+#include "definitions.h"
+
+OutputFileManager::OutputFileManager()
+  : vm_file (nullptr)
+  , str_str_file (nullptr)
+  , err_file (nullptr)
+  , conv_file (nullptr)
+{
+  // open_files if my_rank == 0 form EVPFFT constructor
+  //open_files();
+}
+
+void OutputFileManager::open_files()
+{
+#ifndef ABSOLUTE_NO_OUTPUT
+    vm_file = std::shared_ptr <FILE> (fopen("vm.out", "w"), &fclose);
+    check_that_file_is_open(vm_file.get(), "vm.out");
+
+    str_str_file = std::shared_ptr <FILE> (fopen("str_str.out", "w"), &fclose);
+    check_that_file_is_open(str_str_file.get(), "str_str.out");
+
+    err_file = std::shared_ptr <FILE> (fopen("err.out", "w"), &fclose);
+    check_that_file_is_open(err_file.get(), "err.out");
+
+    conv_file = std::shared_ptr <FILE> (fopen("conv.out", "w"), &fclose);
+    check_that_file_is_open(conv_file.get(), "conv.out");
+
+
+    // write heading text
+    fprintf(vm_file.get(), "EVM,EVMP,DVM,DVMP,SVM,SVM1\n");
+
+    fprintf(str_str_file.get(),
+            "E11,E22,E33,E23,E13,E12,"
+            "S11,S22,S33,S23,S13,S12,"
+            "EP11,EP22,EP33,EP23,EP13,EP12,"
+            "E_dot11,E_dot22,E_dot33,E_dot23,E_dot13,E_dot12,"
+            "E_dotP11,E_dotP22,E_dotP33,E_dotP23,E_dotP13,E_dotP12,"
+            "EVM,EPVM,DVM,DPVM,SVM\n");
+
+    fprintf(err_file.get(), "STEP,IT,ERRE,ERRS,SVM,AVG_NR_IT\n");
+#endif
+}
diff --git a/src/LS-EVPFFT/src/output_file_manager.h b/src/LS-EVPFFT/src/output_file_manager.h
new file mode 100644
index 000000000..27e09d6be
--- /dev/null
+++ b/src/LS-EVPFFT/src/output_file_manager.h
@@ -0,0 +1,19 @@
+#pragma once
+
+#include <stdio.h>
+#include <string>
+#include <memory>
+
+
+
+struct OutputFileManager
+{
+
+  std::shared_ptr <FILE> vm_file;
+  std::shared_ptr <FILE> str_str_file;
+  std::shared_ptr <FILE> err_file;
+  std::shared_ptr <FILE> conv_file;
+
+  OutputFileManager();
+  void open_files();
+};
diff --git a/src/LS-EVPFFT/src/reduction_data_structures.h b/src/LS-EVPFFT/src/reduction_data_structures.h
new file mode 100644
index 000000000..963e3721a
--- /dev/null
+++ b/src/LS-EVPFFT/src/reduction_data_structures.h
@@ -0,0 +1,144 @@
+#pragma once
+
+#include "matar.h"
+
+
+
+//========================================================================================
+// ArrayType : used for performing reduction into an array in parallel region.
+template <typename T, int N>
+struct ArrayType {
+  int size=N;
+  T array[N];
+
+  KOKKOS_INLINE_FUNCTION    // Default constructor
+  ArrayType();
+
+  KOKKOS_INLINE_FUNCTION    // Copy constructor
+  ArrayType(const ArrayType & rhs);
+
+  KOKKOS_INLINE_FUNCTION    // add operator
+  ArrayType& operator+=(const ArrayType & rhs);
+
+  KOKKOS_INLINE_FUNCTION    // volatile add operator
+  void operator+=(const volatile ArrayType & rhs) volatile;
+
+  //KOKKOS_INLINE_FUNCTION    // accessor
+  //T& operator[](int i) const;
+
+};
+
+template <typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayType<T,N>::ArrayType()
+{
+  for (int i = 0; i < N; i++) {
+    this->array[i] = T(0);
+  }
+}
+
+template <typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayType<T,N>::ArrayType(const ArrayType & rhs)
+{
+  for (int i = 0; i < N; i++) {
+    this->array[i] = rhs.array[i];
+  }
+}
+
+template <typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayType<T,N>& ArrayType<T,N>::operator+=(const ArrayType & rhs)
+{
+  for (int i = 0; i < N; i++) {
+    this->array[i] += rhs.array[i];
+  }
+  return *this;
+}
+
+template <typename T, int N>
+KOKKOS_INLINE_FUNCTION
+void ArrayType<T,N>::operator+=(const volatile ArrayType & rhs) volatile
+{
+  for (int i = 0; i < N; i++) {
+    this->array[i] += rhs.array[i];
+  }
+}
+
+//template <typename T, int N>
+//KOKKOS_INLINE_FUNCTION
+//T& ArrayType<T,N>::operator[](int i) const
+//{
+//  return this->array[i];
+//}
+
+
+//========================================================================================
+// ArrayOfArrayType : used for performing reduction into an array of ArrayType in parallel region.
+template <int M, typename T, int N>
+struct ArrayOfArrayType {
+  int size=M;
+  ArrayType <T,N> array[M];
+
+  KOKKOS_INLINE_FUNCTION    // Default constructor
+  ArrayOfArrayType();
+
+  KOKKOS_INLINE_FUNCTION    // Copy constructor
+  ArrayOfArrayType(const ArrayOfArrayType & rhs);
+
+  KOKKOS_INLINE_FUNCTION    // add operator
+  ArrayOfArrayType& operator+=(const ArrayOfArrayType & rhs);
+
+  KOKKOS_INLINE_FUNCTION    // volatile add operator
+  void operator+=(const volatile ArrayOfArrayType & rhs) volatile;
+
+  //KOKKOS_INLINE_FUNCTION    // accessor
+  //ArrayType<T,N>& operator[](int i) const;
+
+};
+
+template <int M, typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayOfArrayType<M,T,N>::ArrayOfArrayType()
+{}
+
+template <int M, typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayOfArrayType<M,T,N>::ArrayOfArrayType(const ArrayOfArrayType & rhs)
+{
+  for (int j = 0; j < M; j++) {
+    for (int i = 0; i < N; i++) {
+      this->array[j].array[i] = rhs.array[j].array[i];
+    }
+  }
+}
+
+template <int M, typename T, int N>
+KOKKOS_INLINE_FUNCTION
+ArrayOfArrayType<M,T,N>& ArrayOfArrayType<M,T,N>::operator+=(const ArrayOfArrayType & rhs)
+{
+  for (int j = 0; j < M; j++) {
+    this->array[j] += rhs.array[j];
+  }
+  return *this;
+}
+
+template <int M, typename T, int N>
+KOKKOS_INLINE_FUNCTION
+void ArrayOfArrayType<M,T,N>::operator+=(const volatile ArrayOfArrayType & rhs) volatile
+{
+  for (int j = 0; j < M; j++) {
+    this->array[j] += rhs.array[j];
+  }
+}
+
+//template <int M, typename T, int N>
+//KOKKOS_INLINE_FUNCTION
+//ArrayType<T,N>& ArrayOfArrayType<M,T,N>::operator[](int i) const
+//{
+//  return this->array[i];
+//}
+
+
+
+
diff --git a/src/LS-EVPFFT/src/step.cpp b/src/LS-EVPFFT/src/step.cpp
new file mode 100644
index 000000000..33f1e53c9
--- /dev/null
+++ b/src/LS-EVPFFT/src/step.cpp
@@ -0,0 +1,168 @@
+#include "evpfft.h"
+#include "vm.h"
+#include "reduction_data_structures.h"
+#include "utilities.h"
+#include "Profiler.h"
+
+void EVPFFT::step_update_velgrad()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+    
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        velgrad(ii,jj,i,j,k) = velgradmacro(ii,jj);
+      } // end for ii
+    } // end for jj
+ 
+  }); // end FOR_ALL_CLASS
+
+}
+
+void EVPFFT::step_set_dvelgradmacro_and_dvelgradmacroacum_to_zero()
+{
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      dvelgradmacro(i,j) = 0.0;
+      dvelgradmacroacum(i,j)  = 0.0;
+    }
+  }
+}
+
+
+void EVPFFT::step_update()
+{
+  Profiler profiler(__FUNCTION__);
+
+  //  velgrad, which contained disgrad at t, is updated
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        sgt(ii,jj,i,j,k) = sg(ii,jj,i,j,k);
+      } // end for ii
+    } // end for jj
+
+  }); // end FOR_ALL_CLASS
+
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      velgradmacroactual(ii,jj) = velgradmacro.host(ii,jj) + dvelgradmacroacum(ii,jj);
+      disgradmacro.host(ii,jj) = disgradmacrot(ii,jj) + velgradmacroactual(ii,jj) * tdot;
+    } // end for ii
+  } // end for jj
+
+
+  // TOTAL (EL+PL) VM
+  evm = vm(disgradmacro.host.pointer());
+  dvm = vm(velgradmacroactual.pointer());
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      disgradmacrot(ii,jj) = disgradmacro.host(ii,jj);
+    } // end for ii
+  } // end for jj
+
+}
+
+
+void EVPFFT::step_vm_calc()
+{
+  Profiler profiler(__FUNCTION__);
+
+#if 0
+  //     INITIAL GUESS OF DISGRADMACRO AT t+dt ALWAYS ELASTIC   
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      disgradmacro.host(ii,jj) = disgradmacrot(ii,jj) + udot.host(ii,jj) * tdot;
+    } // end for ii
+  } // end for jj
+  // update device
+  disgradmacro.update_device();
+#endif
+
+  // Note: reduction is performed on epav(3,3) and edotpav(3,3) 
+  //       epav = all_reduce[0]
+  //       edotpav = all_reduce[1]
+  ArrayOfArrayType <2, real_t, 3*3> all_reduce;
+
+  Kokkos::parallel_reduce(
+  Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+  KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, 
+                ArrayOfArrayType <2, real_t, 3*3> & loc_reduce) {
+
+    ViewMatrixTypeReal epav_loc (&loc_reduce.array[0].array[0], 3, 3);
+    ViewMatrixTypeReal edotpav_loc (&loc_reduce.array[1].array[0], 3, 3);
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        ept(ii,jj,i,j,k) += edotp(ii,jj,i,j,k) * tdot;
+      } // end for ii
+    } // end for jj
+
+//
+//     PLASTIC VM
+//
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        epav_loc(ii,jj) += ept(ii,jj,i,j,k) * wgtc(i,j,k);
+        edotpav_loc(ii,jj) += edotp(ii,jj,i,j,k) * wgtc(i,j,k);
+      }
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  ViewMatrixTypeReal epav_loc (&all_reduce.array[0].array[0], 3, 3);
+  ViewMatrixTypeReal edotpav_loc (&all_reduce.array[1].array[0], 3, 3);
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      epav(ii,jj) = epav_loc(ii,jj);
+      edotpav(ii,jj) = edotpav_loc(ii,jj);  
+    }
+  }
+  MPI_Allreduce(MPI_IN_PLACE, epav.pointer(), epav.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
+  MPI_Allreduce(MPI_IN_PLACE, edotpav.pointer(), edotpav.size(), MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  evmp = 0.0;
+  dvmp = 0.0;
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      evmp += POW2(epav(ii,jj));
+      dvmp += POW2(edotpav(ii,jj));
+    }
+  }
+  evmp = sqrt(2.0/3.0*evmp);
+  dvmp = sqrt(2.0/3.0*dvmp);
+//printf("\t%26.16E, \t%26.16E", evmp, dvmp);
+//PAUSE;
+
+}
+
+
+void EVPFFT::step_texture_rve_update()
+{
+  Profiler profiler(__FUNCTION__);
+
+  //   VELMAX
+  velmax(1) = dsim(1,1)*delt(1)*(npts1-1);
+  velmax(2) = dsim(2,2)*delt(2)*(npts2-1);
+  velmax(3) = dsim(3,3)*delt(3)*(npts3-1);
+
+  //   UPDATE ORIENTATIONS
+  update_orient();
+
+  //   UPDATE DELT
+  delt(1) = (delt(1)*(npts1-1)+velmax(1)*tdot)/(npts1-1);
+  delt(2) = (delt(2)*(npts2-1)+velmax(2)*tdot)/(npts2-1);
+  if (npts3 > 1) {
+     delt(3)=(delt(3)*(npts3-1)+velmax(3)*tdot)/(npts3-1);
+  }
+
+}
+
+
diff --git a/src/LS-EVPFFT/src/update_defgrad.cpp b/src/LS-EVPFFT/src/update_defgrad.cpp
new file mode 100644
index 000000000..fdd2f576d
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_defgrad.cpp
@@ -0,0 +1,214 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "matrix_exp.h"
+#include "determinant33.h"
+#include "math_functions.h"
+#include "reduction_data_structures.h"
+
+void EVPFFT::update_defgrad()
+{
+  Profiler profiler(__FUNCTION__);
+
+  const size_t n = 1 + 9 + 9; // for detFavg (1), defgradinvavgc (9), and velgradavg (9)
+  ArrayType <real_t, n> all_reduce;
+
+  real_t detFavg;
+  real_t velgradavg_[3*3];
+  real_t defgradinvavgc_[3*3];
+
+  // create views of thread private arrays
+  ViewMatrixTypeReal velgradavg(velgradavg_,3,3);
+  ViewMatrixTypeReal defgradinvavgc(defgradinvavgc_,3,3);
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    real_t detFtmp;
+
+    // thread private arrays
+    real_t defgradold_[3*3];
+    real_t defgradnew_[3*3];
+    real_t defgradinvold_[3*3];
+    real_t disgradinc_[3*3];
+    real_t expdisgradinc_[3*3];
+    real_t defgradinvtmp_[3*3];
+    real_t velgradold_[3*3];
+    real_t defgradincinv_[3*3];
+    
+    // create views of thread private arrays
+    ViewMatrixTypeReal defgradold(defgradold_,3,3);
+    ViewMatrixTypeReal defgradnew(defgradnew_,3,3);
+    ViewMatrixTypeReal defgradinvold(defgradinvold_,3,3);
+    ViewMatrixTypeReal disgradinc(disgradinc_,3,3);
+    ViewMatrixTypeReal expdisgradinc(expdisgradinc_,3,3);
+    ViewMatrixTypeReal defgradinvtmp(defgradinvtmp_,3,3);
+    ViewMatrixTypeReal velgradold(velgradold_,3,3);
+    ViewMatrixTypeReal defgradincinv(defgradincinv_,3,3);
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgradold(ii,jj) = defgrad(ii,jj,i,j,k);
+        defgradinvold(ii,jj) = defgradinv(ii,jj,i,j,k);
+        disgradinc(ii,jj) = velgrad(ii,jj,i,j,k)*tdot;
+      }
+    }
+
+    // matrix exponential of increment
+    matrix_exp(disgradinc.pointer(), expdisgradinc.pointer());
+
+    // update 
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgradnew(ii,jj) = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          defgradnew(ii,jj) += expdisgradinc(ii,kk)*defgrad(kk,jj,i,j,k);
+        }
+      }
+    }
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgrad(ii,jj,i,j,k) = defgradnew(ii,jj);
+      }
+    }
+
+    determinant33(defgradnew.pointer(), detFtmp);
+
+    if (detFtmp < 0.0) {
+      printf(" -> WARNING: detF = %E24.14E in voxel %d %d %d", detFtmp, i, j, k);
+    }
+
+    detF(i,j,k) = detFtmp;
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgradinvtmp(ii,jj) = defgradnew(ii,jj);
+      }
+    }
+
+    invert_matrix <3> (defgradinvtmp.pointer());
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        defgradinv(ii,jj,i,j,k) = defgradinvtmp(ii,jj);
+      }
+    }
+
+    wgtc(i,j,k) = wgt*detFtmp;
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        disgrad(ii,jj,i,j,k) = - defgradinvtmp(ii,jj);
+      }
+      disgrad(jj,jj,i,j,k) = disgrad(jj,jj,i,j,k) + 1.0;
+    }
+
+    // update velgrad configuration
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        velgradold(ii,jj) = velgrad(ii,jj,i,j,k);
+        defgradincinv(ii,jj) = expdisgradinc(ii,jj);
+      }
+    }
+
+    invert_matrix <3> (defgradincinv.pointer());
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        velgrad(ii,jj,i,j,k) = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          velgrad(ii,jj,i,j,k) += velgradold(ii,kk)*defgradincinv(kk,jj);
+        }
+      }
+    }
+
+    // averages packed in array
+    int ic;
+    ic = -1;
+    
+    ic = ic + 1;
+    loc_reduce.array[ic] += detFtmp*wgt;
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        ic = ic + 1;
+        loc_reduce.array[ic] += defgradinvtmp(ii,jj) * wgt * detFtmp;
+      }
+    }
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        ic = ic + 1;
+        loc_reduce.array[ic] += velgrad(ii,jj,i,j,k) * wgt * detFtmp;
+      }
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  // averages unpacked from array
+  int ic;
+  ic = -1;
+  
+  ic = ic + 1;
+  detFavg = all_reduce.array[ic];
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      defgradinvavgc(ii,jj) = all_reduce.array[ic]/detFavg;
+    }
+  }
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      velgradavg(ii,jj) = all_reduce.array[ic]/detFavg;
+    }
+  }
+
+  invert_matrix <3> (defgradinvavgc.pointer());
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      defgradinvavgc_inv(ii,jj) = defgradinvavgc(ii,jj);
+    }
+  }
+
+  // weight normalization and correction to velgrad
+  MatrixTypeRealDual dudot_dXini_avg(3,3);
+
+  for (int jj = 1; jj <= 3; jj++) {
+    for (int ii = 1; ii <= 3; ii++) {
+      dudot_dXini_avg.host(ii,jj) = 0.0;
+      for (int kk = 1; kk <= 3; kk++) {
+        dudot_dXini_avg.host(ii,jj) += (velgradmacroactual(ii,kk) - velgradavg(ii,kk))*defgradinvavgc_inv(kk,jj);
+      }
+    }
+  }
+  dudot_dXini_avg.update_device();
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    wgtc(i,j,k) = wgtc(i,j,k)/detFavg;
+
+
+    for (int jj = 1; jj <= 3; jj++) {
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int kk = 1; kk <= 3; kk++) {
+          velgrad(ii,jj,i,j,k) += dudot_dXini_avg(ii,kk)*defgradinv(kk,jj,i,j,k);
+        }
+      }
+    }
+  
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/update_defgrade.cpp b/src/LS-EVPFFT/src/update_defgrade.cpp
new file mode 100644
index 000000000..37bdb2ee6
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_defgrade.cpp
@@ -0,0 +1,101 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+#include "determinant33.h"
+#include "defgrad_dcmp.h"
+
+void EVPFFT::update_defgrade()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    real_t detdefgradeinc;
+
+    // thread private arrays
+    real_t defgradeinvold_[3*3];
+    real_t FpFiniinv_[3*3];
+    real_t defgradeinc_[3*3];
+    real_t Finc_[3*3];
+    real_t Vinc_[3*3];
+    real_t Rinc_[3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal defgradeinvold(defgradeinvold_,3,3);
+    ViewMatrixTypeReal FpFiniinv(FpFiniinv_,3,3);
+    ViewMatrixTypeReal defgradeinc(defgradeinc_,3,3);
+    ViewMatrixTypeReal Finc(Finc_,3,3);
+    ViewMatrixTypeReal Vinc(Vinc_,3,3);
+    ViewMatrixTypeReal Rinc(Rinc_,3,3);
+
+    int iph;
+
+    iph = jphase(i,j,k);
+
+    if (igas(iph) == 0) {
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          defgradeinvold(ii,jj) = defgrade(ii,jj,i,j,k);
+        }
+      }
+
+      invert_matrix <3> (defgradeinvold.pointer());
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          FpFiniinv(ii,jj) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            FpFiniinv(ii,jj) += defgradp(ii,kk,i,j,k)*defgradini(kk,jj,i,j,k);
+          }
+        }
+      }
+
+      invert_matrix <3> (FpFiniinv.pointer());
+
+      // update 
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          defgrade(ii,jj,i,j,k) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            defgrade(ii,jj,i,j,k)  += defgrad(ii,kk,i,j,k)*FpFiniinv(kk,jj);
+          }
+        }
+      }
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          defgradeinc(ii,jj) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            defgradeinc(ii,jj)  += defgrade(ii,kk,i,j,k)*defgradeinvold(kk,jj);
+          }
+          Finc(ii,jj) = defgradeinc(ii,jj);
+        }
+      }
+
+      defgrad_dcmp(Finc.pointer(), Vinc.pointer(), Rinc.pointer());
+      
+      // determinant33(defgradeinc.pointer(), detdefgradeinc);
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          sgt(ii,jj,i,j,k) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            for (int ll = 1; ll <= 3; ll++) {
+              // sgt(ii,jj,i,j,k) += defgradeinc(ii,kk)*defgradeinc(jj,ll)*sg(kk,ll,i,j,k)/detdefgradeinc;
+              sgt(ii,jj,i,j,k) += Rinc(ii,kk)*Rinc(jj,ll)*sg(kk,ll,i,j,k);
+            }
+          }
+        }
+      }
+
+    } // end if (igas(iph) == 0)
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/update_defgradp.cpp b/src/LS-EVPFFT/src/update_defgradp.cpp
new file mode 100644
index 000000000..4aa713a65
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_defgradp.cpp
@@ -0,0 +1,96 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "matrix_exp.h"
+
+void EVPFFT::update_defgradp()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    // thread private arrays
+    real_t aa_[3*3];
+    real_t velgradp_[3*3];
+    real_t disgradincp_[3*3];
+    real_t expdisgradincp_[3*3];
+    real_t defgradpnew_[3*3];
+    real_t dnsa_[3];
+    real_t dbsa_[3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal aa(aa_,3,3);
+    ViewMatrixTypeReal velgradp(velgradp_,3,3);
+    ViewMatrixTypeReal disgradincp(disgradincp_,3,3);
+    ViewMatrixTypeReal expdisgradincp(expdisgradincp_,3,3);
+    ViewMatrixTypeReal defgradpnew(defgradpnew_,3,3);
+    ViewMatrixTypeReal dnsa(dnsa_,3);
+    ViewMatrixTypeReal dbsa(dbsa_,3);
+
+    int iph;
+
+    iph = jphase(i,j,k);
+
+    if (igas(iph) == 0) {
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          aa(ii,jj) = ag(ii,jj,i,j,k);
+          velgradp(ii,jj) = 0.0;
+        }
+      }
+
+      // plastic velocity gradient in intermediate configuration
+      for (int is = 1; is <= nsyst(iph); is++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          dnsa(ii) = 0.0;
+          dbsa(ii) = 0.0;
+          for (int jj = 1; jj <= 3; jj++) {
+            dnsa(ii) += aa(ii,jj) * dnca(jj,is,iph);
+            dbsa(ii) += aa(ii,jj) * dbca(jj,is,iph);
+          }
+        }
+
+        for (int ii = 1; ii <= 3; ii++) {
+          for (int jj = 1; jj <= 3; jj++) {
+            velgradp(ii,jj) += dbsa(ii) * dnsa(jj) * gamdot(is,i,j,k);
+          }
+        }
+      } // end for is
+
+      // increment
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          disgradincp(ii,jj) = velgradp(ii,jj)*tdot;
+        }
+      }
+
+      // matrix exponential of increment
+      matrix_exp(disgradincp.pointer(), expdisgradincp.pointer());
+
+      // update 
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          defgradpnew(ii,jj) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            defgradpnew(ii,jj) += expdisgradincp(ii,kk)*defgradp(kk,jj,i,j,k);
+          }
+        }
+      }
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          defgradp(ii,jj,i,j,k) = defgradpnew(ii,jj);
+        }
+      }
+
+
+    } // end if (igas(iph) == 0) 
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/update_el_stiff.cpp b/src/LS-EVPFFT/src/update_el_stiff.cpp
new file mode 100644
index 000000000..d35f83cfb
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_el_stiff.cpp
@@ -0,0 +1,116 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+#include "determinant33.h"
+#include "defgrad_dcmp.h"
+
+void EVPFFT::update_el_stiff()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(k, 1, npts3+1,
+                j, 1, npts2+1,
+                i, 1, npts1+1, {
+
+    real_t detFe;
+    real_t dum;
+
+    // thread private arrays
+    real_t aa_[3*3];
+    real_t aat_[3*3];
+    real_t defgradetmp_[3*3];
+    real_t caux3333_[3*3*3*3];
+    real_t caux66_[6*6];
+    real_t F_[3*3];
+    real_t V_[3*3];
+    real_t R_[3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal aa(aa_,3,3);
+    ViewMatrixTypeReal aat(aat_,3,3);
+    ViewMatrixTypeReal defgradetmp(defgradetmp_,3,3);
+    ViewMatrixTypeReal caux3333(caux3333_,3,3,3,3);
+    ViewMatrixTypeReal caux66(caux66_,6,6);
+    ViewMatrixTypeReal F(F_,3,3);
+    ViewMatrixTypeReal V(V_,3,3);
+    ViewMatrixTypeReal R(R_,3,3);
+
+    int iph;
+
+    iph = jphase(i,j,k);
+
+    if (igas(iph) == 0) {
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          F(ii,jj) = defgrade(ii,jj,i,j,k);
+        }
+      }
+      defgrad_dcmp(F.pointer(), V.pointer(), R.pointer());
+      
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          aa(ii,jj) = 0.0;
+          for (int kk = 1; kk <= 3; kk++) {
+            // aa(ii,jj) += defgrade(ii,kk,i,j,k)*ag(kk,jj,i,j,k);
+            aa(ii,jj) += R(ii,kk)*ag(kk,jj,i,j,k);
+          }
+        }
+      }
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          aat(ii,jj) = aa(jj,ii);
+          defgradetmp(ii,jj) = defgrade(ii,jj,i,j,k);
+        }
+      }
+
+      // determinant33(defgradetmp.pointer(), detFe);
+
+      for (int i1 = 1; i1 <= 3; i1++) {
+        for (int j1 = 1; j1 <= 3; j1++) {
+          for (int k1 = 1; k1 <= 3; k1++) {
+            for (int l1 = 1; l1 <= 3; l1++) {
+            
+              dum = 0.0;
+              
+              for (int i2 = 1; i2 <= 3; i2++) {
+                for (int j2 = 1; j2 <= 3; j2++) {
+                  for (int k2 = 1; k2 <= 3; k2++) {
+                    for (int l2 = 1; l2 <= 3; l2++) {
+                    
+                      dum += aa(i1,i2) *
+                             aa(j1,j2) *
+                             cc(i2,j2,k2,l2,iph) * 
+                             aat(k2,k1) * 
+                             aat(l2,l1);
+                             
+                    }
+                  }
+                }
+              }
+                            
+              // caux3333(i1,j1,k1,l1) = dum/detFe;
+              caux3333(i1,j1,k1,l1) = dum;
+             
+            }
+          }
+        }
+      }
+            
+      cb.chg_basis_4(caux66.pointer(), caux3333.pointer(), 4, 6, cb.B_basis_device_pointer());
+
+      for (int ii = 1; ii <= 6; ii++) {
+        for (int jj = 1; jj <= 6; jj++) {
+          cg66(ii,jj,i,j,k) = caux66(ii,jj);
+        }
+      }
+
+    } // end if (igas(iph) == 0) 
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/update_grid_velgrad.cpp b/src/LS-EVPFFT/src/update_grid_velgrad.cpp
new file mode 100644
index 000000000..eda96228f
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_grid_velgrad.cpp
@@ -0,0 +1,251 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+#include "reduction_data_structures.h"
+
+void EVPFFT::update_grid_velgrad()
+{
+  Profiler profiler(__FUNCTION__);
+
+  const size_t n = 9; // for velgradrefavg (9)
+  ArrayType <real_t, n> all_reduce;
+
+  MatrixTypeRealDual velgradrefavg(3,3);
+  
+  const real_t twopi = 8.*ATAN(1.0);
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t,n> & loc_reduce) {
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        velgradref(ii,jj,i,j,k) = 0.0;
+        for (int kk = 1; kk <= 3; kk++) {
+          velgradref(ii,jj,i,j,k) += (velgrad(ii,kk,i,j,k))*defgrad(kk,jj,i,j,k);
+        }
+      }
+    }
+
+    // averages packed in array
+    int ic;
+    ic = -1;
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        ic = ic + 1;
+        loc_reduce.array[ic] += velgradref(ii,jj,i,j,k) * wgtc(i,j,k);
+      }
+    }
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  // averages unpacked from array
+  int ic;
+  ic = -1;
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+      ic = ic + 1;
+      velgradrefavg.host(ii,jj) = all_reduce.array[ic];
+    }
+  }
+  velgradrefavg.update_device();
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+
+      // prep for forward FFT
+      FOR_ALL_CLASS(k, 1, npts3+1,
+                    j, 1, npts2+1,
+                    i, 1, npts1+1, {
+        data(i,j,k) = velgradref(ii,jj,i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+#if defined USE_FFTW || USE_MKL
+      data.update_host();
+
+      // perform forward FFT
+      fft->forward(data.host_pointer(), (std::complex<double>*) data_cmplx.host_pointer());
+      data_cmplx.update_device();
+#else
+      // perform forward FFT
+      fft->forward(data.device_pointer(), (std::complex<double>*) data_cmplx.device_pointer());
+#endif
+
+      // write result to ouput
+      FOR_ALL_CLASS(k, 1, npts3_cmplx+1,
+                    j, 1, npts2_cmplx+1,
+                    i, 1, npts1_cmplx+1, {
+        work(ii,jj,i,j,k)   = data_cmplx(1,i,j,k);
+        workim(ii,jj,i,j,k) = data_cmplx(2,i,j,k);
+      }); // end FOR_ALL_CLASS
+      Kokkos::fence();
+
+    } // end for jj
+  } // end for ii
+
+  FOR_ALL_CLASS(k, 1, npts3_cmplx+1,
+                j, 1, npts2_cmplx+1,
+                i, 1, npts1_cmplx+1, {
+
+    real_t xkxk;
+
+    // thread private arrays
+    real_t xk_[3];
+    real_t velhatr_[3];
+    real_t velhatim_[3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal xk(xk_,3);
+    ViewMatrixTypeReal velhatr(velhatr_,3);
+    ViewMatrixTypeReal velhatim(velhatim_,3);
+
+    xk(1) = twopi*xk_gb(i);
+    xk(2) = twopi*yk_gb(j);
+    xk(3) = twopi*zk_gb(k);
+
+    xkxk = xk(1)*xk(1) + xk(2)*xk(2) + xk(3)*xk(3);
+
+    if (xkxk > 0.0) {
+
+      for (int ii = 1; ii <= 3; ii++) {
+        velhatr(ii) = 0.0;
+        velhatim(ii) = 0.0;
+        for (int jj = 1; jj <= 3; jj++) {
+          velhatr(ii) += workim(ii,jj,i,j,k)*xk(jj)/xkxk;
+          velhatim(ii) += - work(ii,jj,i,j,k)*xk(jj)/xkxk;
+        }
+      }
+
+    } else {
+
+      for (int ii = 1; ii <= 3; ii++) {
+        velhatr(ii) = 0.0;
+        velhatim(ii) = 0.0;
+      }
+    }
+
+    for (int ii = 1; ii <= 3; ii++) {
+      work(ii,1,i,j,k) = velhatr(ii);
+      workim(ii,1,i,j,k) = velhatim(ii);
+    }
+
+  }); // end FOR_ALL_CLASS
+
+  int jj;
+  jj = 1;
+  for (int ii = 1; ii <= 3; ii++) {
+
+    // prep for backward FFT
+    FOR_ALL_CLASS(k, 1, npts3_cmplx+1,
+                  j, 1, npts2_cmplx+1,
+                  i, 1, npts1_cmplx+1, {
+      data_cmplx(1,i,j,k) = work(ii,jj,i,j,k);
+      data_cmplx(2,i,j,k) = workim(ii,jj,i,j,k);
+    }); // end FOR_ALL_CLASS
+    Kokkos::fence();
+
+#if defined USE_FFTW || USE_MKL
+    data_cmplx.update_host();
+
+    // perform backward FFT
+    fft->backward((std::complex<double>*) data_cmplx.host_pointer(), data.host_pointer());
+    data.update_device();
+#else
+    // perform backward FFT
+    fft->backward((std::complex<double>*) data_cmplx.device_pointer(), data.device_pointer());
+#endif
+
+    // write result to ouput
+    FOR_ALL_CLASS(k, 1, npts3+1,
+                  j, 1, npts2+1,
+                  i, 1, npts1+1, {
+      work(ii,jj,i,j,k) = data(i,j,k);
+    }); // end FOR_ALL_CLASS
+    Kokkos::fence();
+
+  } // end for ii
+
+
+  FOR_ALL_CLASS(k, 1, npts3+2,
+                j, 1, npts2+2,
+                i, 1, npts1+2, {
+
+    int iv1;
+    int iv2;
+    int jv1;
+    int jv2;
+    int kv1;
+    int kv2;
+
+    // thread private arrays
+    real_t dvnode_[3];
+    real_t Xnode_ref_[3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal dvnode(dvnode_,3);
+    ViewMatrixTypeReal Xnode_ref(Xnode_ref_,3);
+
+    if (i == 1) {
+      iv1 = npts1;
+      iv2 = 1;
+    } else if (i == npts1 + 1) {
+      iv1 = npts1;
+      iv2 = 1;
+    } else {
+      iv1 = i - 1;
+      iv2 = i;
+    }
+
+    if (j == 1) {
+      jv1 = npts2;
+      jv2 = 1;
+    } else if (j == npts2 + 1) {
+      jv1 = npts2;
+      jv2 = 1;
+    } else {
+      jv1 = j - 1;
+      jv2 = j;
+    }
+
+    if (k == 1) {
+      kv1 = npts3;
+      kv2 = 1;
+    } else if (k == npts3 + 1) {
+      kv1 = npts3;
+      kv2 = 1;
+    } else {
+      kv1 = k - 1;
+      kv2 = k;
+    }
+
+    Xnode_ref(1) = float(i) - 0.5;
+    Xnode_ref(2) = float(j) - 0.5;
+    Xnode_ref(3) = float(k) - 0.5;
+
+    for (int ii = 1; ii <= 3; ii++) {
+      dvnode(ii) = 0.125*(work(ii,jj,iv1,jv1,kv1) + work(ii,jj,iv2,jv1,kv1) + 
+       work(ii,jj,iv1,jv2,kv1) + work(ii,jj,iv2,jv2,kv1) +
+       work(ii,jj,iv1,jv1,kv2) + work(ii,jj,iv2,jv1,kv2) + 
+       work(ii,jj,iv1,jv2,kv2) + work(ii,jj,iv2,jv2,kv2));
+    }
+
+    for (int ii = 1; ii <= 3; ii++) {
+      xnode(ii,i,j,k) += dvnode(ii)*tdot;
+      for (int jj = 1; jj <= 3; jj++) {
+        xnode(ii,i,j,k) += velgradrefavg(ii,jj)*Xnode_ref(jj)*tdot;
+      }
+    }
+
+  }); // end FOR_ALL_CLASS
+  Kokkos::fence();
+  
+}
diff --git a/src/LS-EVPFFT/src/update_orient.cpp b/src/LS-EVPFFT/src/update_orient.cpp
new file mode 100644
index 000000000..644935fc5
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_orient.cpp
@@ -0,0 +1,136 @@
+#include "evpfft.h"
+#include "orient.h"
+#include "reduction_data_structures.h"
+
+#include "utilities.h"
+
+
+
+void EVPFFT::update_orient()
+{
+
+  // reduction is performed on rslbar and rlcbar
+  // Note: rslbar = all_reduce[0]
+  //       rlcbar = all_reduce[1]
+  ArrayType <real_t, 2> all_reduce;
+
+
+  Kokkos::parallel_reduce(
+    Kokkos::MDRangePolicy<Kokkos::Rank<3,LOOP_ORDER,LOOP_ORDER>>({1,1,1}, {npts3+1,npts2+1,npts1+1}),
+    KOKKOS_CLASS_LAMBDA(const int k, const int j, const int i, ArrayType <real_t, 2> & loc_reduce) {
+
+    int iph;
+
+    // thread private arrays
+    real_t aa_[3*3];
+    real_t distor_[3*3];
+    real_t dnsa_[3];
+    real_t dbsa_[3];
+    real_t rotslip_[3*3];
+    real_t rotloc_[3*3];
+    real_t rot_[3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal aa(aa_,3,3);
+    ViewMatrixTypeReal distor(distor_,3,3);
+    ViewMatrixTypeReal dnsa(dnsa_,3);
+    ViewMatrixTypeReal dbsa(dbsa_,3);
+    ViewMatrixTypeReal rotslip(rotslip_,3,3);
+    ViewMatrixTypeReal rotloc(rotloc_,3,3);
+    ViewMatrixTypeReal rot(rot_,3,3);
+
+
+    iph = jphase(i,j,k);
+
+    if (igas(iph) == 0) {
+//
+//      LOCAL ROTATION RATE: ANTISYM(VELGRAD)
+//
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          rotloc(ii,jj) = (velgrad(ii,jj,i,j,k) - velgrad(jj,ii,i,j,k)) / 2.0;
+        }
+      }
+//
+//      SLIP ROTATION RATE
+//
+
+      for (int jj = 1; jj <= 3; jj++) {
+        for (int ii = 1; ii <= 3; ii++) {
+          aa(ii,jj) = ag(ii,jj,i,j,k);
+          distor(ii,jj) = 0.0;
+        }
+      }
+
+      for (int is = 1; is <= nsyst(iph); is++) {
+        for (int ii = 1; ii <= 3; ii++) {
+            dnsa(ii) = 0.0;
+            dbsa(ii) = 0.0;
+          for (int jj = 1; jj <= 3; jj++) {
+            dnsa(ii) += aa(ii,jj) * dnca(jj,is,iph);
+            dbsa(ii) += aa(ii,jj) * dbca(jj,is,iph);
+          }
+        }
+
+        for (int ii = 1; ii <= 3; ii++) {
+          for (int jj = 1; jj <= 3; jj++) {
+            distor(ii,jj) += dbsa(ii) * dnsa(jj) * gamdot(is,i,j,k);
+          }
+        }
+      } // end for is
+
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          rotslip(ii,jj) = (distor(ii,jj)-distor(jj,ii)) / 2.0;
+        }
+      }
+
+//
+//      AVERAGE ROTATION RATE
+//
+      loc_reduce.array[0] += SQRT(POW2(rotslip(3,2)) + POW2(rotslip(1,3)) + POW2(rotslip(2,1))) * wgtc(i,j,k);
+      loc_reduce.array[1] += SQRT(POW2(rotloc(3,2)) + POW2(rotloc(1,3)) + POW2(rotloc(2,1))) * wgtc(i,j,k);
+
+//
+//      TOTAL ROTATION
+//
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          rot(ii,jj) = (tomtot(ii,jj)+rotloc(ii,jj)-rotslip(ii,jj)) * tdot;
+        }
+      }
+
+//
+//      REORIENTATION
+//
+      orient(aa.pointer(), rot.pointer());
+
+//
+//      UPDATE ORIENTATION MATRIX
+//
+      for (int ii = 1; ii <= 3; ii++) {
+        for (int jj = 1; jj <= 3; jj++) {
+          ag(ii,jj,i,j,k) = aa(ii,jj);
+        }
+      }
+
+    } // end if (igas(iph) == 0)
+
+  }, all_reduce);
+  Kokkos::fence(); // needed to prevent race condition
+
+  MPI_Allreduce(MPI_IN_PLACE, all_reduce.array, all_reduce.size, MPI_REAL_T, MPI_SUM, mpi_comm);
+
+  real_t rslbar = all_reduce.array[0];
+  real_t rlcbar = all_reduce.array[1];
+
+#ifndef ABSOLUTE_NO_OUTPUT
+  if (0 == my_rank) {
+    printf("\n");
+    printf(" AVERAGE PLASTIC ROTATION = %24.14E\n", rslbar);
+    printf(" AVERAGE LOCAL ROTATION = %24.14E\n", rlcbar);
+  }
+#endif
+
+}
diff --git a/src/LS-EVPFFT/src/update_schmid.cpp b/src/LS-EVPFFT/src/update_schmid.cpp
new file mode 100644
index 000000000..67a8226ad
--- /dev/null
+++ b/src/LS-EVPFFT/src/update_schmid.cpp
@@ -0,0 +1,140 @@
+#include <iostream>
+#include <stdio.h>
+#include "evpfft.h"
+#include "utilities.h"
+#include "Profiler.h"
+#include "math_functions.h"
+
+void EVPFFT::update_schmid()
+{
+  Profiler profiler(__FUNCTION__);
+
+  FOR_ALL_CLASS(kk, 1, npts3+1,
+                jj, 1, npts2+1,
+                ii, 1, npts1+1, {
+
+    for (int is = 1; is <= NSYSMX; is++) {
+      for (int j = 1; j <= 5; j++) {
+        sch(j,is,ii,jj,kk)    = 0.0;
+#ifdef NON_SCHMID_EFFECTS
+        schnon(j,is,ii,jj,kk) = 0.0;
+#endif
+      } // end for j
+    } // end for is
+  });
+
+
+  FOR_ALL_CLASS(kk, 1, npts3+1,
+                jj, 1, npts2+1,
+                ii, 1, npts1+1, {
+
+    // thread private arrays
+    real_t aux5_[5];
+    real_t aux33_[3*3]; 
+    real_t aux33r_[3*3];
+    real_t aux33rsym_[3*3];
+    real_t aa_[3*3];
+    real_t aainv_[3*3];
+
+    // create views of thread private arrays
+    ViewMatrixTypeReal aux5    (aux5_,5);
+    ViewMatrixTypeReal aux33   (aux33_,3,3);
+    ViewMatrixTypeReal aux33r  (aux33r_,3,3);
+    ViewMatrixTypeReal aux33rsym  (aux33rsym_,3,3);
+    ViewMatrixTypeReal aa  (aa_,3,3);
+    ViewMatrixTypeReal aainv  (aainv_,3,3);
+    
+
+    int jph;
+    jph = jphase(ii,jj,kk);
+
+    if (igas(jph) == 0) {
+
+      for (int i = 1; i <= 3; i++) {
+        for (int j = 1; j <= 3; j++) {
+          aa(i,j) = 0.0;
+          for (int k = 1; k <= 3; k++) {
+            aa(i,j) += defgrade(i,k,ii,jj,kk)*ag(k,j,ii,jj,kk);
+          } // end for k
+          aainv(i,j) = aa(i,j);
+        } // end for j
+      } // end for i
+
+      invert_matrix <3> (aainv.pointer());
+
+      for (int is = 1; is <= nsyst(jph); is++) {
+
+//        for (int j = 1; j <= 5; j++) {
+//          aux5(j) = schca(j,is,jph);
+//        } // end for j
+//
+//        cb.chg_basis_1(aux5.pointer(), aux33.pointer(), 1, 5, cb.B_basis_device_pointer());
+
+        for (int i = 1; i <= 3; i++) {
+          for (int j = 1; j <= 3; j++) {
+            aux33(i,j) = dbca(i,is,jph)*dnca(j,is,jph);
+          } // end for j
+        } // end for i
+        
+        for (int i = 1; i <= 3; i++) {
+          for (int j = 1; j <= 3; j++) {
+            aux33r(i,j) = 0.0;
+            for (int i1 = 1; i1 <= 3; i1++) {
+              for (int j1 = 1; j1 <= 3; j1++) {
+                aux33r(i,j) += aa(i,i1)*
+                               aux33(i1,j1)*aainv(j1,j);
+              } // end for j1
+            } // end for i1
+          } // end for j
+        } // end for i
+
+        for (int i = 1; i <= 3; i++) {
+          for (int j = 1; j <= 3; j++) {
+            aux33rsym(i,j) = 0.5*(aux33r(i,j) + aux33r(j,i));
+          } // end for j
+        } // end for i
+
+        cb.chg_basis_2(aux5.pointer(), aux33rsym.pointer(), 2, 5, cb.B_basis_device_pointer());
+
+        for (int j = 1; j <= 5; j++) {
+          sch(j,is,ii,jj,kk) = aux5(j);
+        } // end for j
+
+#ifdef NON_SCHMID_EFFECTS
+          for (int j = 1; j <= 5; j++) {
+            aux5(j) = schcnon(j,is,jph);
+          } // end for j
+
+          cb.chg_basis_1(aux5.pointer(), aux33.pointer(), 1, 5, cb.B_basis_device_pointer());
+
+          for (int i = 1; i <= 3; i++) {
+            for (int j = 1; j <= 3; j++) {
+              aux33r(i,j) = 0.0;
+              for (int i1 = 1; i1 <= 3; i1++) {
+                for (int j1 = 1; j1 <= 3; j1++) {
+                  aux33r(i,j) += aa(i,i1)*
+                                 aux33(i1,j1)*aainv(j1,j);
+                } // end for j1
+              } // end for i1
+            } // end for j
+          } // end for i
+
+          for (int i = 1; i <= 3; i++) {
+            for (int j = 1; j <= 3; j++) {
+              aux33rsym(i,j) = 0.5*(aux33r(i,j) + aux33r(j,i));
+            } // end for j
+          } // end for i
+
+          cb.chg_basis_2(aux5.pointer(), aux33rsym.pointer(), 2, 5, cb.B_basis_device_pointer());
+          
+          for (int j = 1; j <= 5; j++) {
+            schnon(j,is,ii,jj,kk) = aux5(j);
+          } // end for j
+#endif
+
+      } // end for is 
+    }  // end if (igas(jph) == 0)
+
+  }); // end FOR_ALL_CLASS
+
+}
diff --git a/src/LS-EVPFFT/src/utilities.cpp b/src/LS-EVPFFT/src/utilities.cpp
new file mode 100644
index 000000000..209cd3b11
--- /dev/null
+++ b/src/LS-EVPFFT/src/utilities.cpp
@@ -0,0 +1,71 @@
+#include <stdio.h>
+#include <iostream>
+#include "utilities.h"
+
+int get_mpi_comm_rank(const MPI_Comm mpi_comm)
+{
+    int rank;
+    MPI_Comm_rank(mpi_comm, &rank);
+    return rank;
+}
+
+int get_mpi_comm_size(const MPI_Comm mpi_comm)
+{
+    int size;
+    MPI_Comm_size(mpi_comm, &size);
+    return size;
+}
+
+// FOR DEBUGING **************************
+void check_that_file_is_open(const std::ifstream & filestream, const char* filename)
+{
+  if (!filestream)
+  {
+    throw std::runtime_error(std::string("Can't open ") + filename);
+  }
+}
+
+void check_that_file_is_open(FILE *file_ptr, const char* filename)
+{
+  if (filename == NULL)
+  {
+    throw std::runtime_error(std::string("Can't open ") + filename);
+  }
+}
+
+
+void print_array_to_file(real_t *ptr, int N, int my_rank, const char* filename)
+{
+  std::string filename_per_rank = "rank";
+  filename_per_rank += std::to_string(my_rank);
+  filename_per_rank += "_";
+  filename_per_rank += filename;
+
+  FILE *my_file = NULL;
+  my_file = fopen (filename_per_rank.c_str(), "w");
+
+  for (int i = 0; i < N; i++) {
+    fprintf(my_file, "%24.14E\n", ptr[i]);
+  }
+
+  fclose(my_file);
+}
+
+
+
+void print_array_to_file(int *ptr, int N, int my_rank, const char* filename)
+{
+  std::string filename_per_rank = "rank";
+  filename_per_rank += std::to_string(my_rank);
+  filename_per_rank += "_";
+  filename_per_rank += filename;
+
+  FILE *my_file = NULL;
+  my_file = fopen (filename_per_rank.c_str(), "w");
+
+  for (int i = 0; i < N; i++) {
+    fprintf(my_file, "%13d\n", ptr[i]);
+  }
+
+  fclose(my_file);
+}
diff --git a/src/LS-EVPFFT/src/utilities.h b/src/LS-EVPFFT/src/utilities.h
new file mode 100644
index 000000000..34956b873
--- /dev/null
+++ b/src/LS-EVPFFT/src/utilities.h
@@ -0,0 +1,108 @@
+#pragma once
+#include "definitions.h"
+#include <fstream>
+#include <iomanip>
+#include "mpi.h"
+
+using namespace utils;
+
+// MPI utility functions
+int get_mpi_comm_rank(const MPI_Comm mpi_comm);
+int get_mpi_comm_size(const MPI_Comm mpi_comm);
+
+// FOR DEBUGING **************************
+void check_that_file_is_open(const std::ifstream & filestream, const char* filename);
+void check_that_file_is_open(FILE *file_ptr, const char* filename);
+
+// minval
+template <typename T>
+T minval(T *ptr, int N);
+
+// maxval
+template <typename T>
+T maxval(T *ptr, int N);
+
+// sum_array
+template <typename T>
+T sum_array(T *ptr, int N);
+
+template <typename T>
+void print_array(T* ptr, size_t dim0, size_t dim1);
+
+template <typename T>
+void print_array_to_file(T *ptr, size_t dim0, size_t dim1, size_t my_rank, const char* filename);
+
+// minval
+template <typename T>
+T minval(T *ptr, int N)
+{
+  T minval_ = ptr[0];
+  for (int i = 0; i < N; i++) {
+    if (ptr[i] < minval_) {
+      minval_ = ptr[i];
+    }
+  }
+  return minval_;
+}
+
+// maxval
+template <typename T>
+T maxval(T *ptr, int N)
+{
+  T maxval_ = ptr[0];
+  for (int i = 0; i < N; i++) {
+    if (ptr[i] > maxval_) {
+      maxval_ = ptr[i];
+    }
+  }
+  return maxval_;
+}
+
+// sum_array
+template <typename T>
+T sum_array(T *ptr, int N)
+{
+  T sum_ = 0.0;
+  for (int i = 0; i < N; i++) {
+    sum_ += ptr[i];
+  }
+  return sum_;
+}
+
+
+template <typename T>
+void print_array(T* ptr, size_t dim0, size_t dim1)
+{
+  int p = 4;
+  std::cout << std::setprecision(p) << std::scientific;
+  for (size_t i = 0; i < dim0; i++) {
+    for (size_t j = 0; j < dim1; j++) {
+      std::cout << "  " << ptr[j+(i*dim1)];
+    }
+    std::cout << std::endl;
+  }
+}
+
+template <typename T>
+void print_array_to_file(T *ptr, size_t dim0, size_t dim1, size_t my_rank, const char* filename)
+{
+  std::string filename_per_rank = "rank";
+  filename_per_rank += std::to_string(my_rank);
+  filename_per_rank += "_";
+  filename_per_rank += filename;
+
+  std::ofstream myfile;
+  myfile.open(filename_per_rank);
+
+  int p = 4;
+  myfile << std::setprecision(p) << std::scientific;
+  for (size_t i = 0; i < dim0; i++) {
+    for (size_t j = 0; j < dim1; j++) {
+      myfile << "  " << ptr[j+(i*dim1)];
+    }
+    myfile << std::endl;
+  }
+
+  myfile.close();
+}
+
diff --git a/src/LS-EVPFFT/src/vm.cpp b/src/LS-EVPFFT/src/vm.cpp
new file mode 100644
index 000000000..b56afdc70
--- /dev/null
+++ b/src/LS-EVPFFT/src/vm.cpp
@@ -0,0 +1,43 @@
+#include "vm.h"
+
+
+real_t vm(real_t *dtensor_)
+{
+  /* Returens VonMises strain */
+
+  ViewMatrixTypeReal dtensor(dtensor_,3,3);
+  MatrixTypeRealHost dt(3,3);
+
+  real_t trace;
+  real_t result;
+
+  trace = dtensor(1,1) + dtensor(2,2) + dtensor(3,3);
+
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      dt(i,j) = (dtensor(i,j)+dtensor(j,i))/2.0 - 
+                (i/j)*(j/i)*trace/3.0;
+    }
+  }
+
+  result = 0.0;
+  for (int j = 1; j <= 3; j++) {
+    for (int i = 1; i <= 3; i++) {
+      result += POW2(dt(i,j));
+    }
+  }
+  result = sqrt(2.0/3.0*result);
+
+  return result;
+}
+
+real_t vm_stress(real_t *stress) {
+    /* Returns VonMises stress */
+
+    real_t von_mises = sqrtf((((stress[0] - stress[4])*(stress[0] - stress[4])) 
+                       +        ((stress[4] - stress[8])*(stress[4] - stress[8]))
+                       +        ((stress[8] - stress[0])*(stress[8] - stress[0]))
+                       +   6*(pow(stress[1], 2.0)+pow(stress[5],2.0)+pow(stress[2],2.0)))/2.0);
+    return von_mises;
+}
+
diff --git a/src/LS-EVPFFT/src/vm.h b/src/LS-EVPFFT/src/vm.h
new file mode 100644
index 000000000..ee7dc7fd0
--- /dev/null
+++ b/src/LS-EVPFFT/src/vm.h
@@ -0,0 +1,6 @@
+#pragma once
+
+#include "definitions.h"
+
+real_t vm(real_t *dtensor_);
+real_t vm_stress(real_t *stress);
diff --git a/src/LS-EVPFFT/src/voigt.cpp b/src/LS-EVPFFT/src/voigt.cpp
new file mode 100644
index 000000000..027dd5369
--- /dev/null
+++ b/src/LS-EVPFFT/src/voigt.cpp
@@ -0,0 +1,94 @@
+#include "voigt.h"
+
+void voigt(real_t *c2_voigt_, real_t *c4_voigt_, int iopt)
+{
+
+  ViewMatrixTypeReal c2_voigt(c2_voigt_, 6,6);
+  ViewMatrixTypeReal c4_voigt(c4_voigt_, 3,3,3,3);
+
+  int ijv_[6*2] = {1,2,3,2,1,1,1,2,3,3,3,2};
+  real_t f_[6*6];
+  ViewMatrixTypeInt  ijv(ijv_,6,2);
+  ViewMatrixTypeReal f(f_, 6,6);
+  int i1;
+  int i2;
+  int j1;
+  int j2;
+  
+  //
+  if(iopt == 1) {
+    for (int i = 1; i <= 6; i++) {
+      i1 = ijv(i,1);
+      i2 = ijv(i,2);
+      for (int j = 1; j <= 6; j++) {
+        j1 = ijv(j,1);
+        j2 = ijv(j,2);
+        c4_voigt(i1,i2,j1,j2) = c2_voigt(i,j);
+        c4_voigt(i2,i1,j1,j2) = c2_voigt(i,j);
+        c4_voigt(i1,i2,j2,j1) = c2_voigt(i,j);
+        c4_voigt(i2,i1,j2,j1) = c2_voigt(i,j);
+      }
+    }
+  }  // end if (iopt == 1)
+  
+  //
+  if (iopt == 2) {
+    for (int i = 1; i <= 6; i++) {
+      i1 = ijv(i,1);
+      i2 = ijv(i,2);
+      for (int j = 1; j <= 6; j++) {
+        j1 = ijv(j,1);
+        j2 = ijv(j,2);
+        c2_voigt(i,j) = c4_voigt(i1,i2,j1,j2);
+      }
+    }
+  } // end if (iopt == 2)
+  
+  //
+  if (iopt == 3) {
+    for (int i = 1; i <= 6; i++) {
+      for (int j = 1; j <= 6; j++) {
+        f(i,j) = 1.0;
+        if (i > 3) f(i,j) = 0.5;
+        if (j > 3) f(i,j) = 0.5*f(i,j);
+      }
+    }
+    
+    for (int i = 1; i <= 6; i++) {
+      i1 = ijv(i,1);
+      i2 = ijv(i,2);
+      for (int j = 1; j <= 6; j++) {
+        j1 = ijv(j,1);
+        j2 = ijv(j,2);
+        c4_voigt(i1,i2,j1,j2) = f(i,j)*c2_voigt(i,j);
+        c4_voigt(i2,i1,j1,j2) = f(i,j)*c2_voigt(i,j);
+        c4_voigt(i1,i2,j2,j1) = f(i,j)*c2_voigt(i,j);
+        c4_voigt(i2,i1,j2,j1) = f(i,j)*c2_voigt(i,j);
+      }
+    }
+  } // end if (iopt == 3)
+  
+  //
+  if (iopt == 4) {
+    for (int i = 1; i <= 6; i++) {
+      i1 = ijv(i,1);
+      i2 = ijv(i,2);
+      for (int j = 1; j <= 6; j++) {
+        j1 = ijv(j,1);
+        j2 = ijv(j,2);
+        if (i < 3) {
+          c4_voigt(i1,i2,j1,j2) = c2_voigt(i,j);
+          c4_voigt(i2,i1,j1,j2) = c2_voigt(i,j);
+          c4_voigt(i1,i2,j2,j1) = c2_voigt(i,j);
+          c4_voigt(i2,i1,j2,j1) = c2_voigt(i,j);
+        } else {
+          c4_voigt(i1,i2,j1,j2) =  c2_voigt(i,j);
+          c4_voigt(i2,i1,j1,j2) = -c2_voigt(i,j);
+          c4_voigt(i1,i2,j2,j1) =  c2_voigt(i,j);
+          c4_voigt(i2,i1,j2,j1) = -c2_voigt(i,j);
+        } // end if (i < 3)
+      } // end for j
+    } // end for i
+  } // end if (iopt == 4)
+  
+}
diff --git a/src/LS-EVPFFT/src/voigt.h b/src/LS-EVPFFT/src/voigt.h
new file mode 100644
index 000000000..000fc5ff4
--- /dev/null
+++ b/src/LS-EVPFFT/src/voigt.h
@@ -0,0 +1,7 @@
+#pragma once
+
+#include "definitions.h"
+
+using namespace utils;
+
+void voigt(real_t *c2_voigt_, real_t *c4_voigt_, int iopt);
diff --git a/src/LS-EVPFFT/src/vpsc_input.cpp b/src/LS-EVPFFT/src/vpsc_input.cpp
new file mode 100644
index 000000000..9328efdd9
--- /dev/null
+++ b/src/LS-EVPFFT/src/vpsc_input.cpp
@@ -0,0 +1,341 @@
+// ********************************************************************
+//     SUBROUTINE VPSC_INPUT      --->      VERSION 31/jan/99
+//
+//     READS CHARACTERISTICS OF THE RUN: # OF PHASES, NAMES OF INPUT FILES,
+//     DEFORMATION TO BE IMPOSED, CONVERGENCE PARAMETERS, ETC.
+//     READS SINGLE CRYSTAL PROPERTIES: DEFORMATION MODES, CRSS, HARDENING
+//     READS CRYSTAL AND MORPHOLOGIC TEXTURES.
+//     INITIALIZES ARRAYS REQUIRED TO RUN VPSC.
+//     OPENS AND CLOSES INPUT FILES.   OPENS OUTPUT FILES.
+//
+//     MODIFIED 21/07/98 by CNT:
+//     INITIALIZATION RELATED TO 'ELEMENTS' IS DONE INSIDE A SINGLE BLOCK.
+// *****************************************************************************
+
+
+
+#include <iostream>
+#include <stdio.h>
+#include <assert.h>
+#include <fstream>
+#include <cmath>
+#include <string>
+#include <fstream>
+
+#include "evpfft.h"
+#include "definitions.h"
+#include "utilities.h"
+
+#include "math_functions.h"
+
+using namespace utils;
+
+
+void EVPFFT::vpsc_input()
+{
+  real_t dum;
+  real_t dummy;
+  real_t xmi;
+  real_t sigma;
+  int iph;
+
+  std::ifstream ur0;
+  std::string filetext;
+  std::string filecryspl;
+  std::string filecrysel;
+  std::string prosa;
+  std::string filethermo;
+
+// *********   INITIALIZATION BLOCK   ***************************
+  
+  ur0.open(cmd.input_filename);
+  check_that_file_is_open(ur0, cmd.input_filename.c_str());
+
+  ur0 >> NPHMX >> NMODMX >> NTWMMX >> NSYSMX; CLEAR_LINE(ur0);
+  ur0 >> npts1_g >> npts2_g >> npts3_g; CLEAR_LINE(ur0);
+
+  allocate_memory();
+
+  ur0 >> nph; CLEAR_LINE(ur0);
+
+  // THE FOLLOWING REQUIRED FOR SEVERAL ROUTINES WITH 'do iph=iphbot,iphtop'
+  iphbot = 1;
+  iphtop = nph;
+
+  // RVE DIMENSIONS
+  ur0 >> delt(1) >> delt(2) >> delt(3); CLEAR_LINE(ur0);
+  deltvol3 = pow(delt(1) * delt(2) * delt(3), 1.0/3.0);
+
+  ur0 >> prosa; CLEAR_LINE(ur0);
+  ur0 >> filetext; CLEAR_LINE(ur0);
+
+// ***************************************************************************
+//     LOOP OVER PHASES
+// ***************************************************************************
+  for (int iph = 1; iph <= nph; iph++) {
+
+    ur0 >> prosa; CLEAR_LINE(ur0);
+    ur0 >> igas.host(iph); CLEAR_LINE(ur0);
+    ur0 >> prosa; CLEAR_LINE(ur0);
+    ur0 >> filecryspl; CLEAR_LINE(ur0);
+    ur0 >> filecrysel; CLEAR_LINE(ur0);
+    
+    // READS SLIP AND TWINNING MODES FOR THE PHASE
+
+    if (igas.host(iph) == 0) {
+      data_crystal(iph, filecryspl);
+      data_crystal_elast(iph, filecrysel);
+    } // end if (igas.host(iph) == 0)
+
+  } // end for (int iph = 1; iph <= nph; iph++)
+ 
+  // update device (these variables were modified in data_crystal)
+  igas.update_device();
+  nsyst.update_device();
+  gamd0.update_device();
+  nrs.update_device();
+  tau.update_device();
+  thet.update_device();
+  dnca.update_device();
+  dbca.update_device();
+  schca.update_device();
+#ifdef NON_SCHMID_EFFECTS
+  schcnon.update_device();
+#endif
+  hard.update_device();
+
+  // READS INITIAL TEXTURE FROM FILETEXT
+  // and calculates the local elastic stiffness
+  data_grain(filetext);
+ 
+
+// ****************************************************************************
+//     READ BOUNDARY CONDITIONS ON OVERALL STRESS AND STRAIN-RATE
+// ****************************************************************************
+
+  ur0 >> prosa; CLEAR_LINE(ur0);
+  ur0 >> prosa; CLEAR_LINE(ur0);
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      ur0 >> iudot(i,j);
+    }
+    CLEAR_LINE(ur0);
+  }
+ 
+  // check that the components of iudot are correct
+  check_iudot();
+
+  CLEAR_LINE(ur0);
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      ur0 >> udot.host(i,j);
+    }
+    CLEAR_LINE(ur0);
+  }
+  //update device
+  udot.update_device();
+  Kokkos::fence();
+
+  // calculate strain-rate, rotation-rate, and dvm from udot
+  decompose_vel_grad(udot.host_pointer());
+
+  // initialize dvm used to calculate evm which is the denominatior for TOL_strain_field
+  init_dvm();
+
+  CLEAR_LINE(ur0);
+  ur0 >> iscau(1) >> iscau(6) >> iscau(5); CLEAR_LINE(ur0);
+  ur0 >> iscau(2) >> iscau(4); CLEAR_LINE(ur0);
+  ur0 >> iscau(3); CLEAR_LINE(ur0);
+
+  // check that mixed BC are correct
+  check_mixed_bc();
+
+  CLEAR_LINE(ur0);
+  ur0 >> scauchy(1,1) >> scauchy(1,2) >> scauchy(1,3); CLEAR_LINE(ur0);
+  ur0 >> scauchy(2,2) >> scauchy(2,3); CLEAR_LINE(ur0);
+  ur0 >> scauchy(3,3); CLEAR_LINE(ur0);
+
+  scauchy(3,2) = scauchy(2,3);
+  scauchy(3,1) = scauchy(1,3);
+  scauchy(2,1) = scauchy(1,2);
+
+  CLEAR_LINE(ur0);
+  ur0 >> dummy; CLEAR_LINE(ur0);
+  ur0 >> ictrl; CLEAR_LINE(ur0);
+
+  if (ictrl == -1) tdot=dummy;
+  if (ictrl == 0)  tdot=dummy/dvm;
+  if (ictrl > 0) {
+    printf("ICTRL>0 NOT IMPLEMENTED\n");
+    exit(1);
+  }
+
+  CLEAR_LINE(ur0);
+  ur0 >> nsteps; CLEAR_LINE(ur0);
+  ur0 >> error;  CLEAR_LINE(ur0);
+  ur0 >> itmax;  CLEAR_LINE(ur0);
+  ur0 >> irecover; CLEAR_LINE(ur0);
+
+  if (irecover == 1) {
+    std::ifstream f50;
+    f50.open("stress.in");
+    check_that_file_is_open(f50, "stress.in");
+  }
+
+  ur0 >> isave;   CLEAR_LINE(ur0);
+  ur0 >> iupdate; CLEAR_LINE(ur0);
+  ur0 >> iuphard; CLEAR_LINE(ur0);
+  ur0 >> iwtex;   CLEAR_LINE(ur0);
+  ur0 >> iwfields >> iwstep; CLEAR_LINE(ur0);
+
+  init_crss_voce();
+
+  ur0 >> ithermo; CLEAR_LINE(ur0);
+  if (ithermo == 1) {
+    int num_crystals = 200000; // maximum number of crystals/grains
+    eth = MatrixTypeRealHost (3,3,num_crystals);
+
+    ur0 >> filethermo; CLEAR_LINE(ur0);
+    std::ifstream f49;
+    f49.open(filethermo);
+    check_that_file_is_open(f49, filethermo.c_str());
+
+    for (int ig = 1; ig <= num_crystals; ig++) {
+      f49 >> eth(1,1,ig) >> eth(2,2,ig)
+          >> eth(3,3,ig) >> eth(2,3,ig)
+          >> eth(3,1,ig) >> eth(1,2,ig); CLEAR_LINE(ur0);
+
+      eth(3,2,ig) = eth(2,3,ig);
+      eth(1,3,ig) = eth(3,1,ig);
+      eth(2,1,ig) = eth(1,2,ig);
+    }
+
+    f49.close();
+
+  //} 
+  //else {
+  //  for (int ig = 1; ig <= num_crystals; ig++) {
+  //    for (int j = 1; j <= 3; j++) {
+  //      for (int i = 1; i <= 3; i++) {
+  //        eth(i,j,ig) = 0.0;
+  //      }
+  //    }
+  //  }
+  } // end if (ithermo == 1)
+
+  ur0.close();
+}
+
+void EVPFFT::check_iudot()
+{
+  if (iudot(1,1) + iudot(2,2) + iudot(3,3) == 2) {
+    printf("\nCHECK DIAGONAL BOUNDARY CONDITIONS IUDOT\n");
+    printf("CANNOT ENFORCE ONLY TWO DEVIATORIC COMPONENTS\n");
+    exit(1);
+  }
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      if ((i != j) && (iudot(i,j) + iudot(j,i) == 0)) {
+        printf("CHECK OFF-DIAGONAL BOUNDARY CONDITIONS IUDOT\n");
+        exit(1);
+      }
+    }
+  }
+}
+
+void EVPFFT::decompose_vel_grad(real_t* vel_grad_ptr)
+{
+//     SYMMETRIC STRAIN-RATE, ANTISYMMETRIC ROTATION-RATE TENSORS
+//     AND INDICES OF IMPOSED COMPONENTS
+
+  ViewMatrixTypeReal vel_grad(vel_grad_ptr,3,3);
+
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      dsim(i,j) = (vel_grad(i,j) + vel_grad(j,i)) / 2.0;
+      tomtot.host(i,j) = (vel_grad(i,j) - vel_grad(j,i)) / 2.0;
+    }
+  }
+  // update device
+  tomtot.update_device();
+  Kokkos::fence();
+}
+
+void EVPFFT::init_dvm()
+{
+//     WRITES STRAIN RATE DSIM(I,J) IN b-BASIS AS A 5-DIM VECTOR DBAR(K)
+  MatrixTypeRealHost dbar5 (5);
+  MatrixTypeRealHost dbar6 (6);
+  MatrixTypeRealHost dsimdev (3,3);
+  cb.chg_basis_2(dbar6.pointer(), dsim.pointer(), 2, 6, cb.B_basis_host_pointer());
+
+  for (int i = 1; i <= 5; i++) {
+    dbar5(i) = dbar6(i);
+  }
+
+  cb.chg_basis_1(dbar5.pointer(), dsimdev.pointer(), 1, 5, cb.B_basis_host_pointer());
+
+  dvm = 0.0;
+  for (int i = 1; i <= 3; i++) {
+    for (int j = 1; j <= 3; j++) {
+      dvm += POW2(dsimdev(i,j));
+    }
+  }
+  dvm = sqrt(2.0/3.0 * dvm);
+}
+
+void EVPFFT::check_mixed_bc()
+{
+  for (int i = 1; i <= 3; i++) {
+    idsim(i) = iudot(i,i);
+  }
+
+  idsim(4) = 0;
+  if ((iudot(2,3) == 1) && (iudot(3,2) == 1)) idsim(4) = 1;
+  idsim(5) = 0;
+  if ((iudot(1,3) == 1) && (iudot(3,1) == 1)) idsim(5) = 1;
+  idsim(6) = 0;
+  if ((iudot(1,2) == 1) && (iudot(2,1) == 1)) idsim(6) = 1;
+
+  for (int i = 1; i <= 6; i++) {
+    if ((iscau(i) * idsim(i) != 0) || (iscau(i) + idsim(i) != 1)) {
+      printf("CHECK BOUNDARY CONDITS ON STRAIN-RATE AND STRESS\n");
+      printf("IDSIM = ");
+      for (int j = 1; j <= 6; j++) printf("%d ", idsim(j));
+      printf("\n");
+
+      printf("ISCAU = ");
+      for (int j = 1; j <= 6; j++) printf("%d ", iscau(j));
+      printf("\n");
+
+      exit(1);
+    }
+  }
+}
+
+void EVPFFT::init_crss_voce()
+{
+  for (int kk = 1; kk <= npts3; kk++) {
+    for (int jj = 1; jj <= npts2; jj++) {
+      for (int ii = 1; ii <= npts1; ii++) {
+
+        int iph = jphase.host(ii,jj,kk);
+
+        if (igas.host(iph) == 0) {
+          for (int i = 1; i <= nsyst.host(iph); i++) {
+            crss.host(i,1,ii,jj,kk) = tau.host(i,1,iph);
+            crss.host(i,2,ii,jj,kk) = tau.host(i,2,iph);
+            //trialtau.host(i,1,ii,jj,kk)  = tau.host(i,1,iph);
+            //trialtau.host(i,2,ii,jj,kk)  = tau.host(i,2,iph);
+          }
+        } // end if (igas.host(iph) == 0)
+
+      }  // end for ii
+    }  // end for jj
+  } // end for kk
+
+  crss.update_device();
+}
+
diff --git a/src/LS-EVPFFT/src/write_macro_state.cpp b/src/LS-EVPFFT/src/write_macro_state.cpp
new file mode 100644
index 000000000..ee13dc2d4
--- /dev/null
+++ b/src/LS-EVPFFT/src/write_macro_state.cpp
@@ -0,0 +1,60 @@
+#include "evpfft.h"
+#include "Profiler.h"
+
+void EVPFFT::write_macro_state()
+{
+  Profiler profiler(__FUNCTION__);
+
+  if (my_rank == 0) {
+    fprintf(ofile_mgr.vm_file.get(), "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e\n",
+            evm,evmp,dvm,dvmp,svm,svm1);
+
+#if 0
+    fprintf(str_str_file, 
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e\n",
+            disgradmacro.host(1,1),disgradmacro.host(2,2),
+            disgradmacro.host(3,3),disgradmacro.host(2,3),
+            disgradmacro.host(1,3),disgradmacro.host(1,2),
+            scauav(1,1),scauav(2,2),scauav(3,3),
+            scauav(2,3),scauav(1,3),scauav(1,2),
+            epav(1,1),epav(2,2),epav(3,3),
+            epav(2,3),epav(1,3),epav(1,2),
+            velgradmacro.host(1,1),velgradmacro.host(2,2),velgradmacro.host(3,3),
+            velgradmacro.host(2,3),velgradmacro.host(1,3),velgradmacro.host(1,2),
+            edotpav(1,1),edotpav(2,2),edotpav(3,3),
+            edotpav(2,3),edotpav(1,3),edotpav(1,2),
+            evm,evmp,dvm,dvmp,svm);
+#endif
+    // writing a line in str_str_file was split into multiple writes to avoid
+    // some strange error I cant figure out.
+    fprintf(ofile_mgr.str_str_file.get(),
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,",
+            disgradmacro.host(1,1),disgradmacro.host(2,2),
+            disgradmacro.host(3,3),disgradmacro.host(2,3),
+            disgradmacro.host(1,3),disgradmacro.host(1,2),
+            scauav(1,1),scauav(2,2),scauav(3,3),
+            scauav(2,3),scauav(1,3),scauav(1,2) );
+
+    fprintf(ofile_mgr.str_str_file.get(),
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,",
+            epav(1,1),epav(2,2),epav(3,3),
+            epav(2,3),epav(1,3),epav(1,2),
+            velgradmacro.host(1,1),velgradmacro.host(2,2),velgradmacro.host(3,3),
+            velgradmacro.host(2,3),velgradmacro.host(1,3),velgradmacro.host(1,2) );
+
+    fprintf(ofile_mgr.str_str_file.get(),
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,%011.4e,"
+            "%011.4e,%011.4e,%011.4e,%011.4e,%011.4e\n",
+            edotpav(1,1),edotpav(2,2),edotpav(3,3),
+            edotpav(2,3),edotpav(1,3),edotpav(1,2),
+            evm,evmp,dvm,dvmp,svm );
+  } // end if (my_rank == 0)
+  return;
+}
diff --git a/src/LS-EVPFFT/src/write_micro_state.cpp b/src/LS-EVPFFT/src/write_micro_state.cpp
new file mode 100644
index 000000000..26dac7640
--- /dev/null
+++ b/src/LS-EVPFFT/src/write_micro_state.cpp
@@ -0,0 +1,306 @@
+#include "evpfft.h"
+#include "XdmfUniformGridWriter.h"
+#include "math_functions.h"
+#include <stdio.h>
+#include <array>
+#include "Profiler.h"
+#include "vm.h"
+
+template <typename T>
+void write_distinct_components_of_3x3xN1xN2xN3(T *data_ptr, int npts1, int npts2, int npts3,
+  const char *data_name, const char* group_name, std::shared_ptr<MicroOutputWriter<MPI_ORDER_FORTRAN>> micro_writer,
+  XdmfUniformGridWriter &xdmf_writer, const char* hdf5_filename, int my_rank);
+
+
+template <typename T>
+void write_distinct_components_of_3x3xN1xN2xN3(T *data_ptr, int npts1, int npts2, int npts3,
+  const char *data_name, const char* group_name, std::shared_ptr<MicroOutputWriter<MPI_ORDER_FORTRAN>> micro_writer,
+  XdmfUniformGridWriter &xdmf_writer, const char* hdf5_filename, int my_rank)
+{
+  /* This function assumes data_ptr has a Fortran Array Order */
+
+  ViewFMatrix <T> data_view (data_ptr,3,3,npts1,npts2,npts3);
+  FMatrix <T> buffer (npts1,npts2,npts3);
+  char dataset_name [50];
+  char dataset_full_path [200];
+
+  // for xdmf_writer
+  char aLocation [200];
+  std::string aNumberType;
+  std::string aPrecision;
+  xdmf_writer.deduce_attribute_number_type <T> (data_ptr, aNumberType, aPrecision);
+
+  for (int ii = 1; ii <= 3; ii++) {
+    for (int jj = 1; jj <= 3; jj++) {
+
+      if (ii <= jj) {
+        
+        for (int k = 1; k <= npts3; k++) {
+          for (int j = 1; j <= npts2; j++) {
+            for (int i = 1; i <= npts1; i++) {         
+              buffer(i,j,k) = data_view(ii,jj,i,j,k);
+            } // end for k
+          } // end for j
+        } // end for i
+
+        sprintf (dataset_name, "%s_%d%d", data_name, ii, jj);        
+        sprintf (dataset_full_path, "%s/%s", group_name, dataset_name);
+        micro_writer->write <T> (dataset_full_path, buffer.pointer());
+
+        if (0 == my_rank) {
+          sprintf (aLocation, "%s:%s", hdf5_filename, dataset_full_path);
+          xdmf_writer.write_attribute(dataset_name, "Scalar", aNumberType.c_str(), aPrecision.c_str(), aLocation);
+        }
+
+      } // end if (ii <= jj)
+
+    } // end for jj
+  } // end for ii
+}
+
+template <typename T>
+void write_N1xN2xN3(T *data_ptr, int npts1, int npts2, int npts3,
+  const char *data_name, const char* group_name, std::shared_ptr<MicroOutputWriter<MPI_ORDER_FORTRAN>> micro_writer,
+  XdmfUniformGridWriter &xdmf_writer, const char* hdf5_filename, int my_rank)
+{
+  /* This function assumes data_ptr has a Fortran Array Order */
+
+  ViewFMatrix <T> data_view (data_ptr,npts1,npts2,npts3);
+  char dataset_full_path [200];
+  std::string dataset_name = data_name;
+
+  // for xdmf_writer
+  char aLocation [200];
+  std::string aNumberType;
+  std::string aPrecision;
+  xdmf_writer.deduce_attribute_number_type <T> (data_ptr, aNumberType, aPrecision);
+
+  sprintf (dataset_full_path, "%s/%s", group_name, dataset_name.c_str());
+  micro_writer->write <T> (dataset_full_path, data_ptr);
+
+  if (0 == my_rank) {
+    sprintf (aLocation, "%s:%s", hdf5_filename, dataset_full_path);
+    xdmf_writer.write_attribute(dataset_name.c_str(), "Scalar", aNumberType.c_str(), aPrecision.c_str(), aLocation);
+  }
+
+}
+
+void calculate_vm_field(MatrixTypeRealDual field, MatrixTypeRealDual vm_field, int vm_type)
+{
+ /* 
+ * vm_type 0: stress 
+ * vm_type 1: strain
+ * */
+
+  if (vm_type != 0 && vm_type != 1) {
+    throw std::invalid_argument("vm_type must be 0 or 1");
+  }
+
+  int npts1 = field.dims(3);
+  int npts2 = field.dims(4);
+  int npts3 = field.dims(5);
+
+  FOR_ALL(
+          k, 1, npts3+1,
+          j, 1, npts2+1,
+          i, 1, npts1+1, {
+
+    real_t aux_[3*3];
+    ViewMatrixTypeReal aux(aux_,3,3);
+
+    for (int ii = 1; ii <= 3; ii++) {
+      for (int jj = 1; jj <= 3; jj++) {
+        aux(ii,jj) = field(ii,jj,i,j,k);
+      }
+    }
+    
+    if (vm_type == 0) {
+      vm_field(i,j,k) = vm_stress(aux.pointer());
+    }
+    else if (vm_type == 1) {
+      vm_field(i,j,k) = vm(aux.pointer());
+    }
+
+  }); // end FOR_ALL
+  Kokkos::fence();
+
+  vm_field.update_host();
+}
+
+void EVPFFT::calculate_eel(MatrixTypeRealDual &eel)
+{
+  FOR_ALL_CLASS(k, 1, npts3+1,
+          j, 1, npts2+1,
+          i, 1, npts1+1, {
+
+      int jph;
+      real_t cg66aux_ [6*6];
+      real_t cinvg_ [3*3*3*3];
+      ViewMatrixTypeReal cg66aux(cg66aux_,6,6);
+      ViewMatrixTypeReal cinvg(cinvg_,3,3,3,3);
+
+      jph = jphase(i,j,k);
+
+      if (igas(jph)  == 0) {
+
+        for (int jj = 1; jj <= 6; jj++) {
+          for (int ii = 1; ii <= 6; ii++) {
+            cg66aux(ii,jj) = cg66(ii,jj,i,j,k);
+          }
+        }
+
+        invert_matrix <6> (cg66aux.pointer());
+          
+        cb.chg_basis_3(cg66aux.pointer(), cinvg.pointer(), 3, 6, cb.B_basis_device_pointer());
+
+        for (int jj = 1; jj <= 3; jj++) {
+          for (int ii = 1; ii <= 3; ii++) {
+            eel(ii,jj,i,j,k) = 0.0;
+          }
+        }
+
+        for (int kk = 1; kk <= 3; kk++) {
+          for (int ll = 1; ll <= 3; ll++) {
+            for (int jj = 1; jj <= 3; jj++) {
+              for (int ii = 1; ii <= 3; ii++) {
+                eel(ii,jj,i,j,k) += cinvg(ii,jj,kk,ll) * sg(kk,ll,i,j,k);
+              }
+            }
+          }
+        }
+
+      } else { // igas else
+
+        for (int jj = 1; jj <= 3; jj++) {
+          for (int ii = 1; ii <= 3; ii++) {
+            eel(ii,jj,i,j,k) = -100.0;
+          }
+        }
+
+      } // end if (igas(jph)  == 0)
+
+  }); // end FOR_ALL_CLASS
+  Kokkos::fence();
+
+  eel.update_host();
+}
+
+void symmetrize(
+  const MatrixTypeRealDual& unsymmetrized_field,
+  MatrixTypeRealDual& symmetrized_field)
+{
+  size_t npts1 = unsymmetrized_field.dims(3);
+  size_t npts2 = unsymmetrized_field.dims(4);
+  size_t npts3 = unsymmetrized_field.dims(5);
+
+  FOR_ALL(
+          k, 1, npts3+1,
+          j, 1, npts2+1,
+          i, 1, npts1+1, {
+
+       for (int ii = 1; ii <= 3; ii++) {
+          for (int jj = 1; jj <= 3; jj++) {
+            symmetrized_field(ii,jj,i,j,k) = 
+              0.5*(unsymmetrized_field(ii,jj,i,j,k) + unsymmetrized_field(jj,ii,i,j,k));
+          }   
+        }
+
+  }); // end FOR_ALL
+  Kokkos::fence();
+
+  symmetrized_field.update_host(); 
+  return; 
+}
+
+void EVPFFT::write_micro_state()
+{
+  Profiler profiler(__FUNCTION__);
+
+  // xdmf file stuff
+  XdmfUniformGridWriter xdmf_writer(npts3_g, npts2_g, npts1_g);
+  if (0 == my_rank) {
+    char buffer [100];
+
+    sprintf (buffer, "micro_state_timestep_%d.xdmf", imicro); 
+    xdmf_writer.open_file(buffer);
+
+    sprintf (buffer, "TimeStep_%d", imicro);
+    xdmf_writer.write_header(buffer);
+  }
+ 
+  // create group
+  char group_name [50];
+  sprintf (group_name, "/TimeStep_%d", imicro);  
+  hid_t group_id = H5Gcreate(micro_writer->file_id_, group_name, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
+
+  // calculate elastic-strain
+  MatrixTypeRealDual eel (3,3,npts1,npts2,npts3);
+  calculate_eel(eel);
+
+  // Some arrays for results
+  MatrixTypeRealDual sym_disgrad (3,3,npts1,npts2,npts3);
+  MatrixTypeRealDual sym_edotp (3,3,npts1,npts2,npts3);
+  MatrixTypeRealDual sym_eel (3,3,npts1,npts2,npts3);
+
+  // Symmetrize some fields
+  symmetrize(disgrad, sym_disgrad);
+  symmetrize(edotp, sym_edotp);
+  symmetrize(eel, sym_eel);
+  
+  // write strain field
+  disgrad.update_host();
+  write_distinct_components_of_3x3xN1xN2xN3 <real_t> (sym_disgrad.host_pointer(), npts1, npts2, npts3, "strain",
+                               group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // write stress field
+  sg.update_host();
+  write_distinct_components_of_3x3xN1xN2xN3 <real_t> (sg.host_pointer(), npts1, npts2, npts3, "stress",
+                               group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // write plastic-strain-rate field
+  edotp.update_host();
+  write_distinct_components_of_3x3xN1xN2xN3 <real_t> (sym_edotp.host_pointer(), npts1, npts2, npts3, "pl-strain-rate",
+                               group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // write elastic-strain
+  write_distinct_components_of_3x3xN1xN2xN3 <real_t> (sym_eel.host_pointer(), npts1, npts2, npts3, "el-strain",
+                               group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // write jphase
+  write_N1xN2xN3 <int> (jphase.host_pointer(), npts1, npts2, npts3, "phase_id",
+                        group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // for writing vm fields 
+  MatrixTypeRealDual vm_field (npts1,npts2,npts3);
+
+  // calculate and write strain vm field
+  calculate_vm_field(sym_disgrad, vm_field, 1);
+  write_N1xN2xN3 <real_t> (vm_field.host_pointer(), npts1, npts2, npts3, "vm-strain",
+                        group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // calculate and write stress vm field
+  calculate_vm_field(sg, vm_field, 0);
+  write_N1xN2xN3 <real_t> (vm_field.host_pointer(), npts1, npts2, npts3, "vm-stress",
+                        group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // calculate and write plastic-strain-rate vm field
+  calculate_vm_field(sym_edotp, vm_field, 1);
+  write_N1xN2xN3 <real_t> (vm_field.host_pointer(), npts1, npts2, npts3, "vm-pl-strain-rate",
+                        group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+  // calculate and write elastic-strain vm field
+  calculate_vm_field(sym_eel, vm_field, 1);
+  write_N1xN2xN3 <real_t> (vm_field.host_pointer(), npts1, npts2, npts3, "vm-el-strain",
+                        group_name, micro_writer, xdmf_writer, micro_writer->filename_, my_rank);
+
+
+
+  // close xdmf file
+  if (0 == my_rank) {
+    xdmf_writer.write_footer();
+    xdmf_writer.close_file();
+  }
+    
+  // close group
+  herr_t status = H5Gclose(group_id);
+}
diff --git a/src/LS-EVPFFT/src/write_texture.cpp b/src/LS-EVPFFT/src/write_texture.cpp
new file mode 100644
index 000000000..4a741710f
--- /dev/null
+++ b/src/LS-EVPFFT/src/write_texture.cpp
@@ -0,0 +1,72 @@
+#include "evpfft.h"
+#include "euler.h"
+#include <memory>
+#include "Profiler.h"
+
+void EVPFFT::write_texture()
+{
+  Profiler profiler(__FUNCTION__);
+
+  if(my_rank == 0) {
+
+    ag.update_host();
+
+    MatrixTypeRealHost aux33(3,3);
+    MatrixTypeRealHost fbar(3,3);
+    int ig;
+    real_t ph, th, om, one;
+
+    std::unique_ptr <FILE, decltype(&fclose)> file_24 (fopen("tex.out", "w"), &fclose);
+
+    one = 1.0;
+
+    for (int j = 1; j <= 3; j++) {
+      for (int i = 1; i <= 3; i++) {
+        fbar(i,j) = 0.0;
+      }
+    }
+
+    for (int i = 1; i <= 3; i++) {
+      fbar(i,i) = delt(i);
+    }
+
+    fprintf(file_24.get(), "TEX.OUT from FFT\n");
+    fprintf(file_24.get(), "Formatted to plot with POLE\n");
+    for (int j = 1; j <= 3; j++) {
+      for (int i = 1; i <= 3; i++) {
+        fprintf(file_24.get(), "%7.3f", fbar(i,j));
+      }
+    }
+    fprintf(file_24.get(), "\n");
+    fprintf(file_24.get(), "B %10d\n", npts1*npts2*npts3);
+
+
+    ig = 0;
+    for (int k = 1; k <= npts3; k++) {
+      for (int j = 1; j <= npts2; j++) {
+        for (int i = 1; i <= npts1; i++) {
+
+          ig += 1;
+
+          for (int jj = 1; jj <= 3; jj++) {
+            for (int ii = 1; ii <= 3; ii++) {
+              aux33(ii,jj) = ag.host(jj,ii,i,j,k);
+            }
+          }
+
+          euler(1,ph,th,om,aux33.pointer());
+
+          fprintf(file_24.get(), 
+                  "%9.3f%9.3f%9.3f%9.3f"
+                  "%5d%5d%5d%5d%5d\n",
+                  ph,th,om,one,
+                  i,j,k,jgrain(i,j,k),jphase.host(i,j,k)
+          );
+
+        } // end for ii
+      } // end for jj
+    } // end for kk
+
+  } // end if(iwtex .eq. 1)
+
+}
diff --git a/src/Parallel-Solvers/FEA_Module.h b/src/Parallel-Solvers/FEA_Module.h
index 19fc0f555..4a0a8f45f 100644
--- a/src/Parallel-Solvers/FEA_Module.h
+++ b/src/Parallel-Solvers/FEA_Module.h
@@ -114,6 +114,8 @@ class FEA_Module{
   typedef MCONN::dual_view_type::t_dev elem_conn_array;
   typedef Kokkos::View<const GO**, array_layout, HostSpace, memory_traits> const_host_elem_conn_array;
   typedef Kokkos::View<const GO**, array_layout, device_type, memory_traits> const_elem_conn_array;
+  typedef Kokkos::View<const int**, array_layout, HostSpace, memory_traits> const_host_int_array;
+  typedef Kokkos::View<const bool**, array_layout, HostSpace, memory_traits> const_host_bool_array;
   
   //initializes memory for arrays used in the global stiffness matrix assembly
   //initialize data for boundaries of the model and storage for boundary conditions and applied loads
diff --git a/src/Parallel-Solvers/FEA_Module_Inertial.cpp b/src/Parallel-Solvers/FEA_Module_Inertial.cpp
index 2d4399573..ee7e29549 100644
--- a/src/Parallel-Solvers/FEA_Module_Inertial.cpp
+++ b/src/Parallel-Solvers/FEA_Module_Inertial.cpp
@@ -908,7 +908,7 @@ void FEA_Module_Inertial::compute_moment_gradients(const_host_vec_array design_v
 
   //initialize design gradients to 0
   for(int init = 0; init < nlocal_nodes; init++)
-    design_gradients(init,moment_component) = 0;
+    design_gradients(init,0) = 0;
 
   //loop over elements and use quadrature rule to compute volume from Jacobian determinant
   for(int ielem = 0; ielem < rnum_elem; ielem++){
@@ -1047,7 +1047,7 @@ void FEA_Module_Inertial::compute_moment_gradients(const_host_vec_array design_v
       for(int node_loop=0; node_loop < elem->num_basis(); node_loop++){
         if(map->isNodeGlobalElement(nodes_in_elem(ielem, node_loop))){
           local_node_id = map->getLocalElement(nodes_in_elem(ielem, node_loop));
-            design_gradients(local_node_id,moment_component)+=weight_multiply*basis_values(node_loop)*current_position(moment_component)*Jacobian;
+            design_gradients(local_node_id,0)+=weight_multiply*basis_values(node_loop)*current_position(moment_component)*Jacobian;
         }
       }
     }
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/CMakeLists.txt b/src/Parallel-Solvers/Implicit-Lagrange/CMakeLists.txt
index d80b60da8..31b6b6dd5 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/CMakeLists.txt
+++ b/src/Parallel-Solvers/Implicit-Lagrange/CMakeLists.txt
@@ -1,5 +1,5 @@
 set(Implicit_Solver_SRC Implicit_Solver.cpp)
-set(FEA_Module_SRC FEA_Physics_Modules/FEA_Module_Elasticity.cpp FEA_Physics_Modules/FEA_Module_Heat_Conduction.cpp FEA_Physics_Modules/FEA_Module_Thermo_Elasticity.cpp )
+set(FEA_Module_SRC FEA_Physics_Modules/FEA_Module_Elasticity.cpp FEA_Physics_Modules/Elasticity_Optimization_Functions.cpp FEA_Physics_Modules/FEA_Module_Heat_Conduction.cpp FEA_Physics_Modules/FEA_Module_Thermo_Elasticity.cpp )
 
 set(CMAKE_CXX_EXTENSIONS OFF)
 
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/Elasticity_Optimization_Functions.cpp b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/Elasticity_Optimization_Functions.cpp
new file mode 100644
index 000000000..5b15371ee
--- /dev/null
+++ b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/Elasticity_Optimization_Functions.cpp
@@ -0,0 +1,2071 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+
+#include <iostream>
+#include <fstream>
+#include <string>
+#include <sstream>
+#include <stdio.h>
+#include <stdlib.h> 
+#include <math.h>  // fmin, fmax, abs note: fminl is long
+#include <sys/stat.h>
+#include <mpi.h>
+#include <Teuchos_ScalarTraits.hpp>
+#include <Teuchos_RCP.hpp>
+#include <Teuchos_oblackholestream.hpp>
+#include <Teuchos_Tuple.hpp>
+#include <Teuchos_VerboseObject.hpp>
+#include <Teuchos_SerialDenseMatrix.hpp>
+#include <Teuchos_SerialDenseVector.hpp>
+#include <Teuchos_SerialDenseSolver.hpp>
+
+#include <Tpetra_Core.hpp>
+#include <Tpetra_Map.hpp>
+#include <Tpetra_MultiVector.hpp>
+#include <Tpetra_CrsMatrix.hpp>
+#include "Tpetra_Details_makeColMap.hpp"
+#include "Tpetra_Details_DefaultTypes.hpp"
+#include "Tpetra_Details_FixedHashTable.hpp"
+#include "Tpetra_Import.hpp"
+#include "Tpetra_Import_Util2.hpp"
+#include "MatrixMarket_Tpetra.hpp"
+
+#include "elements.h"
+#include "swage.h"
+#include "matar.h"
+#include "utilities.h"
+#include "node_combination.h"
+#include "Simulation_Parameters/FEA_Module/Elasticity_Parameters.h"
+#include "Simulation_Parameters/Simulation_Parameters.h"
+#include "Amesos2_Version.hpp"
+#include "Amesos2.hpp"
+#include "FEA_Module_Elasticity.h"
+#include "Implicit_Solver.h"
+
+//Multigrid Solver
+#include <Xpetra_Operator.hpp>
+#include <MueLu_TpetraOperator.hpp>
+#include <Xpetra_Map.hpp>
+#include <Xpetra_MultiVector.hpp>
+#include <Xpetra_IO.hpp>
+#include <MueLu.hpp>
+#include <MueLu_BaseClass.hpp>
+#ifdef HAVE_MUELU_EXPLICIT_INSTANTIATION
+#include <MueLu_ExplicitInstantiation.hpp>
+#endif
+#include <MueLu_Level.hpp>
+#include <MueLu_MutuallyExclusiveTime.hpp>
+#include <MueLu_ParameterListInterpreter.hpp>
+#include <MueLu_Utilities.hpp>
+#include <DriverCore.hpp>
+
+#define MAX_ELEM_NODES 8
+#define STRAIN_EPSILON 0.000000001
+#define BC_EPSILON 1.0e-6
+
+using namespace utils;
+
+
+/* ----------------------------------------------------------------------
+   Compute the gradient of the displacement constraint with respect to nodal densities
+------------------------------------------------------------------------- */
+
+void FEA_Module_Elasticity::compute_displacement_constraint_gradients(const_host_vec_array design_variables, const_host_vec_array target_displacements, const_host_int_array active_dofs, host_vec_array design_gradients){
+  //local variable for host view in the dual view
+  const_host_vec_array all_node_coords = all_node_coords_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  const_host_vec_array all_node_displacements = all_node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  const_host_elem_conn_array nodes_in_elem = global_nodes_in_elem_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  
+  const_host_vec_array Element_Densities;
+  //local variable for host view of densities from the dual view
+  //bool nodal_density_flag = simparam->nodal_density_flag;
+  const_host_vec_array all_node_densities;
+  if(nodal_density_flag)
+  all_node_densities = all_node_densities_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  else
+  Element_Densities = Global_Element_Densities->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+  int num_dim = simparam->num_dims;
+  int nodes_per_elem = elem->num_basis();
+  int num_gauss_points = simparam->num_gauss_points;
+  int z_quad,y_quad,x_quad, direct_product_count;
+  LO local_node_id, jlocal_node_id, temp_id, local_dof_id, local_dof_idx, local_dof_idy, local_dof_idz, access_index;
+  GO current_global_index, global_dof_id;
+  size_t local_nrows = nlocal_nodes*num_dim;
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
+
+  direct_product_count = std::pow(num_gauss_points,num_dim);
+  real_t Element_Modulus_Gradient, Poisson_Ratio, gradient_force_density[3], Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3], Lower_Poisson_Ratios[3];
+  real_t Elastic_Constant, Shear_Term, Pressure_Term;
+  real_t inner_product, matrix_term, Jacobian, invJacobian, weight_multiply;
+  //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_nodes_1D(num_gauss_points);
+  //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_weights_1D(num_gauss_points);
+  CArray<real_t> legendre_nodes_1D(num_gauss_points);
+  CArray<real_t> legendre_weights_1D(num_gauss_points);
+  real_t pointer_quad_coordinate[num_dim];
+  real_t pointer_quad_coordinate_weight[num_dim];
+  real_t pointer_interpolated_point[num_dim];
+  real_t pointer_JT_row1[num_dim];
+  real_t pointer_JT_row2[num_dim];
+  real_t pointer_JT_row3[num_dim];
+  ViewCArray<real_t> quad_coordinate(pointer_quad_coordinate,num_dim);
+  ViewCArray<real_t> quad_coordinate_weight(pointer_quad_coordinate_weight,num_dim);
+  ViewCArray<real_t> interpolated_point(pointer_interpolated_point,num_dim);
+  ViewCArray<real_t> JT_row1(pointer_JT_row1,num_dim);
+  ViewCArray<real_t> JT_row2(pointer_JT_row2,num_dim);
+  ViewCArray<real_t> JT_row3(pointer_JT_row3,num_dim);
+
+  real_t pointer_basis_values[elem->num_basis()];
+  real_t pointer_basis_derivative_s1[elem->num_basis()];
+  real_t pointer_basis_derivative_s2[elem->num_basis()];
+  real_t pointer_basis_derivative_s3[elem->num_basis()];
+  ViewCArray<real_t> basis_values(pointer_basis_values,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s1(pointer_basis_derivative_s1,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s2(pointer_basis_derivative_s2,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s3(pointer_basis_derivative_s3,elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> nodal_positions(elem->num_basis(),num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_nodal_displacements(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_nodal_adjoint(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> nodal_density(elem->num_basis());
+
+  size_t Brows;
+  if(num_dim==2) Brows = 3;
+  if(num_dim==3) Brows = 6;
+  FArrayKokkos<real_t, array_layout, device_type, memory_traits> B_matrix_contribution(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> B_matrix(Brows,num_dim*elem->num_basis());
+  FArrayKokkos<real_t, array_layout, device_type, memory_traits> CB_matrix_contribution(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> CB_matrix(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> C_matrix(Brows,Brows);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> Local_Matrix_Contribution(num_dim*nodes_per_elem,num_dim*nodes_per_elem);
+
+  //initialize weights
+  elements::legendre_nodes_1D(legendre_nodes_1D,num_gauss_points);
+  elements::legendre_weights_1D(legendre_weights_1D,num_gauss_points);
+  
+  real_t current_density = 1;
+
+  //initialize gradient value to zero
+  for(size_t inode = 0; inode < nlocal_nodes; inode++)
+    design_gradients(inode,0) = 0;
+
+  //solve for the adjoint vector first
+  if(!adjoints_allocated){
+    all_adjoint_displacements_distributed = Teuchos::rcp(new MV(all_dof_map, 1));
+    adjoint_displacements_distributed = Teuchos::rcp(new MV(*all_adjoint_displacements_distributed, local_dof_map));
+    adjoint_equation_RHS_distributed = Teuchos::rcp(new MV(local_dof_map, 1));
+    adjoints_allocated = true;
+  }
+  Teuchos::RCP<MV> lambda = adjoint_displacements_distributed;
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xlambda = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(lambda));
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xB = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(adjoint_equation_RHS_distributed));
+  
+  host_vec_array adjoint_equation_RHS_view = adjoint_equation_RHS_distributed->getLocalView<HostSpace>(Tpetra::Access::ReadWrite);
+  
+  const_host_vec_array lambda_view = lambda->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(active_dofs(i,0))
+      adjoint_equation_RHS_view(i,0) = -2*(all_node_displacements(i,0)-target_displacements(i,0))/
+                                          (target_displacements(i,0)*target_displacements(i,0));
+    else
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+
+  //set adjoint equation RHS terms to 0 if they correspond to a boundary constraint DOF index
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+  //*fos << "Elastic Modulus Gradient" << Element_Modulus_Gradient <<std::endl;
+  //*fos << "DISPLACEMENT" << std::endl;
+  //all_node_displacements_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //*fos << "RHS vector" << std::endl;
+  //Global_Nodal_RHS->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //assign reduced stiffness matrix entries for linear solver
+  if(!matrix_bc_reduced){
+  LO stride_index;
+  for(LO i=0; i < local_nrows; i++){
+    if((Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)){
+      for(LO j = 0; j < Stiffness_Matrix_Strides(i); j++){
+        global_dof_id = DOF_Graph_Matrix(i,j);
+        local_dof_id = all_dof_map->getLocalElement(global_dof_id);
+        Original_Stiffness_Entries(i,j) = Stiffness_Matrix(i,j);
+        Original_Stiffness_Entry_Indices(i,j) = j;
+        if(local_dof_id == i){
+          Stiffness_Matrix(i,j) = 1;
+        }
+        else{     
+          Stiffness_Matrix(i,j) = 0;
+        }
+      }//stride for
+    }
+    else{
+      stride_index = 0;
+      for(LO j = 0; j < Stiffness_Matrix_Strides(i); j++){
+        global_dof_id = DOF_Graph_Matrix(i,j);
+        local_dof_id = all_dof_map->getLocalElement(global_dof_id);
+        if((Node_DOF_Boundary_Condition_Type(local_dof_id)==DISPLACEMENT_CONDITION)){
+          Original_Stiffness_Entries(i,stride_index) = Stiffness_Matrix(i,j);
+          Original_Stiffness_Entry_Indices(i,stride_index) = j;   
+          Stiffness_Matrix(i,j) = 0;
+          stride_index++;
+        }
+      }
+    }
+  }//row for
+  
+  matrix_bc_reduced = true;
+  }
+  
+  //solve for adjoint vector
+  int num_iter = 2000;
+  double solve_tol = 1e-05;
+  int cacheSize = 0;
+  std::string solveType         = "belos";
+  std::string belosType         = "cg";
+  // =========================================================================
+  // Preconditioner construction
+  // =========================================================================
+  //bool useML   = Linear_Solve_Params->isParameter("use external multigrid package") && (Linear_Solve_Params->get<std::string>("use external multigrid package") == "ml");
+  //out<<"*********** MueLu ParameterList ***********"<<std::endl;
+  //out<<*Linear_Solve_Params;
+  //out<<"*******************************************"<<std::endl;
+  
+  //H->Write(-1, -1);
+  //H->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  // =========================================================================
+  // System solution (Ax = b)
+  // =========================================================================
+  //since matrix graph and A are the same from the last update solve, the Hierarchy H need not be rebuilt
+  //xA->describe(*fos,Teuchos::VERB_EXTREME);
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->preScaleRightHandSides(*adjoint_equation_RHS_distributed,"diag");
+  //   Implicit_Solver_Pointer_->preScaleInitialGuesses(*lambda,"diag");
+  // }
+  real_t current_cpu_time2 = Implicit_Solver_Pointer_->CPU_Time();
+  comm->barrier();
+  SystemSolve(xA,xlambda,xB,H,Prec,*fos,solveType,belosType,false,false,false,cacheSize,0,true,true,num_iter,solve_tol);
+  comm->barrier();
+  hessvec_linear_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time2;
+
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->postScaleSolutionVectors(*lambda,"diag");
+  // }
+  
+  //import for displacement of ghosts
+  //Tpetra::Import<LO, GO> ghost_displacement_importer(local_dof_map, all_dof_map);
+
+  //comms to get displacements on all node map
+  all_adjoint_displacements_distributed->doImport(*adjoint_displacements_distributed, *dof_importer, Tpetra::INSERT);
+  host_vec_array all_adjoint = all_adjoint_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+
+
+  //END LINEAR SOLVE
+
+  //loop through each element and assign the contribution to compliance gradient for each of its local nodes
+  for(size_t ielem = 0; ielem < rnum_elem; ielem++){
+    nodes_per_elem = elem->num_basis();
+
+    //initialize C matrix
+    for(int irow = 0; irow < Brows; irow++)
+      for(int icol = 0; icol < Brows; icol++)
+        C_matrix(irow,icol) = 0;
+
+    //B matrix initialization
+    for(int irow=0; irow < Brows; irow++)
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix(irow,icol) = 0;
+      }
+
+    //acquire set of nodes and nodal displacements for this local element
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, node_loop));
+      local_dof_idx = all_dof_map->getLocalElement(nodes_in_elem(ielem, node_loop)*num_dim);
+      local_dof_idy = local_dof_idx + 1;
+      local_dof_idz = local_dof_idx + 2;
+      nodal_positions(node_loop,0) = all_node_coords(local_node_id,0);
+      nodal_positions(node_loop,1) = all_node_coords(local_node_id,1);
+      nodal_positions(node_loop,2) = all_node_coords(local_node_id,2);
+      current_nodal_displacements(node_loop*num_dim) = all_node_displacements(local_dof_idx,0);
+      current_nodal_displacements(node_loop*num_dim+1) = all_node_displacements(local_dof_idy,0);
+      current_nodal_displacements(node_loop*num_dim+2) = all_node_displacements(local_dof_idz,0);
+      current_nodal_adjoint(node_loop*num_dim) = all_adjoint(local_dof_idx,0);
+      current_nodal_adjoint(node_loop*num_dim+1) = all_adjoint(local_dof_idy,0);
+      current_nodal_adjoint(node_loop*num_dim+2) = all_adjoint(local_dof_idz,0);
+      
+      if(nodal_density_flag) nodal_density(node_loop) = all_node_densities(local_node_id,0);
+      //debug print
+      /*
+      std::cout << "node index access x "<< local_node_id << std::endl;
+      std::cout << "local index access x "<< local_dof_idx << " displacement x " << current_nodal_displacements(node_loop*num_dim) <<std::endl;
+      std::cout << "local index access y "<< local_dof_idy << " displacement y " << current_nodal_displacements(node_loop*num_dim + 1) << std::endl;
+      std::cout << "local index access z "<< local_dof_idz << " displacement z " << current_nodal_displacements(node_loop*num_dim + 2) << std::endl; 
+      */
+    }
+
+    //debug print of current_nodal_displacements
+    /*
+    std::cout << " ------------nodal displacements for Element "<< ielem + 1 <<"--------------"<<std::endl;
+    std::cout << " { ";
+    for (int idof = 0; idof < num_dim*nodes_per_elem; idof++){
+      std::cout << idof + 1 << " = " << current_nodal_displacements(idof) << " , " ;
+    }
+    std::cout << " }"<< std::endl;
+    */
+
+    //loop over quadrature points
+    for(int iquad=0; iquad < direct_product_count; iquad++){
+
+    //set current quadrature point
+    if(num_dim==3) z_quad = iquad/(num_gauss_points*num_gauss_points);
+    y_quad = (iquad % (num_gauss_points*num_gauss_points))/num_gauss_points;
+    x_quad = iquad % num_gauss_points;
+    quad_coordinate(0) = legendre_nodes_1D(x_quad);
+    quad_coordinate(1) = legendre_nodes_1D(y_quad);
+    if(num_dim==3)
+    quad_coordinate(2) = legendre_nodes_1D(z_quad);
+
+    //set current quadrature weight
+    quad_coordinate_weight(0) = legendre_weights_1D(x_quad);
+    quad_coordinate_weight(1) = legendre_weights_1D(y_quad);
+    if(num_dim==3)
+    quad_coordinate_weight(2) = legendre_weights_1D(z_quad);
+    else
+    quad_coordinate_weight(2) = 1;
+    weight_multiply = quad_coordinate_weight(0)*quad_coordinate_weight(1)*quad_coordinate_weight(2);
+
+    //compute shape functions at this point for the element type
+    elem->basis(basis_values,quad_coordinate);
+
+    //compute all the necessary coordinates and derivatives at this point
+    //compute shape function derivatives
+    elem->partial_xi_basis(basis_derivative_s1,quad_coordinate);
+    elem->partial_eta_basis(basis_derivative_s2,quad_coordinate);
+    elem->partial_mu_basis(basis_derivative_s3,quad_coordinate);
+
+    //compute derivatives of x,y,z w.r.t the s,t,w isoparametric space needed by JT (Transpose of the Jacobian)
+    //derivative of x,y,z w.r.t s
+    JT_row1(0) = 0;
+    JT_row1(1) = 0;
+    JT_row1(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row1(0) += nodal_positions(node_loop,0)*basis_derivative_s1(node_loop);
+      JT_row1(1) += nodal_positions(node_loop,1)*basis_derivative_s1(node_loop);
+      JT_row1(2) += nodal_positions(node_loop,2)*basis_derivative_s1(node_loop);
+    }
+
+    //derivative of x,y,z w.r.t t
+    JT_row2(0) = 0;
+    JT_row2(1) = 0;
+    JT_row2(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row2(0) += nodal_positions(node_loop,0)*basis_derivative_s2(node_loop);
+      JT_row2(1) += nodal_positions(node_loop,1)*basis_derivative_s2(node_loop);
+      JT_row2(2) += nodal_positions(node_loop,2)*basis_derivative_s2(node_loop);
+    }
+
+    //derivative of x,y,z w.r.t w
+    JT_row3(0) = 0;
+    JT_row3(1) = 0;
+    JT_row3(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row3(0) += nodal_positions(node_loop,0)*basis_derivative_s3(node_loop);
+      JT_row3(1) += nodal_positions(node_loop,1)*basis_derivative_s3(node_loop);
+      JT_row3(2) += nodal_positions(node_loop,2)*basis_derivative_s3(node_loop);
+    }
+    
+    //compute the determinant of the Jacobian
+    Jacobian = JT_row1(0)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+               JT_row1(1)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+               JT_row1(2)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1));
+    if(Jacobian<0) Jacobian = -Jacobian;
+    invJacobian = 1/Jacobian;
+
+    //compute density
+    current_density = 0;
+    if(nodal_density_flag)
+    for(int node_loop=0; node_loop < elem->num_basis(); node_loop++){
+      current_density += nodal_density(node_loop)*basis_values(node_loop);
+    }
+    //default constant element density
+    else{
+      current_density = Element_Densities(ielem,0);
+    }
+
+    //debug print
+    //std::cout << "Current Density " << current_density << std::endl;
+
+    //compute the contributions of this quadrature point to the B matrix
+    if(num_dim==2)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+      B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(1,ishape*num_dim) = 0;
+      B_matrix_contribution(2,ishape*num_dim) = 0;
+      B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(5,ishape*num_dim) = 0;
+      B_matrix_contribution(0,ishape*num_dim+1) = 0;
+      B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(2,ishape*num_dim+1) = 0;
+      B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim+1) = 0;
+      B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(0,ishape*num_dim+2) = 0;
+      B_matrix_contribution(1,ishape*num_dim+2) = 0;
+      B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(3,ishape*num_dim+2) = 0;
+      B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    if(num_dim==3)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+      B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(1,ishape*num_dim) = 0;
+      B_matrix_contribution(2,ishape*num_dim) = 0;
+      B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(5,ishape*num_dim) = 0;
+      B_matrix_contribution(0,ishape*num_dim+1) = 0;
+      B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(2,ishape*num_dim+1) = 0;
+      B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim+1) = 0;
+      B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(0,ishape*num_dim+2) = 0;
+      B_matrix_contribution(1,ishape*num_dim+2) = 0;
+      B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(3,ishape*num_dim+2) = 0;
+      B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==2){
+      C_matrix(0,0) = Pressure_Term;
+      C_matrix(1,1) = Pressure_Term;
+      C_matrix(0,1) = Poisson_Ratio;
+      C_matrix(1,0) = Poisson_Ratio;
+      C_matrix(2,2) = Shear_Term;
+    }
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
+    }
+
+    //compute the previous multiplied by the Elastic (C) Matrix
+    for(int irow=0; irow < Brows; irow++){
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix_contribution(irow,icol) = 0;
+        for(int span=0; span < Brows; span++){
+          CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+        }
+      }
+    }
+    
+    //compute the contributions of this quadrature point to all the local stiffness matrix elements
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        matrix_term = 0;
+        for(int span = 0; span < Brows; span++){
+          matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+        }
+        Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+        if(ifill!=jfill)
+          Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+      }
+    }
+
+    //compute inner product for this quadrature point contribution
+    inner_product = 0;
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        if(ifill==jfill)
+          inner_product += Local_Matrix_Contribution(ifill, jfill)*current_nodal_adjoint(ifill)*current_nodal_displacements(jfill);
+        else{
+          inner_product += Local_Matrix_Contribution(ifill, jfill)*(current_nodal_adjoint(ifill)*current_nodal_displacements(jfill)+
+                            current_nodal_displacements(ifill)*current_nodal_adjoint(jfill));
+        }
+        //debug
+        //if(Local_Matrix_Contribution(ifill, jfill)<0) Local_Matrix_Contribution(ifill, jfill) = - Local_Matrix_Contribution(ifill, jfill);
+        //inner_product += Local_Matrix_Contribution(ifill, jfill);
+      }
+    }
+
+    //evaluate local stiffness matrix gradient with respect to igradient
+    for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+      local_node_id = map->getLocalElement(nodes_in_elem(ielem, igradient));
+      
+      //debug print
+      //std::cout << "contribution for " << igradient + 1 << " is " << inner_product << std::endl;
+      design_gradients(local_node_id,0) += inner_product*Elastic_Constant*basis_values(igradient)*weight_multiply*invJacobian;
+    }
+
+      //evaluate gradient of body force (such as gravity which depends on density) with respect to igradient
+    if(body_term_flag){
+      //look up element material properties at this point as a function of density
+      Gradient_Body_Term(ielem, current_density, gradient_force_density);
+      for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+      local_node_id = map->getLocalElement(nodes_in_elem(ielem, igradient));
+      
+      //compute inner product for this quadrature point contribution
+      inner_product = 0;
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        inner_product += gradient_force_density[ifill%num_dim]*current_nodal_adjoint(ifill)*basis_values(ifill/num_dim);
+      }
+      
+      //debug print
+      //std::cout << "contribution for " << igradient + 1 << " is " << inner_product << std::endl;
+      design_gradients(local_node_id,0) -= inner_product*basis_values(igradient)*weight_multiply*Jacobian;
+      }
+    }
+    }
+  }
+  
+  //restore values of K for any scalar or matrix vector products
+  if(matrix_bc_reduced){
+    for(LO i = 0; i < local_nrows; i++){
+      for(LO j = 0; j < Original_Stiffness_Entries_Strides(i); j++){
+        access_index = Original_Stiffness_Entry_Indices(i,j);
+        Stiffness_Matrix(i,access_index) = Original_Stiffness_Entries(i,j);
+      }
+    }//row for
+    matrix_bc_reduced = false;
+  }
+
+}
+
+/* ------------------------------------------------------------------------------------
+   Compute the hessian*vector product of the displacement with respect to nodal densities
+---------------------------------------------------------------------------------------*/
+
+void FEA_Module_Elasticity::compute_displacement_constraint_hessian_vec(const_host_vec_array design_densities, const_host_vec_array target_displacements, const_host_int_array active_dofs, host_vec_array hessvec, Teuchos::RCP<const MV> direction_vec_distributed){
+  //local variable for host view in the dual view
+  real_t current_cpu_time = Implicit_Solver_Pointer_->CPU_Time();
+  const_host_vec_array all_node_coords = all_node_coords_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  const_host_vec_array all_node_displacements = all_node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  const_host_elem_conn_array nodes_in_elem = global_nodes_in_elem_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  
+  const_host_vec_array Element_Densities;
+  //local variable for host view of densities from the dual view
+  //bool nodal_density_flag = simparam->nodal_density_flag;
+  const_host_vec_array all_node_densities;
+  if(nodal_density_flag)
+  all_node_densities = all_node_densities_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  else
+  Element_Densities = Global_Element_Densities->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+  const_host_vec_array direction_vec = direction_vec_distributed->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+
+  if(!adjoints_allocated){
+    all_adjoint_displacements_distributed = Teuchos::rcp(new MV(all_dof_map, 1));
+    adjoint_displacements_distributed = Teuchos::rcp(new MV(*all_adjoint_displacements_distributed, local_dof_map));
+    adjoint_equation_RHS_distributed = Teuchos::rcp(new MV(local_dof_map, 1));
+    adjoints_allocated = true;
+  }
+
+  if(!constraint_adjoints_allocated){
+    all_psi_adjoint_vector_distributed = Teuchos::rcp(new MV(all_dof_map, 1));
+    all_phi_adjoint_vector_distributed = Teuchos::rcp(new MV(all_dof_map, 1));
+    psi_adjoint_vector_distributed = Teuchos::rcp(new MV(*all_psi_adjoint_vector_distributed, local_dof_map));
+    phi_adjoint_vector_distributed = Teuchos::rcp(new MV(*all_phi_adjoint_vector_distributed, local_dof_map));
+    constraint_adjoints_allocated = true;
+  }
+
+  Teuchos::RCP<MV> lambda = adjoint_displacements_distributed;
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xlambda = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(lambda));
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xpsi_lambda = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(psi_adjoint_vector_distributed));
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xphi_lambda = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(phi_adjoint_vector_distributed));
+  Teuchos::RCP<Xpetra::MultiVector<real_t,LO,GO,node_type>> xB = Teuchos::rcp(new Xpetra::TpetraMultiVector<real_t,LO,GO,node_type>(adjoint_equation_RHS_distributed));
+  
+  host_vec_array adjoint_equation_RHS_view = adjoint_equation_RHS_distributed->getLocalView<HostSpace>(Tpetra::Access::ReadWrite);
+  
+  const_host_vec_array lambda_view = lambda->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+
+  int num_dim = simparam->num_dims;
+  int nodes_per_elem = elem->num_basis();
+  int num_gauss_points = simparam->num_gauss_points;
+  int z_quad,y_quad,x_quad, direct_product_count;
+  LO local_node_id, jlocal_node_id, temp_id, local_dof_id, local_dof_idx, local_dof_idy, local_dof_idz, access_index;
+  GO current_global_index, global_dof_id;
+  size_t local_nrows = nlocal_nodes*num_dim;
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
+
+  direct_product_count = std::pow(num_gauss_points,num_dim);
+  real_t Element_Modulus_Gradient, Element_Modulus_Concavity, Poisson_Ratio, gradient_force_density[3];
+  real_t Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3], Lower_Poisson_Ratios[3];
+  real_t Elastic_Constant, Gradient_Elastic_Constant, Concavity_Elastic_Constant, Shear_Term, Pressure_Term;
+  real_t inner_product, matrix_term, Jacobian, invJacobian, weight_multiply;
+  real_t direction_vec_reduce, local_direction_vec_reduce;
+  //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_nodes_1D(num_gauss_points);
+  //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_weights_1D(num_gauss_points);
+  CArray<real_t> legendre_nodes_1D(num_gauss_points);
+  CArray<real_t> legendre_weights_1D(num_gauss_points);
+  real_t pointer_quad_coordinate[num_dim];
+  real_t pointer_quad_coordinate_weight[num_dim];
+  real_t pointer_interpolated_point[num_dim];
+  real_t pointer_JT_row1[num_dim];
+  real_t pointer_JT_row2[num_dim];
+  real_t pointer_JT_row3[num_dim];
+  ViewCArray<real_t> quad_coordinate(pointer_quad_coordinate,num_dim);
+  ViewCArray<real_t> quad_coordinate_weight(pointer_quad_coordinate_weight,num_dim);
+  ViewCArray<real_t> interpolated_point(pointer_interpolated_point,num_dim);
+  ViewCArray<real_t> JT_row1(pointer_JT_row1,num_dim);
+  ViewCArray<real_t> JT_row2(pointer_JT_row2,num_dim);
+  ViewCArray<real_t> JT_row3(pointer_JT_row3,num_dim);
+
+  real_t pointer_basis_values[elem->num_basis()];
+  real_t pointer_basis_derivative_s1[elem->num_basis()];
+  real_t pointer_basis_derivative_s2[elem->num_basis()];
+  real_t pointer_basis_derivative_s3[elem->num_basis()];
+  ViewCArray<real_t> basis_values(pointer_basis_values,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s1(pointer_basis_derivative_s1,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s2(pointer_basis_derivative_s2,elem->num_basis());
+  ViewCArray<real_t> basis_derivative_s3(pointer_basis_derivative_s3,elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> nodal_positions(elem->num_basis(),num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_nodal_displacements(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_adjoint_displacements(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_psi_adjoint(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> current_phi_adjoint(elem->num_basis()*num_dim);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> nodal_density(elem->num_basis());
+
+  size_t Brows;
+  if(num_dim==2) Brows = 3;
+  if(num_dim==3) Brows = 6;
+  FArrayKokkos<real_t, array_layout, device_type, memory_traits> B_matrix_contribution(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> B_matrix(Brows,num_dim*elem->num_basis());
+  FArrayKokkos<real_t, array_layout, device_type, memory_traits> CB_matrix_contribution(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> CB_matrix(Brows,num_dim*elem->num_basis());
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> C_matrix(Brows,Brows);
+  CArrayKokkos<real_t, array_layout, device_type, memory_traits> Local_Matrix_Contribution(num_dim*nodes_per_elem,num_dim*nodes_per_elem);
+
+  //initialize weights
+  elements::legendre_nodes_1D(legendre_nodes_1D,num_gauss_points);
+  elements::legendre_weights_1D(legendre_weights_1D,num_gauss_points);
+  
+  real_t current_density = 1;
+  
+  //direction_vec_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //initialize hessian vector product value to zero
+  for(size_t inode = 0; inode < nlocal_nodes; inode++)
+    hessvec(inode,0) = 0;
+
+  //compute adjoint from the gradient problem (redundant with gradient function for now to make sure it works; optimize later)
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(active_dofs(i,0))
+      adjoint_equation_RHS_view(i,0) = -2*(all_node_displacements(i,0)-target_displacements(i,0))/
+                                          (target_displacements(i,0)*target_displacements(i,0));
+    else
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+
+  //set adjoint equation RHS terms to 0 if they correspond to a boundary constraint DOF index
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+  //*fos << "Elastic Modulus Gradient" << Element_Modulus_Gradient <<std::endl;
+  //*fos << "DISPLACEMENT" << std::endl;
+  //all_node_displacements_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //*fos << "RHS vector" << std::endl;
+  //Global_Nodal_RHS->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //assign reduced stiffness matrix entries for linear solver
+  if(!matrix_bc_reduced){
+  LO stride_index;
+  for(LO i=0; i < local_nrows; i++){
+    if((Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)){
+      for(LO j = 0; j < Stiffness_Matrix_Strides(i); j++){
+        global_dof_id = DOF_Graph_Matrix(i,j);
+        local_dof_id = all_dof_map->getLocalElement(global_dof_id);
+        Original_Stiffness_Entries(i,j) = Stiffness_Matrix(i,j);
+        Original_Stiffness_Entry_Indices(i,j) = j;
+        if(local_dof_id == i){
+          Stiffness_Matrix(i,j) = 1;
+        }
+        else{     
+          Stiffness_Matrix(i,j) = 0;
+        }
+      }//stride for
+    }
+    else{
+      stride_index = 0;
+      for(LO j = 0; j < Stiffness_Matrix_Strides(i); j++){
+        global_dof_id = DOF_Graph_Matrix(i,j);
+        local_dof_id = all_dof_map->getLocalElement(global_dof_id);
+        if((Node_DOF_Boundary_Condition_Type(local_dof_id)==DISPLACEMENT_CONDITION)){
+          Original_Stiffness_Entries(i,stride_index) = Stiffness_Matrix(i,j);
+          Original_Stiffness_Entry_Indices(i,stride_index) = j;   
+          Stiffness_Matrix(i,j) = 0;
+          stride_index++;
+        }
+      }
+    }
+  }//row for
+  
+  matrix_bc_reduced = true;
+  }
+  
+  //solve for adjoint vector
+  int num_iter = 2000;
+  double solve_tol = 1e-05;
+  int cacheSize = 0;
+  std::string solveType         = "belos";
+  std::string belosType         = "cg";
+  // =========================================================================
+  // Preconditioner construction
+  // =========================================================================
+  //bool useML   = Linear_Solve_Params->isParameter("use external multigrid package") && (Linear_Solve_Params->get<std::string>("use external multigrid package") == "ml");
+  //out<<"*********** MueLu ParameterList ***********"<<std::endl;
+  //out<<*Linear_Solve_Params;
+  //out<<"*******************************************"<<std::endl;
+  
+  //H->Write(-1, -1);
+  //H->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  // =========================================================================
+  // System solution (Ax = b)
+  // =========================================================================
+  //since matrix graph and A are the same from the last update solve, the Hierarchy H need not be rebuilt
+  //xA->describe(*fos,Teuchos::VERB_EXTREME);
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->preScaleRightHandSides(*adjoint_equation_RHS_distributed,"diag");
+  //   Implicit_Solver_Pointer_->preScaleInitialGuesses(*lambda,"diag");
+  // }
+  real_t current_cpu_time2 = Implicit_Solver_Pointer_->CPU_Time();
+  comm->barrier();
+  SystemSolve(xA,xlambda,xB,H,Prec,*fos,solveType,belosType,false,false,false,cacheSize,0,true,true,num_iter,solve_tol);
+  comm->barrier();
+  hessvec_linear_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time2;
+
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->postScaleSolutionVectors(*lambda,"diag");
+  // }
+  
+  //import for displacement of ghosts
+  //Tpetra::Import<LO, GO> ghost_displacement_importer(local_dof_map, all_dof_map);
+
+  //comms to get displacements on all node map
+  all_adjoint_displacements_distributed->doImport(*adjoint_displacements_distributed, *dof_importer, Tpetra::INSERT);
+  host_vec_array all_adjoint = all_adjoint_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+  
+  //Solve for the psi adjoint vector
+  //initialize RHS vector for next solve
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++)
+    adjoint_equation_RHS_view(i,0) = 0;
+  
+  //sum components of direction vector
+  direction_vec_reduce = local_direction_vec_reduce = 0;
+  for(int i = 0; i < nlocal_nodes; i++)
+    local_direction_vec_reduce += direction_vec(i,0);
+  
+  MPI_Allreduce(&local_direction_vec_reduce,&direction_vec_reduce,1,MPI_DOUBLE,MPI_SUM,world);
+
+  //comms to get ghost components of direction vector needed for matrix inner products
+  //Tpetra::Import<LO, GO> node_importer(map, all_node_map);
+  
+  Teuchos::RCP<MV> all_direction_vec_distributed = Teuchos::rcp(new MV(all_node_map, 1));
+  //comms to get ghosts
+  all_direction_vec_distributed->doImport(*direction_vec_distributed, *importer, Tpetra::INSERT);
+  
+  const_host_vec_array all_direction_vec = all_direction_vec_distributed->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
+
+  //loop through each element to contribute to the RHS of the hessvec adjoint equation
+  for(size_t ielem = 0; ielem < rnum_elem; ielem++){
+    nodes_per_elem = elem->num_basis();
+
+    //initialize C matrix
+    for(int irow = 0; irow < Brows; irow++)
+      for(int icol = 0; icol < Brows; icol++)
+        C_matrix(irow,icol) = 0;
+
+    //B matrix initialization
+    for(int irow=0; irow < Brows; irow++)
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix(irow,icol) = 0;
+      }
+
+    //acquire set of nodes and nodal displacements for this local element
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, node_loop));
+      local_dof_idx = all_dof_map->getLocalElement(nodes_in_elem(ielem, node_loop)*num_dim);
+      local_dof_idy = local_dof_idx + 1;
+      local_dof_idz = local_dof_idx + 2;
+      nodal_positions(node_loop,0) = all_node_coords(local_node_id,0);
+      nodal_positions(node_loop,1) = all_node_coords(local_node_id,1);
+      nodal_positions(node_loop,2) = all_node_coords(local_node_id,2);
+      current_nodal_displacements(node_loop*num_dim) = all_node_displacements(local_dof_idx,0);
+      current_nodal_displacements(node_loop*num_dim+1) = all_node_displacements(local_dof_idy,0);
+      current_nodal_displacements(node_loop*num_dim+2) = all_node_displacements(local_dof_idz,0);
+      
+      if(nodal_density_flag) nodal_density(node_loop) = all_node_densities(local_node_id,0);
+    }
+
+    //loop over quadrature points
+    for(int iquad=0; iquad < direct_product_count; iquad++){
+
+      //set current quadrature point
+      if(num_dim==3) z_quad = iquad/(num_gauss_points*num_gauss_points);
+      y_quad = (iquad % (num_gauss_points*num_gauss_points))/num_gauss_points;
+      x_quad = iquad % num_gauss_points;
+      quad_coordinate(0) = legendre_nodes_1D(x_quad);
+      quad_coordinate(1) = legendre_nodes_1D(y_quad);
+      if(num_dim==3)
+      quad_coordinate(2) = legendre_nodes_1D(z_quad);
+
+      //set current quadrature weight
+      quad_coordinate_weight(0) = legendre_weights_1D(x_quad);
+      quad_coordinate_weight(1) = legendre_weights_1D(y_quad);
+      if(num_dim==3)
+      quad_coordinate_weight(2) = legendre_weights_1D(z_quad);
+      else
+      quad_coordinate_weight(2) = 1;
+      weight_multiply = quad_coordinate_weight(0)*quad_coordinate_weight(1)*quad_coordinate_weight(2);
+
+      //compute shape functions at this point for the element type
+      elem->basis(basis_values,quad_coordinate);
+
+      //compute all the necessary coordinates and derivatives at this point
+      //compute shape function derivatives
+      elem->partial_xi_basis(basis_derivative_s1,quad_coordinate);
+      elem->partial_eta_basis(basis_derivative_s2,quad_coordinate);
+      elem->partial_mu_basis(basis_derivative_s3,quad_coordinate);
+
+      //compute derivatives of x,y,z w.r.t the s,t,w isoparametric space needed by JT (Transpose of the Jacobian)
+      //derivative of x,y,z w.r.t s
+      JT_row1(0) = 0;
+      JT_row1(1) = 0;
+      JT_row1(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row1(0) += nodal_positions(node_loop,0)*basis_derivative_s1(node_loop);
+        JT_row1(1) += nodal_positions(node_loop,1)*basis_derivative_s1(node_loop);
+        JT_row1(2) += nodal_positions(node_loop,2)*basis_derivative_s1(node_loop);
+      }
+
+      //derivative of x,y,z w.r.t t
+      JT_row2(0) = 0;
+      JT_row2(1) = 0;
+      JT_row2(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row2(0) += nodal_positions(node_loop,0)*basis_derivative_s2(node_loop);
+        JT_row2(1) += nodal_positions(node_loop,1)*basis_derivative_s2(node_loop);
+        JT_row2(2) += nodal_positions(node_loop,2)*basis_derivative_s2(node_loop);
+      }
+
+      //derivative of x,y,z w.r.t w
+      JT_row3(0) = 0;
+      JT_row3(1) = 0;
+      JT_row3(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row3(0) += nodal_positions(node_loop,0)*basis_derivative_s3(node_loop);
+        JT_row3(1) += nodal_positions(node_loop,1)*basis_derivative_s3(node_loop);
+        JT_row3(2) += nodal_positions(node_loop,2)*basis_derivative_s3(node_loop);
+      }
+    
+      //compute the determinant of the Jacobian
+      Jacobian = JT_row1(0)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+                 JT_row1(1)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+                 JT_row1(2)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1));
+      if(Jacobian<0) Jacobian = -Jacobian;
+      invJacobian = 1/Jacobian;
+
+      //compute density
+      current_density = 0;
+      if(nodal_density_flag)
+      for(int node_loop=0; node_loop < elem->num_basis(); node_loop++){
+        current_density += nodal_density(node_loop)*basis_values(node_loop);
+      }
+      //default constant element density
+      else{
+        current_density = Element_Densities(ielem,0);
+      }
+
+      //debug print
+      //std::cout << "Current Density " << current_density << std::endl;
+
+      //compute the contributions of this quadrature point to the B matrix
+      if(num_dim==2)
+      for(int ishape=0; ishape < nodes_per_elem; ishape++){
+        B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(1,ishape*num_dim) = 0;
+        B_matrix_contribution(2,ishape*num_dim) = 0;
+        B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(5,ishape*num_dim) = 0;
+        B_matrix_contribution(0,ishape*num_dim+1) = 0;
+        B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(2,ishape*num_dim+1) = 0;
+        B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim+1) = 0;
+        B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(0,ishape*num_dim+2) = 0;
+        B_matrix_contribution(1,ishape*num_dim+2) = 0;
+        B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(3,ishape*num_dim+2) = 0;
+        B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    if(num_dim==3)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+        B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(1,ishape*num_dim) = 0;
+        B_matrix_contribution(2,ishape*num_dim) = 0;
+        B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(5,ishape*num_dim) = 0;
+        B_matrix_contribution(0,ishape*num_dim+1) = 0;
+        B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(2,ishape*num_dim+1) = 0;
+        B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim+1) = 0;
+        B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(0,ishape*num_dim+2) = 0;
+        B_matrix_contribution(1,ishape*num_dim+2) = 0;
+        B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(3,ishape*num_dim+2) = 0;
+        B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Gradient_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==2){
+      C_matrix(0,0) = Pressure_Term;
+      C_matrix(1,1) = Pressure_Term;
+      C_matrix(0,1) = Poisson_Ratio;
+      C_matrix(1,0) = Poisson_Ratio;
+      C_matrix(2,2) = Shear_Term;
+    }
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Gradient_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Gradient_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Gradient_Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
+    }
+
+    //compute the previous multiplied by the Elastic (C) Matrix
+    for(int irow=0; irow < Brows; irow++){
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix_contribution(irow,icol) = 0;
+        for(int span=0; span < Brows; span++){
+          CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+        }
+      }
+    }
+    
+    //compute the contributions of this quadrature point to all the local stiffness matrix elements
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        matrix_term = 0;
+        for(int span = 0; span < Brows; span++){
+          matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+        }
+        Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+        if(ifill!=jfill)
+          Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+      }
+    }
+
+    //evaluate local stiffness matrix gradient with respect to igradient
+    for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      //if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, igradient));
+      
+      //compute rhs product for this quadrature point contribution  
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        local_dof_id = all_dof_map->getLocalElement(nodes_in_elem(ielem, ifill/num_dim)*num_dim);
+        local_dof_id += ifill%num_dim;
+        global_dof_id = all_dof_map->getGlobalElement(local_dof_id);
+        if(Node_DOF_Boundary_Condition_Type(local_dof_id)!=DISPLACEMENT_CONDITION&&local_dof_map->isNodeGlobalElement(global_dof_id)){
+          inner_product = 0;
+          for(int jfill=0; jfill < num_dim*nodes_per_elem; jfill++){
+            inner_product += Local_Matrix_Contribution(ifill, jfill)*current_nodal_displacements(jfill);
+            //debug
+            //if(Local_Matrix_Contribution(ifill, jfill)<0) Local_Matrix_Contribution(ifill, jfill) = - Local_Matrix_Contribution(ifill, jfill);
+            //inner_product += Local_Matrix_Contribution(ifill, jfill);
+          }
+          adjoint_equation_RHS_view(local_dof_id,0) -= inner_product*Gradient_Elastic_Constant*basis_values(igradient)*weight_multiply*all_direction_vec(local_node_id,0)*invJacobian;
+        }
+      }
+      } //density gradient loop
+    }//quadrature loop
+  }//element index loop
+
+  //set adjoint equation RHS terms to 0 if they correspond to a boundary constraint DOF index
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+  //*fos << "Elastic Modulus Gradient" << Element_Modulus_Gradient <<std::endl;
+  //*fos << "DISPLACEMENT" << std::endl;
+  //all_node_displacements_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //*fos << "RHS vector" << std::endl;
+  //Global_Nodal_RHS->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  //solve for adjoint vector
+  // =========================================================================
+  // Preconditioner construction
+  // =========================================================================
+  //bool useML   = Linear_Solve_Params->isParameter("use external multigrid package") && (Linear_Solve_Params->get<std::string>("use external multigrid package") == "ml");
+  //out<<"*********** MueLu ParameterList ***********"<<std::endl;
+  //out<<*Linear_Solve_Params;
+  //out<<"*******************************************"<<std::endl;
+  
+  //H->Write(-1, -1);
+  //H->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  // =========================================================================
+  // System solution (Ax = b)
+  // =========================================================================
+  //since matrix graph and A are the same from the last update solve, the Hierarchy H need not be rebuilt
+  //xA->describe(*fos,Teuchos::VERB_EXTREME);
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->preScaleRightHandSides(*adjoint_equation_RHS_distributed,"diag");
+  //   Implicit_Solver_Pointer_->preScaleInitialGuesses(*lambda,"diag");
+  // }
+  real_t current_cpu_time3 = Implicit_Solver_Pointer_->CPU_Time();
+  comm->barrier();
+  SystemSolve(xA,xpsi_lambda,xB,H,Prec,*fos,solveType,belosType,false,false,false,cacheSize,0,true,true,num_iter,solve_tol);
+  comm->barrier();
+  hessvec_linear_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time3;
+
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->postScaleSolutionVectors(*lambda,"diag");
+  // }
+  //scale by reciprocal ofdirection vector sum
+  psi_adjoint_vector_distributed->scale(1/direction_vec_reduce);
+  
+  //import for displacement of ghosts
+  //Tpetra::Import<LO, GO> ghost_displacement_importer(local_dof_map, all_dof_map);
+
+  //comms to get displacements on all node map
+  all_psi_adjoint_vector_distributed->doImport(*psi_adjoint_vector_distributed, *dof_importer, Tpetra::INSERT);
+  host_vec_array all_psi_adjoint = all_psi_adjoint_vector_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+  //*fos << "ALL ADJOINT" << std::endl;
+  //all_adjoint_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //compute phi adjoint vector
+
+  //initialize RHS vector for next solve
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++)
+    adjoint_equation_RHS_view(i,0) = 0;
+
+  //loop through each element to contribute to the RHS of the hessvec adjoint equation
+  for(size_t ielem = 0; ielem < rnum_elem; ielem++){
+    nodes_per_elem = elem->num_basis();
+
+    //initialize C matrix
+    for(int irow = 0; irow < Brows; irow++)
+      for(int icol = 0; icol < Brows; icol++)
+        C_matrix(irow,icol) = 0;
+
+    //B matrix initialization
+    for(int irow=0; irow < Brows; irow++)
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix(irow,icol) = 0;
+      }
+
+    //acquire set of nodes and nodal displacements for this local element
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, node_loop));
+      local_dof_idx = all_dof_map->getLocalElement(nodes_in_elem(ielem, node_loop)*num_dim);
+      local_dof_idy = local_dof_idx + 1;
+      local_dof_idz = local_dof_idx + 2;
+      nodal_positions(node_loop,0) = all_node_coords(local_node_id,0);
+      nodal_positions(node_loop,1) = all_node_coords(local_node_id,1);
+      nodal_positions(node_loop,2) = all_node_coords(local_node_id,2);
+      current_adjoint_displacements(node_loop*num_dim) = all_adjoint(local_dof_idx,0);
+      current_adjoint_displacements(node_loop*num_dim+1) = all_adjoint(local_dof_idy,0);
+      current_adjoint_displacements(node_loop*num_dim+2) = all_adjoint(local_dof_idz,0);
+      
+      if(nodal_density_flag) nodal_density(node_loop) = all_node_densities(local_node_id,0);
+    }
+
+    //loop over quadrature points
+    for(int iquad=0; iquad < direct_product_count; iquad++){
+
+      //set current quadrature point
+      if(num_dim==3) z_quad = iquad/(num_gauss_points*num_gauss_points);
+      y_quad = (iquad % (num_gauss_points*num_gauss_points))/num_gauss_points;
+      x_quad = iquad % num_gauss_points;
+      quad_coordinate(0) = legendre_nodes_1D(x_quad);
+      quad_coordinate(1) = legendre_nodes_1D(y_quad);
+      if(num_dim==3)
+      quad_coordinate(2) = legendre_nodes_1D(z_quad);
+
+      //set current quadrature weight
+      quad_coordinate_weight(0) = legendre_weights_1D(x_quad);
+      quad_coordinate_weight(1) = legendre_weights_1D(y_quad);
+      if(num_dim==3)
+      quad_coordinate_weight(2) = legendre_weights_1D(z_quad);
+      else
+      quad_coordinate_weight(2) = 1;
+      weight_multiply = quad_coordinate_weight(0)*quad_coordinate_weight(1)*quad_coordinate_weight(2);
+
+      //compute shape functions at this point for the element type
+      elem->basis(basis_values,quad_coordinate);
+
+      //compute all the necessary coordinates and derivatives at this point
+      //compute shape function derivatives
+      elem->partial_xi_basis(basis_derivative_s1,quad_coordinate);
+      elem->partial_eta_basis(basis_derivative_s2,quad_coordinate);
+      elem->partial_mu_basis(basis_derivative_s3,quad_coordinate);
+
+      //compute derivatives of x,y,z w.r.t the s,t,w isoparametric space needed by JT (Transpose of the Jacobian)
+      //derivative of x,y,z w.r.t s
+      JT_row1(0) = 0;
+      JT_row1(1) = 0;
+      JT_row1(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row1(0) += nodal_positions(node_loop,0)*basis_derivative_s1(node_loop);
+        JT_row1(1) += nodal_positions(node_loop,1)*basis_derivative_s1(node_loop);
+        JT_row1(2) += nodal_positions(node_loop,2)*basis_derivative_s1(node_loop);
+      }
+
+      //derivative of x,y,z w.r.t t
+      JT_row2(0) = 0;
+      JT_row2(1) = 0;
+      JT_row2(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row2(0) += nodal_positions(node_loop,0)*basis_derivative_s2(node_loop);
+        JT_row2(1) += nodal_positions(node_loop,1)*basis_derivative_s2(node_loop);
+        JT_row2(2) += nodal_positions(node_loop,2)*basis_derivative_s2(node_loop);
+      }
+
+      //derivative of x,y,z w.r.t w
+      JT_row3(0) = 0;
+      JT_row3(1) = 0;
+      JT_row3(2) = 0;
+      for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+        JT_row3(0) += nodal_positions(node_loop,0)*basis_derivative_s3(node_loop);
+        JT_row3(1) += nodal_positions(node_loop,1)*basis_derivative_s3(node_loop);
+        JT_row3(2) += nodal_positions(node_loop,2)*basis_derivative_s3(node_loop);
+      }
+    
+      //compute the determinant of the Jacobian
+      Jacobian = JT_row1(0)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+                 JT_row1(1)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+                 JT_row1(2)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1));
+      if(Jacobian<0) Jacobian = -Jacobian;
+      invJacobian = 1/Jacobian;
+
+      //compute density
+      current_density = 0;
+      if(nodal_density_flag)
+      for(int node_loop=0; node_loop < elem->num_basis(); node_loop++){
+        current_density += nodal_density(node_loop)*basis_values(node_loop);
+      }
+      //default constant element density
+      else{
+        current_density = Element_Densities(ielem,0);
+      }
+
+      //debug print
+      //std::cout << "Current Density " << current_density << std::endl;
+
+      //compute the contributions of this quadrature point to the B matrix
+      if(num_dim==2)
+      for(int ishape=0; ishape < nodes_per_elem; ishape++){
+        B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(1,ishape*num_dim) = 0;
+        B_matrix_contribution(2,ishape*num_dim) = 0;
+        B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(5,ishape*num_dim) = 0;
+        B_matrix_contribution(0,ishape*num_dim+1) = 0;
+        B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(2,ishape*num_dim+1) = 0;
+        B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim+1) = 0;
+        B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(0,ishape*num_dim+2) = 0;
+        B_matrix_contribution(1,ishape*num_dim+2) = 0;
+        B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(3,ishape*num_dim+2) = 0;
+        B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    if(num_dim==3)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+        B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(1,ishape*num_dim) = 0;
+        B_matrix_contribution(2,ishape*num_dim) = 0;
+        B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(5,ishape*num_dim) = 0;
+        B_matrix_contribution(0,ishape*num_dim+1) = 0;
+        B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+        B_matrix_contribution(2,ishape*num_dim+1) = 0;
+        B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(4,ishape*num_dim+1) = 0;
+        B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(0,ishape*num_dim+2) = 0;
+        B_matrix_contribution(1,ishape*num_dim+2) = 0;
+        B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+        B_matrix_contribution(3,ishape*num_dim+2) = 0;
+        B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+        B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Gradient_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==2){
+      C_matrix(0,0) = Pressure_Term;
+      C_matrix(1,1) = Pressure_Term;
+      C_matrix(0,1) = Poisson_Ratio;
+      C_matrix(1,0) = Poisson_Ratio;
+      C_matrix(2,2) = Shear_Term;
+    }
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Gradient_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Gradient_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Gradient_Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
+    }
+
+    //compute the previous multiplied by the Elastic (C) Matrix
+    for(int irow=0; irow < Brows; irow++){
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix_contribution(irow,icol) = 0;
+        for(int span=0; span < Brows; span++){
+          CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+        }
+      }
+    }
+    
+    //compute the contributions of this quadrature point to all the local stiffness matrix elements
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        matrix_term = 0;
+        for(int span = 0; span < Brows; span++){
+          matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+        }
+        Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+        if(ifill!=jfill)
+          Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+      }
+    }
+
+    //evaluate local stiffness matrix gradient with respect to igradient
+    for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      //if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, igradient));
+      
+      //compute rhs product for this quadrature point contribution  
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        local_dof_id = all_dof_map->getLocalElement(nodes_in_elem(ielem, ifill/num_dim)*num_dim);
+        local_dof_id += ifill%num_dim;
+        global_dof_id = all_dof_map->getGlobalElement(local_dof_id);
+        if(Node_DOF_Boundary_Condition_Type(local_dof_id)!=DISPLACEMENT_CONDITION&&local_dof_map->isNodeGlobalElement(global_dof_id)){
+          inner_product = 0;
+          for(int jfill=0; jfill < num_dim*nodes_per_elem; jfill++){
+            inner_product += Local_Matrix_Contribution(ifill, jfill)*current_adjoint_displacements(jfill);
+            //debug
+            //if(Local_Matrix_Contribution(ifill, jfill)<0) Local_Matrix_Contribution(ifill, jfill) = - Local_Matrix_Contribution(ifill, jfill);
+            //inner_product += Local_Matrix_Contribution(ifill, jfill);
+          }
+          adjoint_equation_RHS_view(local_dof_id,0) -= inner_product*Gradient_Elastic_Constant*basis_values(igradient)*weight_multiply*all_direction_vec(local_node_id,0)*invJacobian;
+        }
+      }
+      } //density gradient loop
+    }//quadrature loop
+  }//element index loop
+
+  //set adjoint equation RHS contribution from constraint definition
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(active_dofs(i,0))
+      adjoint_equation_RHS_view(i,0) -= 2*(all_psi_adjoint(i,0))*direction_vec_reduce/
+                                          (target_displacements(i,0)*target_displacements(i,0));
+  }
+
+  //set adjoint equation RHS terms to 0 if they correspond to a boundary constraint DOF index
+  for(int i=0; i < local_dof_map->getLocalNumElements(); i++){
+    if(Node_DOF_Boundary_Condition_Type(i)==DISPLACEMENT_CONDITION)
+      adjoint_equation_RHS_view(i,0) = 0;
+  }
+  //*fos << "Elastic Modulus Gradient" << Element_Modulus_Gradient <<std::endl;
+  //*fos << "DISPLACEMENT" << std::endl;
+  //all_node_displacements_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //*fos << "RHS vector" << std::endl;
+  //Global_Nodal_RHS->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  //solve for adjoint vector
+  // =========================================================================
+  // Preconditioner construction
+  // =========================================================================
+  //bool useML   = Linear_Solve_Params->isParameter("use external multigrid package") && (Linear_Solve_Params->get<std::string>("use external multigrid package") == "ml");
+  //out<<"*********** MueLu ParameterList ***********"<<std::endl;
+  //out<<*Linear_Solve_Params;
+  //out<<"*******************************************"<<std::endl;
+  
+  //H->Write(-1, -1);
+  //H->describe(*fos,Teuchos::VERB_EXTREME);
+  
+  // =========================================================================
+  // System solution (Ax = b)
+  // =========================================================================
+  //since matrix graph and A are the same from the last update solve, the Hierarchy H need not be rebuilt
+  //xA->describe(*fos,Teuchos::VERB_EXTREME);
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->preScaleRightHandSides(*adjoint_equation_RHS_distributed,"diag");
+  //   Implicit_Solver_Pointer_->preScaleInitialGuesses(*lambda,"diag");
+  // }
+  real_t current_cpu_time4 = Implicit_Solver_Pointer_->CPU_Time();
+  comm->barrier();
+  SystemSolve(xA,xphi_lambda,xB,H,Prec,*fos,solveType,belosType,false,false,false,cacheSize,0,true,true,num_iter,solve_tol);
+  comm->barrier();
+  hessvec_linear_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time4;
+
+  // if(module_params->equilibrate_matrix_flag){
+  //   Implicit_Solver_Pointer_->postScaleSolutionVectors(*lambda,"diag");
+  // }
+  //scale by reciprocal ofdirection vector sum
+  phi_adjoint_vector_distributed->scale(1/direction_vec_reduce);
+  
+  //import for displacement of ghosts
+  //Tpetra::Import<LO, GO> ghost_displacement_importer(local_dof_map, all_dof_map);
+
+  //comms to get displacements on all node map
+  all_phi_adjoint_vector_distributed->doImport(*phi_adjoint_vector_distributed, *dof_importer, Tpetra::INSERT);
+  host_vec_array all_phi_adjoint = all_phi_adjoint_vector_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+  //*fos << "ALL ADJOINT" << std::endl;
+  //all_adjoint_distributed->describe(*fos,Teuchos::VERB_EXTREME);
+
+  //now that adjoint is computed, calculate the hessian vector product
+  //loop through each element and assign the contribution to Hessian vector product for each of its local nodes
+
+  for(size_t ielem = 0; ielem < rnum_elem; ielem++){
+    nodes_per_elem = elem->num_basis();
+
+    //initialize C matrix
+    for(int irow = 0; irow < Brows; irow++)
+      for(int icol = 0; icol < Brows; icol++)
+        C_matrix(irow,icol) = 0;
+
+    //B matrix initialization
+    for(int irow=0; irow < Brows; irow++)
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix(irow,icol) = 0;
+      }
+
+    //acquire set of nodes and nodal displacements for this local element
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, node_loop));
+      local_dof_idx = all_dof_map->getLocalElement(nodes_in_elem(ielem, node_loop)*num_dim);
+      local_dof_idy = local_dof_idx + 1;
+      local_dof_idz = local_dof_idx + 2;
+      nodal_positions(node_loop,0) = all_node_coords(local_node_id,0);
+      nodal_positions(node_loop,1) = all_node_coords(local_node_id,1);
+      nodal_positions(node_loop,2) = all_node_coords(local_node_id,2);
+      current_nodal_displacements(node_loop*num_dim) = all_node_displacements(local_dof_idx,0);
+      current_nodal_displacements(node_loop*num_dim+1) = all_node_displacements(local_dof_idy,0);
+      current_nodal_displacements(node_loop*num_dim+2) = all_node_displacements(local_dof_idz,0);
+      current_adjoint_displacements(node_loop*num_dim) = all_adjoint(local_dof_idx,0);
+      current_adjoint_displacements(node_loop*num_dim+1) = all_adjoint(local_dof_idy,0);
+      current_adjoint_displacements(node_loop*num_dim+2) = all_adjoint(local_dof_idz,0);
+      current_psi_adjoint(node_loop*num_dim) = all_psi_adjoint(local_dof_idx,0);
+      current_psi_adjoint(node_loop*num_dim+1) = all_psi_adjoint(local_dof_idy,0);
+      current_psi_adjoint(node_loop*num_dim+2) = all_psi_adjoint(local_dof_idz,0);
+      current_phi_adjoint(node_loop*num_dim) = all_phi_adjoint(local_dof_idx,0);
+      current_phi_adjoint(node_loop*num_dim+1) = all_phi_adjoint(local_dof_idy,0);
+      current_phi_adjoint(node_loop*num_dim+2) = all_phi_adjoint(local_dof_idz,0);
+      
+      if(nodal_density_flag) nodal_density(node_loop) = all_node_densities(local_node_id,0);
+    }
+
+    //loop over quadrature points
+    for(int iquad=0; iquad < direct_product_count; iquad++){
+
+    //set current quadrature point
+    if(num_dim==3) z_quad = iquad/(num_gauss_points*num_gauss_points);
+    y_quad = (iquad % (num_gauss_points*num_gauss_points))/num_gauss_points;
+    x_quad = iquad % num_gauss_points;
+    quad_coordinate(0) = legendre_nodes_1D(x_quad);
+    quad_coordinate(1) = legendre_nodes_1D(y_quad);
+    if(num_dim==3)
+    quad_coordinate(2) = legendre_nodes_1D(z_quad);
+
+    //set current quadrature weight
+    quad_coordinate_weight(0) = legendre_weights_1D(x_quad);
+    quad_coordinate_weight(1) = legendre_weights_1D(y_quad);
+    if(num_dim==3)
+    quad_coordinate_weight(2) = legendre_weights_1D(z_quad);
+    else
+    quad_coordinate_weight(2) = 1;
+    weight_multiply = quad_coordinate_weight(0)*quad_coordinate_weight(1)*quad_coordinate_weight(2);
+
+    //compute shape functions at this point for the element type
+    elem->basis(basis_values,quad_coordinate);
+
+    //compute all the necessary coordinates and derivatives at this point
+    //compute shape function derivatives
+    elem->partial_xi_basis(basis_derivative_s1,quad_coordinate);
+    elem->partial_eta_basis(basis_derivative_s2,quad_coordinate);
+    elem->partial_mu_basis(basis_derivative_s3,quad_coordinate);
+
+    //compute derivatives of x,y,z w.r.t the s,t,w isoparametric space needed by JT (Transpose of the Jacobian)
+    //derivative of x,y,z w.r.t s
+    JT_row1(0) = 0;
+    JT_row1(1) = 0;
+    JT_row1(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row1(0) += nodal_positions(node_loop,0)*basis_derivative_s1(node_loop);
+      JT_row1(1) += nodal_positions(node_loop,1)*basis_derivative_s1(node_loop);
+      JT_row1(2) += nodal_positions(node_loop,2)*basis_derivative_s1(node_loop);
+    }
+
+    //derivative of x,y,z w.r.t t
+    JT_row2(0) = 0;
+    JT_row2(1) = 0;
+    JT_row2(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row2(0) += nodal_positions(node_loop,0)*basis_derivative_s2(node_loop);
+      JT_row2(1) += nodal_positions(node_loop,1)*basis_derivative_s2(node_loop);
+      JT_row2(2) += nodal_positions(node_loop,2)*basis_derivative_s2(node_loop);
+    }
+
+    //derivative of x,y,z w.r.t w
+    JT_row3(0) = 0;
+    JT_row3(1) = 0;
+    JT_row3(2) = 0;
+    for(int node_loop=0; node_loop < nodes_per_elem; node_loop++){
+      JT_row3(0) += nodal_positions(node_loop,0)*basis_derivative_s3(node_loop);
+      JT_row3(1) += nodal_positions(node_loop,1)*basis_derivative_s3(node_loop);
+      JT_row3(2) += nodal_positions(node_loop,2)*basis_derivative_s3(node_loop);
+    }
+    
+    //compute the determinant of the Jacobian
+    Jacobian = JT_row1(0)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+               JT_row1(1)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+               JT_row1(2)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1));
+    if(Jacobian<0) Jacobian = -Jacobian;
+    invJacobian = 1/Jacobian;
+
+    //compute density
+    current_density = 0;
+    if(nodal_density_flag){
+      for(int node_loop=0; node_loop < elem->num_basis(); node_loop++){
+        current_density += nodal_density(node_loop)*basis_values(node_loop);
+      }
+    }
+    //default constant element density
+    else{
+      current_density = Element_Densities(ielem,0);
+    }
+
+    //debug print
+    //std::cout << "Current Density " << current_density << std::endl;
+
+    //compute the contributions of this quadrature point to the B matrix
+    if(num_dim==2)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+      B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(1,ishape*num_dim) = 0;
+      B_matrix_contribution(2,ishape*num_dim) = 0;
+      B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(5,ishape*num_dim) = 0;
+      B_matrix_contribution(0,ishape*num_dim+1) = 0;
+      B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(2,ishape*num_dim+1) = 0;
+      B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim+1) = 0;
+      B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(0,ishape*num_dim+2) = 0;
+      B_matrix_contribution(1,ishape*num_dim+2) = 0;
+      B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(3,ishape*num_dim+2) = 0;
+      B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+    if(num_dim==3)
+    for(int ishape=0; ishape < nodes_per_elem; ishape++){
+      B_matrix_contribution(0,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(1,ishape*num_dim) = 0;
+      B_matrix_contribution(2,ishape*num_dim) = 0;
+      B_matrix_contribution(3,ishape*num_dim) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(5,ishape*num_dim) = 0;
+      B_matrix_contribution(0,ishape*num_dim+1) = 0;
+      B_matrix_contribution(1,ishape*num_dim+1) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+      B_matrix_contribution(2,ishape*num_dim+1) = 0;
+      B_matrix_contribution(3,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(4,ishape*num_dim+1) = 0;
+      B_matrix_contribution(5,ishape*num_dim+1) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(0,ishape*num_dim+2) = 0;
+      B_matrix_contribution(1,ishape*num_dim+2) = 0;
+      B_matrix_contribution(2,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(1)-JT_row3(0)*JT_row2(1))-
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(1)-JT_row3(0)*JT_row1(1))+
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(1)-JT_row2(0)*JT_row1(1)));
+      B_matrix_contribution(3,ishape*num_dim+2) = 0;
+      B_matrix_contribution(4,ishape*num_dim+2) = (basis_derivative_s1(ishape)*(JT_row2(1)*JT_row3(2)-JT_row3(1)*JT_row2(2))-
+          basis_derivative_s2(ishape)*(JT_row1(1)*JT_row3(2)-JT_row3(1)*JT_row1(2))+
+          basis_derivative_s3(ishape)*(JT_row1(1)*JT_row2(2)-JT_row2(1)*JT_row1(2)));
+      B_matrix_contribution(5,ishape*num_dim+2) = (-basis_derivative_s1(ishape)*(JT_row2(0)*JT_row3(2)-JT_row3(0)*JT_row2(2))+
+          basis_derivative_s2(ishape)*(JT_row1(0)*JT_row3(2)-JT_row3(0)*JT_row1(2))-
+          basis_derivative_s3(ishape)*(JT_row1(0)*JT_row2(2)-JT_row2(0)*JT_row1(2)));
+    }
+
+    //look up element material properties at this point as a function of density
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Concavity_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Concavity_Element_Material_Properties(ielem, Element_Modulus_Concavity, Poisson_Ratio, current_density);
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Concavity_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Concavity_Elastic_Constant = Element_Modulus_Concavity/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==2){
+      C_matrix(0,0) = Pressure_Term;
+      C_matrix(1,1) = Pressure_Term;
+      C_matrix(0,1) = Poisson_Ratio;
+      C_matrix(1,0) = Poisson_Ratio;
+      C_matrix(2,2) = Shear_Term;
+    }
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Concavity_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Concavity_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Concavity_Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
+    }
+
+    //compute the previous multiplied by the Elastic (C) Matrix
+    for(int irow=0; irow < Brows; irow++){
+      for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+        CB_matrix_contribution(irow,icol) = 0;
+        for(int span=0; span < Brows; span++){
+          CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+        }
+      }
+    }
+    
+    //compute the contributions of this quadrature point to all the local stiffness matrix elements
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+          matrix_term = 0;
+          for(int span = 0; span < Brows; span++){
+            matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+          }
+          Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+          if(jfill!=ifill)
+            Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+        }
+      }
+      
+    //compute inner product for this quadrature point contribution
+    inner_product = 0;
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        if(ifill==jfill)
+          inner_product += Local_Matrix_Contribution(ifill, jfill)*current_adjoint_displacements(ifill)*current_nodal_displacements(jfill);
+        else
+          inner_product += Local_Matrix_Contribution(ifill, jfill)*(current_adjoint_displacements(ifill)*current_nodal_displacements(jfill) +
+                          current_adjoint_displacements(jfill)*current_nodal_displacements(ifill));
+      }
+    }
+
+    //evaluate local stiffness matrix concavity with respect to igradient and jgradient
+    for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      local_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, igradient));
+      for(int jgradient=igradient; jgradient < nodes_per_elem; jgradient++){
+        jlocal_node_id = all_node_map->getLocalElement(nodes_in_elem(ielem, jgradient));
+        //debug print
+        //std::cout << "contribution for " << igradient + 1 << " is " << inner_product << std::endl;
+        if(map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))){
+        temp_id = map->getLocalElement(nodes_in_elem(ielem, igradient));
+          hessvec(temp_id,0) += inner_product*Concavity_Elastic_Constant*basis_values(igradient)*all_direction_vec(jlocal_node_id,0)*
+                                  basis_values(jgradient)*weight_multiply*invJacobian;
+        }
+        if(igradient!=jgradient&&map->isNodeGlobalElement(nodes_in_elem(ielem, jgradient))){
+          //temp_id = map->getLocalElement(nodes_in_elem(ielem, jgradient));
+          hessvec(jlocal_node_id,0) += inner_product*Concavity_Elastic_Constant*basis_values(igradient)*all_direction_vec(local_node_id,0)*
+                                      basis_values(jgradient)*weight_multiply*invJacobian;
+
+        }
+      }
+    }
+
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Gradient_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Gradient_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Gradient_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Gradient_Elastic_Constant;
+
+      }
+    }
+
+    
+    if(anisotropic_lattice){
+      //compute the previous multiplied by the Elastic (C) Matrix
+      for(int irow=0; irow < Brows; irow++){
+        for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+          CB_matrix_contribution(irow,icol) = 0;
+          for(int span=0; span < Brows; span++){
+            CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+          }
+        }
+      }
+      
+      //compute the contributions of this quadrature point to all the local stiffness matrix elements
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+          matrix_term = 0;
+          for(int span = 0; span < Brows; span++){
+            matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+          }
+          Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+          if(jfill!=ifill)
+            Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+        }
+      }
+    }
+    
+    //compute inner product for this quadrature point contribution
+    inner_product = 0;
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=0; jfill < num_dim*nodes_per_elem; jfill++){
+        inner_product += Local_Matrix_Contribution(ifill, jfill)*current_phi_adjoint(ifill)*current_nodal_displacements(jfill);
+        
+        inner_product += Local_Matrix_Contribution(ifill, jfill)*current_adjoint_displacements(ifill)*current_psi_adjoint(jfill);
+        //debug
+        //if(Local_Matrix_Contribution(ifill, jfill)<0) Local_Matrix_Contribution(ifill, jfill) = - Local_Matrix_Contribution(ifill, jfill);
+        //inner_product += Local_Matrix_Contribution(ifill, jfill);
+      }
+    }
+
+    //evaluate local stiffness matrix gradient with respect to igradient (augmented term with adjoint vector)
+    for(int igradient=0; igradient < nodes_per_elem; igradient++){
+      if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+      local_node_id = map->getLocalElement(nodes_in_elem(ielem, igradient));
+      
+      //debug print
+      //std::cout << "contribution for " << igradient + 1 << " is " << inner_product << std::endl;
+      hessvec(local_node_id,0) += inner_product*direction_vec_reduce*Gradient_Elastic_Constant*basis_values(igradient)*weight_multiply*invJacobian;
+    }
+
+      //evaluate gradient of body force (such as gravity which depends on density) with respect to igradient
+      if(body_term_flag){
+        for(int igradient=0; igradient < nodes_per_elem; igradient++){
+        if(!map->isNodeGlobalElement(nodes_in_elem(ielem, igradient))) continue;
+        local_node_id = map->getLocalElement(nodes_in_elem(ielem, igradient));
+        //look up element material properties at this point as a function of density
+        Gradient_Body_Term(ielem, current_density, gradient_force_density);
+      
+        //compute inner product for this quadrature point contribution
+        inner_product = 0;
+        for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+          inner_product -= gradient_force_density[ifill%num_dim]*
+                           current_adjoint_displacements(ifill)*basis_values(ifill/num_dim);
+        }
+      
+        //debug print
+        //std::cout << "contribution for " << igradient + 1 << " is " << inner_product << std::endl;
+        hessvec(local_node_id,0) += inner_product*direction_vec_reduce*basis_values(igradient)*weight_multiply*Jacobian;
+        }
+      }
+    }
+  }//end element loop for hessian vector product
+
+  //restore values of K for any scalar or matrix vector products
+  if(matrix_bc_reduced){
+    for(LO i = 0; i < local_nrows; i++){
+      for(LO j = 0; j < Original_Stiffness_Entries_Strides(i); j++){
+        access_index = Original_Stiffness_Entry_Indices(i,j);
+        Stiffness_Matrix(i,access_index) = Original_Stiffness_Entries(i,j);
+      }
+    }//row for
+    matrix_bc_reduced = false;
+  }
+
+  hessvec_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time;
+}
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.cpp b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.cpp
index 7e2dbf251..0ef9315b0 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.cpp
+++ b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.cpp
@@ -162,7 +162,7 @@ FEA_Module_Elasticity::FEA_Module_Elasticity(
   all_node_strains_distributed = Teuchos::rcp(new MV(all_node_map, strain_count));
   Global_Nodal_Forces = Teuchos::rcp(new MV(local_dof_map, 1));
   Global_Nodal_RHS = Teuchos::rcp(new MV(local_dof_map, 1));
-  adjoints_allocated = false;
+  adjoints_allocated = constraint_adjoints_allocated = false;
 
   //initialize displacements to 0
   //local variable for host view in the dual view
@@ -974,7 +974,7 @@ void FEA_Module_Elasticity::init_assembly(){
 
   //allocate sparse graph with node repeats
   RaggedRightArrayKokkos<size_t, array_layout, device_type, memory_traits> Repeat_Graph_Matrix(Graph_Matrix_Strides_initial);
-  RaggedRightArrayofVectorsKokkos<size_t, array_layout, device_type, memory_traits> Element_local_indices(Graph_Matrix_Strides_initial,num_dim);
+  RaggedRightArrayofVectorsKokkos<size_t, array_layout, device_type, memory_traits> Element_local_indices(Graph_Matrix_Strides_initial,3);
   
   //Fill the initial Graph with repeats
   if(num_dim == 2)
@@ -2117,6 +2117,105 @@ void FEA_Module_Elasticity::Concavity_Element_Material_Properties(size_t ielem,
   
 }
 
+/* ----------------------------------------------------------------------
+   Retrieve material properties associated with a finite element
+------------------------------------------------------------------------- */
+
+void FEA_Module_Elasticity::Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli[3],
+                                                                    real_t Poisson_Ratios[3], 
+                                                                    real_t Shear_Moduli[3], real_t density){
+  real_t unit_scaling = simparam->get_unit_scaling();
+  real_t penalty_product = 1;
+  real_t density_epsilon = simparam->optimization_options.density_epsilon;
+  if(density < 0) density = 0;
+  if(module_params->material.SIMP_modulus){
+    for(int i = 0; i < penalty_power; i++)
+    penalty_product *= density;
+    //relationship between density and stiffness
+    for(int idim = 0; idim < num_dim; idim++){
+      Element_Moduli[idim] = (density_epsilon + (1 - density_epsilon)*penalty_product)*module_params->material.elastic_moduli[idim]/unit_scaling/unit_scaling;
+      Shear_Moduli[idim] = (density_epsilon + (1 - density_epsilon)*penalty_product)*module_params->material.shear_moduli[idim]/unit_scaling/unit_scaling;
+      //Element_Modulus = density*simparam->Elastic_Modulus/unit_scaling/unit_scaling;
+      Poisson_Ratios[idim] = module_params->material.poisson_ratios[idim];
+    }
+  }
+  else if(module_params->material.linear_cell_modulus){
+    for(int idim = 0; idim < num_dim; idim++){
+      Element_Moduli[idim] = module_params->material.modulus_initial + module_params->material.modulus_density_slope*density;
+      Poisson_Ratios[idim] = module_params->material.poisson_ratio;
+      Shear_Moduli[idim] = module_params->material.shear_modulus_initial + module_params->material.shear_modulus_density_slope*density;
+    }
+  }
+}
+
+/* ----------------------------------------------------------------------
+   Retrieve derivative of material properties with respect to local density
+------------------------------------------------------------------------- */
+
+void FEA_Module_Elasticity::Gradient_Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli_Derivatives[3],
+                                                                            real_t Poisson_Ratios_Derivatives[3], 
+                                                                            real_t Shear_Moduli_Derivatives[3], real_t density){
+  real_t unit_scaling = simparam->get_unit_scaling();
+  real_t penalty_product = 1;
+  real_t density_epsilon = simparam->optimization_options.density_epsilon;
+  if(density < 0) density = 0;
+  
+  if(module_params->material.SIMP_modulus){
+    for(int i = 0; i < penalty_power - 1; i++)
+      penalty_product *= density;
+    //relationship between density and stiffness
+    
+    for(int idim = 0; idim < num_dim; idim++){
+      Element_Moduli_Derivatives[idim] = penalty_power*(1 - density_epsilon)*penalty_product*module_params->material.elastic_moduli[idim]/unit_scaling/unit_scaling;
+      Shear_Moduli_Derivatives[idim] = penalty_power*(1 - density_epsilon)*penalty_product*module_params->material.shear_moduli[idim]/unit_scaling/unit_scaling;
+      //Element_Modulus_Derivative = simparam->Elastic_Modulus/unit_scaling/unit_scaling;
+      Poisson_Ratios_Derivatives[idim] = module_params->material.poisson_ratios[idim];
+    }
+  }
+  else if(module_params->material.linear_cell_modulus){
+    for(int idim = 0; idim < num_dim; idim++){
+      Element_Moduli_Derivatives[idim] = module_params->material.modulus_density_slope;
+      Poisson_Ratios_Derivatives[idim] = module_params->material.poisson_ratios[idim];
+      Shear_Moduli_Derivatives[idim] = module_params->material.shear_modulus_density_slope;
+    }
+  }
+}
+
+/* --------------------------------------------------------------------------------
+   Retrieve second derivative of material properties with respect to local density
+----------------------------------------------------------------------------------- */
+
+void FEA_Module_Elasticity::Concavity_Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli_Derivatives[3], real_t Poisson_Ratios_Derivatives[3], real_t Shear_Moduli_Derivatives[3], real_t density){
+  real_t unit_scaling = simparam->get_unit_scaling();
+  real_t penalty_product = 1;
+  real_t density_epsilon = simparam->optimization_options.density_epsilon;
+  if(density < 0) density = 0;
+  
+  if(module_params->material.SIMP_modulus){
+    if(penalty_power>=2){
+      for(int i = 0; i < penalty_power - 2; i++)
+        penalty_product *= density;
+      //relationship between density and stiffness
+      
+      for(int idim = 0; idim < num_dim; idim++){
+        Element_Moduli_Derivatives[idim] = penalty_power*(penalty_power-1)*(1 - density_epsilon)*penalty_product*module_params->material.elastic_modulus/unit_scaling/unit_scaling;
+        Shear_Moduli_Derivatives[idim] = penalty_power*(penalty_power-1)*(1 - density_epsilon)*penalty_product*module_params->material.elastic_modulus/unit_scaling/unit_scaling;
+        Poisson_Ratios_Derivatives[idim] = module_params->material.poisson_ratios[idim];
+      }
+    }
+  }
+  else if(module_params->material.linear_cell_modulus){
+    
+    for(int idim = 0; idim < num_dim; idim++){
+      Element_Moduli_Derivatives[idim] = 0;
+      Shear_Moduli_Derivatives[idim] = 0;
+      Poisson_Ratios_Derivatives[idim] = module_params->material.poisson_ratios[idim];
+    }
+  }
+  //Element_Modulus_Derivative = simparam->Elastic_Modulus/unit_scaling/unit_scaling;
+  
+}
+
 /* ----------------------------------------------------------------------
    Construct the local stiffness matrix
 ------------------------------------------------------------------------- */
@@ -2139,8 +2238,9 @@ void FEA_Module_Elasticity::local_matrix(int ielem, CArrayKokkos<real_t, array_l
   int z_quad,y_quad,x_quad, direct_product_count;
   size_t local_node_id;
 
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
   direct_product_count = std::pow(num_gauss_points,num_dim);
-  real_t Elastic_Constant, Shear_Term, Pressure_Term, matrix_term;
+  real_t Elastic_Constant, Shear_Term, Pressure_Term, matrix_term, Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3];
   real_t matrix_subterm1, matrix_subterm2, matrix_subterm3, Jacobian, invJacobian, weight_multiply;
   real_t Element_Modulus, Poisson_Ratio;
   //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_nodes_1D(num_gauss_points);
@@ -2227,11 +2327,17 @@ void FEA_Module_Elasticity::local_matrix(int ielem, CArrayKokkos<real_t, array_l
     }
 
     //look up element material properties as a function of density at the point
-    Element_Material_Properties((size_t) ielem,Element_Modulus,Poisson_Ratio, current_density);
-    Elastic_Constant = Element_Modulus/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Shear_Term = 0.5-Poisson_Ratio;
-    Pressure_Term = 1 - Poisson_Ratio;
-
+    if(anisotropic_lattice){
+      Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios ,Shear_Moduli, current_density);
+      Elastic_Constant = (1-Poisson_Ratios[0]*Poisson_Ratios[0]-Poisson_Ratios[1]*Poisson_Ratios[1]-Poisson_Ratios[2]*Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2])/(Elastic_Moduli[0]*Elastic_Moduli[1]*Elastic_Moduli[2]);
+    }
+    else{
+      Element_Material_Properties((size_t) ielem,Element_Modulus,Poisson_Ratio, current_density);
+      Elastic_Constant = Element_Modulus/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
     //compute all the necessary coordinates and derivatives at this point
 
     //compute shape function derivatives
@@ -2536,7 +2642,7 @@ void FEA_Module_Elasticity::local_matrix_multiply(int ielem, CArrayKokkos<real_t
   real_t unit_scaling = simparam->get_unit_scaling();
   bool topology_optimization_on = simparam->topology_optimization_on;
   direct_product_count = std::pow(num_gauss_points,num_dim);
-  real_t Elastic_Constant, Shear_Term, Pressure_Term, matrix_term;
+  real_t Elastic_Constant, Shear_Term, Pressure_Term, matrix_term, Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3], Lower_Poisson_Ratios[3];
   real_t matrix_subterm1, matrix_subterm2, matrix_subterm3, invJacobian, Jacobian, weight_multiply;
   real_t Element_Modulus, Poisson_Ratio;
   //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_nodes_1D(num_gauss_points);
@@ -2579,6 +2685,7 @@ void FEA_Module_Elasticity::local_matrix_multiply(int ielem, CArrayKokkos<real_t
   //initialize weights
   elements::legendre_nodes_1D(legendre_nodes_1D,num_gauss_points);
   elements::legendre_weights_1D(legendre_weights_1D,num_gauss_points);
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
 
   real_t current_density = 1;
 
@@ -2655,17 +2762,37 @@ void FEA_Module_Elasticity::local_matrix_multiply(int ielem, CArrayKokkos<real_t
 
     //look up element material properties at this point as a function of density
     if(topology_optimization_on){
-      Element_Material_Properties((size_t) ielem,Element_Modulus,Poisson_Ratio, current_density);
+      if(anisotropic_lattice){
+      Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios ,Shear_Moduli, current_density);
+      }
+      else{
+        Element_Material_Properties((size_t) ielem,Element_Modulus,Poisson_Ratio, current_density);
+      }
     }
     else{
       Element_Modulus = module_params->material.elastic_modulus/unit_scaling/unit_scaling;
       Poisson_Ratio = module_params->material.poisson_ratio;
+      if(anisotropic_lattice){
+        for(int idim=0 ; idim < num_dim; idim++){
+          Elastic_Moduli[idim] = module_params->material.elastic_moduli[idim];
+          Shear_Moduli[idim] = module_params->material.shear_moduli[idim];
+          Poisson_Ratios[idim] = module_params->material.poisson_ratios[idim];
+        }
+      }
     }
     
-    Elastic_Constant = Element_Modulus/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Shear_Term = 0.5-Poisson_Ratio;
-    Pressure_Term = 1 - Poisson_Ratio;
-
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Elastic_Constant = Element_Modulus/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
     //debug print
     //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
 
@@ -2678,18 +2805,35 @@ void FEA_Module_Elasticity::local_matrix_multiply(int ielem, CArrayKokkos<real_t
       C_matrix(2,2) = Shear_Term;
     }
     if(num_dim==3){
-      C_matrix(0,0) = Pressure_Term;
-      C_matrix(1,1) = Pressure_Term;
-      C_matrix(2,2) = Pressure_Term;
-      C_matrix(0,1) = Poisson_Ratio;
-      C_matrix(0,2) = Poisson_Ratio;
-      C_matrix(1,0) = Poisson_Ratio;
-      C_matrix(1,2) = Poisson_Ratio;
-      C_matrix(2,0) = Poisson_Ratio;
-      C_matrix(2,1) = Poisson_Ratio;
-      C_matrix(3,3) = Shear_Term;
-      C_matrix(4,4) = Shear_Term;
-      C_matrix(5,5) = Shear_Term;
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
     }
   
   /*
@@ -3266,10 +3410,11 @@ void FEA_Module_Elasticity::compute_adjoint_gradients(const_host_vec_array desig
   int z_quad,y_quad,x_quad, direct_product_count;
   size_t local_node_id, local_dof_idx, local_dof_idy, local_dof_idz;
   GO current_global_index;
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
 
   direct_product_count = std::pow(num_gauss_points,num_dim);
   real_t Element_Modulus_Gradient, Poisson_Ratio, gradient_force_density[3];
-  real_t Elastic_Constant, Shear_Term, Pressure_Term;
+  real_t Elastic_Constant, Shear_Term, Pressure_Term, Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3], Lower_Poisson_Ratios[3];
   real_t inner_product, matrix_term, Jacobian, invJacobian, weight_multiply;
   //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_nodes_1D(num_gauss_points);
   //CArrayKokkos<real_t, array_layout, device_type, memory_traits> legendre_weights_1D(num_gauss_points);
@@ -3531,11 +3676,25 @@ void FEA_Module_Elasticity::compute_adjoint_gradients(const_host_vec_array desig
     }
     
     //look up element material properties at this point as a function of density
-    Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
-    Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Shear_Term = 0.5 - Poisson_Ratio;
-    Pressure_Term = 1 - Poisson_Ratio;
-
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
     //debug print
     //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
 
@@ -3548,18 +3707,35 @@ void FEA_Module_Elasticity::compute_adjoint_gradients(const_host_vec_array desig
       C_matrix(2,2) = Shear_Term;
     }
     if(num_dim==3){
-      C_matrix(0,0) = Pressure_Term;
-      C_matrix(1,1) = Pressure_Term;
-      C_matrix(2,2) = Pressure_Term;
-      C_matrix(0,1) = Poisson_Ratio;
-      C_matrix(0,2) = Poisson_Ratio;
-      C_matrix(1,0) = Poisson_Ratio;
-      C_matrix(1,2) = Poisson_Ratio;
-      C_matrix(2,0) = Poisson_Ratio;
-      C_matrix(2,1) = Poisson_Ratio;
-      C_matrix(3,3) = Shear_Term;
-      C_matrix(4,4) = Shear_Term;
-      C_matrix(5,5) = Shear_Term;
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
     }
 
     //compute the previous multiplied by the Elastic (C) Matrix
@@ -3656,9 +3832,9 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
   const_host_vec_array direction_vec = direction_vec_distributed->getLocalView<HostSpace>(Tpetra::Access::ReadOnly);
 
   if(!adjoints_allocated){
-    adjoint_displacements_distributed = Teuchos::rcp(new MV(local_dof_map, 1));
-    adjoint_equation_RHS_distributed = Teuchos::rcp(new MV(local_dof_map, 1));
     all_adjoint_displacements_distributed = Teuchos::rcp(new MV(all_dof_map, 1));
+    adjoint_displacements_distributed = Teuchos::rcp(new MV(*all_adjoint_displacements_distributed, local_dof_map));
+    adjoint_equation_RHS_distributed = Teuchos::rcp(new MV(local_dof_map, 1));
     adjoints_allocated = true;
   }
   Teuchos::RCP<MV> lambda = adjoint_displacements_distributed;
@@ -3673,12 +3849,14 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
   int nodes_per_elem = elem->num_basis();
   int num_gauss_points = simparam->num_gauss_points;
   int z_quad,y_quad,x_quad, direct_product_count;
-  LO local_node_id, jlocal_node_id, temp_id, local_dof_id, local_dof_idx, local_dof_idy, local_dof_idz;
+  LO local_node_id, jlocal_node_id, temp_id, local_dof_id, local_dof_idx, local_dof_idy, local_dof_idz, access_index;
   GO current_global_index, global_dof_id;
   size_t local_nrows = nlocal_nodes*num_dim;
+  bool anisotropic_lattice = module_params->anisotropic_lattice;
 
   direct_product_count = std::pow(num_gauss_points,num_dim);
   real_t Element_Modulus_Gradient, Element_Modulus_Concavity, Poisson_Ratio, gradient_force_density[3];
+  real_t Elastic_Moduli[3], Shear_Moduli[3], Poisson_Ratios[3], Lower_Poisson_Ratios[3];
   real_t Elastic_Constant, Gradient_Elastic_Constant, Concavity_Elastic_Constant, Shear_Term, Pressure_Term;
   real_t inner_product, matrix_term, Jacobian, invJacobian, weight_multiply;
   real_t direction_vec_reduce, local_direction_vec_reduce;
@@ -3948,12 +4126,25 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
     }
     
     //look up element material properties at this point as a function of density
-    Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
     
-    Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Shear_Term = 0.5 - Poisson_Ratio;
-    Pressure_Term = 1 - Poisson_Ratio;
-
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Gradient_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
     //debug print
     //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
 
@@ -3966,18 +4157,35 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
       C_matrix(2,2) = Shear_Term;
     }
     if(num_dim==3){
-      C_matrix(0,0) = Pressure_Term;
-      C_matrix(1,1) = Pressure_Term;
-      C_matrix(2,2) = Pressure_Term;
-      C_matrix(0,1) = Poisson_Ratio;
-      C_matrix(0,2) = Poisson_Ratio;
-      C_matrix(1,0) = Poisson_Ratio;
-      C_matrix(1,2) = Poisson_Ratio;
-      C_matrix(2,0) = Poisson_Ratio;
-      C_matrix(2,1) = Poisson_Ratio;
-      C_matrix(3,3) = Shear_Term;
-      C_matrix(4,4) = Shear_Term;
-      C_matrix(5,5) = Shear_Term;
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Gradient_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Gradient_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Gradient_Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
     }
 
     //compute the previous multiplied by the Elastic (C) Matrix
@@ -4322,14 +4530,27 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
     }
 
     //look up element material properties at this point as a function of density
-    Concavity_Element_Material_Properties(ielem, Element_Modulus_Concavity, Poisson_Ratio, current_density);
-    Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    if(anisotropic_lattice){
+      Concavity_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    else{
+      Concavity_Element_Material_Properties(ielem, Element_Modulus_Concavity, Poisson_Ratio, current_density);
+      Gradient_Element_Material_Properties(ielem, Element_Modulus_Gradient, Poisson_Ratio, current_density);
+    }
     
-    Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Concavity_Elastic_Constant = Element_Modulus_Concavity/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
-    Shear_Term = 0.5 - Poisson_Ratio;
-    Pressure_Term = 1 - Poisson_Ratio;
-    //*fos << "Elastic Modulus Concavity" << Element_Modulus_Concavity << " " << Element_Modulus_Gradient << std::endl;
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Concavity_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    else{
+      Gradient_Elastic_Constant = Element_Modulus_Gradient/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Concavity_Elastic_Constant = Element_Modulus_Concavity/((1 + Poisson_Ratio)*(1 - 2*Poisson_Ratio));
+      Shear_Term = 0.5-Poisson_Ratio;
+      Pressure_Term = 1 - Poisson_Ratio;
+    }
     //debug print
     //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
 
@@ -4342,18 +4563,35 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
       C_matrix(2,2) = Shear_Term;
     }
     if(num_dim==3){
-      C_matrix(0,0) = Pressure_Term;
-      C_matrix(1,1) = Pressure_Term;
-      C_matrix(2,2) = Pressure_Term;
-      C_matrix(0,1) = Poisson_Ratio;
-      C_matrix(0,2) = Poisson_Ratio;
-      C_matrix(1,0) = Poisson_Ratio;
-      C_matrix(1,2) = Poisson_Ratio;
-      C_matrix(2,0) = Poisson_Ratio;
-      C_matrix(2,1) = Poisson_Ratio;
-      C_matrix(3,3) = Shear_Term;
-      C_matrix(4,4) = Shear_Term;
-      C_matrix(5,5) = Shear_Term;
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Concavity_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Concavity_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Concavity_Elastic_Constant;
+
+      }
+      else{
+        C_matrix(0,0) = Pressure_Term;
+        C_matrix(1,1) = Pressure_Term;
+        C_matrix(2,2) = Pressure_Term;
+        C_matrix(0,1) = Poisson_Ratio;
+        C_matrix(0,2) = Poisson_Ratio;
+        C_matrix(1,0) = Poisson_Ratio;
+        C_matrix(1,2) = Poisson_Ratio;
+        C_matrix(2,0) = Poisson_Ratio;
+        C_matrix(2,1) = Poisson_Ratio;
+        C_matrix(3,3) = Shear_Term;
+        C_matrix(4,4) = Shear_Term;
+        C_matrix(5,5) = Shear_Term;
+      }
     }
 
     //compute the previous multiplied by the Elastic (C) Matrix
@@ -4367,17 +4605,17 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
     }
     
     //compute the contributions of this quadrature point to all the local stiffness matrix elements
-      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
-        for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
-          matrix_term = 0;
-          for(int span = 0; span < Brows; span++){
-            matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
-          }
-          Local_Matrix_Contribution(ifill,jfill) = matrix_term;
-          if(jfill!=ifill)
-            Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+    for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+      for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+        matrix_term = 0;
+        for(int span = 0; span < Brows; span++){
+          matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
         }
+        Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+        if(jfill!=ifill)
+          Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
       }
+    }
       
     //compute inner product for this quadrature point contribution
     inner_product = 0;
@@ -4410,6 +4648,66 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
         }
       }
     }
+
+    //look up element material properties at this point as a function of density
+    if(anisotropic_lattice){
+      Gradient_Element_Anisotropic_Material_Properties((size_t) ielem,Elastic_Moduli, Poisson_Ratios, Shear_Moduli, current_density);
+    }
+    
+    if(anisotropic_lattice){
+      Lower_Poisson_Ratios[0] = Poisson_Ratios[0]*Elastic_Moduli[1]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[1] = Poisson_Ratios[1]*Elastic_Moduli[2]/Elastic_Moduli[0];
+      Lower_Poisson_Ratios[2] = Poisson_Ratios[2]*Elastic_Moduli[2]/Elastic_Moduli[1];
+      Gradient_Elastic_Constant = 1/(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0] - Poisson_Ratios[1]*Lower_Poisson_Ratios[1] - Poisson_Ratios[2]*Lower_Poisson_Ratios[2]
+                          -2*Poisson_Ratios[0]*Poisson_Ratios[1]*Poisson_Ratios[2]);
+    }
+    //debug print
+    //std::cout << "Element Material Params " << Elastic_Constant << std::endl;
+
+    //compute Elastic (C) matrix
+    if(num_dim==3){
+      if(anisotropic_lattice){
+        C_matrix(0,0) = Elastic_Moduli[0]*(1-Poisson_Ratios[2]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,1) = Elastic_Moduli[1]*(1-Poisson_Ratios[1]*Lower_Poisson_Ratios[1]);
+        C_matrix(2,2) = Elastic_Moduli[2]*(1-Poisson_Ratios[0]*Lower_Poisson_Ratios[0]);
+        C_matrix(0,1) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[0]+Lower_Poisson_Ratios[1]*Poisson_Ratios[2]);
+        C_matrix(0,2) = Elastic_Moduli[0]*(Lower_Poisson_Ratios[1]+Lower_Poisson_Ratios[0]*Lower_Poisson_Ratios[2]);
+        C_matrix(1,0) = C_matrix(0,1);
+        C_matrix(1,2) = Elastic_Moduli[1]*(Lower_Poisson_Ratios[2]+Lower_Poisson_Ratios[1]*Poisson_Ratios[0]);
+        C_matrix(2,0) = C_matrix(0,2);
+        C_matrix(2,1) = C_matrix(1,2);
+        C_matrix(3,3) = 2*Shear_Moduli[0]/Gradient_Elastic_Constant;
+        C_matrix(4,4) = 2*Shear_Moduli[1]/Gradient_Elastic_Constant;
+        C_matrix(5,5) = 2*Shear_Moduli[2]/Gradient_Elastic_Constant;
+
+      }
+    }
+
+    
+    if(anisotropic_lattice){
+      //compute the previous multiplied by the Elastic (C) Matrix
+      for(int irow=0; irow < Brows; irow++){
+        for(int icol=0; icol < num_dim*nodes_per_elem; icol++){
+          CB_matrix_contribution(irow,icol) = 0;
+          for(int span=0; span < Brows; span++){
+            CB_matrix_contribution(irow,icol) += C_matrix(irow,span)*B_matrix_contribution(span,icol);
+          }
+        }
+      }
+      
+      //compute the contributions of this quadrature point to all the local stiffness matrix elements
+      for(int ifill=0; ifill < num_dim*nodes_per_elem; ifill++){
+        for(int jfill=ifill; jfill < num_dim*nodes_per_elem; jfill++){
+          matrix_term = 0;
+          for(int span = 0; span < Brows; span++){
+            matrix_term += B_matrix_contribution(span,ifill)*CB_matrix_contribution(span,jfill);
+          }
+          Local_Matrix_Contribution(ifill,jfill) = matrix_term;
+          if(jfill!=ifill)
+            Local_Matrix_Contribution(jfill,ifill) = Local_Matrix_Contribution(ifill,jfill);
+        }
+      }
+    }
     
     //compute inner product for this quadrature point contribution
     inner_product = 0;
@@ -4454,6 +4752,18 @@ void FEA_Module_Elasticity::compute_adjoint_hessian_vec(const_host_vec_array des
       }
     }
   }//end element loop for hessian vector product
+
+  //restore values of K for any scalar or matrix vector products
+  if(matrix_bc_reduced){
+    for(LO i = 0; i < local_nrows; i++){
+      for(LO j = 0; j < Original_Stiffness_Entries_Strides(i); j++){
+        access_index = Original_Stiffness_Entry_Indices(i,j);
+        Stiffness_Matrix(i,access_index) = Original_Stiffness_Entries(i,j);
+      }
+    }//row for
+    matrix_bc_reduced = false;
+  }
+
   hessvec_time += Implicit_Solver_Pointer_->CPU_Time() - current_cpu_time;
 }
 
@@ -5180,9 +5490,9 @@ int FEA_Module_Elasticity::solve(){
   int nodes_per_elem = max_nodes_per_element;
   int local_node_index, current_row, current_column;
   int max_stride = 0;
-  size_t access_index, row_access_index, row_counter;
+  size_t row_access_index, row_counter;
   GO global_index, global_dof_index;
-  LO local_dof_index;
+  LO local_dof_index, access_index;
 
   //*fos << Amesos2::version() << std::endl << std::endl;
 
@@ -5456,7 +5766,7 @@ int FEA_Module_Elasticity::solve(){
   //*fos << std::endl;
   //std::fflush(stdout);
     
-  int num_iter = 2000;
+  int num_iter = 3000;
   double solve_tol = 1e-06;
   int cacheSize = 0;
   std::string solveType         = "belos";
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.h b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.h
index 933d6c6a7..cc6f6fcb7 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/FEA_Physics_Modules/FEA_Module_Elasticity.h
@@ -125,6 +125,12 @@ class FEA_Module_Elasticity: public FEA_Module{
 
   void Concavity_Element_Material_Properties(size_t ielem, real_t &Element_Modulus, real_t &Poisson_Ratio, real_t density);
 
+  void Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli[3], real_t Poisson_Ratios[3], real_t Shear_Moduli[3], real_t density);
+
+  void Gradient_Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli[3], real_t Poisson_Ratios[3], real_t Shear_Moduli[3], real_t density);
+
+  void Concavity_Element_Anisotropic_Material_Properties(size_t ielem, real_t Element_Moduli[3], real_t Poisson_Ratios[3], real_t Shear_Moduli[3], real_t density);
+
   void Body_Term(size_t ielem, real_t density, real_t *forces);
 
   void Gradient_Body_Term(size_t ielem, real_t density, real_t *forces);
@@ -148,6 +154,11 @@ class FEA_Module_Elasticity: public FEA_Module{
   void collect_output(Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > global_reduce_map);
 
   void node_density_constraints(host_vec_array node_densities_lower_bound);
+
+  //for the displacement constraint adjoint solves
+  void compute_displacement_constraint_gradients(const_host_vec_array design_densities, const_host_vec_array target_displacements, const_host_int_array active_dofs, host_vec_array gradients);
+
+  void compute_displacement_constraint_hessian_vec(const_host_vec_array design_densities, const_host_vec_array target_displacements, const_host_int_array active_dofs, host_vec_array hessvec, Teuchos::RCP<const MV> direction_vec_distributed);
   
   Elasticity_Parameters *module_params;
   Implicit_Solver *Implicit_Solver_Pointer_;
@@ -174,10 +185,14 @@ class FEA_Module_Elasticity: public FEA_Module{
   Teuchos::RCP<MV> all_node_displacements_distributed;
   Teuchos::RCP<MV> all_cached_node_displacements_distributed;
   Teuchos::RCP<MV> all_node_strains_distributed;
-  bool adjoints_allocated;
+  bool adjoints_allocated, constraint_adjoints_allocated;
   Teuchos::RCP<MV> adjoint_displacements_distributed;
+  Teuchos::RCP<MV> psi_adjoint_vector_distributed;
+  Teuchos::RCP<MV> phi_adjoint_vector_distributed;
   Teuchos::RCP<MV> adjoint_equation_RHS_distributed;
   Teuchos::RCP<MV> all_adjoint_displacements_distributed;
+  Teuchos::RCP<MV> all_psi_adjoint_vector_distributed;
+  Teuchos::RCP<MV> all_phi_adjoint_vector_distributed;
   Teuchos::RCP<MAT> Global_Stiffness_Matrix;
   Teuchos::RCP<MAT> Global_Mass_Matrix;
   Teuchos::RCP<MV> Global_Nodal_RHS;
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.cpp b/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.cpp
index 160f770ce..4bf472a58 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.cpp
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.cpp
@@ -190,6 +190,10 @@ void Implicit_Solver::run(){
     real_t linear_solve_time = 0;
     real_t hessvec_time = 0;
     real_t hessvec_linear_time = 0;
+
+    if(simparam.output_options.convert_to_tecplot){
+      output_design(0,simparam.output_options.convert_to_tecplot);
+    }
     
     // ---- Find Boundaries on mesh ---- //
     init_boundaries();
@@ -331,8 +335,10 @@ void Implicit_Solver::read_mesh_ansys_dat(const char *MESH){
   GO node_gid;
   real_t dof_value;
   host_vec_array node_densities;
+  bool zero_index_base = input_options.zero_index_base;
+  int negative_index_found = 0;
+  int global_negative_index_found = 0;
 
-  
   //Nodes_Per_Element_Type =  elements::elem_types::Nodes_Per_Element_Type;
 
   //read the mesh
@@ -771,11 +777,25 @@ void Implicit_Solver::read_mesh_ansys_dat(const char *MESH){
         //as we loop through the nodes belonging to this element we store them
         //if any of these nodes belongs to this rank this list is used to store the element locally
         node_gid = atoi(&read_buffer(scan_loop,inode,0));
-        node_store(inode-elem_words_per_line_no_nodes) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(zero_index_base)
+          node_store(inode-elem_words_per_line_no_nodes) = node_gid; //subtract 1 since file index start is 1 but code expects 0
+        else
+          node_store(inode-elem_words_per_line_no_nodes) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(node_store(inode-elem_words_per_line_no_nodes) < 0){
+          negative_index_found = 1;
+        }
         //first we add the elements to a dynamically allocated list
-        if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
-          assign_flag = 1;
-          rnum_elem++;
+        if(zero_index_base){
+          if(map->isNodeGlobalElement(node_gid)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
+        }
+        else{
+          if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
         }
       }
 
@@ -862,16 +882,29 @@ void Implicit_Solver::read_mesh_ansys_dat(const char *MESH){
   dual_nodes_in_elem.modify_host();
 
   for(int ielem = 0; ielem < rnum_elem; ielem++)
-    for(int inode = 0; inode < nodes_per_element; inode++)
+    for(int inode = 0; inode < nodes_per_element; inode++){
       nodes_in_elem(ielem, inode) = element_temp[ielem*nodes_per_element + inode];
+    }
 
   //view storage for all local elements connected to local nodes on this rank
-  CArrayKokkos<GO, array_layout, device_type, memory_traits> All_Element_Global_Indices(rnum_elem);
+  Kokkos::DualView <GO*, array_layout, device_type, memory_traits> All_Element_Global_Indices("All_Element_Global_Indices",rnum_elem);
 
   //copy temporary global indices storage to view storage
-  for(int ielem = 0; ielem < rnum_elem; ielem++)
-    All_Element_Global_Indices(ielem) = global_indices_temp[ielem];
-
+  for(int ielem = 0; ielem < rnum_elem; ielem++){
+    All_Element_Global_Indices.h_view(ielem) = global_indices_temp[ielem];
+    if(global_indices_temp[ielem]<0){
+      negative_index_found = 1;
+    }
+  }
+  
+  MPI_Allreduce(&negative_index_found,&global_negative_index_found,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
+  if(global_negative_index_found){
+    if(myrank==0){
+    std::cout << "Node index less than or equal to zero detected; set \"zero_index_base: true\" under \"input_options\" in your yaml file if indices start at 0" << std::endl;
+    }
+    exit_solver(0);
+  }
+  
   //debug print element edof
   /*
   std::cout << " ------------ELEMENT EDOF ON TASK " << myrank << " --------------"<<std::endl;
@@ -891,9 +924,12 @@ void Implicit_Solver::read_mesh_ansys_dat(const char *MESH){
   //delete temporary element connectivity and index storage
   std::vector<size_t>().swap(element_temp);
   std::vector<size_t>().swap(global_indices_temp);
+
+  All_Element_Global_Indices.modify_host();
+  All_Element_Global_Indices.sync_device();
   
   //construct overlapping element map (since different ranks can own the same elements due to the local node map)
-  all_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),All_Element_Global_Indices.get_kokkos_view(),0,comm));
+  all_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),All_Element_Global_Indices.d_view,0,comm));
 
 
   //element type selection (subject to change)
@@ -1317,10 +1353,64 @@ void Implicit_Solver::setup_optimization_problem(){
         *fos << " MASS CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         eq_constraint = ROL::makePtr<MassConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][0], false);
       }
+      else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Displacement_Constraint){
+        //simple test of assignment to the vector for constraint dofs
+        Teuchos::RCP<MV> target_displacements = Teuchos::rcp(new MV(local_dof_map, 1));
+        Teuchos::RCP<Tpetra::MultiVector<int,LO,GO>> active_dofs = Teuchos::rcp(new Tpetra::MultiVector<int,LO,GO>(local_dof_map, 1));
+        active_dofs->putScalar(0);
+        host_vec_array target_displacements_view = target_displacements->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+        host_ivec_array active_dofs_view = active_dofs->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+        std::string constraint_filename = simparam.optimization_options.constraints[imodule-nmulti_objective_modules-1].argument_file_name;
+        //std::cout << "DISPLACEMENT CONSTRAINT SETTING FILE NAME " << constraint_filename << std::endl;
+        std::ifstream *disp_in = NULL;
+        disp_in = new std::ifstream();
+        disp_in->open(constraint_filename);
+        if (!(*disp_in)) throw std::runtime_error(std::string("Can't open ") + constraint_filename);
+        if (disp_in->is_open()) {
+        std::string line;
+          while (std::getline(*disp_in, line)) { // Read each line until the end of the file
+            std::istringstream ss(line); // Create a string stream from the line
+            GO tag_id;
+            real_t target_displacement_value;
+              // Process each number (replace this with your own logic)
+              ss >> tag_id;
+              ss >> target_displacement_value;
+              //std::cout << "Read number: " << tag_id << " " << target_displacement_value << std::endl;
+              if(map->isNodeGlobalElement(tag_id)){
+                LO local_node_id = map->getLocalElement(tag_id);
+                active_dofs_view(local_node_id*num_dim,0) = 1;
+                target_displacements_view(local_node_id*num_dim,0) = target_displacement_value;
+              }
+
+          }
+          disp_in->close(); // Close the file
+        }
+        *fos << " DISPLACEMENT CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
+        eq_constraint = ROL::makePtr<DisplacementConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, target_displacements, active_dofs, 0, false);
+
+        //ROL::Ptr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>> rol_x =
+        //ROL::makePtr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>>(design_node_densities_distributed);
+        //construct direction vector for check
+        //Teuchos::RCP<MV> directions_distributed = Teuchos::rcp(new MV(map, 1));
+        //directions_distributed->putScalar(1);
+        //directions_distributed->randomize(-0.8,1);
+        // ROL::Ptr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>> rol_d =
+        // ROL::makePtr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>>(directions_distributed);
+        
+        // ROL::Ptr<std::vector<real_t> > c_ptr = ROL::makePtr<std::vector<real_t>>(1,1.0);
+        // ROL::Ptr<ROL::Vector<real_t> > constraint_buf = ROL::makePtr<ROL::StdVector<real_t>>(c_ptr);
+        //std::cout << " VALUE TEST " << obj_value << std::endl;
+        //eq_constraint->checkApplyJacobian(*rol_x, *rol_d, *constraint_buf);
+        //eq_constraint->checkApplyAdjointHessian(*rol_x, *constraint_buf, *rol_d, *rol_x);
+      }
       else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Moment_of_Inertia_Constraint){
         *fos << " MOMENT OF INERTIA CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         eq_constraint = ROL::makePtr<MomentOfInertiaConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][1], Function_Arguments[imodule][0], false);
       }
+      else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Center_of_Mass_Constraint){
+        *fos << " CENTER OF MASS CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
+        eq_constraint = ROL::makePtr<CenterOfMassConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][1], Function_Arguments[imodule][0], false);
+      }
       else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Strain_Energy_Constraint){    
         *fos << " STRAIN ENERGY CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         eq_constraint = ROL::makePtr<StrainEnergyConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][0], false);
@@ -1349,10 +1439,24 @@ void Implicit_Solver::setup_optimization_problem(){
         *fos << " MASS CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         ineq_constraint = ROL::makePtr<MassConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag);
       }
+      else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Displacement_Constraint){
+        //simple test of assignment to the vector for constraint dofs
+        Teuchos::RCP<MV> target_displacements = Teuchos::rcp(new MV(local_dof_map, 1));
+        Teuchos::RCP<Tpetra::MultiVector<int,LO,GO>> active_dofs = Teuchos::rcp(new Tpetra::MultiVector<int,LO,GO>(local_dof_map, 1));
+        active_dofs->putScalar(0);
+        host_vec_array target_displacements_view = target_displacements->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+        host_ivec_array active_dofs_view = active_dofs->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+        *fos << " DISPLACEMENT CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
+        ineq_constraint = ROL::makePtr<DisplacementConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, target_displacements, active_dofs);
+      }
       else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Moment_of_Inertia_Constraint){
         *fos << " MOMENT OF INERTIA CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         ineq_constraint = ROL::makePtr<MomentOfInertiaConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][1], Function_Arguments[imodule][0]);
       }
+      else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Center_of_Mass_Constraint){
+        *fos << " CENTER OF MASS CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
+        ineq_constraint = ROL::makePtr<CenterOfMassConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][1], Function_Arguments[imodule][0]);
+      }
       else if(TO_Module_List[imodule] == TO_MODULE_TYPE::Strain_Energy_Constraint){
         *fos << " STRAIN ENERGY CONSTRAINT EXPECTS FEA MODULE INDEX " <<TO_Module_My_FEA_Module[imodule] << std::endl;
         ineq_constraint = ROL::makePtr<StrainEnergyConstraint_TopOpt>(fea_modules[TO_Module_My_FEA_Module[imodule]], nodal_density_flag, Function_Arguments[imodule][2]);
@@ -1686,7 +1790,7 @@ void Implicit_Solver::collect_information(){
    Sort Information for parallel file output
 ------------------------------------------------------------------------- */
 
-void Implicit_Solver::sort_information(){
+void Implicit_Solver::sort_information(bool mesh_conversion_flag){
   GO nreduce_nodes = 0;
   GO nreduce_elem = 0;
   int num_dim = simparam.num_dims;
@@ -1702,19 +1806,20 @@ void Implicit_Solver::sort_information(){
   //comms to collect
   sorted_node_coords_distributed->doImport(*node_coords_distributed, node_sorting_importer, Tpetra::INSERT);
 
-  //comms to collect FEA module related vector data
-  for (int imodule = 0; imodule < nfea_modules; imodule++){
-    fea_modules[imodule]->sort_output(sorted_map);
-    //collected_node_displacements_distributed->doImport(*(fea_elasticity->node_displacements_distributed), dof_collection_importer, Tpetra::INSERT);
-  }
-  
-
-  //collected nodal density information
-  sorted_node_densities_distributed = Teuchos::rcp(new MV(sorted_map, 1));
+  if(!mesh_conversion_flag){
+    //comms to collect FEA module related vector data
+    for (int imodule = 0; imodule < nfea_modules; imodule++){
+      fea_modules[imodule]->sort_output(sorted_map);
+      //collected_node_displacements_distributed->doImport(*(fea_elasticity->node_displacements_distributed), dof_collection_importer, Tpetra::INSERT);
+    }
+    
 
-  //comms to collect
-  sorted_node_densities_distributed->doImport(*design_node_densities_distributed, node_sorting_importer, Tpetra::INSERT);
+    //collected nodal density information
+    sorted_node_densities_distributed = Teuchos::rcp(new MV(sorted_map, 1));
 
+    //comms to collect
+    sorted_node_densities_distributed->doImport(*design_node_densities_distributed, node_sorting_importer, Tpetra::INSERT);
+  }
   //sort element connectivity
   sorted_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(num_elem,0,comm));
   Tpetra::Import<LO, GO> element_sorting_importer(all_element_map, sorted_element_map);
@@ -1730,10 +1835,10 @@ void Implicit_Solver::sort_information(){
    Output Model Information in tecplot format
 ------------------------------------------------------------------------- */
 
-void Implicit_Solver::output_design(int current_step){
+void Implicit_Solver::output_design(int current_step, bool mesh_conversion_flag){
   if(current_step!=last_print_step){
     last_print_step = current_step;
-    parallel_tecplot_writer();
+    parallel_tecplot_writer(mesh_conversion_flag);
   }
 
 }
@@ -1742,7 +1847,7 @@ void Implicit_Solver::output_design(int current_step){
    Output Model Information in tecplot format
 ------------------------------------------------------------------------- */
 
-void Implicit_Solver::parallel_tecplot_writer(){
+void Implicit_Solver::parallel_tecplot_writer(bool mesh_conversion_flag){
   
   int num_dim = simparam.num_dims;
 	std::string current_file_name;
@@ -1766,9 +1871,11 @@ void Implicit_Solver::parallel_tecplot_writer(){
     fea_modules[imodule]->compute_output();
   }
   
-  sort_information();
+  sort_information(mesh_conversion_flag);
   const_host_vec_array sorted_node_coords = sorted_node_coords_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
-  const_host_vec_array sorted_node_densities = sorted_node_densities_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+  const_host_vec_array sorted_node_densities;
+  if(!mesh_conversion_flag)
+    sorted_node_densities = sorted_node_densities_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
   const_host_elem_conn_array sorted_nodes_in_elem = sorted_nodes_in_elem_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
   // Convert ijk index system to the finite element numbering convention
   // for vertices in cell
@@ -1796,6 +1903,8 @@ void Implicit_Solver::parallel_tecplot_writer(){
     current_file_name = base_file_name_undeformed + file_count + file_extension;
   else
     current_file_name = base_file_name + file_count + file_extension;
+  
+  if(mesh_conversion_flag) current_file_name = "Converted_Tecplot_Mesh.dat";
 
   MPI_File_open(MPI_COMM_WORLD, current_file_name.c_str(), 
                 MPI_MODE_CREATE|MPI_MODE_WRONLY, 
@@ -1821,18 +1930,28 @@ void Implicit_Solver::parallel_tecplot_writer(){
   //myfile << "VARIABLES = \"x\", \"y\", \"z\", \"density\", \"sigmaxx\", \"sigmayy\", \"sigmazz\", \"sigmaxy\", \"sigmaxz\", \"sigmayz\"" "\n";
   //else
   current_line_stream.str("");
-  if(num_dim == 2)
-	  current_line_stream << "VARIABLES = \"x\", \"y\", \"density\"";
-  else if(num_dim == 3)
-	  current_line_stream << "VARIABLES = \"x\", \"y\", \"z\", \"density\"";
-  for (int imodule = 0; imodule < nfea_modules; imodule++){
-    for(int ioutput = 0; ioutput < fea_modules[imodule]->noutput; ioutput++){
-      nvector = fea_modules[imodule]->output_vector_sizes[ioutput];
-      for(int ivector = 0; ivector < nvector; ivector++){
-        current_line_stream << ", \"" << fea_modules[imodule]->output_dof_names[ioutput][ivector] << "\"";
+  if(mesh_conversion_flag){
+    if(num_dim == 2)
+      current_line_stream << "VARIABLES = \"x\", \"y\"";
+    else if(num_dim == 3)
+      current_line_stream << "VARIABLES = \"x\", \"y\", \"z\"";
+  }
+  else{
+    if(num_dim == 2)
+      current_line_stream << "VARIABLES = \"x\", \"y\", \"density\"";
+    else if(num_dim == 3)
+      current_line_stream << "VARIABLES = \"x\", \"y\", \"z\", \"density\"";
+
+    for (int imodule = 0; imodule < nfea_modules; imodule++){
+      for(int ioutput = 0; ioutput < fea_modules[imodule]->noutput; ioutput++){
+        nvector = fea_modules[imodule]->output_vector_sizes[ioutput];
+        for(int ivector = 0; ivector < nvector; ivector++){
+          current_line_stream << ", \"" << fea_modules[imodule]->output_dof_names[ioutput][ivector] << "\"";
+        }
       }
     }
   }
+  
   current_line = current_line_stream.str();
   if(myrank == 0)
     MPI_File_write(myfile_parallel,current_line.c_str(),current_line.length(), MPI_CHAR, MPI_STATUS_IGNORE);
@@ -1871,13 +1990,15 @@ void Implicit_Solver::parallel_tecplot_writer(){
   //output nodal data
   //compute buffer output size and file stream offset for this MPI rank
   int default_dof_count = num_dim;
-  default_dof_count++;
+  if(!mesh_conversion_flag) default_dof_count++;
   int buffer_size_per_node_line = 26*default_dof_count + 1; //25 width + 1 space per number plus line terminator
-  for (int imodule = 0; imodule < nfea_modules; imodule++){
-    noutput = fea_modules[imodule]->noutput;
-    for(int ioutput = 0; ioutput < noutput; ioutput++){
-      nvector = fea_modules[imodule]->output_vector_sizes[ioutput];  
-      buffer_size_per_node_line += 26*nvector;
+  if(!mesh_conversion_flag){
+    for (int imodule = 0; imodule < nfea_modules; imodule++){
+      noutput = fea_modules[imodule]->noutput;
+      for(int ioutput = 0; ioutput < noutput; ioutput++){
+        nvector = fea_modules[imodule]->output_vector_sizes[ioutput];  
+        buffer_size_per_node_line += 26*nvector;
+      }
     }
   }
   int nlocal_sorted_nodes = sorted_map->getLocalNumElements();
@@ -1896,21 +2017,23 @@ void Implicit_Solver::parallel_tecplot_writer(){
 		current_line_stream << std::setw(25) << sorted_node_coords(nodeline,2) << " ";
 
     //velocity print
-    current_line_stream << std::setw(25) << sorted_node_densities(nodeline,0) << " ";
-    
-    for (int imodule = 0; imodule < nfea_modules; imodule++){
-      noutput = fea_modules[imodule]->noutput;
-      for(int ioutput = 0; ioutput < noutput; ioutput++){
-        current_sorted_output = fea_modules[imodule]->module_outputs[ioutput];
-        nvector = fea_modules[imodule]->output_vector_sizes[ioutput];
-        if(fea_modules[imodule]->vector_style[ioutput] == FEA_Module::DOF){
-          for(int ivector = 0; ivector < nvector; ivector++){
-           current_line_stream << std::setw(25) << current_sorted_output(nodeline*nvector + ivector,0) << " ";
+    if(!mesh_conversion_flag){
+      current_line_stream << std::setw(25) << sorted_node_densities(nodeline,0) << " ";
+
+      for (int imodule = 0; imodule < nfea_modules; imodule++){
+        noutput = fea_modules[imodule]->noutput;
+        for(int ioutput = 0; ioutput < noutput; ioutput++){
+          current_sorted_output = fea_modules[imodule]->module_outputs[ioutput];
+          nvector = fea_modules[imodule]->output_vector_sizes[ioutput];
+          if(fea_modules[imodule]->vector_style[ioutput] == FEA_Module::DOF){
+            for(int ivector = 0; ivector < nvector; ivector++){
+            current_line_stream << std::setw(25) << current_sorted_output(nodeline*nvector + ivector,0) << " ";
+            }
           }
-        }
-        if(fea_modules[imodule]->vector_style[ioutput] == FEA_Module::NODAL){
-          for(int ivector = 0; ivector < nvector; ivector++){
-            current_line_stream << std::setw(25) << current_sorted_output(nodeline,ivector) << " ";
+          if(fea_modules[imodule]->vector_style[ioutput] == FEA_Module::NODAL){
+            for(int ivector = 0; ivector < nvector; ivector++){
+              current_line_stream << std::setw(25) << current_sorted_output(nodeline,ivector) << " ";
+            }
           }
         }
       }
@@ -1976,7 +2099,7 @@ void Implicit_Solver::parallel_tecplot_writer(){
   MPI_Barrier(world);
 
   //Displaced Geometry File option
-  if(displacement_module>=0&&displace_geometry){
+  if(displacement_module>=0&&(displace_geometry&&!mesh_conversion_flag)){
     current_line_stream.str("");
     header_stream_offset = 0;
     //deformed geometry
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.h b/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.h
index 70d69a290..217c6abd1 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Implicit_Solver.h
@@ -84,7 +84,7 @@ class Implicit_Solver: public Solver{
 
   void collect_information();
 
-  void sort_information();
+  void sort_information(bool mesh_conversion_flag = false);
 
   //process input to decide TO problem and FEA modules
   void FEA_module_setup();
@@ -106,11 +106,11 @@ class Implicit_Solver: public Solver{
 
   void init_topology_conditions (int num_sets);
 
-  void output_design(int current_step);
+  void output_design(int current_step, bool mesh_conversion_flag = false);
 
   void tecplot_writer();
 
-  void parallel_tecplot_writer();
+  void parallel_tecplot_writer(bool mesh_conversion_flag = false);
 
   //void init_boundary_sets(int num_boundary_sets);
 
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Center_of_Mass_Constraint.h b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Center_of_Mass_Constraint.h
index 5fd97db75..bfc96c04c 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Center_of_Mass_Constraint.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Center_of_Mass_Constraint.h
@@ -90,12 +90,12 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
   FEA_Module_Inertial *FEM_;
   ROL::Ptr<ROL_MV> ROL_Element_Masses;
   ROL::Ptr<ROL_MV> ROL_Element_Moments;
-  ROL::Ptr<ROL_MV> ROL_Gradients;
+  ROL::Ptr<ROL_MV> ROL_Gradients, ROL_Mass_Gradients;
   Teuchos::RCP<MV> constraint_gradients_distributed;
   Teuchos::RCP<MV> mass_gradients_distributed;
   real_t initial_center_of_mass;
   bool inequality_flag_;
-  real_t constraint_value_;
+  real_t constraint_value_, normalization_value;
   int constraint_component_;
 
   ROL::Ptr<const MV> getVector( const V& x ) {
@@ -111,7 +111,7 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
   int last_comm_step, current_step, last_solve_step;
   std::string my_fea_module = "Inertial";
 
-  CenterOfMassConstraint_TopOpt(FEA_Module *FEM, bool nodal_density_flag, int constraint_component, bool inequality_flag=true, real_t constraint_value=0) 
+  CenterOfMassConstraint_TopOpt(FEA_Module *FEM, bool nodal_density_flag, int constraint_component, real_t constraint_value=0, bool inequality_flag=true) 
   {
     FEM_ = dynamic_cast<FEA_Module_Inertial*>(FEM);
     nodal_density_flag_ = nodal_density_flag;
@@ -120,6 +120,8 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
     inequality_flag_ = inequality_flag;
     constraint_value_ = constraint_value;
     constraint_component_ = constraint_component;
+    normalization_value = constraint_value_;
+    if(constraint_value_==0) normalization_value = 1;
     int num_dim = FEM_->simparam->num_dims;
     ROL_Element_Masses = ROL::makePtr<ROL_MV>(FEM_->Global_Element_Masses);
     if(constraint_component_ == 0)
@@ -131,11 +133,9 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
 
     const_host_vec_array design_densities = FEM_->design_node_densities_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
     
-    FEM_->compute_element_moments(design_densities,true, constraint_component_);
-    
     //sum per element results across all MPI ranks
     ROL::Elementwise::ReductionSum<real_t> sumreduc;
-    real_t initial_moment = ROL_Element_Moments->reduce(sumreduc);
+    real_t initial_moment;
     real_t initial_mass;
 
     if(FEM_->mass_init) { initial_mass = FEM_->mass; }
@@ -147,8 +147,16 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
       FEM_->mass_init = true;
     }
     
-    FEM_->com_init[constraint_component_] = true;
-    FEM_->center_of_mass[constraint_component_] = initial_center_of_mass = initial_moment/initial_mass;
+    if(FEM_->com_init[constraint_component_]) { initial_center_of_mass = FEM_->center_of_mass[constraint_component_]; }
+    else{
+      FEM_->compute_element_moments(design_densities,true, constraint_component_, false);
+
+      //sum per element results across all MPI ranks
+      ROL::Elementwise::ReductionSum<real_t> sumreduc;
+      initial_moment = ROL_Element_Moments->reduce(sumreduc);
+      FEM_->com_init[constraint_component_] = true;
+      FEM_->center_of_mass[constraint_component_] = initial_center_of_mass = initial_moment/initial_mass;
+    }
 
     //debug print
     if(FEM_->myrank==0){
@@ -165,7 +173,7 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
     mass_gradients_distributed = FEM_->mass_gradients_distributed; 
 
     if(FEM_->center_of_mass_gradients_distributed.is_null()) 
-      FEM_->center_of_mass_gradients_distributed = Teuchos::rcp(new MV(FEM_->map, num_dim));
+      FEM_->center_of_mass_gradients_distributed = Teuchos::rcp(new MV(FEM_->map, 1));
     constraint_gradients_distributed = FEM_->center_of_mass_gradients_distributed;
   }
 
@@ -216,10 +224,10 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
     }
     
     if(inequality_flag_){
-      (*cp)[0] = current_com;
+      (*cp)[0] = current_com/normalization_value;
     }
     else{
-      (*cp)[0] = current_com - constraint_value_;
+      (*cp)[0] = (current_com - constraint_value_)/normalization_value;
     }
 
     //std::cout << "Ended constraint value on task " <<FEM_->myrank <<std::endl;
@@ -285,9 +293,9 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
       //*fos << std::endl;
       //std::fflush(stdout);
     for(int i = 0; i < FEM_->nlocal_nodes; i++){
-      constraint_gradients(i,constraint_component_) /= current_mass;
-      constraint_gradients(i,constraint_component_) -= mass_gradients(i,0)*current_com/current_mass;
-      constraint_gradients(i,constraint_component_) *= (*vp)[0];
+      constraint_gradients(i,0) /= current_mass;
+      constraint_gradients(i,0) -= mass_gradients(i,0)*current_com/current_mass;
+      constraint_gradients(i,0) *= (*vp)[0]/normalization_value;
     }
     
     
@@ -298,7 +306,7 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
   
   void applyJacobian(ROL::Vector<real_t> &jv, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &x, real_t &tol) override {
     //std::cout << "Started constraint grad on task " <<FEM_->myrank  << std::endl;
-     //get Tpetra multivector pointer from the ROL vector
+    //get Tpetra multivector pointer from the ROL vector
     ROL::Ptr<const MV> zp = getVector(x);
     ROL::Ptr<std::vector<real_t>> jvp = dynamic_cast<ROL::StdVector<real_t>&>(jv).getVector();
     
@@ -346,8 +354,8 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
     FEM_->compute_nodal_gradients(design_densities, mass_gradients);
 
     for(int i = 0; i < FEM_->nlocal_nodes; i++){
-      constraint_gradients(i,constraint_component_) /= current_mass;
-      constraint_gradients(i,constraint_component_) -= mass_gradients(i,0)*current_com/current_mass;
+      constraint_gradients(i,0) /= current_mass;
+      constraint_gradients(i,0) -= mass_gradients(i,0)*current_com/current_mass;
     }
 
     ROL_Gradients = ROL::makePtr<ROL_MV>(constraint_gradients_distributed);
@@ -355,95 +363,78 @@ class CenterOfMassConstraint_TopOpt : public ROL::Constraint<real_t> {
     //debug print
     //std::cout << "Constraint Gradient value " << gradient_dot_v << std::endl;
 
-    (*jvp)[0] = gradient_dot_v;
+    (*jvp)[0] = gradient_dot_v/normalization_value;
     //std::cout << "Ended constraint grad on task " <<FEM_->myrank  << std::endl;
   }
   
-  /*
-  void hessVec_12( ROL::Vector<real_t> &hv, const ROL::Vector<real_t> &v, 
-                   const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &z, real_t &tol ) {
+  void applyAdjointHessian(ROL::Vector<real_t> &ahuv, const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &z, real_t &tol) {
     
-    // Unwrap hv
-    ROL::Ptr<MV> hvp = getVector(hv);
-
-    // Unwrap v
-    ROL::Ptr<const MV> vp = getVector(v);
-
-    // Unwrap x
-    ROL::Ptr<const MV> up = getVector(u);
+    //get Tpetra multivector pointer from the ROL vector
     ROL::Ptr<const MV> zp = getVector(z);
-
-    // Apply Jacobian
-    hv.zero();
-    if ( !useLC_ ) {
-      MV KU(up->size(),0.0);
-      MV U;
-      U.assign(up->begin(),up->end());
-      FEM_->set_boundary_conditions(U);
-      FEM_->apply_jacobian(KU,U,*zp,*vp);
-      for (size_t i=0; i<up->size(); i++) {
-        (*hvp)[i] = 2.0*KU[i];
-      }
-    }
+    ROL::Ptr<const std::vector<real_t>> up = dynamic_cast<const ROL::StdVector<real_t>&>(u).getVector();
+    ROL::Ptr<const MV> vp = getVector(v);
     
-  }
-
-  void hessVec_21( ROL::Vector<real_t> &hv, const ROL::Vector<real_t> &v, 
-                   const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &z, real_t &tol ) {
-                     
-    // Unwrap g
-    ROL::Ptr<MV> hvp = getVector(hv);
+    //ROL::Ptr<ROL_MV> ROL_Element_Volumes;
 
-    // Unwrap v
-    ROL::Ptr<const MV> vp = getVector(v);
+    //get local view of the data
+    const_host_vec_array design_densities = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array v_view = vp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    host_vec_array moment_gradients = constraint_gradients_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    host_vec_array mass_gradients = mass_gradients_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
 
-    // Unwrap x
-    ROL::Ptr<const MV> up = getVector(u);
-    ROL::Ptr<const MV> zp = getVector(z);
- 
-    // Apply Jacobian
-    hv.zero();
-    if ( !useLC_ ) {
-      std::MV<real_t> U;
-      U.assign(up->begin(),up->end());
-      FEM_->set_boundary_conditions(U);
-      std::MV<real_t> V;
-      V.assign(vp->begin(),vp->end());
-      FEM_->set_boundary_conditions(V);
-      FEM_->apply_adjoint_jacobian(*hvp,U,*zp,V);
-      for (size_t i=0; i<hvp->size(); i++) {
-        (*hvp)[i] *= 2.0;
-      }
+    //communicate ghosts and solve for nodal degrees of freedom as a function of the current design variables
+    //communicate ghosts
+    if(last_comm_step!=current_step){
+      FEM_->comm_variables(zp);
+      last_comm_step = current_step;
     }
     
-  }
+    int rnum_elem = FEM_->rnum_elem;
 
-  void hessVec_22( ROL::Vector<real_t> &hv, const ROL::Vector<real_t> &v, 
-                   const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &z, real_t &tol ) {
-                     
-    ROL::Ptr<MV> hvp = getVector(hv);
+    //compute mass
+    real_t current_mass;
+    if(FEM_->mass_update == current_step&&0) { current_mass = FEM_->mass; }
+    else{
+      FEM_->compute_element_masses(design_densities,false);
+      //sum per element results across all MPI ranks
+      ROL::Elementwise::ReductionSum<real_t> sumreduc;
+      FEM_->mass = current_mass = ROL_Element_Masses->reduce(sumreduc);
+      FEM_->mass_update = current_step;
+    }
 
-    // Unwrap v
-    ROL::Ptr<const MV> vp = getVector(v);
+    //compute center of mass
+    real_t current_com;
+    if(FEM_->com_update[constraint_component_] == current_step) current_com = FEM_->center_of_mass[constraint_component_];
+    else{
+    FEM_->compute_element_moments(design_densities,false,constraint_component_);
+    FEM_->com_update[constraint_component_] = current_step;
+    
+    //sum per element results across all MPI ranks
+    ROL::Elementwise::ReductionSum<real_t> sumreduc;
+    real_t current_moment = ROL_Element_Moments->reduce(sumreduc);
+    FEM_->center_of_mass[constraint_component_] = current_com = current_moment/current_mass;
+    }
 
-    // Unwrap x
-    ROL::Ptr<const MV> up = getVector(u);
-    ROL::Ptr<const MV> zp = getVector(z);
+    // Unwrap hv
+    ROL::Ptr<MV> ahuvp = getVector(ahuv);
+  
+    host_vec_array constraint_adjoint_hess = ahuvp->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    real_t matrix_product;
+    FEM_->compute_moment_gradients(design_densities, moment_gradients, constraint_component_);
+    FEM_->compute_nodal_gradients(design_densities, mass_gradients);
     
-    // Apply Jacobian
-    hv.zero();
-    if ( !useLC_ ) {
-      MV U;
-      U.assign(up->begin(),up->end());
-      FEM_->set_boundary_conditions(U);
-      MV V;
-      V.assign(vp->begin(),vp->end());
-      FEM_->set_boundary_conditions(V);
-      FEM_->apply_adjoint_jacobian(*hvp,U,*zp,*vp,U);
+    ROL_Gradients = ROL::makePtr<ROL_MV>(constraint_gradients_distributed);
+    ROL_Mass_Gradients = ROL::makePtr<ROL_MV>(mass_gradients_distributed);
+    real_t dot_product_moment = ROL_Gradients->dot(v);
+    real_t dot_product_mass = ROL_Mass_Gradients->dot(v);
+    for(int i = 0; i < FEM_->nlocal_nodes; i++){
+        constraint_adjoint_hess(i,0) = -dot_product_moment*mass_gradients(i,0)*(*up)[0]/current_mass/current_mass-
+                      moment_gradients(i,0)*(*up)[0]*dot_product_mass/current_mass/current_mass+
+                      2*current_com*dot_product_mass*mass_gradients(i,0)*(*up)[0]/current_mass/current_mass;
+        constraint_adjoint_hess(i,0) /= normalization_value;
     }
     
   }
-  */
 };
 
 #endif // end header guard
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Displacement_Constraint.h b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Displacement_Constraint.h
new file mode 100644
index 000000000..572737daf
--- /dev/null
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Displacement_Constraint.h
@@ -0,0 +1,347 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+ 
+#ifndef DISPLACEMENT_CONSTRAINT_TOPOPT_H
+#define DISPLACEMENT_CONSTRAINT_TOPOPT_H
+
+#include "matar.h"
+#include "elements.h"
+#include <Teuchos_ScalarTraits.hpp>
+#include <Teuchos_RCP.hpp>
+#include <Teuchos_oblackholestream.hpp>
+#include <Teuchos_Tuple.hpp>
+#include <Teuchos_VerboseObject.hpp>
+
+#include <Tpetra_Core.hpp>
+#include <Tpetra_Map.hpp>
+#include <Tpetra_MultiVector.hpp>
+#include "Tpetra_Details_makeColMap.hpp"
+#include "Tpetra_Details_DefaultTypes.hpp"
+
+#include "ROL_Types.hpp"
+#include <ROL_TpetraMultiVector.hpp>
+#include "ROL_Constraint.hpp"
+#include "ROL_Elementwise_Reduce.hpp"
+#include "FEA_Module_Elasticity.h"
+
+//designed for many displacement constraints; uses a quadratic error penalty sum to allow solving one adjoint system rather than one per displacement constraint
+class DisplacementConstraint_TopOpt : public ROL::Constraint<real_t> {
+  
+  typedef Tpetra::Map<>::local_ordinal_type LO;
+  typedef Tpetra::Map<>::global_ordinal_type GO;
+  typedef Tpetra::Map<>::node_type Node;
+  typedef Tpetra::Map<LO, GO, Node> Map;
+  typedef Tpetra::MultiVector<real_t, LO, GO, Node> MV;
+  typedef Tpetra::MultiVector<int, LO, GO, Node> IMV;
+  typedef ROL::Vector<real_t> V;
+  typedef ROL::TpetraMultiVector<real_t,LO,GO,Node> ROL_MV;
+  typedef ROL::TpetraMultiVector<int,LO,GO,Node> ROL_IMV;
+  
+  using traits = Kokkos::ViewTraits<LO*, Kokkos::LayoutLeft, void, void>;
+  using array_layout    = typename traits::array_layout;
+  using execution_space = typename traits::execution_space;
+  using device_type     = typename traits::device_type;
+  using memory_traits   = typename traits::memory_traits;
+  using global_size_t = Tpetra::global_size_t;
+
+  typedef Kokkos::View<real_t*, Kokkos::LayoutRight, device_type, memory_traits> values_array;
+  typedef Kokkos::View<GO*, array_layout, device_type, memory_traits> global_indices_array;
+  typedef Kokkos::View<LO*, array_layout, device_type, memory_traits> indices_array;
+  
+  //typedef Kokkos::DualView<real_t**, Kokkos::LayoutLeft, device_type>::t_dev vec_array;
+  typedef MV::dual_view_type::t_dev vec_array;
+  typedef MV::dual_view_type::t_host host_vec_array;
+  typedef Kokkos::View<const real_t**, array_layout, HostSpace, memory_traits> const_host_vec_array;
+  typedef Kokkos::View<const int**, array_layout, HostSpace, memory_traits> const_host_int_array;
+  typedef MV::dual_view_type dual_vec_array;
+
+private:
+
+  FEA_Module_Elasticity *FEM_;
+  ROL::Ptr<ROL_MV> ROL_Displacements;
+  ROL::Ptr<MV> Target_Displacements_;
+  ROL::Ptr<IMV> Active_DOFs_;
+  ROL::Ptr<ROL_MV> ROL_Gradients;
+  Teuchos::RCP<MV> constraint_gradients_distributed;
+  Teuchos::RCP<MV> all_node_displacements_distributed_temp;
+  real_t initial_strain_energy_;
+  bool inequality_flag_;
+  real_t constraint_value_;
+  real_t scaling;
+
+  ROL::Ptr<const MV> getVector( const V& x ) {
+    return dynamic_cast<const ROL_MV&>(x).getVector();
+  }
+
+  ROL::Ptr<MV> getVector( V& x ) {
+    return dynamic_cast<ROL_MV&>(x).getVector();
+  }
+
+public:
+  bool nodal_density_flag_;
+  int last_comm_step, current_step, last_solve_step;
+  std::string my_fea_module = "Elasticity";
+
+  DisplacementConstraint_TopOpt(FEA_Module *FEM, bool nodal_density_flag, ROL::Ptr<MV> Target_Displacements, ROL::Ptr<IMV> Active_DOFs, real_t constraint_value=0, bool inequality_flag=true){
+      FEM_ = dynamic_cast<FEA_Module_Elasticity*>(FEM);
+      nodal_density_flag_ = nodal_density_flag;
+      last_comm_step = last_solve_step = -1;
+      current_step = 0;
+      inequality_flag_ = inequality_flag;
+      constraint_value_ = constraint_value;
+      constraint_gradients_distributed = Teuchos::rcp(new MV(FEM_->map, 1));
+      Target_Displacements_ = Target_Displacements;
+      Active_DOFs_ = Active_DOFs;
+
+      //deep copy solve data into the cache variable
+      FEM_->all_cached_node_displacements_distributed = Teuchos::rcp(new MV(*(FEM_->all_node_displacements_distributed), Teuchos::Copy));
+      all_node_displacements_distributed_temp = FEM_->all_node_displacements_distributed;
+
+      ROL_Displacements = ROL::makePtr<ROL_MV>(FEM_->node_displacements_distributed);
+      scaling = 1;
+
+      std::cout.precision(10);
+  }
+
+  void update(const ROL::Vector<real_t> &z, ROL::UpdateType type, int iter = -1 ) {
+    // //debug
+    // std::ostream &out = std::cout;
+    // Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
+
+    current_step++;
+    ROL::Ptr<const MV> zp = getVector(z);
+    const_host_vec_array design_densities = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+
+    if (type == ROL::UpdateType::Initial)  {
+      // This is the first call to update
+      //first linear solve was done in FEA class run function already
+
+      //initial design density data was already communicated for ghost nodes in init_design()
+    }
+    else if (type == ROL::UpdateType::Accept) {
+      // u_ was set to u=S(x) during a trial update
+      // and has been accepted as the new iterate
+      /*assign temp pointer to the cache multivector (not the cache pointer) storage for a swap of the multivectors;
+        this just avoids deep copy */
+      all_node_displacements_distributed_temp = FEM_->all_cached_node_displacements_distributed;
+      // Cache the accepted value
+      FEM_->all_cached_node_displacements_distributed = FEM_->all_node_displacements_distributed;
+    }
+    else if (type == ROL::UpdateType::Revert) {
+      // u_ was set to u=S(x) during a trial update
+      // and has been rejected as the new iterate
+      // Revert to cached value
+      FEM_->comm_variables(zp);
+      FEM_->all_node_displacements_distributed = FEM_->all_cached_node_displacements_distributed;
+    }
+    else if (type == ROL::UpdateType::Trial) {
+      // This is a new value of x
+      FEM_->all_node_displacements_distributed = all_node_displacements_distributed_temp;
+      //communicate density variables for ghosts
+      FEM_->comm_variables(zp);
+      //update deformation variables
+      FEM_->update_linear_solve(zp, current_step);
+      if(FEM_->myrank==0)
+      std::cout << "called Trial" << std::endl;
+    }
+    else { // ROL::UpdateType::Temp
+      // This is a new value of x used for,
+      // e.g., finite-difference checks
+      if(FEM_->myrank==0)
+        std::cout << "called Temp" << std::endl;
+      FEM_->all_node_displacements_distributed = all_node_displacements_distributed_temp;
+      FEM_->comm_variables(zp);
+      FEM_->update_linear_solve(zp, current_step);
+    }
+  }
+
+  void value(ROL::Vector<real_t> &c, const ROL::Vector<real_t> &z, real_t &tol ) {
+    ROL::Ptr<const MV> zp = getVector(z);
+    ROL::Ptr<std::vector<real_t>> cp = dynamic_cast<ROL::StdVector<real_t>&>(c).getVector();
+    ROL::Ptr<const IMV> active_dofsp = Active_DOFs_;
+    ROL::Ptr<const MV> target_displacementsp = Target_Displacements_;
+    
+    const_host_int_array active_dofs_view = active_dofsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array target_displacements_view = target_displacementsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array displacements_view = FEM_->node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+
+    int nlocal_nodes = FEM_->nlocal_nodes;
+    int num_dim = FEM_->num_dim;
+    real_t current_quadsum_value, current_local_quadsum_value;
+
+    current_local_quadsum_value = current_quadsum_value = 0;
+    for(int idof = 0; idof < nlocal_nodes*num_dim; idof++){
+      if(active_dofs_view(idof,0)){
+          current_local_quadsum_value += (displacements_view(idof,0)-target_displacements_view(idof,0))*
+                                        (displacements_view(idof,0)-target_displacements_view(idof,0))/(target_displacements_view(idof,0)*target_displacements_view(idof,0));
+          // if(std::isnan((displacements_view(idof,0)-target_displacements_view(idof,0))*
+          //                               (displacements_view(idof,0)-target_displacements_view(idof,0))/(target_displacements_view(idof,0)*target_displacements_view(idof,0)))){
+          //                               std::cout << "NAN QUADSUM VALUES " << target_displacements_view(idof,0) << " " << active_dofs_view(idof,0) << std::endl;
+          //                               }
+      }
+    }
+
+    MPI_Allreduce(&current_local_quadsum_value,&current_quadsum_value,1,MPI_DOUBLE,MPI_SUM,MPI_COMM_WORLD);
+
+    if(FEM_->myrank==0)
+      std::cout << "CURRENT DISPLACEMENT CONSTRAINT VALUE " << current_quadsum_value << std::endl;
+    if(inequality_flag_)
+      (*cp)[0] = current_quadsum_value/scaling;
+    else
+      (*cp)[0] = current_quadsum_value/scaling - constraint_value_;
+  }
+
+  
+  void applyAdjointJacobian(ROL::Vector<real_t> &ajv, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &x, real_t &tol) override {
+    //std::cout << "Started constraint adjoint grad on task " <<FEM_->myrank << std::endl;
+     //get Tpetra multivector pointer from the ROL vector
+    ROL::Ptr<const MV> zp = getVector(x);
+    ROL::Ptr<const std::vector<real_t>> vp = dynamic_cast<const ROL::StdVector<real_t>&>(v).getVector();
+    ROL::Ptr<MV> ajvp = getVector(ajv);
+    ROL::Ptr<const IMV> active_dofsp = Active_DOFs_;
+    ROL::Ptr<const MV> target_displacementsp = Target_Displacements_;
+    
+    //ROL::Ptr<ROL_MV> ROL_Element_Volumes;
+
+    //get local view of the data
+    const_host_vec_array design_densities = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    //host_vec_array constraint_gradients = constraint_gradients_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    host_vec_array constraint_gradients = ajvp->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    //host_vec_array dual_constraint_vector = vp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_int_array active_dofs_view = active_dofsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array target_displacements_view = target_displacementsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array displacements_view = FEM_->node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+
+    //communicate ghosts
+    /*
+    if(last_comm_step!=current_step){
+      FEM_->comm_variables(zp);
+      last_comm_step = current_step;
+    }
+    */
+    int rnum_elem = FEM_->rnum_elem;
+    int nlocal_nodes = FEM_->nlocal_nodes;
+    
+    FEM_->compute_displacement_constraint_gradients(design_densities, target_displacements_view, active_dofs_view, constraint_gradients);
+
+    if(nodal_density_flag_){
+      for(int i = 0; i < nlocal_nodes; i++){
+        constraint_gradients(i,0) *= (*vp)[0]/scaling;
+      }
+    }
+    else{
+      for(int i = 0; i < rnum_elem; i++){
+        constraint_gradients(i,0) *= (*vp)[0]/scaling;
+      }
+    }
+    //std::cout << "Ended constraint adjoint grad on task " <<FEM_->myrank  << std::endl;
+    //debug print
+    //std::cout << "Constraint Gradient value " << std::endl;
+  }
+  
+  void applyJacobian(ROL::Vector<real_t> &jv, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &x, real_t &tol) {
+    //get Tpetra multivector pointer from the ROL vector
+    ROL::Ptr<const MV> zp = getVector(x);
+    ROL::Ptr<std::vector<real_t>> jvp = dynamic_cast<ROL::StdVector<real_t>&>(jv).getVector();
+    ROL::Ptr<const IMV> active_dofsp = Active_DOFs_;
+    ROL::Ptr<const MV> target_displacementsp = Target_Displacements_;
+
+    //get local view of the data
+    host_vec_array constraint_gradients = constraint_gradients_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    const_host_vec_array design_densities = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_int_array active_dofs_view = active_dofsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array target_displacements_view = target_displacementsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array displacements_view = FEM_->node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+
+    int rnum_elem = FEM_->rnum_elem;
+    int nlocal_nodes = FEM_->nlocal_nodes;
+
+    FEM_->compute_displacement_constraint_gradients(design_densities, target_displacements_view, active_dofs_view, constraint_gradients);
+    if(nodal_density_flag_){
+      for(int i = 0; i < nlocal_nodes; i++){
+        constraint_gradients(i,0) /= scaling;
+      }
+    }
+    else{
+      for(int i = 0; i < rnum_elem; i++){
+        constraint_gradients(i,0) /= scaling;
+      }
+    }
+    ROL_Gradients = ROL::makePtr<ROL_MV>(constraint_gradients_distributed);
+    real_t gradient_dot_v = ROL_Gradients->dot(v);
+    //debug print
+    //std::cout << "Constraint Gradient value " << gradient_dot_v << std::endl;
+
+    (*jvp)[0] = gradient_dot_v;
+  }
+
+  void applyAdjointHessian(ROL::Vector<real_t> &ahuv, const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &z, Real &tol) {
+    // Unwrap hv
+    ROL::Ptr<MV> ahuvp = getVector(ahuv);
+
+    // Unwrap v
+    ROL::Ptr<const MV> vp = getVector(v);
+    ROL::Ptr<const std::vector<real_t>> up = dynamic_cast<const ROL::StdVector<real_t>&>(u).getVector();
+    ROL::Ptr<const MV> zp = getVector(z);
+    ROL::Ptr<const IMV> active_dofsp = Active_DOFs_;
+    ROL::Ptr<const MV> target_displacementsp = Target_Displacements_;
+    
+    host_vec_array constraint_adjoint_hessvec = ahuvp->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    const_host_vec_array design_densities = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array direction_vector = vp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_int_array active_dofs_view = active_dofsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array target_displacements_view = target_displacementsp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    const_host_vec_array displacements_view = FEM_->node_displacements_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    
+    int rnum_elem = FEM_->rnum_elem;
+    int nlocal_nodes = FEM_->nlocal_nodes;
+
+    FEM_->compute_displacement_constraint_hessian_vec(design_densities, target_displacements_view, active_dofs_view, constraint_adjoint_hessvec, vp);
+    for(int i = 0; i < nlocal_nodes; i++){
+      constraint_adjoint_hessvec(i,0) *= (*up)[0]/scaling;
+    }
+    //if(FEM_->myrank==0)
+    //std::cout << "hessvec" << std::endl;
+    //vp->describe(*fos,Teuchos::VERB_EXTREME);
+    //hvp->describe(*fos,Teuchos::VERB_EXTREME);
+    if(FEM_->myrank==0)
+    std::cout << "Called Displacement Constraint Hessianvec" << std::endl;
+    FEM_->hessvec_count++;
+  }
+};
+
+#endif // end header guard
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Heat_Capacity_Potential_Minimize.h b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Heat_Capacity_Potential_Minimize.h
index b8733f72f..7156284a8 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Heat_Capacity_Potential_Minimize.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Heat_Capacity_Potential_Minimize.h
@@ -129,9 +129,9 @@ class HeatCapacityPotentialMinimize_TopOpt : public ROL::Objective<real_t> {
   }
 
   void update(const ROL::Vector<real_t> &z, ROL::UpdateType type, int iter = -1 ) {
-    //debug
-    std::ostream &out = std::cout;
-    Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
+    // //debug
+    // std::ostream &out = std::cout;
+    // Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
 
     current_step++;
     ROL::Ptr<const MV> zp = getVector(z);
@@ -167,13 +167,13 @@ class HeatCapacityPotentialMinimize_TopOpt : public ROL::Objective<real_t> {
       //update deformation variables
       FEM_->update_linear_solve(zp, current_step);
       if(FEM_->myrank==0)
-      *fos << "called Trial" << std::endl;
+      std::cout << "called Trial" << std::endl;
     }
     else { // ROL::UpdateType::Temp
       // This is a new value of x used for,
       // e.g., finite-difference checks
       if(FEM_->myrank==0)
-      *fos << "called Temp" << std::endl;
+      std::cout << "called Temp" << std::endl;
       FEM_->all_node_temperatures_distributed = all_node_temperatures_distributed_temp;
       FEM_->comm_variables(zp);
       FEM_->update_linear_solve(zp, current_step);
@@ -301,9 +301,9 @@ class HeatCapacityPotentialMinimize_TopOpt : public ROL::Objective<real_t> {
   
   
   void hessVec( ROL::Vector<real_t> &hv, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &z, real_t &tol ) {
-    //debug
-    std::ostream &out = std::cout;
-    Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
+    // //debug
+    // std::ostream &out = std::cout;
+    // Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
     // Unwrap hv
     ROL::Ptr<MV> hvp = getVector(hv);
 
@@ -322,7 +322,7 @@ class HeatCapacityPotentialMinimize_TopOpt : public ROL::Objective<real_t> {
     //vp->describe(*fos,Teuchos::VERB_EXTREME);
     //hvp->describe(*fos,Teuchos::VERB_EXTREME);
     if(FEM_->myrank==0)
-    *fos << "Called Heat Capacity Potential Hessianvec" << std::endl;
+    std::cout << "Called Heat Capacity Potential Hessianvec" << std::endl;
     FEM_->hessvec_count++;
   }
 /*
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Strain_Energy_Minimize.h b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Strain_Energy_Minimize.h
index aff9f97a3..191df2c37 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Strain_Energy_Minimize.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization/Strain_Energy_Minimize.h
@@ -130,9 +130,9 @@ class StrainEnergyMinimize_TopOpt : public ROL::Objective<real_t> {
   }
 
   void update(const ROL::Vector<real_t> &z, ROL::UpdateType type, int iter = -1 ) {
-    //debug
-    std::ostream &out = std::cout;
-    Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
+    // //debug
+    // std::ostream &out = std::cout;
+    // Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
 
     current_step++;
     ROL::Ptr<const MV> zp = getVector(z);
@@ -168,13 +168,13 @@ class StrainEnergyMinimize_TopOpt : public ROL::Objective<real_t> {
       //update deformation variables
       FEM_->update_linear_solve(zp, current_step);
       if(FEM_->myrank==0)
-      *fos << "called Trial" << std::endl;
+      std::cout << "called Trial" << std::endl;
     }
     else { // ROL::UpdateType::Temp
       // This is a new value of x used for,
       // e.g., finite-difference checks
       if(FEM_->myrank==0)
-      *fos << "called Temp" << std::endl;
+      std::cout << "called Temp" << std::endl;
       FEM_->all_node_displacements_distributed = all_node_displacements_distributed_temp;
       FEM_->comm_variables(zp);
       FEM_->update_linear_solve(zp, current_step);
@@ -290,9 +290,9 @@ class StrainEnergyMinimize_TopOpt : public ROL::Objective<real_t> {
   
   
   void hessVec( ROL::Vector<real_t> &hv, const ROL::Vector<real_t> &v, const ROL::Vector<real_t> &z, real_t &tol ) {
-    //debug
-    std::ostream &out = std::cout;
-    Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
+    // //debug
+    // std::ostream &out = std::cout;
+    // Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
     // Unwrap hv
     ROL::Ptr<MV> hvp = getVector(hv);
 
@@ -310,7 +310,7 @@ class StrainEnergyMinimize_TopOpt : public ROL::Objective<real_t> {
     //vp->describe(*fos,Teuchos::VERB_EXTREME);
     //hvp->describe(*fos,Teuchos::VERB_EXTREME);
     if(FEM_->myrank==0)
-    *fos << "Called Strain Energy Hessianvec" << std::endl;
+    std::cout << "Called Strain Energy Hessianvec" << std::endl;
     FEM_->hessvec_count++;
   }
 /*
diff --git a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization_Function_Headers.h b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization_Function_Headers.h
index 301e0cf54..828628c1b 100644
--- a/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization_Function_Headers.h
+++ b/src/Parallel-Solvers/Implicit-Lagrange/Topology_Optimization_Function_Headers.h
@@ -44,6 +44,7 @@
 #include "Moment_of_Inertia_Constraint.h"
 #include "Bounded_Strain_Constraint.h"
 #include "Strain_Energy_Constraint.h"
+#include "Displacement_Constraint.h"
 #include "Strain_Energy_Minimize.h"
 #include "Strain_Energy_Objective.h"
 #include "Heat_Capacity_Potential_Minimize.h"
diff --git a/src/Parallel-Solvers/Material-Models/CMakeLists.txt b/src/Parallel-Solvers/Material-Models/CMakeLists.txt
index fad537dfc..f0b233891 100644
--- a/src/Parallel-Solvers/Material-Models/CMakeLists.txt
+++ b/src/Parallel-Solvers/Material-Models/CMakeLists.txt
@@ -6,6 +6,8 @@ add_subdirectory(Ideal-Gas-Models)
 
 if (BUILD_EVPFFT_FIERRO)
   add_subdirectory(../../EVPFFT/src EVPFFT)
+elseif(BUILD_LS_EVPFFT_FIERRO)
+  add_subdirectory(../../LS-EVPFFT/src LS-EVPFFT)
 elseif(BUILD_EVP_FIERRO)
   add_subdirectory(../../EVP/src EVP)
 else()
diff --git a/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/IdealGasEOSModel.cpp b/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/IdealGasEOSModel.cpp
index 55aa1a1ca..b7457a2d8 100644
--- a/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/IdealGasEOSModel.cpp
+++ b/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/IdealGasEOSModel.cpp
@@ -56,7 +56,11 @@ double IdealGasEOSModel::calc_sound_speed_gradient_internal_energy(
   const double den,
   const double sie){
   // sound speed gradient
-  return 0.5*gamma_*(gamma_ - 1.0)/sqrt(gamma_*(gamma_ - 1.0)*sie);
+  //compute sound speed to decide on avoiding sinfularity near sie=0
+  if(sqrt(gamma_*(gamma_ - 1.0)*sie) < csmin_)
+    return 0;
+  else
+    return 0.5*gamma_*(gamma_ - 1.0)/sqrt(gamma_*(gamma_ - 1.0)*sie);
 }
 
 KOKKOS_FUNCTION
diff --git a/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/init_state_vars.cpp b/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/init_state_vars.cpp
index 19d493a3c..f7cde4d41 100644
--- a/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/init_state_vars.cpp
+++ b/src/Parallel-Solvers/Material-Models/Ideal-Gas-Models/init_state_vars.cpp
@@ -6,7 +6,7 @@ using namespace mtr;
 void init_state_vars(
   const DCArrayKokkos <material_t> &material,
   const DViewCArrayKokkos <size_t> &elem_mat_id,
-  const DCArrayKokkos <double> &state_vars,
+  DCArrayKokkos <double> &state_vars,
   const DCArrayKokkos <double> &global_vars,
   const DCArrayKokkos <double> &elem_user_output_vars,
   const size_t num_elems)
@@ -20,6 +20,7 @@ void init_state_vars(
             state_vars.host(elem_gid,var) = 0.0;
         }
     }
+    state_vars.update_device();
     
 	  return;
 }
diff --git a/src/Parallel-Solvers/Material-Models/User-Material-Models/init_state_vars.cpp b/src/Parallel-Solvers/Material-Models/User-Material-Models/init_state_vars.cpp
index 19d493a3c..f7cde4d41 100644
--- a/src/Parallel-Solvers/Material-Models/User-Material-Models/init_state_vars.cpp
+++ b/src/Parallel-Solvers/Material-Models/User-Material-Models/init_state_vars.cpp
@@ -6,7 +6,7 @@ using namespace mtr;
 void init_state_vars(
   const DCArrayKokkos <material_t> &material,
   const DViewCArrayKokkos <size_t> &elem_mat_id,
-  const DCArrayKokkos <double> &state_vars,
+  DCArrayKokkos <double> &state_vars,
   const DCArrayKokkos <double> &global_vars,
   const DCArrayKokkos <double> &elem_user_output_vars,
   const size_t num_elems)
@@ -20,6 +20,7 @@ void init_state_vars(
             state_vars.host(elem_gid,var) = 0.0;
         }
     }
+    state_vars.update_device();
     
 	  return;
 }
diff --git a/src/Parallel-Solvers/Material-Models/material_models.h b/src/Parallel-Solvers/Material-Models/material_models.h
index 37c1e36e3..36bb996fb 100644
--- a/src/Parallel-Solvers/Material-Models/material_models.h
+++ b/src/Parallel-Solvers/Material-Models/material_models.h
@@ -143,7 +143,7 @@ struct strength_t {
 void init_state_vars(
   const DCArrayKokkos <material_t> &material,
   const DViewCArrayKokkos <size_t> &elem_mat_id,
-  const DCArrayKokkos <double> &state_vars,
+  DCArrayKokkos <double> &state_vars,
   const DCArrayKokkos <double> &global_vars,
   const DCArrayKokkos <double> &elem_user_output_vars,
   const size_t num_elems);
diff --git a/src/Parallel-Solvers/Parallel-Explicit/Dynamic_Elastic_Solver/FEA_Module_Dynamic_Elasticity.cpp b/src/Parallel-Solvers/Parallel-Explicit/Dynamic_Elastic_Solver/FEA_Module_Dynamic_Elasticity.cpp
index aab8dd9ce..e09c67741 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/Dynamic_Elastic_Solver/FEA_Module_Dynamic_Elasticity.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/Dynamic_Elastic_Solver/FEA_Module_Dynamic_Elasticity.cpp
@@ -1054,7 +1054,6 @@ void FEA_Module_Dynamic_Elasticity::setup(){
                     global_vars,
                     elem_user_output_vars,
                     rnum_elem);
-    state_vars.update_device();
  
     // initialize strength model
     init_strength_model(elem_strength,
diff --git a/src/Parallel-Solvers/Parallel-Explicit/Explicit_Solver.cpp b/src/Parallel-Solvers/Parallel-Explicit/Explicit_Solver.cpp
index 544bc95d4..056c13c34 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/Explicit_Solver.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/Explicit_Solver.cpp
@@ -90,6 +90,7 @@
 #include "Mass_Constraint.h"
 #include "Moment_of_Inertia_Constraint.h"
 #include "Kinetic_Energy_Minimize.h"
+#include "Area_Normals.h"
 
 #define BUFFER_LINES 20000
 #define MAX_WORD 30
@@ -400,6 +401,9 @@ void Explicit_Solver::read_mesh_ansys_dat(const char *MESH){
   GO node_gid;
   real_t dof_value;
   host_vec_array node_densities;
+  bool zero_index_base = input_options.zero_index_base;
+  int negative_index_found = 0;
+  int global_negative_index_found = 0;
   //Nodes_Per_Element_Type =  elements::elem_types::Nodes_Per_Element_Type;
 
   //read the mesh
@@ -838,11 +842,25 @@ void Explicit_Solver::read_mesh_ansys_dat(const char *MESH){
         //as we loop through the nodes belonging to this element we store them
         //if any of these nodes belongs to this rank this list is used to store the element locally
         node_gid = atoi(&read_buffer(scan_loop,inode,0));
-        node_store(inode-elem_words_per_line_no_nodes) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(zero_index_base)
+          node_store(inode-elem_words_per_line_no_nodes) = node_gid; //subtract 1 since file index start is 1 but code expects 0
+        else
+          node_store(inode-elem_words_per_line_no_nodes) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(node_store(inode-elem_words_per_line_no_nodes) < 0){
+          negative_index_found = 1;
+        }
         //first we add the elements to a dynamically allocated list
-        if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
-          assign_flag = 1;
-          rnum_elem++;
+        if(zero_index_base){
+          if(map->isNodeGlobalElement(node_gid)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
+        }
+        else{
+          if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
         }
       }
 
@@ -929,14 +947,27 @@ void Explicit_Solver::read_mesh_ansys_dat(const char *MESH){
   dual_nodes_in_elem.modify_host();
 
   for(int ielem = 0; ielem < rnum_elem; ielem++)
-    for(int inode = 0; inode < nodes_per_element; inode++)
+    for(int inode = 0; inode < nodes_per_element; inode++){
       nodes_in_elem(ielem, inode) = element_temp[ielem*nodes_per_element + inode];
+    }
 
   //view storage for all local elements connected to local nodes on this rank
   Kokkos::DualView <GO*, array_layout, device_type, memory_traits> All_Element_Global_Indices("All_Element_Global_Indices",rnum_elem);
   //copy temporary global indices storage to view storage
-  for(int ielem = 0; ielem < rnum_elem; ielem++)
+  for(int ielem = 0; ielem < rnum_elem; ielem++){
     All_Element_Global_Indices.h_view(ielem) = global_indices_temp[ielem];
+    if(global_indices_temp[ielem]<0){
+      negative_index_found = 1;
+    }
+  }
+  
+  MPI_Allreduce(&negative_index_found,&global_negative_index_found,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
+  if(global_negative_index_found){
+    if(myrank==0){
+    std::cout << "Node index less than or equal to zero detected; set \"zero_index_base: true\" under \"input_options\" in your yaml file if indices start at 0" << std::endl;
+    }
+    exit_solver(0);
+  }
 
   //delete temporary element connectivity and index storage
   std::vector<size_t>().swap(element_temp);
@@ -1372,13 +1403,37 @@ void Explicit_Solver::setup_optimization_problem(){
   problem->finalize(false,true,*fos);
   //problem->check(true,std::cout);
 
+  // //check area normal gradients
+  // ROL::Ptr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>> rol_debug_coords =
+  // ROL::makePtr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>>(all_node_coords_distributed);
+  // Teuchos::RCP<MV> debug_directions_distributed = Teuchos::rcp(new MV(all_node_map, num_dim));
+  // ROL::Ptr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>> debug_rol_d =
+  // ROL::makePtr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>>(debug_directions_distributed);
+  // debug_directions_distributed->randomize(-0.00375,0.00375);
+  // if(myrank==0){
+  // for(int inode = 0; inode < 8; inode++){
+  //   for(int idim = 0; idim < num_dim; idim++){
+  //     std::cout<< "AREA NORMALS GRADIENT FOR NODE " << inode << " AND DIM " << idim << std::endl;
+  //     ROL::Ptr<ROL::Objective<real_t>> debug_obj = ROL::makePtr<AreaNormals>(this, inode, idim, nodal_density_flag);
+  //     //debug_directions_distributed->putScalar(-0.1);
+  //     //debug_directions_distributed->norm2(debug_direction_norm);
+  //     //debug_directions_distributed->scale(1/debug_direction_norm(0));
+  //     //set all but first component to 0 for debug
+  //     host_vec_array debug_directions = debug_directions_distributed->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+  //     //for(int init = 1; init < nlocal_nodes; init++)
+  //     //directions(4,0) = -0.3;
+  //     debug_obj->checkGradient(*rol_debug_coords, *debug_rol_d);
+  //   }
+  // }
+  // }
+
   //debug checks
   ROL::Ptr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>> rol_x =
    ROL::makePtr<ROL::TpetraMultiVector<real_t,LO,GO,node_type>>(design_node_densities_distributed);
   //construct direction vector for check
   Teuchos::RCP<MV> directions_distributed = Teuchos::rcp(new MV(map, 1));
   directions_distributed->putScalar(-0.1);
-  //directions_distributed->randomize(-0.8,1);
+  directions_distributed->randomize(-0.8,1);
   Kokkos::View <real_t*, array_layout, HostSpace, memory_traits> direction_norm("gradient norm",1);
   directions_distributed->norm2(direction_norm);
   directions_distributed->scale(1/direction_norm(0));
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.cpp
index 04b01fe53..648cfb3d1 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.cpp
@@ -1094,7 +1094,6 @@ void FEA_Module_SGH::setup(){
                     global_vars,
                     elem_user_output_vars,
                     rnum_elem);
-    state_vars.update_device();
  
     // initialize strength model
     init_strength_model(elem_strength,
@@ -1420,7 +1419,7 @@ void FEA_Module_SGH::setup(){
     //initialize if topology optimization is used
     if(simparam->topology_optimization_on || simparam->shape_optimization_on){
       init_assembly();
-      assemble_matrix();
+      //assemble_matrix();
     }
 
     // update host copies of arrays modified in this function
@@ -2018,14 +2017,12 @@ void FEA_Module_SGH::sgh_solve(){
             // ---- Calculate velocity diveregence for the element ----
             if(num_dim==2){
                 get_divergence2D(elem_div,
-                                 *mesh,
                                  node_coords,
                                  node_vel,
                                  elem_vol);
             }
             else {
                 get_divergence(elem_div,
-                               *mesh,
                                node_coords,
                                node_vel,
                                elem_vol);
@@ -2104,7 +2101,6 @@ void FEA_Module_SGH::sgh_solve(){
 
             // ---- Update nodal velocities ---- //
             update_velocity_sgh(rk_alpha,
-                              *mesh,
                               node_vel,
                               node_mass,
                               corner_force);
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.h b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.h
index 1a42b5155..50ea65e52 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.h
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/FEA_Module_SGH.h
@@ -256,14 +256,12 @@ class FEA_Module_SGH: public FEA_Module{
   double average_element_density(const int nodes_per_elem, const CArray<double> current_element_densities) const;
 
   void get_divergence(DViewCArrayKokkos <double> &elem_div,
-                      const mesh_t mesh,
                       const DViewCArrayKokkos <double> &node_coords,
                       const DViewCArrayKokkos <double> &node_vel,
                       const DViewCArrayKokkos <double> &elem_vol);
 
 
   void get_divergence2D(DViewCArrayKokkos <double> &elem_div,
-                        const mesh_t mesh,
                         const DViewCArrayKokkos <double> &node_coords,
                         const DViewCArrayKokkos <double> &node_vel,
                         const DViewCArrayKokkos <double> &elem_vol);
@@ -301,7 +299,6 @@ class FEA_Module_SGH: public FEA_Module{
 
 
   void update_velocity_sgh(double rk_alpha,
-                           const mesh_t &mesh,
                            DViewCArrayKokkos <double> &node_vel,
                            const DViewCArrayKokkos <double> &node_mass,
                            const DViewCArrayKokkos <double> &corner_force);
@@ -432,7 +429,9 @@ class FEA_Module_SGH: public FEA_Module{
 
   void comm_node_masses();
 
-  void comm_adjoint_vectors(int cycle);
+  void comm_adjoint_vector(int cycle);
+
+  void comm_phi_adjoint_vector(int cycle);
 
   void comm_variables(Teuchos::RCP<const MV> zp);
 
@@ -605,8 +604,8 @@ class FEA_Module_SGH: public FEA_Module{
   //Local FEA data
   DCArrayKokkos<size_t, array_layout, device_type, memory_traits> Global_Gradient_Matrix_Assembly_Map; //Maps element local nodes to columns on ragged right node connectivity graph
   DCArrayKokkos<size_t, array_layout, device_type, memory_traits> Element_Gradient_Matrix_Assembly_Map; //Maps element-node pair to columns on ragged right node to element connectivity
-  RaggedRightArrayKokkos<GO, array_layout, device_type, memory_traits> Graph_Matrix; //stores global indices
-  RaggedRightArrayKokkos<GO, array_layout, device_type, memory_traits> DOF_Graph_Matrix; //stores global indices
+  RaggedRightArrayKokkos<LO, array_layout, device_type, memory_traits> Graph_Matrix; //stores local indices
+  RaggedRightArrayKokkos<LO, array_layout, device_type, memory_traits> DOF_Graph_Matrix; //stores local indices
   RaggedRightArrayKokkos<real_t, Kokkos::LayoutRight, device_type, memory_traits, array_layout> Force_Gradient_Positions;
   RaggedRightArrayKokkos<real_t, Kokkos::LayoutRight, device_type, memory_traits, array_layout> Force_Gradient_Velocities;
   CArrayKokkos<real_t, array_layout, device_type, memory_traits> Force_Gradient_Energies; //transposed such that elem ids correspond to rows
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/force_gradients_sgh.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/force_gradients_sgh.cpp
index f2cf4ac09..599b134a7 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/force_gradients_sgh.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/force_gradients_sgh.cpp
@@ -45,6 +45,7 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
 
     const size_t rk_level = simparam->dynamic_options.rk_num_bins - 1;
     const size_t num_dims = simparam->num_dims;
+    size_t num_corners = rnum_elem*num_nodes_in_elem;
 
     //initialize gradient matrix
     FOR_ALL_CLASS(dof_gid, 0, nlocal_nodes*num_dims, {
@@ -54,6 +55,19 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
     }); // end parallel for loop over nodes
     Kokkos::fence();
 
+    //initialize buffer storage; not all components are explicitly set for this routine
+    FOR_ALL_CLASS(corner_gid, 0, num_corners, {
+      for (int dim = 0; dim < num_dims; dim++){
+            //assign gradient of corner contribution of force to relevant matrix entries with non-zero node velocity gradient
+            for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
+                for(int jdim = 0; jdim < num_dims; jdim++){
+                    corner_gradient_storage(corner_gid,dim,igradient,jdim) = 0;
+                }
+            }
+      }
+    }); // end parallel for loop over nodes
+    Kokkos::fence();
+
     // --- calculate the forces acting on the nodes from the element ---
     for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
     //FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
@@ -79,7 +93,7 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
         
         // Riemann velocity
         double vel_star_array[3];
-        double vel_star_gradient_array[3];
+        double vel_star_gradient_array[3*num_nodes_in_elem];
         
         // velocity gradient
         double vel_grad_array[9];
@@ -94,7 +108,7 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
         ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> muc_gradient(muc_gradient_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
-        ViewCArrayKokkos <double> vel_star_gradient(vel_star_gradient_array, num_dims);
+        ViewCArrayKokkos <double> vel_star_gradient(vel_star_gradient_array, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
 
         EOSParent* eos_model = elem_eos(elem_gid).model;
@@ -177,7 +191,7 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
             
         // initialize to Riemann velocity to zero
         for (size_t dim = 0; dim < num_dims; dim++){
-            vel_star(dim) = vel_star_gradient(dim) = 0.0;
+            vel_star(dim) = 0.0;
         }
 
         // loop over nodes and calculate an average velocity, which is
@@ -255,7 +269,6 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
                 muc(node_lid) = elem_den(elem_gid) *
                                (material(mat_id).q1ex*elem_sspd(elem_gid) + material(mat_id).q2ex*mag_vel);
             } // end if on divergence sign
-           
 
             size_t use_shock_dir = 0;
             double mu_term;
@@ -295,16 +308,29 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
         if (sum(3) > fuzz) {
             for (size_t i = 0; i < num_dims; i++) {
                 vel_star(i) = sum(i)/sum(3);
-                vel_star_gradient(i) = 1/(double)num_nodes_in_elem;
             }
         }
         else {
             for (int i = 0; i < num_dims; i++){
                 vel_star(i) = 0.0;
-                vel_star_gradient(i) = 1/(double)num_nodes_in_elem;
             }
         } // end if
 
+        
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++) {
+            // The Riemann velocity, called vel_star
+            if (sum(3) > fuzz) {
+                for (size_t i = 0; i < num_dims; i++) {
+                    vel_star_gradient(node_lid,i) = muc(node_lid)/sum(3);
+                }
+            }
+            else {
+                for (int i = 0; i < num_dims; i++){
+                    vel_star_gradient(node_lid,i) = 0;
+                }
+            } // end if
+        }
+
             
             
         // ---- Calculate the shock detector for the Riemann-solver ----
@@ -387,15 +413,19 @@ void FEA_Module_SGH::get_force_vgradient_sgh(const DCArrayKokkos <material_t> &m
                 //assign gradient of corner contribution of force to relevant matrix entries with non-zero node velocity gradient
                 for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
                     size_t gradient_node_gid = nodes_in_elem(elem_gid, igradient);
-                    if(!map->isNodeLocalElement(gradient_node_gid)) continue;
+                    //if(!map->isNodeLocalElement(gradient_node_gid)) continue;
                     column_index = num_dims*Global_Gradient_Matrix_Assembly_Map(elem_gid, igradient, node_lid);
                     if(node_lid==igradient){
-                        Force_Gradient_Velocities(gradient_node_gid*num_dims+dim, column_index+dim) += phi*muc(node_lid)*(vel_star_gradient(dim) - 1);
-                        corner_gradient_storage(corner_gid,dim,igradient,dim) = phi*muc(node_lid)*(vel_star_gradient(dim) - 1);
+                        if(map->isNodeLocalElement(gradient_node_gid)){
+                            Force_Gradient_Velocities(gradient_node_gid*num_dims+dim, column_index+dim) += phi*muc(node_lid)*(vel_star_gradient(igradient,dim) - 1);
+                        }
+                        corner_gradient_storage(corner_gid,dim,igradient,dim) = phi*muc(node_lid)*(vel_star_gradient(igradient,dim) - 1);
                     }
                     else{
-                        Force_Gradient_Velocities(gradient_node_gid*num_dims+dim, column_index+dim) += phi*muc(node_lid)*(vel_star_gradient(dim));
-                        corner_gradient_storage(corner_gid,dim,igradient,dim) = phi*muc(node_lid)*(vel_star_gradient(dim));
+                        if(map->isNodeLocalElement(gradient_node_gid)){
+                            Force_Gradient_Velocities(gradient_node_gid*num_dims+dim, column_index+dim) += phi*muc(node_lid)*(vel_star_gradient(igradient,dim));
+                        }
+                        corner_gradient_storage(corner_gid,dim,igradient,dim) = phi*muc(node_lid)*(vel_star_gradient(igradient,dim));
                     }
                 }
             } // end loop over dimension
@@ -464,7 +494,8 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
     Kokkos::fence();
 
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
+    for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
+    //FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
         
         const size_t num_nodes_in_elem = 8;
         real_t gradient_result[num_dims];
@@ -480,6 +511,7 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         
         // the sums in the Riemann solver
         double sum_array[4];
+        double sum_gradient_array[4];
         
         // corner shock impeadance x |corner area normal dot shock_dir|
         double muc_array[8];
@@ -487,6 +519,7 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         
         // Riemann velocity
         double vel_star_array[3];
+        double vel_star_gradient_array[3];
         
         // velocity gradient
         double vel_grad_array[9];
@@ -498,9 +531,11 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         ViewCArrayKokkos <double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> shock_dir(shock_dir_array, num_dims);
         ViewCArrayKokkos <double> sum(sum_array, 4);
+        ViewCArrayKokkos <double> sum_gradient(sum_gradient_array, 4);
         ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> muc_gradient(muc_gradient_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
+        ViewCArrayKokkos <double> vel_star_gradient(vel_star_gradient_array, num_dims);
         ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
 
         EOSParent* eos_model = elem_eos(elem_gid).model;
@@ -562,9 +597,10 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         
         
         // --- Calculate the Cauchy stress ---
-        for (size_t i = 0; i < 3; i++){
-            for (size_t j = 0; j < 3; j++){
+        for (size_t i = 0; i < num_dims; i++){
+            for (size_t j = 0; j < num_dims; j++){
                 tau(i, j) = stress(i,j);
+                tau_gradient(i,j) = 0;
                 // artificial viscosity can be added here to tau
             } // end for
         } //end for
@@ -572,7 +608,7 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         // add the pressure
         for (int i = 0; i < num_dims; i++){
             tau(i, i) -= elem_pres(elem_gid);
-            tau_gradient(i, i) = eos_model->calc_pressure_gradient_internal_energy(elem_pres,
+            tau_gradient(i, i) = -eos_model->calc_pressure_gradient_internal_energy(elem_pres,
                                  elem_stress,
                                  elem_gid,
                                  elem_mat_id(elem_gid),
@@ -616,7 +652,7 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         
         // initialize sum term in MARS to zero
         for (int i = 0; i < 4; i++){
-            sum(i) = 0.0;
+            sum(i) = sum_gradient(i) = 0.0;
         }
 
         double mag;       // magnitude of the area normal
@@ -724,6 +760,11 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
             sum(1) += mu_term*vel(1);
             sum(2) += mu_term*vel(2);
             sum(3) += mu_term;
+            
+            sum_gradient(0) += mu_term_gradient*vel(0);
+            sum_gradient(1) += mu_term_gradient*vel(1);
+            sum_gradient(2) += mu_term_gradient*vel(2);
+            sum_gradient(3) += mu_term_gradient;
 
             muc(node_lid) = mu_term; // the impeadance time surface area is stored here
             muc_gradient(node_lid) = mu_term_gradient;
@@ -737,11 +778,13 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         if (sum(3) > fuzz) {
             for (size_t i = 0; i < num_dims; i++) {
                 vel_star(i) = sum(i)/sum(3);
+                vel_star_gradient(i) = sum_gradient(i)/sum(3) - sum_gradient(3)*sum(i)/sum(3)/sum(3);
             }
         }
         else {
             for (int i = 0; i < num_dims; i++){
                 vel_star(i) = 0.0;
+                vel_star_gradient(i) = 0;
             }
         } // end if
 
@@ -828,7 +871,8 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
                           area_normal(node_lid, 0)*tau_gradient(0, dim)
                         + area_normal(node_lid, 1)*tau_gradient(1, dim)
                         + area_normal(node_lid, 2)*tau_gradient(2, dim)
-                        + phi*muc_gradient(node_lid)*(vel_star(dim) - node_vel(rk_level, node_gid, dim));
+                        + phi*muc_gradient(node_lid)*(vel_star(dim) - node_vel(rk_level, node_gid, dim))
+                        + phi*muc(node_lid)*(vel_star_gradient(dim));
 
             } // end loop over dimension
 
@@ -842,8 +886,8 @@ void FEA_Module_SGH::get_force_egradient_sgh(const DCArrayKokkos <material_t> &m
         size_t mat_id = elem_mat_id(elem_gid);
         
         
-
-    }); // end parallel for loop over elements
+    }
+    //}); // end parallel for loop over elements
     
     /*
     //accumulate node values from corner storage
@@ -895,9 +939,9 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
       }
     }); // end parallel for loop over nodes
     Kokkos::fence();
-
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
+    for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
+    //FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
         
         const size_t num_nodes_in_elem = 8;
         real_t gradient_result[num_dims];
@@ -918,6 +962,7 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         
         // the sums in the Riemann solver
         double sum_array[4];
+        double sum_gradient_array[4*max_nodes_per_element*num_dims];
         
         // corner shock impeadance x |corner area normal dot shock_dir|
         double muc_array[8];
@@ -925,6 +970,7 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         
         // Riemann velocity
         double vel_star_array[3];
+        double vel_star_gradient_array[num_dims*max_nodes_per_element*num_dims];
         
         // velocity gradient
         double vel_grad_array[9];
@@ -932,15 +978,17 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         // --- Create views of arrays to aid the force calculation ---
     
         ViewCArrayKokkos <double> tau(tau_array, num_dims, num_dims);
-        ViewCArrayKokkos <double> tau_gradient(tau_gradient_array, num_dims, num_dims,max_nodes_per_element, num_dims);
+        ViewCArrayKokkos <double> tau_gradient(tau_gradient_array, num_dims, num_dims, max_nodes_per_element, num_dims);
         ViewCArrayKokkos <double> volume_gradients(volume_gradients_array, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> area_normal_gradients(area_normal_gradients_array, num_nodes_in_elem, num_dims, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> shock_dir(shock_dir_array, num_dims);
         ViewCArrayKokkos <double> sum(sum_array, 4);
+        ViewCArrayKokkos <double> sum_gradient(sum_gradient_array, 4, max_nodes_per_element, num_dims);
         ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> muc_gradient(muc_gradient_array, num_nodes_in_elem, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
+        ViewCArrayKokkos <double> vel_star_gradient(vel_star_gradient_array, num_dims, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
 
         EOSParent* eos_model = elem_eos(elem_gid).model;
@@ -953,12 +1001,21 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         // create a view of the stress_matrix
         ViewCArrayKokkos <double> stress(&elem_stress(rk_level, elem_gid, 0,0), 3, 3);
         
-        
         // cut out the node_gids for this element
         ViewCArrayKokkos <size_t> elem_node_gids(&nodes_in_elem(elem_gid, 0), 8);
         
         //gradients of the element volume
         get_vol_hex_ugradient(volume_gradients, elem_gid, node_coords, elem_node_gids, rk_level);
+
+        
+        // //debug
+        // for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
+        //     for (size_t dim = 0; dim < num_dims; dim++){
+        //         if(volume_gradients(node_lid, dim)>1 || volume_gradients(node_lid, dim) < -1)
+        //         std::cout << volume_gradients(node_lid, dim) << " ";
+        //     } // end for
+        // } // end for
+        // std::cout << std::endl;
         
         // get the B matrix which are the OUTWARD corner area normals
         get_bmatrix(area_normal,
@@ -990,6 +1047,17 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
                 area_normal(node_lid, dim) = (-1.0)*area_normal(node_lid,dim);
             } // end for
         } // end for
+
+        // the -1 is for the inward surface area normal,
+        for (size_t node_lid = 0; node_lid < num_nodes_in_elem; node_lid++){
+            for (size_t dim = 0; dim < num_dims; dim++){
+                for (size_t igradient = 0; igradient < num_nodes_in_elem; igradient++){
+                    for (size_t jdim = 0; jdim < num_dims; jdim++){
+                        area_normal_gradients(node_lid, dim, igradient, jdim) = (-1.0)*area_normal_gradients(node_lid,dim, igradient, jdim);
+                    }
+                }
+            } // end for
+        } // end for
         
     
         
@@ -1015,6 +1083,18 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
                 // artificial viscosity can be added here to tau
             } // end for
         } //end for
+        
+        //initialize gradient array
+        for (size_t i = 0; i < 3; i++){
+            for (size_t j = 0; j < 3; j++){
+                for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
+                    tau_gradient(i, j, igradient, 0) = 0;
+                    tau_gradient(i, j, igradient, 1) = 0;
+                    if(num_dims==3)
+                    tau_gradient(i, j, igradient, 2) = 0;
+                }
+            } // end for
+        } //end for
 
         // add the pressure
         for (int i = 0; i < num_dims; i++){
@@ -1061,6 +1141,13 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         for (int i = 0; i < 4; i++){
             sum(i) = 0.0;
         }
+        for (int i = 0; i < 4; i++){
+            for (size_t igradient = 0; igradient < num_nodes_in_elem; igradient++) {
+                for (size_t jdim = 0; jdim < num_dims; jdim++) {
+                    sum_gradient(i, igradient, jdim) = 0.0;
+                }
+            }
+        }
 
         double mag;       // magnitude of the area normal
         double mag_vel;   // magnitude of velocity
@@ -1146,14 +1233,14 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
                            fabs( shock_dir(0)*area_normal(node_lid,0)
                                + shock_dir(1)*area_normal(node_lid,1)
                                + shock_dir(2)*area_normal(node_lid,2) );
-                muc_gradient(node_lid) = muc_gradient(node_lid)*
-                           fabs( shock_dir(0)*area_normal(node_lid,0)
-                               + shock_dir(1)*area_normal(node_lid,1)
-                               + shock_dir(2)*area_normal(node_lid,2) );
-                mu_term_gradient = muc(node_lid)* //amend if shock dir has dependence for gradient not captured here
-                           fabs( shock_dir(0)*area_normal_gradients(node_lid,0)
-                               + shock_dir(1)*area_normal_gradients(node_lid,1)
-                               + shock_dir(2)*area_normal_gradients(node_lid,2) );
+                // muc_gradient(node_lid) = muc_gradient(node_lid)*
+                //            fabs( shock_dir(0)*area_normal(node_lid,0)
+                //                + shock_dir(1)*area_normal(node_lid,1)
+                //                + shock_dir(2)*area_normal(node_lid,2) );
+                // mu_term_gradient = muc(node_lid)* //amend if shock dir has dependence for gradient not captured here
+                //            fabs( shock_dir(0)*area_normal_gradients(node_lid,0)
+                //                + shock_dir(1)*area_normal_gradients(node_lid,1)
+                //                + shock_dir(2)*area_normal_gradients(node_lid,2) );
             }
             else {
                // Using a full tensoral Riemann jump relation
@@ -1162,20 +1249,20 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
                                + area_normal(node_lid, 1)*area_normal(node_lid, 1)
                                + area_normal(node_lid, 2)*area_normal(node_lid, 2) );
                 for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){     
-                    for( int jdim = 0; jdim < num_dims; jdim++){          
-                        muc_gradient(node_lid,igradient,jdim) = muc_gradient(node_lid,igradient,jdim)
+                    for( int jdim = 0; jdim < num_dims; jdim++){        
+                        mu_term_gradient = muc_gradient(node_lid,igradient,jdim)
                                 * sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
                                     + area_normal(node_lid, 1)*area_normal(node_lid, 1)
                                     + area_normal(node_lid, 2)*area_normal(node_lid, 2) );
                         
-                        mu_term_gradient = muc(node_lid)*(area_normal(node_lid, 0)*area_normal_gradients(node_lid, 0,igradient,jdim)
+                        mu_term_gradient += muc(node_lid)*(area_normal(node_lid, 0)*area_normal_gradients(node_lid, 0,igradient,jdim)
                                     + area_normal(node_lid, 1)*area_normal_gradients(node_lid, 1,igradient,jdim)
                                     + area_normal(node_lid, 2)*area_normal_gradients(node_lid, 2,igradient,jdim))/
                                 sqrt( area_normal(node_lid, 0)*area_normal(node_lid, 0)
                                     + area_normal(node_lid, 1)*area_normal(node_lid, 1)
                                     + area_normal(node_lid, 2)*area_normal(node_lid, 2) );
 
-                        muc_gradient(node_lid, igradient,jdim) += mu_term_gradient;
+                        muc_gradient(node_lid, igradient,jdim) = mu_term_gradient;
                     }
                 }
             }
@@ -1185,13 +1272,22 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
             sum(2) += mu_term*vel(2);
             sum(3) += mu_term;
 
+            //sum gradients
+            
+            for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){     
+                for( int jdim = 0; jdim < num_dims; jdim++){    
+                    sum_gradient(0,igradient,jdim) += muc_gradient(node_lid, igradient,jdim)*vel(0);
+                    sum_gradient(1,igradient,jdim) += muc_gradient(node_lid, igradient,jdim)*vel(1);
+                    sum_gradient(2,igradient,jdim) += muc_gradient(node_lid, igradient,jdim)*vel(2);
+                    sum_gradient(3,igradient,jdim) += muc_gradient(node_lid, igradient,jdim);
+                }
+            }
+
             muc(node_lid) = mu_term; // the impeadance time surface area is stored here
 
         } // end for node_lid loop over nodes of the elem
 
 
-
-
         // The Riemann velocity, called vel_star
         if (sum(3) > fuzz) {
             for (size_t i = 0; i < num_dims; i++) {
@@ -1204,6 +1300,25 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
             }
         } // end if
 
+        //vel star gradients
+        if (sum(3) > fuzz) {
+            for (size_t i = 0; i < num_dims; i++) {
+                for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){     
+                    for( int jdim = 0; jdim < num_dims; jdim++){    
+                        vel_star_gradient(i,igradient,jdim) = sum_gradient(i,igradient,jdim)/sum(3)-sum(i)/(sum(3)*sum(3))*sum_gradient(3,igradient,jdim);
+                    }
+                }
+            }
+        }
+        else {
+            for (int i = 0; i < num_dims; i++){
+                for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){     
+                    for( int jdim = 0; jdim < num_dims; jdim++){    
+                        vel_star_gradient(i,igradient,jdim) = 0;
+                    }
+                }
+            }
+        } // end if
             
             
         // ---- Calculate the shock detector for the Riemann-solver ----
@@ -1287,22 +1402,36 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
                 for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
                     for(int jdim = 0; jdim < num_dims; jdim++){
                         size_t gradient_node_gid = nodes_in_elem(elem_gid, igradient);
-                        if(!map->isNodeLocalElement(gradient_node_gid)) continue;
+                        //if(!map->isNodeLocalElement(gradient_node_gid)) continue;
                         column_index = num_dims*Global_Gradient_Matrix_Assembly_Map(elem_gid, igradient, node_lid);
-                        Force_Gradient_Positions(gradient_node_gid*num_dims+jdim, column_index+dim) += area_normal(node_lid, 0)*tau_gradient(0, dim, igradient, jdim)
+                        if(map->isNodeLocalElement(gradient_node_gid)){
+                            Force_Gradient_Positions(gradient_node_gid*num_dims+jdim, column_index+dim) += area_normal(node_lid, 0)*tau_gradient(0, dim, igradient, jdim)
+                                    + area_normal(node_lid, 1)*tau_gradient(1, dim, igradient, jdim)
+                                    + area_normal(node_lid, 2)*tau_gradient(2, dim, igradient, jdim)
+                                    + area_normal_gradients(node_lid, 0, igradient, jdim)*tau(0, dim)
+                                    + area_normal_gradients(node_lid, 1, igradient, jdim)*tau(1, dim)
+                                    + area_normal_gradients(node_lid, 2, igradient, jdim)*tau(2, dim)
+                                    + phi*muc_gradient(node_lid, igradient, jdim)*(vel_star(dim) - node_vel(rk_level, node_gid, dim))
+                                    + phi*muc(node_lid)*(vel_star_gradient(dim, igradient, jdim));
+                        }
+                        corner_gradient_storage(corner_gid,dim,igradient,jdim) = area_normal(node_lid, 0)*tau_gradient(0, dim, igradient, jdim)
                                 + area_normal(node_lid, 1)*tau_gradient(1, dim, igradient, jdim)
                                 + area_normal(node_lid, 2)*tau_gradient(2, dim, igradient, jdim)
                                 + area_normal_gradients(node_lid, 0, igradient, jdim)*tau(0, dim)
                                 + area_normal_gradients(node_lid, 1, igradient, jdim)*tau(1, dim)
                                 + area_normal_gradients(node_lid, 2, igradient, jdim)*tau(2, dim)
-                                + phi*muc_gradient(node_lid, igradient, jdim)*(vel_star(dim) - node_vel(rk_level, node_gid, dim));
-                        corner_gradient_storage(corner_gid,dim,igradient,jdim) = area_normal(node_lid, 0)*tau_gradient(0, dim, igradient, jdim)
-                                + area_normal(node_lid, 1)*tau_gradient(1, dim, igradient, jdim)
-                                + area_normal(node_lid, 2)*tau_gradient(2, dim, igradient, jdim)
-                                + area_normal_gradients(node_lid, 0)*tau(0, dim, igradient, jdim)
-                                + area_normal_gradients(node_lid, 1)*tau(1, dim, igradient, jdim)
-                                + area_normal_gradients(node_lid, 2)*tau(2, dim, igradient, jdim)
-                                + phi*muc_gradient(node_lid, igradient, jdim)*(vel_star(dim) - node_vel(rk_level, node_gid, dim));
+                                + phi*muc_gradient(node_lid, igradient, jdim)*(vel_star(dim) - node_vel(rk_level, node_gid, dim))
+                                + phi*muc(node_lid)*(vel_star_gradient(dim, igradient, jdim));
+                        // if(map->isNodeLocalElement(gradient_node_gid)){
+                        //     Force_Gradient_Positions(gradient_node_gid*num_dims+jdim, column_index+dim) +=
+                        //             + area_normal_gradients(node_lid, 0, igradient, jdim)*tau(0, dim)
+                        //             + area_normal_gradients(node_lid, 1, igradient, jdim)*tau(1, dim)
+                        //             + area_normal_gradients(node_lid, 2, igradient, jdim)*tau(2, dim);
+                        // }
+                        // corner_gradient_storage(corner_gid,dim,igradient,jdim) = 
+                        //         + area_normal_gradients(node_lid, 0, igradient, jdim)*tau(0, dim)
+                        //         + area_normal_gradients(node_lid, 1, igradient, jdim)*tau(1, dim)
+                        //         + area_normal_gradients(node_lid, 2, igradient, jdim)*tau(2, dim);
                     }
                 }
             } // end loop over dimension
@@ -1317,8 +1446,8 @@ void FEA_Module_SGH::get_force_ugradient_sgh(const DCArrayKokkos <material_t> &m
         size_t mat_id = elem_mat_id(elem_gid);
         
         
-
-    }); // end parallel for loop over elements
+    }
+    //}); // end parallel for loop over elements
     
     /*
     //accumulate node values from corner storage
@@ -1351,7 +1480,7 @@ void FEA_Module_SGH::force_design_gradient_term(const_vec_array design_variables
   const DCArrayKokkos <boundary_t> boundary = module_params->boundary;
   const DCArrayKokkos <material_t> material = simparam->material;
   const int num_dim = simparam->num_dims;
-  int num_corners = rnum_elem*num_nodes_in_elem;
+  size_t num_corners = rnum_elem*num_nodes_in_elem;
   real_t global_dt;
   bool element_constant_density = true;
   size_t current_data_index, next_data_index;
@@ -1413,14 +1542,12 @@ void FEA_Module_SGH::force_design_gradient_term(const_vec_array design_variables
         // ---- Calculate velocity diveregence for the element ----
         if(num_dim==2){
             get_divergence2D(elem_div,
-                            *mesh,
                             node_coords,
                             node_vel,
                             elem_vol);
         }
         else {
             get_divergence(elem_div,
-                          *mesh,
                           node_coords,
                           node_vel,
                           elem_vol);
@@ -1537,7 +1664,8 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
     const size_t rk_level = simparam->dynamic_options.rk_num_bins - 1;
     const size_t num_dims = simparam->num_dims;
     // --- calculate the forces acting on the nodes from the element ---
-    FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
+    for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
+    //FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
         
         const size_t num_nodes_in_elem = 8;
         real_t gradient_result[num_dims];
@@ -1553,6 +1681,7 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         
         // the sums in the Riemann solver
         double sum_array[4];
+        double sum_gradient_array[4];
         
         // corner shock impeadance x |corner area normal dot shock_dir|
         double muc_array[8];
@@ -1560,6 +1689,7 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         
         // Riemann velocity
         double vel_star_array[3];
+        double vel_star_gradient_array[3];
         
         // velocity gradient
         double vel_grad_array[9];
@@ -1571,9 +1701,11 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         ViewCArrayKokkos <double> area_normal(area_normal_array, num_nodes_in_elem, num_dims);
         ViewCArrayKokkos <double> shock_dir(shock_dir_array, num_dims);
         ViewCArrayKokkos <double> sum(sum_array, 4);
+        ViewCArrayKokkos <double> sum_gradient(sum_gradient_array, 4);
         ViewCArrayKokkos <double> muc(muc_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> muc_gradient(muc_gradient_array, num_nodes_in_elem);
         ViewCArrayKokkos <double> vel_star(vel_star_array, num_dims);
+        ViewCArrayKokkos <double> vel_star_gradient(vel_star_gradient_array, num_dims);
         ViewCArrayKokkos <double> vel_grad(vel_grad_array, num_dims, num_dims);
 
         EOSParent* eos_model = elem_eos(elem_gid).model;
@@ -1638,7 +1770,7 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         for (size_t i = 0; i < 3; i++){
             for (size_t j = 0; j < 3; j++){
                 tau(i, j) = stress(i,j);
-                // artificial viscosity can be added here to tau
+                tau_gradient(i,j) = 0;
             } // end for
         } //end for
 
@@ -1680,7 +1812,7 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         
         // initialize sum term in MARS to zero
         for (int i = 0; i < 4; i++){
-            sum(i) = 0.0;
+            sum(i) = sum_gradient(i) = 0.0;
         }
 
         double mag;       // magnitude of the area normal
@@ -1777,6 +1909,11 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
             sum(2) += mu_term*vel(2);
             sum(3) += mu_term;
 
+            sum_gradient(0) += mu_term_gradient*vel(0);
+            sum_gradient(1) += mu_term_gradient*vel(1);
+            sum_gradient(2) += mu_term_gradient*vel(2);
+            sum_gradient(3) += mu_term_gradient;
+
             muc(node_lid) = mu_term; // the impeadance time surface area is stored here
             muc_gradient(node_lid) = mu_term_gradient;
 
@@ -1789,11 +1926,13 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         if (sum(3) > fuzz) {
             for (size_t i = 0; i < num_dims; i++) {
                 vel_star(i) = sum(i)/sum(3);
+                vel_star_gradient(i) = sum_gradient(i)/sum(3) - sum_gradient(3)*sum(i)/sum(3)/sum(3);
             }
         }
         else {
             for (int i = 0; i < num_dims; i++){
                 vel_star(i) = 0.0;
+                vel_star_gradient(i) = 0.0;
             }
         } // end if
 
@@ -1880,7 +2019,8 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
                           area_normal(node_lid, 0)*tau_gradient(0, dim)
                         + area_normal(node_lid, 1)*tau_gradient(1, dim)
                         + area_normal(node_lid, 2)*tau_gradient(2, dim)
-                        + phi*muc_gradient(node_lid)*(vel_star(dim) - node_vel(rk_level, node_gid, dim));
+                        + phi*muc_gradient(node_lid)*(vel_star(dim) - node_vel(rk_level, node_gid, dim))
+                        + phi*muc(node_lid)*(vel_star_gradient(dim));
 
             } // end loop over dimension
 
@@ -1894,8 +2034,8 @@ void FEA_Module_SGH::get_force_dgradient_sgh(const DCArrayKokkos <material_t> &m
         size_t mat_id = elem_mat_id(elem_gid);
         
         
-
-    }); // end parallel for loop over elements
+    }
+    //}); // end parallel for loop over elements
     
     /*
     //accumulate node values from corner storage
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/geometry.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/geometry.cpp
index 1aa309b5b..a741de817 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/geometry.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/geometry.cpp
@@ -666,6 +666,7 @@ void FEA_Module_SGH::get_vol_hex_ugradient(const ViewCArrayKokkos <double> &elem
                 case 4:
                     gradient_result =
                     (x(3)*(( z(0) - z(2))) +
+                    x(5)*((-z(0) + z(2) - z(4) + z(6)))+
                     x(6)*(( z(2) - z(5))) +
                     x(0)*((-z(2) - z(3) + z(4) + z(5))) +
                     x(2)*((z(0) + z(3) - z(5) - z(6))) +
@@ -749,14 +750,14 @@ void FEA_Module_SGH::get_vol_hex_ugradient(const ViewCArrayKokkos <double> &elem
                     x(7)*(y(6) - y(3)) +
                     x(3)*(-y(1)+ y(7) + y(6) - y(0)) +
                     x(5)*(y(1) - y(6)) +
-                    x(6)*(-y(7) + y(5) - y(3)) +
+                    x(6)*(y(1) - y(7) + y(5) - y(3)) +
                     x(0)*(-y(1) + y(3)))*twelth;
                 break;
                 case 11:
                     gradient_result =
                     (x(1)*(y(0) - y(2)) +
                     x(7)*(-y(0) + y(6) + y(2) - y(4)) +
-                    x(6)*(y(7) + y(2)) +
+                    x(6)*(-y(7) + y(2)) +
                     x(0)*(-y(2) + y(7) - y(1) + y(4)) +
                     x(2)*(y(0) + y(1) - y(7) - y(6)) +
                     x(4)*(y(7) - y(0)))*twelth;
@@ -1513,7 +1514,7 @@ void FEA_Module_SGH::get_bmatrix_gradients(const ViewCArrayKokkos <double> &B_ma
     gradient_terms(0,7) = y(0) + y(3) - y(5) - y(6);
     
     //y derivative
-    gradient_terms(1,0) = -x(1) + x(3) - x(4) + x(7);
+    gradient_terms(1,0) = -x(1) + x(3) - x(5) + x(7);
     gradient_terms(1,1) = x(0) - x(5);
     gradient_terms(1,2) = 0;
     gradient_terms(1,3) = -x(0) + x(7);
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/momentum.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/momentum.cpp
index 06fadd33b..4f4ecd9fe 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/momentum.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/momentum.cpp
@@ -1,5 +1,4 @@
 
-#include "mesh.h"
 #include "state.h"
 #include "FEA_Module_SGH.h"
 
@@ -7,7 +6,6 @@
 // This function evolves the velocity at the nodes of the mesh
 //------------------------------------------------------------------------------
 void FEA_Module_SGH::update_velocity_sgh(double rk_alpha,
-                         const mesh_t &mesh,
                          DViewCArrayKokkos <double> &node_vel,
                          const DViewCArrayKokkos <double> &node_mass,
                          const DViewCArrayKokkos <double> &corner_force
@@ -216,7 +214,6 @@ void FEA_Module_SGH::get_velgrad2D(ViewCArrayKokkos <double> &vel_grad,
 // This subroutine to calculate the velocity divergence in all elements
 //------------------------------------------------------------------------------
 void FEA_Module_SGH::get_divergence(DViewCArrayKokkos <double> &elem_div,
-                    const mesh_t mesh,
                     const DViewCArrayKokkos <double> &node_coords,
                     const DViewCArrayKokkos <double> &node_vel,
                     const DViewCArrayKokkos <double> &elem_vol
@@ -296,7 +293,6 @@ void FEA_Module_SGH::get_divergence(DViewCArrayKokkos <double> &elem_div,
 // This subroutine to calculate the velocity divergence in all elements
 //------------------------------------------------------------------------------
 void FEA_Module_SGH::get_divergence2D(DViewCArrayKokkos <double> &elem_div,
-                      const mesh_t mesh,
                       const DViewCArrayKokkos <double> &node_coords,
                       const DViewCArrayKokkos <double> &node_vel,
                       const DViewCArrayKokkos <double> &elem_vol
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/power_gradients_sgh.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/power_gradients_sgh.cpp
index b9f2ae18f..394c8582f 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/power_gradients_sgh.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/power_gradients_sgh.cpp
@@ -77,14 +77,12 @@ void FEA_Module_SGH::power_design_gradient_term(const_vec_array design_variables
         // ---- Calculate velocity diveregence for the element ----
         if(num_dim==2){
             get_divergence2D(elem_div,
-                            *mesh,
                             node_coords,
                             node_vel,
                             elem_vol);
         }
         else {
             get_divergence(elem_div,
-                          *mesh,
                           node_coords,
                           node_vel,
                           elem_vol);
@@ -237,6 +235,7 @@ void FEA_Module_SGH::get_power_dgradient_sgh(double rk_alpha,
     FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
 
         double elem_power = 0.0;
+        elem_power_dgradients(elem_gid) = 0;
 
         // --- tally the contribution from each corner to the element ---
 
@@ -260,7 +259,7 @@ void FEA_Module_SGH::get_power_dgradient_sgh(double rk_alpha,
             for (size_t dim=0; dim<num_dims; dim++){
                 
                 double half_vel = (node_vel(rk_level, node_gid, dim) + node_vel(0, node_gid, dim))*0.5;
-                elem_power_dgradients(elem_gid) += corner_vector_storage(corner_gid, dim)*node_radius*half_vel;
+                elem_power_dgradients(elem_gid) -= corner_vector_storage(corner_gid, dim)*node_radius*node_vel(rk_level, node_gid, dim);
                 
             } // end for dim
             
@@ -295,7 +294,8 @@ void FEA_Module_SGH::get_power_ugradient_sgh(double rk_alpha,
     Kokkos::fence();
 
     // loop over all the elements in the mesh
-    FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
+    for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
+    //FOR_ALL_CLASS (elem_gid, 0, rnum_elem, {
 
         double elem_power = 0.0;
 
@@ -322,17 +322,18 @@ void FEA_Module_SGH::get_power_ugradient_sgh(double rk_alpha,
             for (size_t dim=0; dim<num_dims; dim++){
                 for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
                     for (size_t jdim=0; jdim<num_dims; jdim++){
-                        column_id = Element_Gradient_Matrix_Assembly_Map(elem_gid,node_lid);
+                        column_id = Element_Gradient_Matrix_Assembly_Map(elem_gid,igradient);
                         gradient_node_id = nodes_in_elem(elem_gid,igradient);
                         if(!map->isNodeLocalElement(gradient_node_id)) continue;
-                        Power_Gradient_Velocities(gradient_node_id*num_dims+jdim, column_id) += corner_gradient_storage(corner_gid,dim,igradient,jdim)*node_vel(rk_level, node_gid, dim)*node_radius;
+                        Power_Gradient_Positions(gradient_node_id*num_dims+jdim, column_id) -= corner_gradient_storage(corner_gid,dim,igradient,jdim)*node_vel(rk_level, node_gid, dim)*node_radius;
                     }
                 }
             } // end for dim
             
         } // end for node_lid
 
-    }); // end parallel loop over the elements
+    //}); // end parallel loop over the elements
+    }
     
     return;
 } // end subroutine
@@ -388,15 +389,15 @@ void FEA_Module_SGH::get_power_vgradient_sgh(double rk_alpha,
             // calculate the Power=F dot V for this corner
             for (size_t dim=0; dim<num_dims; dim++){
                 for(int igradient = 0; igradient < num_nodes_in_elem; igradient++){
-                    column_id = Element_Gradient_Matrix_Assembly_Map(elem_gid,node_lid);
+                    column_id = Element_Gradient_Matrix_Assembly_Map(elem_gid,igradient);
                     gradient_node_id = nodes_in_elem(elem_gid,igradient);
                     if(!map->isNodeLocalElement(gradient_node_id)) continue;
                     if(node_lid==igradient){
-                        Power_Gradient_Velocities(gradient_node_id*num_dims+dim, column_id) += corner_gradient_storage(corner_gid,dim,igradient,dim)*node_vel(rk_level, node_gid, dim)*node_radius+
+                        Power_Gradient_Velocities(gradient_node_id*num_dims+dim, column_id) -= corner_gradient_storage(corner_gid,dim,igradient,dim)*node_vel(rk_level, node_gid, dim)*node_radius+
                                                                                                             corner_force(corner_gid, dim)*node_radius;
                     }
                     else{
-                        Power_Gradient_Velocities(gradient_node_id*num_dims+dim, column_id) += corner_gradient_storage(corner_gid,dim,igradient,dim)*node_vel(rk_level, node_gid, dim)*node_radius;
+                        Power_Gradient_Velocities(gradient_node_id*num_dims+dim, column_id) -= corner_gradient_storage(corner_gid,dim,igradient,dim)*node_vel(rk_level, node_gid, dim)*node_radius;
                     }
                 }
             } // end for dim
@@ -456,7 +457,7 @@ void FEA_Module_SGH::get_power_egradient_sgh(double rk_alpha,
             for (size_t dim=0; dim<num_dims; dim++){
                 
                 double half_vel = (node_vel(rk_level, node_gid, dim) + node_vel(0, node_gid, dim))*0.5;
-                Power_Gradient_Energies(elem_gid) += Force_Gradient_Energies(elem_gid, node_lid*num_dims+dim)*node_radius*half_vel;
+                Power_Gradient_Energies(elem_gid) -= Force_Gradient_Energies(elem_gid, node_lid*num_dims+dim)*node_radius*node_vel(rk_level, node_gid, dim);
                 
             } // end for dim
             
diff --git a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/sgh_optimization.cpp b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/sgh_optimization.cpp
index c73b57dc3..960341b18 100644
--- a/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/sgh_optimization.cpp
+++ b/src/Parallel-Solvers/Parallel-Explicit/SGH_Solver/sgh_optimization.cpp
@@ -488,6 +488,14 @@ void FEA_Module_SGH::update_forward_solve(Teuchos::RCP<const MV> zp){
     if(simparam->topology_optimization_on||simparam->shape_optimization_on){
       //assemble_matrix();
     }
+
+    // update host copies of arrays modified in this function
+    elem_den.update_host();
+    elem_mass.update_host();
+    elem_sie.update_host();
+    elem_stress.update_host();
+    elem_pres.update_host();
+    elem_sspd.update_host(); 
     
     //execute solve
     sgh_solve();
@@ -526,6 +534,7 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
   //initialize first adjoint vector at last_time_step to 0 as the terminal value
   (*adjoint_vector_data)[last_time_step+1]->putScalar(0);
   (*phi_adjoint_vector_data)[last_time_step+1]->putScalar(0);
+  (*psi_adjoint_vector_data)[last_time_step+1]->putScalar(0);
 
   //solve terminal value problem, proceeds in time backward. For simplicity, we use the same timestep data from the forward solve.
   //A linear interpolant is assumed between velocity data points; velocity midpoint is used to update the adjoint.
@@ -541,8 +550,8 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
         if(myrank==0)
           printf("cycle = %d, time = %f, time step = %f \n", cycle, time_data[cycle], global_dt);
       }
-          // print time step every 10 cycles
-      else if (cycle%1==0){
+          // print time step every 20 cycles
+      else if (cycle%20==0){
         if(myrank==0)
           printf("cycle = %d, time = %f, time step = %f \n", cycle, time_data[cycle], global_dt);
       } // end if
@@ -588,14 +597,12 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
       // ---- Calculate velocity diveregence for the element ----
       if(num_dim==2){
           get_divergence2D(elem_div,
-                          *mesh,
                           node_coords,
                           node_vel,
                           elem_vol);
       }
       else {
           get_divergence(elem_div,
-                        *mesh,
                         node_coords,
                         node_vel,
                         elem_vol);
@@ -635,6 +642,41 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
                         cycle);
       }
 
+      if(num_dim==2){
+            get_force_sgh2D(material,
+                            *mesh,
+                            node_coords,
+                            node_vel,
+                            elem_den,
+                            elem_sie,
+                            elem_pres,
+                            elem_stress,
+                            elem_sspd,
+                            elem_vol,
+                            elem_div,
+                            elem_mat_id,
+                            corner_force,
+                            1.0,
+                            cycle);
+        }
+        else {
+            get_force_sgh(material,
+                        *mesh,
+                        node_coords,
+                        node_vel,
+                        elem_den,
+                        elem_sie,
+                        elem_pres,
+                        elem_stress,
+                        elem_sspd,
+                        elem_vol,
+                        elem_div,
+                        elem_mat_id,
+                        corner_force,
+                        1.0,
+                        cycle);
+        }
+
       //compute gradient matrices
       get_force_egradient_sgh(material,
                               *mesh,
@@ -720,14 +762,13 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
       vec_array phi_midpoint_adjoint_vector =  phi_adjoint_vector_distributed->getLocalView<device_type> (Tpetra::Access::ReadWrite);
       vec_array psi_midpoint_adjoint_vector =  psi_adjoint_vector_distributed->getLocalView<device_type> (Tpetra::Access::ReadWrite);
       
-      //half step update for RK2 scheme
+      //half step update for RK2 scheme; EQUATION 1
       FOR_ALL_CLASS(node_gid, 0, nlocal_nodes, {
         real_t rate_of_change;
         real_t matrix_contribution;
         size_t dof_id;
         size_t elem_id;
         for (int idim = 0; idim < num_dim; idim++){
-
           //EQUATION 1
           matrix_contribution = 0;
           //compute resulting row of force velocity gradient matrix transpose right multiplied by adjoint vector
@@ -741,12 +782,23 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
             elem_id = elems_in_node(node_gid,ielem);
             matrix_contribution += psi_previous_adjoint_vector(elem_id,0)*Power_Gradient_Velocities(node_gid*num_dim+idim,ielem);
           }
-          
           rate_of_change = previous_velocity_vector(node_gid,idim)- 
                             matrix_contribution/node_mass(node_gid)-
                             phi_previous_adjoint_vector(node_gid,idim)/node_mass(node_gid);
           midpoint_adjoint_vector(node_gid,idim) = -rate_of_change*global_dt/2 + previous_adjoint_vector(node_gid,idim);
 
+        } 
+      }); // end parallel for
+      Kokkos::fence();
+      
+      //half step update for RK2 scheme; EQUATION 2
+      FOR_ALL_CLASS(node_gid, 0, nlocal_nodes, {
+        real_t rate_of_change;
+        real_t matrix_contribution;
+        size_t dof_id;
+        size_t elem_id;
+        for (int idim = 0; idim < num_dim; idim++){
+
           //EQUATION 2
           matrix_contribution = 0;
           //compute resulting row of force displacement gradient matrix transpose right multiplied by adjoint vector
@@ -768,8 +820,10 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
         } 
       }); // end parallel for
       Kokkos::fence();
+      
+      //phi_adjoint_vector_distributed->describe(*fos,Teuchos::VERB_EXTREME);
 
-      //half step update for RK2 scheme
+      //half step update for RK2 scheme; EQUATION 3
       FOR_ALL_CLASS(elem_gid, 0, rnum_elem, {
         real_t rate_of_change;
         real_t matrix_contribution;
@@ -787,9 +841,15 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
         psi_midpoint_adjoint_vector(elem_gid,0) = -rate_of_change*global_dt/2 + psi_previous_adjoint_vector(elem_gid,0);
       }); // end parallel for
       Kokkos::fence();
-
+      
+      //apply BCs to adjoint vector, only matters for the momentum adjoint if using strictly velocity boundary conditions
       boundary_adjoint(*mesh, boundary,midpoint_adjoint_vector, phi_midpoint_adjoint_vector, psi_midpoint_adjoint_vector);
-      comm_adjoint_vectors(cycle);
+      comm_adjoint_vector(cycle);
+      comm_phi_adjoint_vector(cycle);
+      
+      //save for second half of RK
+      (*psi_adjoint_vector_data)[cycle]->assign(*psi_adjoint_vector_distributed);
+
       //swap names to get ghost nodes for the midpoint vectors
       vec_array current_adjoint_vector = adjoint_vector_distributed->getLocalView<device_type> (Tpetra::Access::ReadWrite);
       vec_array phi_current_adjoint_vector = phi_adjoint_vector_distributed->getLocalView<device_type> (Tpetra::Access::ReadWrite);
@@ -798,8 +858,179 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
       phi_midpoint_adjoint_vector =  (*phi_adjoint_vector_data)[cycle]->getLocalView<device_type> (Tpetra::Access::ReadWrite);
       psi_midpoint_adjoint_vector =  (*psi_adjoint_vector_data)[cycle]->getLocalView<device_type> (Tpetra::Access::ReadWrite);
 
+      //compute gradients at midpoint
+      FOR_ALL_CLASS(node_gid, 0, nlocal_nodes+nghost_nodes, {
+        for (int idim = 0; idim < num_dim; idim++){
+          node_vel(rk_level,node_gid,idim) = 0.5*(previous_velocity_vector(node_gid,idim)+current_velocity_vector(node_gid,idim));
+          node_coords(rk_level,node_gid,idim) = 0.5*(previous_coordinate_vector(node_gid,idim)+current_coordinate_vector(node_gid,idim));
+        }
+      });
+      Kokkos::fence();
+
+      FOR_ALL_CLASS(elem_gid, 0, rnum_elem, {
+        elem_sie(rk_level,elem_gid) = 0.5*(previous_element_internal_energy(elem_gid,0)+current_element_internal_energy(elem_gid,0));
+      });
+      Kokkos::fence();
+
+      //set state according to phase data at this timestep
+
+      get_vol();
+
+      // ---- Calculate velocity diveregence for the element ----
+      if(num_dim==2){
+          get_divergence2D(elem_div,
+                          node_coords,
+                          node_vel,
+                          elem_vol);
+      }
+      else {
+          get_divergence(elem_div,
+                        node_coords,
+                        node_vel,
+                        elem_vol);
+      } // end if 2D
+
+      // ---- Calculate elem state (den, pres, sound speed, stress) for next time step ----
+      if(num_dim==2){
+          update_state2D(material,
+                          *mesh,
+                          node_coords,
+                          node_vel,
+                          elem_den,
+                          elem_pres,
+                          elem_stress,
+                          elem_sspd,
+                          elem_sie,
+                          elem_vol,
+                          elem_mass,
+                          elem_mat_id,
+                          1.0,
+                          cycle);
+      }
+      else{
+          update_state(material,
+                        *mesh,
+                        node_coords,
+                        node_vel,
+                        elem_den,
+                        elem_pres,
+                        elem_stress,
+                        elem_sspd,
+                        elem_sie,
+                        elem_vol,
+                        elem_mass,
+                        elem_mat_id,
+                        1.0,
+                        cycle);
+      }
+
+      if(num_dim==2){
+            get_force_sgh2D(material,
+                            *mesh,
+                            node_coords,
+                            node_vel,
+                            elem_den,
+                            elem_sie,
+                            elem_pres,
+                            elem_stress,
+                            elem_sspd,
+                            elem_vol,
+                            elem_div,
+                            elem_mat_id,
+                            corner_force,
+                            1.0,
+                            cycle);
+        }
+        else {
+            get_force_sgh(material,
+                        *mesh,
+                        node_coords,
+                        node_vel,
+                        elem_den,
+                        elem_sie,
+                        elem_pres,
+                        elem_stress,
+                        elem_sspd,
+                        elem_vol,
+                        elem_div,
+                        elem_mat_id,
+                        corner_force,
+                        1.0,
+                        cycle);
+        }
+
+      //compute gradient matrices
+      get_force_egradient_sgh(material,
+                              *mesh,
+                              node_coords,
+                              node_vel,
+                              elem_den,
+                              elem_sie,
+                              elem_pres,
+                              elem_stress,
+                              elem_sspd,
+                              elem_vol,
+                              elem_div,
+                              elem_mat_id,
+                              1.0,
+                              cycle);
+
+      get_power_egradient_sgh(1.0,
+                              *mesh,
+                              node_vel,
+                              node_coords,
+                              elem_sie,
+                              elem_mass,
+                              corner_force);
+
+      get_force_vgradient_sgh(material,
+                              *mesh,
+                              node_coords,
+                              node_vel,
+                              elem_den,
+                              elem_sie,
+                              elem_pres,
+                              elem_stress,
+                              elem_sspd,
+                              elem_vol,
+                              elem_div,
+                              elem_mat_id,
+                              1.0,
+                              cycle);
+      
+      get_power_vgradient_sgh(1.0,
+                              *mesh,
+                              node_vel,
+                              node_coords,
+                              elem_sie,
+                              elem_mass,
+                              corner_force);
+
+      get_force_ugradient_sgh(material,
+                              *mesh,
+                              node_coords,
+                              node_vel,
+                              elem_den,
+                              elem_sie,
+                              elem_pres,
+                              elem_stress,
+                              elem_sspd,
+                              elem_vol,
+                              elem_div,
+                              elem_mat_id,
+                              1.0,
+                              cycle);
+
+      get_power_ugradient_sgh(1.0,
+                              *mesh,
+                              node_vel,
+                              node_coords,
+                              elem_sie,
+                              elem_mass,
+                              corner_force);
+
       
-      //full step update with midpoint gradient for RK2 scheme
+      //full step update with midpoint gradient for RK2 scheme; EQUATION 1
       FOR_ALL_CLASS(node_gid, 0, nlocal_nodes, {
         real_t rate_of_change;
         real_t matrix_contribution;
@@ -825,6 +1056,17 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
                             matrix_contribution/node_mass(node_gid)-
                             phi_midpoint_adjoint_vector(node_gid,idim)/node_mass(node_gid);
           current_adjoint_vector(node_gid,idim) = -rate_of_change*global_dt + previous_adjoint_vector(node_gid,idim);
+        }
+      }); // end parallel for
+      Kokkos::fence();
+
+      //full step update with midpoint gradient for RK2 scheme; EQUATION 2
+      FOR_ALL_CLASS(node_gid, 0, nlocal_nodes, {
+        real_t rate_of_change;
+        real_t matrix_contribution;
+        size_t dof_id;
+        size_t elem_id;
+        for (int idim = 0; idim < num_dim; idim++){
 
           //EQUATION 2
           matrix_contribution = 0;
@@ -846,7 +1088,8 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
         }
       }); // end parallel for
       Kokkos::fence();
-      //half step update for RK2 scheme
+
+      //full step update for RK2 scheme; EQUATION 3
       FOR_ALL_CLASS(elem_gid, 0, rnum_elem, {
         real_t rate_of_change;
         real_t matrix_contribution;
@@ -860,23 +1103,22 @@ void FEA_Module_SGH::compute_topology_optimization_adjoint_full(){
           matrix_contribution += midpoint_adjoint_vector(dof_id/num_dim,dof_id%num_dim)*Force_Gradient_Energies(elem_gid,idof);
         }
         rate_of_change = -(matrix_contribution + psi_midpoint_adjoint_vector(elem_gid,0)*Power_Gradient_Energies(elem_gid))/elem_mass(elem_gid);
-        //rate_of_change = -0.0000001*previous_adjoint_vector(node_gid,idim);
-        psi_current_adjoint_vector(elem_gid,0) = -rate_of_change*global_dt/2 + psi_midpoint_adjoint_vector(elem_gid,0);
+        //debug
+        //std::cout << "PSI RATE OF CHANGE " << rate_of_change << std::endl;
+        psi_current_adjoint_vector(elem_gid,0) = -rate_of_change*global_dt + psi_previous_adjoint_vector(elem_gid,0);
       }); // end parallel for
       Kokkos::fence();
 
-      
-      boundary_adjoint(*mesh, boundary,current_adjoint_vector, phi_current_adjoint_vector, psi_midpoint_adjoint_vector);
+      boundary_adjoint(*mesh, boundary,current_adjoint_vector, phi_current_adjoint_vector, psi_current_adjoint_vector);
+      comm_adjoint_vector(cycle);
+      comm_phi_adjoint_vector(cycle);
+      //save data from time-step completion
+      (*psi_adjoint_vector_data)[cycle]->assign(*psi_adjoint_vector_distributed);
 
     } //end view scope
     
-
-    comm_adjoint_vectors(cycle);
     //phi_adjoint_vector_distributed->describe(*fos,Teuchos::VERB_EXTREME);
     
-      //debug
-      if(cycle==0)
-      std::cout << "REACHED THIS STEP OF ADJOINT" << std::endl;
   }
 }
 
@@ -907,10 +1149,93 @@ void FEA_Module_SGH::compute_topology_optimization_gradient_full(Teuchos::RCP<co
   { //view scope
     vec_array design_gradients = design_gradients_distributed->getLocalView<device_type> (Tpetra::Access::ReadWrite);
     const_vec_array design_densities = design_densities_distributed->getLocalView<device_type> (Tpetra::Access::ReadOnly);
-    //compute design gradients
+    //initialize design gradients
+    FOR_ALL_CLASS(node_id, 0, nlocal_nodes, {
+      design_gradients(node_id,0) = 0;
+    }); // end parallel for
+    Kokkos::fence();
+
+    //gradient contribution from kinetic energy v(dM/drho)v product.
+    if(simparam->dynamic_options.output_time_sequence_level==TIME_OUTPUT_LEVEL::extreme){
+        if(myrank==0){
+          std::cout << "v*dM/drho*v term" << std::endl;
+        }
+    }
+
+    for (unsigned long cycle = 0; cycle < last_time_step+1; cycle++) {
+      //compute timestep from time data
+      global_dt = time_data[cycle+1] - time_data[cycle];
+      
+      //print
+      if(simparam->dynamic_options.output_time_sequence_level==TIME_OUTPUT_LEVEL::extreme){
+
+        if (cycle==0){
+          if(myrank==0)
+            printf("cycle = %lu, time = %f, time step = %f \n", cycle, time_data[cycle], global_dt);
+        }
+            // print time step every 10 cycles
+        else if (cycle%20==0){
+          if(myrank==0)
+            printf("cycle = %lu, time = %f, time step = %f \n", cycle, time_data[cycle], global_dt);
+        } // end if
+      }
+        //view scope
+        {
+          const_vec_array current_velocity_vector = (*forward_solve_velocity_data)[cycle]->getLocalView<device_type> (Tpetra::Access::ReadOnly);
+          const_vec_array next_velocity_vector = (*forward_solve_velocity_data)[cycle+1]->getLocalView<device_type> (Tpetra::Access::ReadOnly);
+          
+          FOR_ALL_CLASS(elem_id, 0, rnum_elem, {
+            size_t node_id;
+            size_t corner_id;
+            real_t inner_product;
+            //std::cout << elem_mass(elem_id) <<std::endl;
+            //current_nodal_velocities
+            for (int inode = 0; inode < num_nodes_in_elem; inode++){
+              node_id = nodes_in_elem(elem_id, inode);
+              //midpoint rule for integration being used; add velocities and divide by 2
+              current_element_velocities(inode,0) = (current_velocity_vector(node_id,0) + next_velocity_vector(node_id,0))/2;
+              current_element_velocities(inode,1) = (current_velocity_vector(node_id,1) + next_velocity_vector(node_id,1))/2;
+              if(num_dim==3)
+              current_element_velocities(inode,2) = (current_velocity_vector(node_id,2) + next_velocity_vector(node_id,2))/2;
+            }
+
+            inner_product = 0;
+            for(int ifill=0; ifill < num_nodes_in_elem; ifill++){
+              node_id = nodes_in_elem(elem_id, ifill);
+              for(int idim=0; idim < num_dim; idim++){
+                inner_product += elem_mass(elem_id)*current_element_velocities(ifill,idim)*current_element_velocities(ifill,idim);
+              }
+            }
+
+            for (int inode = 0; inode < num_nodes_in_elem; inode++){
+              //compute gradient of local element contribution to v^t*M*v product
+              corner_id = elem_id*num_nodes_in_elem + inode;
+              //division by design ratio recovers nominal element mass used in the gradient operator
+              corner_value_storage(corner_id) = inner_product*global_dt/relative_element_densities(elem_id);
+            }
+            
+          }); // end parallel for
+          Kokkos::fence();
+          
+          //accumulate node values from corner storage
+          //multiply
+          FOR_ALL_CLASS(node_id, 0, nlocal_nodes, {
+            size_t corner_id;
+            for(int icorner=0; icorner < num_corners_in_node(node_id); icorner++){
+              corner_id = corners_in_node(node_id,icorner);
+              design_gradients(node_id,0) += corner_value_storage(corner_id);
+            }
+          }); // end parallel for
+          Kokkos::fence();
+
+        } //end view scope
+
+    }
+  
+
+    //multiply by Hex8 constants (the diagonlization here only works for Hex8 anyway)
     FOR_ALL_CLASS(node_id, 0, nlocal_nodes, {
       design_gradients(node_id,0) *=0.5/(double)num_nodes_in_elem/(double)num_nodes_in_elem;
-      //design_gradients(node_id,0) =0.00001;
     }); // end parallel for
     Kokkos::fence();
 
@@ -1035,7 +1360,6 @@ void FEA_Module_SGH::compute_topology_optimization_gradient_full(Teuchos::RCP<co
             size_t node_id;
             size_t corner_id;
             real_t inner_product;
-            //std::cout << elem_mass(elem_id) <<std::endl;
 
             psi_dot_current = (next_psi_adjoint_vector(elem_id,0) - current_psi_adjoint_vector(elem_id,0))/global_dt;
             inner_product = elem_mass(elem_id)*(psi_dot_current+psi_dot_current)*current_element_internal_energy(elem_id,0)/2;
@@ -1121,6 +1445,7 @@ void FEA_Module_SGH::compute_topology_optimization_gradient_full(Teuchos::RCP<co
       const_vec_array current_element_internal_energy = (*forward_solve_internal_energy_data)[0]->getLocalView<device_type> (Tpetra::Access::ReadOnly);
       const_vec_array current_psi_adjoint_vector = (*psi_adjoint_vector_data)[0]->getLocalView<device_type> (Tpetra::Access::ReadOnly);
       
+      //(*psi_adjoint_vector_data)[100]->describe(*fos,Teuchos::VERB_EXTREME);
       FOR_ALL_CLASS(elem_id, 0, rnum_elem, {
         real_t lambda_dot;
         size_t node_id;
@@ -1267,8 +1592,8 @@ void FEA_Module_SGH::init_assembly(){
   current_row_nodes_scanned = CArrayKokkos<size_t, array_layout, device_type, memory_traits>(max_stride, "current_row_nodes_scanned");
 
   //allocate sparse graph with node repeats
-  RaggedRightArrayKokkos<size_t, array_layout, device_type, memory_traits> Repeat_Graph_Matrix(Graph_Matrix_Strides_initial);
-  RaggedRightArrayofVectorsKokkos<size_t, array_layout, device_type, memory_traits> Element_local_indices(Graph_Matrix_Strides_initial,num_dim);
+  RaggedRightArrayKokkos<LO, array_layout, device_type, memory_traits> Repeat_Graph_Matrix(Graph_Matrix_Strides_initial);
+  RaggedRightArrayofVectorsKokkos<LO, array_layout, device_type, memory_traits> Element_local_indices(Graph_Matrix_Strides_initial,3);
   
   //Fill the initial Graph with repeats
   if(num_dim == 2){
@@ -1395,7 +1720,7 @@ void FEA_Module_SGH::init_assembly(){
     
   Graph_Matrix_Strides.update_host();
   //copy reduced content to non_repeat storage
-  Graph_Matrix = RaggedRightArrayKokkos<GO, array_layout, device_type, memory_traits>(Graph_Matrix_Strides);
+  Graph_Matrix = RaggedRightArrayKokkos<LO, array_layout, device_type, memory_traits>(Graph_Matrix_Strides);
 
   FOR_ALL_CLASS(inode, 0, nlocal_nodes, {
     for(int istride = 0; istride < Graph_Matrix_Strides(inode); istride++){
@@ -1415,8 +1740,26 @@ void FEA_Module_SGH::init_assembly(){
   }); // end parallel for
 
   Gradient_Matrix_Strides.update_host();
+
+  //build inverse map for element gradient assembly
+  for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
+      FOR_ALL_CLASS(node_lid, 0, num_nodes_in_elem, {
+          
+          // get the global_id of the node
+          size_t node_gid = nodes_in_elem(elem_gid, node_lid);
+          
+          // the column index is the num corners saved
+          size_t j = count_saved_corners_in_node(node_gid);
+          Element_Gradient_Matrix_Assembly_Map(elem_gid, node_lid) = j;
+
+          // increment the number of corners saved to this node_gid
+          count_saved_corners_in_node(node_gid)++;
+
+      });  // end FOR_ALL over nodes in element
+      Kokkos::fence();  
+  } // end for elem_gid
   
-  DOF_Graph_Matrix = RaggedRightArrayKokkos<GO, array_layout, device_type, memory_traits>(Gradient_Matrix_Strides);
+  DOF_Graph_Matrix = RaggedRightArrayKokkos<LO, array_layout, device_type, memory_traits>(Gradient_Matrix_Strides);
   Force_Gradient_Positions = RaggedRightArrayKokkos<real_t, Kokkos::LayoutRight, device_type, memory_traits, array_layout>(Gradient_Matrix_Strides);
   Force_Gradient_Velocities = RaggedRightArrayKokkos<real_t, Kokkos::LayoutRight, device_type, memory_traits, array_layout>(Gradient_Matrix_Strides);
   //needs different graph of node to elem rather than node to node
@@ -1425,7 +1768,7 @@ void FEA_Module_SGH::init_assembly(){
   DOF_to_Elem_Matrix_Strides = DCArrayKokkos<size_t, array_layout, device_type, memory_traits>(nlocal_nodes*num_dim);
   FOR_ALL_CLASS(inode, 0, nlocal_nodes, {
     for(int idim = 0; idim < num_dim; idim++){
-      DOF_to_Elem_Matrix_Strides(inode*num_dim + idim) = node_to_elem_strides(inode);
+      DOF_to_Elem_Matrix_Strides(inode*num_dim + idim) = count_saved_corners_in_node(inode);
     }
   }); // end parallel for
   DOF_to_Elem_Matrix_Strides.update_host();
@@ -1443,25 +1786,6 @@ void FEA_Module_SGH::init_assembly(){
     }
   }); // end parallel for
 
-  
-  //build inverse map for element gradient assembly
-  for (size_t elem_gid = 0; elem_gid < rnum_elem; elem_gid++){
-      FOR_ALL_CLASS(node_lid, 0, num_nodes_in_elem, {
-          
-          // get the global_id of the node
-          size_t node_gid = nodes_in_elem(elem_gid, node_lid);
-          
-          // the column index is the num corners saved
-          size_t j = count_saved_corners_in_node(node_gid);
-          
-          Element_Gradient_Matrix_Assembly_Map(elem_gid, node_lid) = j;
-
-          // increment the number of corners saved to this node_gid
-          count_saved_corners_in_node(node_gid)++;
-
-      });  // end FOR_ALL over nodes in element
-  } // end for elem_gid
-  Kokkos::fence();  
 
   /*
   //construct distributed gradient matrix from local kokkos data
@@ -1545,8 +1869,10 @@ void FEA_Module_SGH::boundary_adjoint(const mesh_t &mesh,
                 size_t bdy_node_gid = bdy_nodes_in_set(bdy_set, bdy_node_lid);
                     
                 // Set velocity to zero in that directdion
-                if(bdy_node_gid<nlocal_nodes)
+                if(bdy_node_gid<nlocal_nodes){
                   node_adjoint(bdy_node_gid, direction) = 0.0;
+                  node_phi_adjoint(bdy_node_gid, direction) = 0.0;
+                }
                 //node_phi_adjoint(bdy_node_gid, direction) = 0.0;
                         
             }
@@ -1559,8 +1885,10 @@ void FEA_Module_SGH::boundary_adjoint(const mesh_t &mesh,
 
                 for(size_t dim=0; dim < num_dims; dim++){
                     // Set velocity to zero
-                  if(bdy_node_gid<nlocal_nodes)
+                  if(bdy_node_gid<nlocal_nodes){
                     node_adjoint(bdy_node_gid, dim) = 0.0;
+                    node_phi_adjoint(bdy_node_gid, dim) = 0.0;
+                  }
                   //node_phi_adjoint(bdy_node_gid, dim) = 0.0;
                 }
                 
@@ -1582,30 +1910,16 @@ void FEA_Module_SGH::boundary_adjoint(const mesh_t &mesh,
    Communicate updated nodal adjoint vectors to ghost nodes
 ------------------------------------------------------------------------- */
 
-void FEA_Module_SGH::comm_adjoint_vectors(int cycle){
-  
-  //debug print of design vector
-      //std::ostream &out = std::cout;
-      //Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
-      //if(myrank==0)
-      //*fos << "Density data :" << std::endl;
-      //node_densities_distributed->describe(*fos,Teuchos::VERB_EXTREME);
-      //*fos << std::endl;
-      //std::fflush(stdout);
-
-  //communicate design densities
-  //create import object using local node indices map and all indices map
-  //Tpetra::Import<LO, GO> importer(map, all_node_map);
-  
+void FEA_Module_SGH::comm_adjoint_vector(int cycle){
   //comms to get ghosts
   (*adjoint_vector_data)[cycle]->doImport(*adjoint_vector_distributed, *importer, Tpetra::INSERT);
+}
+
+/* ----------------------------------------------------------------------
+   Communicate updated nodal adjoint vectors to ghost nodes
+------------------------------------------------------------------------- */
+
+void FEA_Module_SGH::comm_phi_adjoint_vector(int cycle){
+  //comms to get ghosts
   (*phi_adjoint_vector_data)[cycle]->doImport(*phi_adjoint_vector_distributed, *importer, Tpetra::INSERT);
-  //all_node_map->describe(*fos,Teuchos::VERB_EXTREME);
-  //all_node_velocities_distributed->describe(*fos,Teuchos::VERB_EXTREME);
-  
-  //update_count++;
-  //if(update_count==1){
-      //MPI_Barrier(world);
-      //MPI_Abort(world,4);
-  //}
 }
diff --git a/src/Parallel-Solvers/Parallel-Explicit/Topology_Optimization/Area_Normals.h b/src/Parallel-Solvers/Parallel-Explicit/Topology_Optimization/Area_Normals.h
new file mode 100644
index 000000000..f4ad28896
--- /dev/null
+++ b/src/Parallel-Solvers/Parallel-Explicit/Topology_Optimization/Area_Normals.h
@@ -0,0 +1,242 @@
+/**********************************************************************************************
+ © 2020. Triad National Security, LLC. All rights reserved.
+ This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos
+ National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
+ Department of Energy/National Nuclear Security Administration. All rights in the program are
+ reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear
+ Security Administration. The Government is granted for itself and others acting on its behalf a
+ nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare
+ derivative works, distribute copies to the public, perform publicly and display publicly, and
+ to permit others to do so.
+ This program is open source under the BSD-3 License.
+ Redistribution and use in source and binary forms, with or without modification, are permitted
+ provided that the following conditions are met:
+ 
+ 1.  Redistributions of source code must retain the above copyright notice, this list of
+ conditions and the following disclaimer.
+ 
+ 2.  Redistributions in binary form must reproduce the above copyright notice, this list of
+ conditions and the following disclaimer in the documentation and/or other materials
+ provided with the distribution.
+ 
+ 3.  Neither the name of the copyright holder nor the names of its contributors may be used
+ to endorse or promote products derived from this software without specific prior
+ written permission.
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+ IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ **********************************************************************************************/
+ 
+#ifndef AREA_NORMALS_SHAPEOPT_H
+#define AREA_NORMALS_SHAPEOPT_H
+
+#include "matar.h"
+#include "elements.h"
+#include <string>
+#include <Teuchos_ScalarTraits.hpp>
+#include <Teuchos_RCP.hpp>
+#include <Teuchos_oblackholestream.hpp>
+#include <Teuchos_Tuple.hpp>
+#include <Teuchos_VerboseObject.hpp>
+
+#include <Tpetra_Core.hpp>
+#include <Tpetra_Map.hpp>
+#include <Tpetra_MultiVector.hpp>
+#include "Tpetra_Details_makeColMap.hpp"
+#include "Tpetra_Details_DefaultTypes.hpp"
+
+#include "ROL_Types.hpp"
+#include <ROL_TpetraMultiVector.hpp>
+#include "ROL_Objective.hpp"
+#include "ROL_Elementwise_Reduce.hpp"
+#include "FEA_Module_SGH.h"
+#include "FEA_Module_Dynamic_Elasticity.h"
+#include "Explicit_Solver.h"
+
+class AreaNormals : public ROL::Objective<real_t> {
+  
+  typedef Tpetra::Map<>::local_ordinal_type LO;
+  typedef Tpetra::Map<>::global_ordinal_type GO;
+  typedef Tpetra::Map<>::node_type Node;
+  typedef Tpetra::Map<LO, GO, Node> Map;
+  typedef Tpetra::MultiVector<real_t, LO, GO, Node> MV;
+  typedef ROL::Vector<real_t> V;
+  typedef ROL::TpetraMultiVector<real_t,LO,GO,Node> ROL_MV;
+  
+  using traits = Kokkos::ViewTraits<LO*, Kokkos::LayoutLeft, void, void>;
+  using array_layout    = typename traits::array_layout;
+  using execution_space = typename traits::execution_space;
+  using device_type     = typename traits::device_type;
+  using memory_traits   = typename traits::memory_traits;
+  using global_size_t = Tpetra::global_size_t;
+
+  typedef Kokkos::View<real_t*, Kokkos::LayoutRight, device_type, memory_traits> values_array;
+  typedef Kokkos::View<GO*, array_layout, device_type, memory_traits> global_indices_array;
+  typedef Kokkos::View<LO*, array_layout, device_type, memory_traits> indices_array;
+  
+  //typedef Kokkos::DualView<real_t**, Kokkos::LayoutLeft, device_type>::t_dev vec_array;
+  typedef MV::dual_view_type::t_dev vec_array;
+  typedef MV::dual_view_type::t_host host_vec_array;
+  typedef Kokkos::View<const real_t**, array_layout, HostSpace, memory_traits> const_host_vec_array;
+  typedef Kokkos::View<const real_t**, array_layout, device_type, memory_traits> const_vec_array;
+  typedef MV::dual_view_type dual_vec_array;
+
+private:
+
+  Explicit_Solver *Explicit_Solver_Pointer_;
+  FEA_Module_SGH *FEM_SGH_;
+  FEA_Module_Dynamic_Elasticity *FEM_Dynamic_Elasticity_;
+  ROL::Ptr<ROL_MV> ROL_Force;
+  ROL::Ptr<ROL_MV> ROL_Velocities;
+  ROL::Ptr<ROL_MV> ROL_Gradients;
+
+  ROL::Ptr<const MV> getVector( const V& x ) {
+    return dynamic_cast<const ROL_MV&>(x).getVector();
+  }
+
+  ROL::Ptr<MV> getVector( V& x ) {
+    return dynamic_cast<ROL_MV&>(x).getVector();
+  }
+
+public:
+  bool nodal_density_flag_;
+  int last_comm_step, last_solve_step, current_step;
+  size_t nvalid_modules;
+  std::vector<FEA_MODULE_TYPE> valid_fea_modules; //modules that may interface with this objective function
+  FEA_MODULE_TYPE set_module_type;
+  int set_node_, set_dim_;
+  //std::string my_fea_module = "SGH";
+
+  AreaNormals(Explicit_Solver *Explicit_Solver_Pointer, int set_node, int set_dim, bool nodal_density_flag) 
+   {
+      Explicit_Solver_Pointer_ = Explicit_Solver_Pointer;
+      
+      valid_fea_modules.push_back(FEA_MODULE_TYPE::SGH);
+      valid_fea_modules.push_back(FEA_MODULE_TYPE::Dynamic_Elasticity);
+      nvalid_modules = valid_fea_modules.size();
+
+      const Simulation_Parameters& simparam = Explicit_Solver_Pointer_->simparam;
+      for (const auto& fea_module : Explicit_Solver_Pointer_->fea_modules) {
+        for(int ivalid = 0; ivalid < nvalid_modules; ivalid++){
+          if(fea_module->Module_Type==FEA_MODULE_TYPE::SGH){
+            FEM_SGH_ = dynamic_cast<FEA_Module_SGH*>(fea_module);
+            set_module_type = FEA_MODULE_TYPE::SGH;
+          }
+          if(fea_module->Module_Type==FEA_MODULE_TYPE::Dynamic_Elasticity){
+            FEM_Dynamic_Elasticity_ = dynamic_cast<FEA_Module_Dynamic_Elasticity*>(fea_module);
+            set_module_type = FEA_MODULE_TYPE::Dynamic_Elasticity;
+          }
+        }
+      }
+      nodal_density_flag_ = nodal_density_flag;
+      last_comm_step = last_solve_step = -1;
+      current_step = 0;
+      set_node_ = set_node;
+      set_dim_ = set_dim;
+
+      //real_t current_kinetic_energy = ROL_Velocities->dot(*ROL_Force)/2;
+      //std::cout.precision(10);
+      //if(FEM_->myrank==0)
+      //std::cout << "INITIAL KINETIC ENERGY " << current_kinetic_energy << std::endl;
+  }
+
+  void update(const ROL::Vector<real_t> &z, ROL::UpdateType type, int iter = -1 ) {
+  }
+
+  real_t value(const ROL::Vector<real_t> &z, real_t &tol) {
+    
+    double area_normal_array[24];
+    size_t nodes_in_elem_array[8];
+    double node_coords_array[2*8*3];
+    ViewCArrayKokkos <size_t> elem_node_gids(nodes_in_elem_array, 8);
+    for(int init = 0; init < 8; init++){
+      elem_node_gids(init) = init;
+    }
+    ViewCArrayKokkos <double> area_normal(area_normal_array, 8, 3);
+    const size_t rk_level = FEM_SGH_->simparam->dynamic_options.rk_num_bins - 1;
+    //std::cout << "Started obj value on task " <<FEM_->myrank  << std::endl;
+    ROL::Ptr<const MV> zp = getVector(z);
+
+    const_host_vec_array design_coordinates = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    DViewCArrayKokkos<double> node_coords(node_coords_array,2,8,3);
+    for(int inode = 0; inode < 8; inode++){
+      size_t local_node_id = FEM_SGH_->nodes_in_elem(0,inode);
+      node_coords(1,inode,0) = design_coordinates(local_node_id,0);
+      node_coords(1,inode,1) = design_coordinates(local_node_id,1);
+      node_coords(1,inode,2) = design_coordinates(local_node_id,2);
+    }
+
+    // std::cout.precision(10);
+    // if(Explicit_Solver_Pointer_->myrank==0)
+    // std::cout << "CURRENT TIME INTEGRAL OF KINETIC ENERGY " << objective_accumulation << std::endl;
+    FEM_SGH_->get_bmatrix(area_normal,
+                      0,
+                      node_coords,
+                      elem_node_gids,
+                      1);
+
+    real_t objective_value = -area_normal(set_node_, set_dim_);
+    //std::cout << "Ended obj value on task " <<FEM_->myrank  << std::endl;
+    return objective_value;
+  }
+
+  //void gradient_1( ROL::Vector<real_t> &g, const ROL::Vector<real_t> &u, const ROL::Vector<real_t> &z, real_t &tol ) {
+    //g.zero();
+  //}
+  
+  void gradient( ROL::Vector<real_t> &g, const ROL::Vector<real_t> &z, real_t &tol ) {
+    //std::cout << "Started obj gradient on task " <<FEM_->myrank  << std::endl;
+    //get Tpetra multivector pointer from the ROL vector
+    double area_normal_gradients_array[8*8*3*3];
+    size_t nodes_in_elem[8];
+    double node_coords_array[2*8*3];
+    ViewCArrayKokkos <size_t> elem_node_gids(nodes_in_elem, 8);
+    for(int init = 0; init < 8; init++){
+      elem_node_gids(init) = init;
+    }
+    
+    ViewCArrayKokkos <double> area_normal_gradients(area_normal_gradients_array, 8, 3, 8, 3);
+    const size_t rk_level = FEM_SGH_->simparam->dynamic_options.rk_num_bins - 1;
+    ROL::Ptr<const MV> zp = getVector(z);
+    ROL::Ptr<MV> gp = getVector(g);
+    const_host_vec_array design_coordinates = zp->getLocalView<HostSpace> (Tpetra::Access::ReadOnly);
+    host_vec_array design_gradients = gp->getLocalView<HostSpace> (Tpetra::Access::ReadWrite);
+    DViewCArrayKokkos<double> node_coords(node_coords_array,2,8,3);
+    for(int inode = 0; inode < 8; inode++){
+      size_t local_node_id = FEM_SGH_->nodes_in_elem(0,inode);
+      node_coords(1,inode,0) = design_coordinates(local_node_id,0);
+      node_coords(1,inode,1) = design_coordinates(local_node_id,1);
+      node_coords(1,inode,2) = design_coordinates(local_node_id,2);
+    }
+
+    for (int igradient = 0; igradient < FEM_SGH_->nlocal_nodes+FEM_SGH_->nghost_nodes; igradient++){
+      design_gradients(igradient,0) = 0;
+      design_gradients(igradient,1) = 0;
+      design_gradients(igradient,2) = 0;
+    }
+
+
+    FEM_SGH_->get_bmatrix_gradients(area_normal_gradients,
+                0,
+                node_coords,
+                elem_node_gids,
+                1);
+
+    for(int inode = 0; inode < 8; inode++){
+      size_t local_node_id = FEM_SGH_->nodes_in_elem(0,inode);
+      for(int idim = 0; idim < 3; idim++)
+        design_gradients(local_node_id,idim) = -area_normal_gradients(set_node_,set_dim_,inode,idim);
+    }
+  }
+  
+};
+
+#endif // end header guard
diff --git a/src/Parallel-Solvers/Simulation_Parameters/FEA_Module/Elasticity_Parameters.h b/src/Parallel-Solvers/Simulation_Parameters/FEA_Module/Elasticity_Parameters.h
index d76d1fc9c..4b9d84105 100644
--- a/src/Parallel-Solvers/Simulation_Parameters/FEA_Module/Elasticity_Parameters.h
+++ b/src/Parallel-Solvers/Simulation_Parameters/FEA_Module/Elasticity_Parameters.h
@@ -8,6 +8,7 @@ struct Elasticity_Parameters
     : virtual ImplicitModule, FEA_Module_Parameters::Register<Elasticity_Parameters, FEA_MODULE_TYPE::Elasticity> {
     bool strain_max_flag    = false;
     bool modal_analysis     = false;
+    bool anisotropic_lattice = false;
 
     Elasticity_Parameters() : FEA_Module_Parameters({
         FIELD::displacement,
@@ -16,5 +17,5 @@ struct Elasticity_Parameters
     }) { }
 };
 IMPL_YAML_SERIALIZABLE_WITH_BASE(Elasticity_Parameters, ImplicitModule,
-    strain_max_flag, modal_analysis
+    strain_max_flag, modal_analysis, anisotropic_lattice
 )
diff --git a/src/Parallel-Solvers/Simulation_Parameters/Material.h b/src/Parallel-Solvers/Simulation_Parameters/Material.h
index 0fe57cddf..3175f6ab7 100644
--- a/src/Parallel-Solvers/Simulation_Parameters/Material.h
+++ b/src/Parallel-Solvers/Simulation_Parameters/Material.h
@@ -14,7 +14,10 @@ struct material_t {
     RUN_LOCATION eos_run_location      = RUN_LOCATION::device; 
 
     double elastic_modulus  = 200000000000;
+    double elastic_moduli[3];
     double poisson_ratio    = 0.3;
+    double poisson_ratios[3];
+    double shear_moduli[3];
     double density          = 7850;
     double initial_temperature  = 293;
     double thermal_conductivity = 10;
@@ -25,6 +28,8 @@ struct material_t {
     bool linear_cell_modulus = false;
     double modulus_density_slope = 200000000000;
     double modulus_initial = 0;
+    double shear_modulus_density_slope = 200000000000;
+    double shear_modulus_initial = 0;
 
     double q1 = 1.0;
     double q2 = 0.0;
@@ -57,5 +62,6 @@ IMPL_YAML_SERIALIZABLE_FOR(Material,
     elastic_modulus, poisson_ratio,
     density, initial_temperature, thermal_conductivity,
     specific_internal_energy_rate, expansion_coefficients,
-    SIMP_modulus, linear_cell_modulus, modulus_density_slope, modulus_initial
+    SIMP_modulus, linear_cell_modulus, modulus_density_slope, modulus_initial,
+    elastic_moduli, poisson_ratios, shear_moduli
 )
diff --git a/src/Parallel-Solvers/Simulation_Parameters/Optimization_Options.h b/src/Parallel-Solvers/Simulation_Parameters/Optimization_Options.h
index 7349209da..a033efcbe 100644
--- a/src/Parallel-Solvers/Simulation_Parameters/Optimization_Options.h
+++ b/src/Parallel-Solvers/Simulation_Parameters/Optimization_Options.h
@@ -18,22 +18,30 @@ SERIALIZABLE_ENUM(TO_MODULE_TYPE,
   Strain_Energy_Minimize,
   Mass_Constraint,
   Moment_of_Inertia_Constraint,
+  Center_of_Mass_Constraint,
   Heat_Capacity_Potential_Constraint,
   MULTI_OBJECTIVE_TERM,
   Thermo_Elastic_Strain_Energy_Minimize,
-  Strain_Energy_Constraint
+  Strain_Energy_Constraint,
+  Displacement_Constraint
 )
 
 
 SERIALIZABLE_ENUM(OPTIMIZATION_PROCESS, none, topology_optimization, shape_optimization)
 SERIALIZABLE_ENUM(OPTIMIZATION_OBJECTIVE, none, minimize_kinetic_energy, multi_objective, minimize_compliance, minimize_thermal_resistance)
-SERIALIZABLE_ENUM(CONSTRAINT_TYPE, mass, moment_of_inertia)
+SERIALIZABLE_ENUM(CONSTRAINT_TYPE, mass, moment_of_inertia, center_of_mass, displacement)
 SERIALIZABLE_ENUM(RELATION, equality)
 SERIALIZABLE_ENUM(DENSITY_FILTER, none, helmholtz_filter)
 SERIALIZABLE_ENUM(MULTI_OBJECTIVE_STRUCTURE, linear)
-SERIALIZABLE_ENUM(CONSTRAINT_COMPONENT, xx, yy, zz, xy, xz, yz)
+SERIALIZABLE_ENUM(CONSTRAINT_COMPONENT, x, y, z, xx, yy, zz, xy, xz, yz)
 inline int component_to_int(CONSTRAINT_COMPONENT c) {
   switch(c) {
+    case CONSTRAINT_COMPONENT::x:
+      return 0;
+    case CONSTRAINT_COMPONENT::y:
+      return 1;
+    case CONSTRAINT_COMPONENT::z:
+      return 2;
     case CONSTRAINT_COMPONENT::xx:
       return 0;
     case CONSTRAINT_COMPONENT::yy:
@@ -56,6 +64,7 @@ struct Optimization_Constraint {
   double value         = 0;
   RELATION relation    = RELATION::equality;
   std::optional<CONSTRAINT_COMPONENT> component;
+  std::string argument_file_name;
 
   // Non-serialized fields
   std::vector<std::optional<double>> inertia_centers;
@@ -65,11 +74,15 @@ struct Optimization_Constraint {
       if (!component.has_value())
         throw Yaml::ConfigurationException("`component` field required for constraint type " + to_string(type));
     }
+    if (type == CONSTRAINT_TYPE::center_of_mass) {
+      if (!component.has_value())
+        throw Yaml::ConfigurationException("`component` field required for constraint type " + to_string(type));
+    }
   }
 };
 YAML_ADD_REQUIRED_FIELDS_FOR(Optimization_Constraint, value, type, relation)
 IMPL_YAML_SERIALIZABLE_FOR(Optimization_Constraint, 
-  value, type, relation, component
+  value, type, relation, component, argument_file_name
 )
 
 struct MultiObjectiveModule {
diff --git a/src/Parallel-Solvers/Simulation_Parameters/Output_Options.h b/src/Parallel-Solvers/Simulation_Parameters/Output_Options.h
index 8c325c0f9..bb444f18a 100644
--- a/src/Parallel-Solvers/Simulation_Parameters/Output_Options.h
+++ b/src/Parallel-Solvers/Simulation_Parameters/Output_Options.h
@@ -14,9 +14,12 @@ struct Output_Options : Yaml::DerivedFields, Yaml::ValidatedYaml {
   size_t max_num_user_output_vars=0;
   bool write_initial = true;
   bool write_final   = true;
+  bool convert_to_vtk = false;
+  bool convert_to_tecplot = false;
 };
 
 IMPL_YAML_SERIALIZABLE_FOR(Output_Options, 
   timer_output_level, output_fields, include_default_output_fields,
-  output_file_format, write_initial, write_final, max_num_user_output_vars
+  output_file_format, write_initial, write_final, max_num_user_output_vars,
+  convert_to_vtk, convert_to_tecplot
 )
diff --git a/src/Parallel-Solvers/Simulation_Parameters/Simulation_Parameters.h b/src/Parallel-Solvers/Simulation_Parameters/Simulation_Parameters.h
index 4c67a888e..fc7b5f6b3 100644
--- a/src/Parallel-Solvers/Simulation_Parameters/Simulation_Parameters.h
+++ b/src/Parallel-Solvers/Simulation_Parameters/Simulation_Parameters.h
@@ -146,6 +146,20 @@ struct Simulation_Parameters
                     {constraint.value, (double)component_to_int(constraint.component.value())}
                 );
                 break;
+            case CONSTRAINT_TYPE::center_of_mass:
+                add_TO_module(
+                    TO_MODULE_TYPE::Center_of_Mass_Constraint, 
+                    f_type, 
+                    {constraint.value, (double)component_to_int(constraint.component.value())}
+                );
+                break;
+            case CONSTRAINT_TYPE::displacement:
+                add_TO_module(
+                    TO_MODULE_TYPE::Displacement_Constraint, 
+                    f_type, 
+                    {constraint.value}
+                );
+                break;
             default:
                 throw Yaml::ConfigurationException("Unsupported constraint type " + to_string(constraint.type));
             }
@@ -276,8 +290,10 @@ struct Simulation_Parameters
             {TO_MODULE_TYPE::Heat_Capacity_Potential_Constraint,    {FEA_MODULE_TYPE::Heat_Conduction}},
             {TO_MODULE_TYPE::Thermo_Elastic_Strain_Energy_Minimize, {FEA_MODULE_TYPE::Heat_Conduction}},
             {TO_MODULE_TYPE::Mass_Constraint,                       {FEA_MODULE_TYPE::Inertial       }},
+            {TO_MODULE_TYPE::Center_of_Mass_Constraint,             {FEA_MODULE_TYPE::Inertial       }},
             {TO_MODULE_TYPE::Moment_of_Inertia_Constraint,          {FEA_MODULE_TYPE::Inertial       }},
             {TO_MODULE_TYPE::Strain_Energy_Minimize,                {FEA_MODULE_TYPE::Elasticity     }},
+            {TO_MODULE_TYPE::Displacement_Constraint,               {FEA_MODULE_TYPE::Elasticity     }},
             {TO_MODULE_TYPE::Strain_Energy_Constraint,              {FEA_MODULE_TYPE::Elasticity     }},
             {TO_MODULE_TYPE::Kinetic_Energy_Minimize,               {FEA_MODULE_TYPE::SGH, FEA_MODULE_TYPE::Dynamic_Elasticity}},
         };
diff --git a/src/Parallel-Solvers/Solver.cpp b/src/Parallel-Solvers/Solver.cpp
index 31bd2ad61..4631e22f8 100644
--- a/src/Parallel-Solvers/Solver.cpp
+++ b/src/Parallel-Solvers/Solver.cpp
@@ -249,6 +249,9 @@ void Solver::read_mesh_ensight(const char *MESH){
   size_t read_index_start, node_rid, elem_gid;
   GO node_gid;
   real_t dof_value;
+  bool zero_index_base = input_options.zero_index_base;
+  int negative_index_found = 0;
+  int global_negative_index_found = 0;
   //Nodes_Per_Element_Type =  elements::elem_types::Nodes_Per_Element_Type;
 
   //read the mesh
@@ -631,11 +634,25 @@ void Solver::read_mesh_ensight(const char *MESH){
         //as we loop through the nodes belonging to this element we store them
         //if any of these nodes belongs to this rank this list is used to store the element locally
         node_gid = atoi(&read_buffer(scan_loop,inode,0));
-        node_store(inode) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(zero_index_base)
+          node_store(inode) = node_gid; //subtract 1 since file index start is 1 but code expects 0
+        else
+          node_store(inode) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(node_store(inode) < 0){
+          negative_index_found = 1;
+        }
         //first we add the elements to a dynamically allocated list
-        if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
-          assign_flag = 1;
-          rnum_elem++;
+        if(zero_index_base){
+          if(map->isNodeGlobalElement(node_gid)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
+        }
+        else{
+          if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
         }
       }
 
@@ -724,21 +741,32 @@ void Solver::read_mesh_ensight(const char *MESH){
   dual_nodes_in_elem.modify_host();
 
   for(int ielem = 0; ielem < rnum_elem; ielem++)
-    for(int inode = 0; inode < elem_words_per_line; inode++)
+    for(int inode = 0; inode < elem_words_per_line; inode++){
       nodes_in_elem(ielem, inode) = element_temp[ielem*elem_words_per_line + inode];
+    }
 
   //view storage for all local elements connected to local nodes on this rank
   //DCArrayKokkos<GO, array_layout, device_type, memory_traits> All_Element_Global_Indices(rnum_elem);
   Kokkos::DualView <GO*, array_layout, device_type, memory_traits> All_Element_Global_Indices("All_Element_Global_Indices",rnum_elem);
   //copy temporary global indices storage to view storage
-  for(int ielem = 0; ielem < rnum_elem; ielem++)
+  for(int ielem = 0; ielem < rnum_elem; ielem++){
     All_Element_Global_Indices.h_view(ielem) = global_indices_temp[ielem];
-
+    if(global_indices_temp[ielem]<0){
+      negative_index_found = 1;
+    }
+  }
+  
+  MPI_Allreduce(&negative_index_found,&global_negative_index_found,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
+  if(global_negative_index_found){
+    if(myrank==0){
+    std::cout << "Node index less than or equal to zero detected; set \"zero_index_base: true\" under \"input_options\" in your yaml file if indices start at 0" << std::endl;
+    }
+    exit_solver(0);
+  }
   //delete temporary element connectivity and index storage
   std::vector<size_t>().swap(element_temp);
   std::vector<size_t>().swap(global_indices_temp);
   
-  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   All_Element_Global_Indices.modify_host();
   All_Element_Global_Indices.sync_device();
 
@@ -753,6 +781,7 @@ void Solver::read_mesh_ensight(const char *MESH){
   }
   */
 
+  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   all_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),All_Element_Global_Indices.d_view,0,comm));
 
   //element type selection (subject to change)
@@ -846,6 +875,8 @@ void Solver::read_mesh_vtk(const char *MESH){
   GO node_gid;
   real_t dof_value;
   bool zero_index_base = input_options.zero_index_base;
+  int negative_index_found = 0;
+  int global_negative_index_found = 0;
   //Nodes_Per_Element_Type =  elements::elem_types::Nodes_Per_Element_Type;
 
   //read the mesh
@@ -953,7 +984,7 @@ void Solver::read_mesh_vtk(const char *MESH){
         //read portions of the line into the substring variable
         line_parse >> substring;
         //debug print
-        std::cout<<" "<< substring <<std::endl;
+        //std::cout<<" "<< substring <<std::endl;
         //assign the substring variable as a word of the read buffer
         strcpy(&read_buffer(buffer_loop,iword,0),substring.c_str());
         }
@@ -1162,6 +1193,9 @@ void Solver::read_mesh_vtk(const char *MESH){
           node_store(inode) = node_gid; //subtract 1 since file index start is 1 but code expects 0
         else
           node_store(inode) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(node_store(inode) < 0){
+          negative_index_found = 1;
+        }
         //first we add the elements to a dynamically allocated list
         if(zero_index_base){
           if(map->isNodeGlobalElement(node_gid)&&!assign_flag){
@@ -1263,21 +1297,33 @@ void Solver::read_mesh_vtk(const char *MESH){
   dual_nodes_in_elem.modify_host();
 
   for(int ielem = 0; ielem < rnum_elem; ielem++)
-    for(int inode = 0; inode < elem_words_per_line; inode++)
+    for(int inode = 0; inode < elem_words_per_line; inode++){
       nodes_in_elem(ielem, inode) = element_temp[ielem*elem_words_per_line + inode];
+    }
 
   //view storage for all local elements connected to local nodes on this rank
   //DCArrayKokkos<GO, array_layout, device_type, memory_traits> All_Element_Global_Indices(rnum_elem);
   Kokkos::DualView <GO*, array_layout, device_type, memory_traits> All_Element_Global_Indices("All_Element_Global_Indices",rnum_elem);
   //copy temporary global indices storage to view storage
-  for(int ielem = 0; ielem < rnum_elem; ielem++)
+  for(int ielem = 0; ielem < rnum_elem; ielem++){
     All_Element_Global_Indices.h_view(ielem) = global_indices_temp[ielem];
+    if(global_indices_temp[ielem]<0){
+      negative_index_found = 1;
+    }
+  }
+  
+  MPI_Allreduce(&negative_index_found,&global_negative_index_found,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
+  if(global_negative_index_found){
+    if(myrank==0){
+    std::cout << "Node index less than or equal to zero detected; set \"zero_index_base: true\" under \"input_options\" in your yaml file if indices start at 0" << std::endl;
+    }
+    exit_solver(0);
+  }
 
   //delete temporary element connectivity and index storage
   std::vector<size_t>().swap(element_temp);
   std::vector<size_t>().swap(global_indices_temp);
   
-  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   All_Element_Global_Indices.modify_host();
   All_Element_Global_Indices.sync_device();
 
@@ -1292,6 +1338,7 @@ void Solver::read_mesh_vtk(const char *MESH){
   }
   */
 
+  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   all_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),All_Element_Global_Indices.d_view,0,comm));
 
   //element type selection (subject to change)
@@ -1384,6 +1431,9 @@ void Solver::read_mesh_tecplot(const char *MESH){
   GO node_gid;
   real_t dof_value;
   host_vec_array node_densities;
+  bool zero_index_base = input_options.zero_index_base;
+  int negative_index_found = 0;
+  int global_negative_index_found = 0;
   //Nodes_Per_Element_Type =  elements::elem_types::Nodes_Per_Element_Type;
 
   //read the mesh
@@ -1663,11 +1713,25 @@ void Solver::read_mesh_tecplot(const char *MESH){
         //as we loop through the nodes belonging to this element we store them
         //if any of these nodes belongs to this rank this list is used to store the element locally
         node_gid = atoi(&read_buffer(scan_loop,inode,0));
-        node_store(inode) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(zero_index_base)
+          node_store(inode) = node_gid; //subtract 1 since file index start is 1 but code expects 0
+        else
+          node_store(inode) = node_gid - 1; //subtract 1 since file index start is 1 but code expects 0
+        if(node_store(inode) < 0){
+          negative_index_found = 1;
+        }
         //first we add the elements to a dynamically allocated list
-        if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
-          assign_flag = 1;
-          rnum_elem++;
+        if(zero_index_base){
+          if(map->isNodeGlobalElement(node_gid)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
+        }
+        else{
+          if(map->isNodeGlobalElement(node_gid-1)&&!assign_flag){
+            assign_flag = 1;
+            rnum_elem++;
+          }
         }
       }
 
@@ -1747,28 +1811,45 @@ void Solver::read_mesh_tecplot(const char *MESH){
     max_nodes_per_element = elem->num_nodes();
   }
 
+  //1 type per mesh for now
+  for(int ielem = 0; ielem < rnum_elem; ielem++)
+    Element_Types(ielem) = mesh_element_type;
+
   dual_nodes_in_elem = dual_elem_conn_array("dual_nodes_in_elem", rnum_elem, max_nodes_per_element);
   host_elem_conn_array nodes_in_elem = dual_nodes_in_elem.view_host();
   dual_nodes_in_elem.modify_host();
 
   for(int ielem = 0; ielem < rnum_elem; ielem++)
-    for(int inode = 0; inode < elem_words_per_line; inode++)
+    for(int inode = 0; inode < elem_words_per_line; inode++){
       nodes_in_elem(ielem, inode) = element_temp[ielem*elem_words_per_line + inode];
+    }
 
   //view storage for all local elements connected to local nodes on this rank
   Kokkos::DualView <GO*, array_layout, device_type, memory_traits> All_Element_Global_Indices("All_Element_Global_Indices",rnum_elem);
   //copy temporary global indices storage to view storage
-  for(int ielem = 0; ielem < rnum_elem; ielem++)
+  for(int ielem = 0; ielem < rnum_elem; ielem++){
     All_Element_Global_Indices.h_view(ielem) = global_indices_temp[ielem];
+    if(global_indices_temp[ielem]<0){
+      negative_index_found = 1;
+    }
+  }
+  
+  MPI_Allreduce(&negative_index_found,&global_negative_index_found,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
+  if(global_negative_index_found){
+    if(myrank==0){
+    std::cout << "Node index less than or equal to zero detected; set \"zero_index_base: true\" under \"input_options\" in your yaml file if indices start at 0" << std::endl;
+    }
+    exit_solver(0);
+  }
 
   //delete temporary element connectivity and index storage
   std::vector<size_t>().swap(element_temp);
   std::vector<size_t>().swap(global_indices_temp);
   
-  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   All_Element_Global_Indices.modify_host();
   All_Element_Global_Indices.sync_device();
 
+  //construct overlapping element map (since different ranks can own the same elements due to the local node map)
   all_element_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),All_Element_Global_Indices.d_view,0,comm));
 
 
@@ -2225,6 +2306,53 @@ void Solver::init_maps(){
   //construct distributed element connectivity multivector
   global_nodes_in_elem_distributed = Teuchos::rcp(new MCONN(all_element_map, dual_nodes_in_elem));
 
+  if(nlocal_elem_non_overlapping >= 1) {
+  //construct map of nodes that belong to the non-overlapping element set (contained by ghost + local node set but not all of them)
+  std::set<GO> nonoverlap_elem_node_set;
+
+    //search through local elements for global node indices not owned by this MPI rank
+    if(num_dim==2)
+    for (int cell_rid = 0; cell_rid < nlocal_elem_non_overlapping; cell_rid++) {
+      //set nodes per element
+      element_select->choose_2Delem_type(Element_Types(cell_rid), elem2D);
+      nodes_per_element = elem2D->num_nodes();  
+      for (int node_lid = 0; node_lid < nodes_per_element; node_lid++){
+        node_gid = nodes_in_elem(cell_rid, node_lid);
+        nonoverlap_elem_node_set.insert(node_gid);
+      }
+    }
+
+    if(num_dim==3)
+    for (int cell_rid = 0; cell_rid < nlocal_elem_non_overlapping; cell_rid++) {
+      //set nodes per element
+      element_select->choose_3Delem_type(Element_Types(cell_rid), elem);
+      nodes_per_element = elem->num_nodes();  
+      for (int node_lid = 0; node_lid < nodes_per_element; node_lid++){
+        node_gid = nodes_in_elem(cell_rid, node_lid);
+        nonoverlap_elem_node_set.insert(node_gid);
+      }
+    }
+
+    //by now the set contains, with no repeats, all the global node indices belonging to the non overlapping element list on this MPI rank
+    //now pass the contents of the set over to a CArrayKokkos, then create a map to find local ghost indices from global ghost indices
+    nnonoverlap_elem_nodes = nonoverlap_elem_node_set.size();
+    nonoverlap_elem_nodes = Kokkos::DualView <GO*, Kokkos::LayoutLeft, device_type, memory_traits>("nonoverlap_elem_nodes", nnonoverlap_elem_nodes);
+    int inonoverlap_elem_node = 0;
+    auto it = nonoverlap_elem_node_set.begin();
+    while(it!=nonoverlap_elem_node_set.end()){
+      nonoverlap_elem_nodes.h_view(inonoverlap_elem_node++) = *it;
+      it++;
+    }
+
+    
+    nonoverlap_elem_nodes.modify_host();
+    nonoverlap_elem_nodes.sync_device();
+    // create a Map for ghost node indices
+    nonoverlap_element_node_map = Teuchos::rcp( new Tpetra::Map<LO,GO,node_type>(Teuchos::OrdinalTraits<GO>::invalid(),nonoverlap_elem_nodes.d_view,0,comm));
+
+  }
+
+
   //debug print
   //std::ostream &out = std::cout;
   //Teuchos::RCP<Teuchos::FancyOStream> fos = Teuchos::fancyOStream(Teuchos::rcpFromRef(out));
diff --git a/src/Parallel-Solvers/Solver.h b/src/Parallel-Solvers/Solver.h
index d78706dfd..7e4088c6a 100644
--- a/src/Parallel-Solvers/Solver.h
+++ b/src/Parallel-Solvers/Solver.h
@@ -108,6 +108,8 @@ class Solver{
   typedef MV::dual_view_type::t_host host_vec_array;
   typedef Kokkos::View<const real_t**, array_layout, HostSpace, memory_traits> const_host_vec_array;
   typedef Kokkos::View<const real_t**, array_layout, device_type, memory_traits> const_vec_array;
+  typedef Kokkos::View<const int**, array_layout, HostSpace, memory_traits> const_host_ivec_array;
+  typedef Kokkos::View<int**, array_layout, HostSpace, memory_traits> host_ivec_array;
   typedef MV::dual_view_type dual_vec_array;
   typedef MCONN::dual_view_type dual_elem_conn_array;
   typedef MCONN::dual_view_type::t_host host_elem_conn_array;
@@ -204,6 +206,10 @@ class Solver{
   Kokkos::DualView <GO*, Kokkos::LayoutLeft, device_type, memory_traits> ghost_nodes;
   Kokkos::DualView <int*, array_layout, device_type, memory_traits> ghost_node_ranks;
 
+  //Node set corresponding to uniquely assigned list of elements on this MPI rank
+  size_t nnonoverlap_elem_nodes;
+  Kokkos::DualView <GO*, Kokkos::LayoutLeft, device_type, memory_traits> nonoverlap_elem_nodes;
+
   //Local FEA data including ghosts
   size_t nall_nodes;
   size_t rnum_elem;
@@ -215,6 +221,7 @@ class Solver{
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > sorted_map; //sorted contiguous map of node indices
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > ghost_node_map; //map of node indices with ghosts on each rank
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > all_node_map; //map of node indices with ghosts on each rank
+  Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > nonoverlap_element_node_map; //map of node indices with ghosts on each rank
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > element_map; //non overlapping map of elements owned by each rank used in reduction ops
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > all_element_map; //overlapping map of elements connected to the local nodes in each rank
   Teuchos::RCP<Tpetra::Map<LO,GO,node_type> > sorted_element_map; //sorted contiguous map of element indices owned by each rank used in parallel IO
diff --git a/src/Yaml-Serializable/README.md b/src/Yaml-Serializable/README.md
index 98559141c..5a215aaf8 100644
--- a/src/Yaml-Serializable/README.md
+++ b/src/Yaml-Serializable/README.md
@@ -264,22 +264,21 @@ IMPL_YAML_SERIALIZABLE_FOR(MyStruct, a, b, c, d)
 
 
 ### Derived Data
-YAML has much less expressive semantics than you might want for your C++ struct. It is often useful to have a post-read hook to put the data into a more useful view for consumers of the struct. For this reason, there is a `Yaml::DerivedFields` class to inherit from and a `void derive()` hook that you can implement. This function will get run after the data is loaded, but before the custom `void validate()` hook is run. The following example uses the `void derive()` hook to initialize difficult to unserializable members of a base class[^2].
+YAML has much less expressive semantics than you might want for your C++ struct. It is often useful to have a post-read hook to put the data into a more useful view for consumers of the struct. For this reason, there is a `Yaml::DerivedFields` class to inherit from and a `void derive()` hook that you can implement. This function will get run after the data is loaded, but before the custom `void validate()` hook is run. The following example uses the `void derive()` hook to initialize difficult to unserializable members of a base class.
 
 ```c++
 
 struct my_struct {
-    double array[5];
+    double* array;
 };
 
 struct MyStruct : my_struct, Yaml::DerivedFields {
     std::vector<double> array;
 
     void derive() {
-        if (array.size() != 5)
-            throw Yaml::ConfigurationException("Expected list of 5 elements", "array");
+        my_struct::array = new double[array.size()];
 
-        for (size_t i = 0; i < 5; i++)
+        for (size_t i = 0; i < array.size(); i++)
             my_struct::array[i] = array[i];
     }
 };
@@ -385,5 +384,4 @@ The arguments to `apply` and `try_apply` can be any unary function (or unary coe
 The difference between `apply` and `try_apply` is that `apply` will throw a runtime exception if the type is unhandled, while `try_apply` will simply return false.
 
 
-[^1]: There is a slight addition to the YAML 1.0 spec to include simple lists with bracket syntax: "[1, 2, 3, ... ]".
-[^2]: Fixed size arrays (`std::array<T, N>` or `T[N]`) and raw pointers (`T*`) are not currently supported as serialization targets.
\ No newline at end of file
+[^1]: There is a slight addition to the YAML 1.0 spec to include simple lists with bracket syntax: "[1, 2, 3, ... ]".
\ No newline at end of file
diff --git a/src/Yaml-Serializable/include/yaml-serializable-base.h b/src/Yaml-Serializable/include/yaml-serializable-base.h
index 8ab76cb37..d9d1492a4 100644
--- a/src/Yaml-Serializable/include/yaml-serializable-base.h
+++ b/src/Yaml-Serializable/include/yaml-serializable-base.h
@@ -41,10 +41,15 @@
 #define YAML_BASE_H
 
 #include "Yaml.hpp"
+#include <array>
 
 namespace Yaml {
     template<typename T> void serialize(const T& v, Yaml::Node& node);
     template<typename T> void deserialize(T& v, Yaml::Node& node, bool raw=false);
+    template<typename T, size_t N> void serialize(const T(&v)[N], Yaml::Node& node);
+    template<typename T, size_t N> void deserialize(T(&v)[N], Yaml::Node& node, bool raw=false);
+    template<typename T, size_t N> void serialize(const std::array<T, N>& v, Yaml::Node& node);
+    template<typename T, size_t N> void deserialize(std::array<T, N>& v, Yaml::Node& node, bool raw=false);
     
     template<>
     inline void deserialize<bool>(bool& v, Yaml::Node& node, bool raw) {
diff --git a/src/Yaml-Serializable/include/yaml-serializable-containers.h b/src/Yaml-Serializable/include/yaml-serializable-containers.h
index 4b074bfbe..3f0cd3f61 100644
--- a/src/Yaml-Serializable/include/yaml-serializable-containers.h
+++ b/src/Yaml-Serializable/include/yaml-serializable-containers.h
@@ -47,6 +47,7 @@
 #include <vector>
 #include <sstream>
 #include <optional>
+#include <regex>
 
 namespace Yaml {
     namespace {
@@ -89,20 +90,47 @@ namespace Yaml {
             return result;
         }
 
-        template<typename T, typename F>
-        void deserialize_to_iterable(Node& node, bool raw, F inserter) {
+        /**
+         * Checks if this string looks like a Yaml list or not.
+        */
+        bool looks_like_a_list(std::string s) {
+            return std::regex_match(s, std::regex("\\[[^\\[\\]\n]*\\]"));
+        }
+        
+        /**
+         * This implementation of Yaml serialization doesn't have support for the bracket notation: [1, 2, 3, ...].
+         * So we have a little wrapper here for simple bracketing. 
+         * 
+         * If we are trying to parse the node into a sequence, we will use this function 
+         * to try to conver [...] style strings to Yaml::Sequence nodes.
+        */
+        Node to_sequence_node(Node& node) {
             Yaml::Node sequence_node;
             if (node.IsScalar()) {
+                auto s = node.As<std::string>();
+                if (!looks_like_a_list(s))
+                    throw Yaml::ConfigurationException("Failed to parse string: `" + s + "`. Expected a list.");
                 for (auto token : tokenizer(node.As<std::string>(), ','))
                     sequence_node.PushBack() = token;
             }
             if (node.IsSequence()) {
                 sequence_node = node;
             }
+            
+            return sequence_node;
+        }
 
-            for(size_t i = 0; i < sequence_node.Size(); i++) {
+        /**
+         * Deserializes the node into an iterable object with variable inserter F.
+         * `inserter` is called once for each element.
+         * 
+         * Treats deserialization errors in the i'th item as being under a node named "<i>".
+        */
+        template<typename T, typename F>
+        void deserialize_to_iterable(Node& node, bool raw, F inserter) {
+            for(size_t i = 0; i < node.Size(); i++) {
                 T item;
-                deserialize_from_indexed_item(item, sequence_node, raw, i);
+                deserialize_from_indexed_item(item, node, raw, i);
                 inserter(item);
             }
         }
@@ -133,7 +161,8 @@ namespace Yaml {
             static void deserialize(std::vector<T>& v, Yaml::Node& node, bool raw) {
                 if (node.IsNone()) return;
                 v.clear();
-                deserialize_to_iterable<T>(node, raw, [&](const T& item) {v.push_back(item);});
+                auto sequence_node = to_sequence_node(node);
+                deserialize_to_iterable<T>(sequence_node, raw, [&](const T& item) {v.push_back(item);});
             }
             static void serialize(const std::vector<T>& v, Yaml::Node& node) {
                 set_node_to_empty_sequence(node);
@@ -176,7 +205,8 @@ namespace Yaml {
             static void deserialize(std::set<T>& v, Yaml::Node& node, bool raw) {
                 if (node.IsNone()) return;
                 v.clear();
-                deserialize_to_iterable<T>(node, raw, [&](const T& item) {v.insert(item);});
+                auto sequence_node = to_sequence_node(node);
+                deserialize_to_iterable<T>(sequence_node, raw, [&](const T& item) {v.insert(item);});
             }
             static void serialize(const std::set<T>& v, Yaml::Node& node) {
                 set_node_to_empty_sequence(node);
@@ -237,6 +267,37 @@ namespace Yaml {
                 }
             }
         };
+
+
+        /**
+         * Separate special handling for T[N] and std::array<T, N>. 
+         * Exactly what is accepted here is implemented with different templates in yaml-serializable-base.h and 
+         * implemented in yaml-serailizable.h
+        */
+        template<typename T, size_t N>
+        struct FixedLengthImpl {
+            static void validate_length(Node& node) {
+                if (node.Size() != N)
+                    throw Yaml::ConfigurationException("Expected a list of length: " + std::to_string(N) 
+                                                     + ". Found list of length: " + std::to_string(node.Size()));
+            }
+
+            template<typename K>
+            static void deserialize(K& v, Yaml::Node& node, bool raw) {
+                if (node.IsNone()) return;
+                auto sequence_node = to_sequence_node(node);
+                validate_length(sequence_node);
+                size_t i = 0;
+                deserialize_to_iterable<T>(sequence_node, raw, [&](const T& item) { v[i++] = item; });
+            }
+
+            template<typename K>
+            static void serialize(const K& v, Yaml::Node& node) {
+                set_node_to_empty_sequence(node);
+                for (size_t i = 0; i < N; i++)
+                    Yaml::serialize(v[i], node.PushBack());
+            }
+        };
     }
 }
 
diff --git a/src/Yaml-Serializable/include/yaml-serializable.h b/src/Yaml-Serializable/include/yaml-serializable.h
index 3402500d4..ae8cb36a7 100644
--- a/src/Yaml-Serializable/include/yaml-serializable.h
+++ b/src/Yaml-Serializable/include/yaml-serializable.h
@@ -101,6 +101,25 @@ namespace Yaml {
         Containers::Impl<T>::deserialize(v, node, raw);
     }
 
+    template<typename T, size_t N>
+    void serialize(const T(&v)[N], Yaml::Node& node) {
+        Containers::FixedLengthImpl<T, N>::serialize(v, node);
+    }
+
+    template<typename T, size_t N>
+    void deserialize(T(&v)[N], Yaml::Node& node, bool raw) {
+        Containers::FixedLengthImpl<T, N>::deserialize(v, node, raw);
+    }
+
+    template<typename T, size_t N>
+    void serialize(const std::array<T, N>& v, Yaml::Node& node) {
+        Containers::FixedLengthImpl<T, N>::serialize(v, node);
+    }
+
+    template<typename T, size_t N>
+    void deserialize(std::array<T, N>& v, Yaml::Node& node, bool raw) {
+        Containers::FixedLengthImpl<T, N>::deserialize(v, node, raw);
+    }
 
     /**
      * Implements direct string serialization for Yaml Serializable objects.
diff --git a/src/Yaml-Serializable/test/test.cpp b/src/Yaml-Serializable/test/test.cpp
index 93eb848fb..c2366cfa4 100644
--- a/src/Yaml-Serializable/test/test.cpp
+++ b/src/Yaml-Serializable/test/test.cpp
@@ -5,6 +5,7 @@
 #include <cmath>
 #include <memory>
 #include <optional>
+#include <array>
 
 template<typename T>
 bool compare_vec(std::vector<T> a, std::vector<T> b) {
@@ -632,4 +633,101 @@ TEST(YamlSerializable, TypeDiscriminatedTryApplyNoThrow) {
 
 TEST(YamlSerializable, StaticTypeName) {
     EXPECT_STREQ(std::string(Yaml::static_type_name<DerivedA>()).c_str(), "DerivedA");
+}
+
+TEST(YamlSerializable, ArrayReading) {
+    std::string input = R"(
+        - 1
+        - 2
+        - 3
+    )";
+
+    int v[3];
+    Yaml::from_string(input, v);
+
+    for (size_t i = 0; i < 3; i++)
+        EXPECT_EQ(i+1, v[i]);
+}
+
+TEST(YamlSerializable, ArrayReadWrite) {
+    int a[3];
+    std::string str = Yaml::to_string(a);
+
+    int b[3];
+    Yaml::from_string(str, b);
+
+    for (size_t i = 0; i < 3; i++)
+        EXPECT_EQ(a[i], b[i]);
+}
+
+TEST(YamlSerializable, StdArrayReading) {
+    std::string input = R"(
+        - 1
+        - 2
+        - 3
+    )";
+
+    std::array<int, 3> v;
+    Yaml::from_string(input, v);
+
+    for (size_t i = 0; i < 3; i++)
+        EXPECT_EQ(i+1, v[i]);
+}
+
+TEST(YamlSerializable, StdArrayReadWrite) {
+    std::array<int, 3> a;
+    std::string str = Yaml::to_string(a);
+
+    std::array<int, 3> b;
+    Yaml::from_string(str, b);
+
+    for (size_t i = 0; i < 3; i++)
+        EXPECT_EQ(a[i], b[i]);
+}
+
+
+TEST(YamlSerializable, ArrayReadingInvalidLength) {
+    std::string input = R"(
+        - 1
+        - 2
+    )";
+
+    int v[3];
+
+    EXPECT_THROW({
+        Yaml::from_string(input, v);
+    }, Yaml::ConfigurationException);
+}
+
+TEST(YamlSerializable, StdArrayReadingInvalidLength) {
+    std::string input = R"(
+        - 1
+        - 2
+    )";
+
+    std::array<int, 3> v;
+
+    EXPECT_THROW({
+        Yaml::from_string(input, v);
+    }, Yaml::ConfigurationException);
+}
+
+
+TEST(YamlSerializable, InvalidScalarList) {
+    std::string input = "1 2 3";
+
+    int v[3];
+    EXPECT_THROW({
+        Yaml::from_string(input, v);
+    }, Yaml::ConfigurationException);
+}
+
+TEST(YamlSerializable, ValidScalarList) {
+    std::string input = "[1, 2, 3]";
+
+    int v[3];
+    Yaml::from_string(input, v);
+
+    for (size_t i = 0; i < 3; i++)
+        EXPECT_EQ(i+1, v[i]);
 }
\ No newline at end of file