diff --git a/README.md b/README.md index 110697c..e90e948 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,255 @@ CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Eric Micallef + * https://www.linkedin.com/in/eric-micallef-99291714b/ + +* Tested on: Windows 10, i5, Nvidia GTX1660 (Personal) + +## Intro to path tracing + +Path tracing is a computer graphics method of three dimensional rendering images. + +Path tracing simulates many effects, such as soft shadows, depth of field, motion blur, sub-surface scattering, textures and indirect lighting. + +In order to get high quality images from path tracing, a very large number of rays must be traced to avoid visible noisy artifacts. Because each ray is data dependent of the other rays this makes path tracing a great fit for a GPU. + +Below is a typical render of a scene showing a reflective and diffuse spheres as well as shadows and lighting. + +![](img/nice_render.PNG) + +## Performance Analysis + +![](img/dof_none.PNG) + +The below charts are gathered from the scene shown above. In this scene we have 4 small spheres with some reflection but are mostly diffuse. + +![](img/1_loop.png) + +The above chart shows the speed up or slow down associated with enabling each algorithm/technique. We see that in general, material sorting adds too much overhead to see any benefits that may be associated. We see caching helps improve performance as we get to make one less call to our function "compute intersections" and we see that stream compaction helps improve performance by quite a bit because we get to remove some dead rays from the scene and do less work each depth calculation. The rest of the features are essentially free as they are not adding many instructions or complexity to the system. + +![](img/1_loop_r.png) + +![](img/raw_1_loop.png) + + +The above chart shows the speed up or slow down associated with enabling each algorithm/technique for release mode. We see a slightly different story with this as caching, stream compaction and material sorting all add overhead to the system. This to me was interesting. I am not sure how some of the code changes to tell this part of the picture. + +more information below for each algorithm in the algorithm analysis. + +## Algorith Analysis + +### Material Sorting + +![](img/Material_Sorting.png) + +![](img/Material_Sorting_r.png) + +The idea behind material sorting is that we sort our rays by material. All rays with Material ID 1 are contiguous with each other in memory. This can help to reduce thread divergence. If all threads in a warp are computing based off a certain material we would expect that they would mostly behave the same way. For example, all threads with a specular material are all going to reflect. + +In my runs there was not any speedup seen from this. I tried several other runs with more materials and objects in the scene to see if my scene was not complex enough but again saw no real advantage. + +I suspect if I added texture mapping or more complex materials the algorithms and branch divergence would be more severe and this is when I would see a benefit. + +The chart above shows us how long it takes to actually sort our material. So, each run in debug mode we add about 100ms to our computation time but do not see a benefit of greater than 100ms. So, our material sorting overhead does not outweigh any beneifts that we may see from creating contiguous material memory and reducing thread divergence. This is seen in both debug and release mode. + +### First Bounce Caching + +![](img/Caching.png) + +![](img/caching_r.png) + +The idea behind first bounce caching is to reduce repetitive computations. Our compute intersections call takes around 250ms and for larger more complex scenes could take longer. Caching allows us to skip the first one. + +Every sample, rays are generated from the camera and enter the scene. for the first sample the rays will go to the same spot. Instead of computing this every time we can cache it on our cpu and just reuse it for the next iterations. thereby we get a free intersections computation. + +As we allow more bounces or depth to the scene the benefit of this technique will diminish. + +The chart above shows us that with a depth of 1 we do not spend any time computing intersections since we just reuse. The time to copy from this information from CPU to GPU was about .25ms compared to computing the intersection which costed us around 250ms. + +In release mode we see no benefit from caching on the cpu side. This can mean that the time spent copying data from the cpu to the gpu is equivalent to the time it takes to execute the function "compute_intersections" + + +### Stream Compaction + +![](img/Stream_Compaction.png) + +![](img/Stream_Compaction_r.png) + +The idea behind stream compaction is to remove dead rays ( rays that are no longer bouncing ) from the scene. Dead rays are either rays that have hit a light object, have exited the scene or make no contribution to the scene. + +In our naive path tracer our kernel launches a thread per ray. After the first bounce some rays exit and after more bounces more rays will exit. Yet we will still launch a thread for this ray. + +To remedy this we use stream compaction. After every bounce we can remove the rays that are no longer bouncing. Thus, launching less kernels, thus less work to be done. + +This technique sees a significant speedup in performance for debug mode but saw too much overhead being created for release mode. + +The above chart shows us the time spent doing our sorting we can see almost an exponential decay in time spent as depth increases. + +We see this expoenetial decay in both release and debug mode. + +### Anti-aliasing + +code can be found in pathtrace.cu in "generate ray from camera". + +Anti-aliasing is best done on the gpu because every ray from the camera must be slightly manipulated. Therefore since we have many rays and they are all data dependent it makes sense to compute this on a gpu. + +The idea behind anti aliasing is to add some jitter or randomness to the camera rays. Instead of having the rays shoot out the same direction every time, we add a slight random offset. This has the input rays start in a different possibly more interesting directions to accumlate color. What we see with the anti aliasing is edges become a bit smoother. This technique is essentially free because all we are doing is adding some randomness to the first rays direction. + +Unfortunately if we use anti aliasing we can not use the First Bounce Cache technique. This is because we are manipulating our first rays just a bit. So the first computation will not always be the same. + +We could possibly use both techniques if we made the "random-ness" of anti aliasing more deterministic. For example, we adjust the camera jitter based off of our sample number. IF we did this though we would have to have cache more elements meaning more memory useage. + + +![](img/combo_alias.jpg) + +The left image sphere is a render with anti aliasing. You will notice that the edges are much smoother. + +On the right is a render with no aliasing. You will notice the edges are more abrupt and jagged. + +Both of these images are at the same sample point. + +Below is the full image render even from afar the jagged edges are noticeable! ( if your monitor is big enough that is ...) + + +![](img/cornell_no_alias.png) + +No Alias + +![](img/cornell_alias.png) + +Anti Aliasing + +![](img/zoom_nice_no_aa.png) + +No Alias + +![](img/zoom_nice_aa.PNG) + +Anti Aliasing + +## depth of field + +code for implementation can be found in pathtrace.cu function "ConcentricSampleDisk". + +Depth of field is best done on the gpu because every ray from the camera must be slightly manipulated based on our lens and focal point. Therefore, since we have many rays and they are all data dependent it makes sense to compute this on a gpu. + +From our chart we see depth of field did not add much, if any overhead to the system. +Similar to anti-aliasing this technique is done once per iteration and makes adjustments to the input rays direction and origin. the compute time for this is pretty minimal there by it is masked away by the heavier functions. + +Similar to anti aliasing we can not employ our caching technique speed up when doing depth of field. This is so because we are manipulating the first ray differently each iteration to get that blur effect. + + +Below is a render of what you would see if you have perfect vision (no depth of field) + +![](img/no_blur.PNG) + +Below is what you would see if you had decent eye sight or over the age of 60 + +![](img/blur.PNG) + +And finally, this is what I see without my contacts + +![](img/full_blur.png) + +## Motion Blur + +code for implementation can be found in pathtrace.cu, function "add_blur" + +Similar to depth of field and anti aliasing motion blur is pretty much free. The drawbck is that again, we can not use the caching technique. This is so because we moving the object position based off its speed every iteration. + +This algorithm has us update the position by a little bit every iteration. So, we just need to loop through however many objects are in the scene and update it based on the speed of the object. Unless there are 1000's of objects in a scene this can and should be done easily and quickly on the CPU. + +Below is an image with alot of smaller spheres flying around the scene. Simulating something like electrons around a nucleus. + + +![](img/motion_blur_sun.PNG) + + +## Refraction ( Extra ? ) + +code for implementation can be found in interactions.h, function "compute_refraction" + +refraction is best done on the gpu because every ray is data dependent and the amount of rays to compute per picture is large. + +From our results above we see that refracion addded a bit of overhead but not too much. This is expected as adding refraction adds a bit more branch divergence. But the time spent computing refraction is not significantly more than computing a reflective or diffuse object so we only see a slight slow down. + +Below is a scene with objects that have some refractivity and a slight bit of reflectivity. We see the sphere in the back refract blue light onto the white wall. We see the yellow sphere obscured from the refractive sphere in front and on the right we see a neat looking cube where we actually see a small reflection inside due to the phenomena of total internal reflection. + + +![](img/good_refraction.PNG) + + +Below is a render with my attempt to simulate refractive water. To do this I just added a thin blue refractive material. It is not the prettiest render but it shows use of fresnels number when combining refraction and reflection. + + +![](img/water_refraction.PNG) + + +From the image above we can see the distortion on the walls and slight distortion of the tower. + +In real life, water and other refractive materials have some reflection to them. When you look into a lake if it is calm enough you can see a reflection. This is where Fresnels Number can create a more realistic render. + +In the below image we add some reflective and refractive properties to the "water" and use fresnels number to calculate how much reflectivity this ray has. We can see how the middle area has a little glean to it. and the "water" is a different color depending on where it is in the frame. If we could add textures this could become even more realistic! + +![](img/water_refraction_p2.PNG) + + +### Importance Sampling ( Extra ) + +code for implementation can be found in interactions.h wrapped in #ifdef + +Impportance sampling is best done on the gpu because we are adding weight to our ray during diffusion,reflection or refraction. Since we have thousands of rays per image all data dependent the computation is best done on the gpu side. + +From our chart we can see that importance sampling does not add really any overhead to the system ( about 20ms added ). That is because it is just a few extra multiplies and divides per thread in the shader to get a weight. + +The theory behind importance sampling is that it will help the image converge much quicker. I did not notice a noticeable difference of when the image converges while using importance sampling vs not using importance sampling. At iterations 50, 250, 550 the image roughly looked the same. No worse, no better. + +During a regular render we would accumulate color per depth. With importance sampling we want to accumulate color per depth as well but also give some weight to the color of our ray bounces based off of where they hit. + +For example, when we have a diffuse bounce we can bounce anywhere in our cosine weighted hemisphere but some spots in this hemisphere shine brighter than others. With importance sampling we can give more weight to these and less weight to areas that are not as interesting. This helps the image converge quicker. + +### Subsurface scattering ( Extra ) + +Currently a work in progress. + + +### References + +Physically Based Rendering second and third edition + - Used this books algorithms to employ Depth of field + - Used book for refraction + - Used book for fresnels + - Used book for importance sampling + +https://www.scratchapixel.com/lessons/3d-basic-rendering/introduction-to-shading/reflection-refraction-fresnel + - in help with refraction + +https://en.wikipedia.org/wiki/Snell%27s_law + - more help with refraction and fresnels + +https://computergraphics.stackexchange.com/questions/2482/choosing-reflection-or-refraction-in-path-tracing + - for help with fresnels + +https://github.com/ssloy/tinyraytracer/wiki/Part-1:-understandable-raytracing + - in general helped with the basics of fresnels, reflection, refraction + +Ziad and Hannah, Our TA's for help with motion blur + +https://learning.oreilly.com/library/view/physically-based-rendering/9780128007099/B9780128006450500130_2.xhtml + - Help with importance sampling + +https://computergraphics.stackexchange.com/questions/4979/what-is-importance-sampling + - Help with importance sampling + +https://www.scratchapixel.com/lessons/3d-basic-rendering/global-illumination-path-tracing/global-illumination-path-tracing-practical-implementation + - Help with importance sampling + + +## Cmakelist + +modified defualt for use with tinygfl but did not get it working so regular Cmake file can be used. + -### (TODO: Your README) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. diff --git a/external/include/json.hpp b/external/include/json.hpp new file mode 100644 index 0000000..c9af0be --- /dev/null +++ b/external/include/json.hpp @@ -0,0 +1,20406 @@ +/* + __ _____ _____ _____ + __| | __| | | | JSON for Modern C++ +| | |__ | | | | | | version 3.5.0 +|_____|_____|_____|_|___| https://github.com/nlohmann/json + +Licensed under the MIT License . +SPDX-License-Identifier: MIT +Copyright (c) 2013-2018 Niels Lohmann . + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +*/ + +#ifndef NLOHMANN_JSON_HPP +#define NLOHMANN_JSON_HPP + +#define NLOHMANN_JSON_VERSION_MAJOR 3 +#define NLOHMANN_JSON_VERSION_MINOR 5 +#define NLOHMANN_JSON_VERSION_PATCH 0 + +#include // all_of, find, for_each +#include // assert +#include // and, not, or +#include // nullptr_t, ptrdiff_t, size_t +#include // hash, less +#include // initializer_list +#include // istream, ostream +#include // random_access_iterator_tag +#include // accumulate +#include // string, stoi, to_string +#include // declval, forward, move, pair, swap + +// #include +#ifndef NLOHMANN_JSON_FWD_HPP +#define NLOHMANN_JSON_FWD_HPP + +#include // int64_t, uint64_t +#include // map +#include // allocator +#include // string +#include // vector + +/*! +@brief namespace for Niels Lohmann +@see https://github.com/nlohmann +@since version 1.0.0 +*/ +namespace nlohmann +{ +/*! +@brief default JSONSerializer template argument + +This serializer ignores the template arguments and uses ADL +([argument-dependent lookup](https://en.cppreference.com/w/cpp/language/adl)) +for serialization. +*/ +template +struct adl_serializer; + +template class ObjectType = + std::map, + template class ArrayType = std::vector, + class StringType = std::string, class BooleanType = bool, + class NumberIntegerType = std::int64_t, + class NumberUnsignedType = std::uint64_t, + class NumberFloatType = double, + template class AllocatorType = std::allocator, + template class JSONSerializer = + adl_serializer> +class basic_json; + +/*! +@brief JSON Pointer + +A JSON pointer defines a string syntax for identifying a specific value +within a JSON document. It can be used with functions `at` and +`operator[]`. Furthermore, JSON pointers are the base for JSON patches. + +@sa [RFC 6901](https://tools.ietf.org/html/rfc6901) + +@since version 2.0.0 +*/ +template +class json_pointer; + +/*! +@brief default JSON class + +This type is the default specialization of the @ref basic_json class which +uses the standard template types. + +@since version 1.0.0 +*/ +using json = basic_json<>; +} // namespace nlohmann + +#endif + +// #include + + +// This file contains all internal macro definitions +// You MUST include macro_unscope.hpp at the end of json.hpp to undef all of them + +// exclude unsupported compilers +#if !defined(JSON_SKIP_UNSUPPORTED_COMPILER_CHECK) + #if defined(__clang__) + #if (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) < 30400 + #error "unsupported Clang version - see https://github.com/nlohmann/json#supported-compilers" + #endif + #elif defined(__GNUC__) && !(defined(__ICC) || defined(__INTEL_COMPILER)) + #if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) < 40800 + #error "unsupported GCC version - see https://github.com/nlohmann/json#supported-compilers" + #endif + #endif +#endif + +// disable float-equal warnings on GCC/clang +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #pragma GCC diagnostic push + #pragma GCC diagnostic ignored "-Wfloat-equal" +#endif + +// disable documentation warnings on clang +#if defined(__clang__) + #pragma GCC diagnostic push + #pragma GCC diagnostic ignored "-Wdocumentation" +#endif + +// allow for portable deprecation warnings +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #define JSON_DEPRECATED __attribute__((deprecated)) +#elif defined(_MSC_VER) + #define JSON_DEPRECATED __declspec(deprecated) +#else + #define JSON_DEPRECATED +#endif + +// allow to disable exceptions +#if (defined(__cpp_exceptions) || defined(__EXCEPTIONS) || defined(_CPPUNWIND)) && !defined(JSON_NOEXCEPTION) + #define JSON_THROW(exception) throw exception + #define JSON_TRY try + #define JSON_CATCH(exception) catch(exception) + #define JSON_INTERNAL_CATCH(exception) catch(exception) +#else + #define JSON_THROW(exception) std::abort() + #define JSON_TRY if(true) + #define JSON_CATCH(exception) if(false) + #define JSON_INTERNAL_CATCH(exception) if(false) +#endif + +// override exception macros +#if defined(JSON_THROW_USER) + #undef JSON_THROW + #define JSON_THROW JSON_THROW_USER +#endif +#if defined(JSON_TRY_USER) + #undef JSON_TRY + #define JSON_TRY JSON_TRY_USER +#endif +#if defined(JSON_CATCH_USER) + #undef JSON_CATCH + #define JSON_CATCH JSON_CATCH_USER + #undef JSON_INTERNAL_CATCH + #define JSON_INTERNAL_CATCH JSON_CATCH_USER +#endif +#if defined(JSON_INTERNAL_CATCH_USER) + #undef JSON_INTERNAL_CATCH + #define JSON_INTERNAL_CATCH JSON_INTERNAL_CATCH_USER +#endif + +// manual branch prediction +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #define JSON_LIKELY(x) __builtin_expect(!!(x), 1) + #define JSON_UNLIKELY(x) __builtin_expect(!!(x), 0) +#else + #define JSON_LIKELY(x) x + #define JSON_UNLIKELY(x) x +#endif + +// C++ language standard detection +#if (defined(__cplusplus) && __cplusplus >= 201703L) || (defined(_HAS_CXX17) && _HAS_CXX17 == 1) // fix for issue #464 + #define JSON_HAS_CPP_17 + #define JSON_HAS_CPP_14 +#elif (defined(__cplusplus) && __cplusplus >= 201402L) || (defined(_HAS_CXX14) && _HAS_CXX14 == 1) + #define JSON_HAS_CPP_14 +#endif + +/*! +@brief macro to briefly define a mapping between an enum and JSON +@def NLOHMANN_JSON_SERIALIZE_ENUM +@since version 3.4.0 +*/ +#define NLOHMANN_JSON_SERIALIZE_ENUM(ENUM_TYPE, ...) \ + template \ + inline void to_json(BasicJsonType& j, const ENUM_TYPE& e) \ + { \ + static_assert(std::is_enum::value, #ENUM_TYPE " must be an enum!"); \ + static const std::pair m[] = __VA_ARGS__; \ + auto it = std::find_if(std::begin(m), std::end(m), \ + [e](const std::pair& ej_pair) -> bool \ + { \ + return ej_pair.first == e; \ + }); \ + j = ((it != std::end(m)) ? it : std::begin(m))->second; \ + } \ + template \ + inline void from_json(const BasicJsonType& j, ENUM_TYPE& e) \ + { \ + static_assert(std::is_enum::value, #ENUM_TYPE " must be an enum!"); \ + static const std::pair m[] = __VA_ARGS__; \ + auto it = std::find_if(std::begin(m), std::end(m), \ + [j](const std::pair& ej_pair) -> bool \ + { \ + return ej_pair.second == j; \ + }); \ + e = ((it != std::end(m)) ? it : std::begin(m))->first; \ + } + +// Ugly macros to avoid uglier copy-paste when specializing basic_json. They +// may be removed in the future once the class is split. + +#define NLOHMANN_BASIC_JSON_TPL_DECLARATION \ + template class ObjectType, \ + template class ArrayType, \ + class StringType, class BooleanType, class NumberIntegerType, \ + class NumberUnsignedType, class NumberFloatType, \ + template class AllocatorType, \ + template class JSONSerializer> + +#define NLOHMANN_BASIC_JSON_TPL \ + basic_json + +// #include + + +#include // not +#include // size_t +#include // conditional, enable_if, false_type, integral_constant, is_constructible, is_integral, is_same, remove_cv, remove_reference, true_type + +namespace nlohmann +{ +namespace detail +{ +// alias templates to reduce boilerplate +template +using enable_if_t = typename std::enable_if::type; + +template +using uncvref_t = typename std::remove_cv::type>::type; + +// implementation of C++14 index_sequence and affiliates +// source: https://stackoverflow.com/a/32223343 +template +struct index_sequence +{ + using type = index_sequence; + using value_type = std::size_t; + static constexpr std::size_t size() noexcept + { + return sizeof...(Ints); + } +}; + +template +struct merge_and_renumber; + +template +struct merge_and_renumber, index_sequence> + : index_sequence < I1..., (sizeof...(I1) + I2)... > {}; + +template +struct make_index_sequence + : merge_and_renumber < typename make_index_sequence < N / 2 >::type, + typename make_index_sequence < N - N / 2 >::type > {}; + +template<> struct make_index_sequence<0> : index_sequence<> {}; +template<> struct make_index_sequence<1> : index_sequence<0> {}; + +template +using index_sequence_for = make_index_sequence; + +// dispatch utility (taken from ranges-v3) +template struct priority_tag : priority_tag < N - 1 > {}; +template<> struct priority_tag<0> {}; + +// taken from ranges-v3 +template +struct static_const +{ + static constexpr T value{}; +}; + +template +constexpr T static_const::value; +} // namespace detail +} // namespace nlohmann + +// #include + + +#include // not +#include // numeric_limits +#include // false_type, is_constructible, is_integral, is_same, true_type +#include // declval + +// #include + +// #include + + +#include // random_access_iterator_tag + +// #include + + +namespace nlohmann +{ +namespace detail +{ +template struct make_void +{ + using type = void; +}; +template using void_t = typename make_void::type; +} // namespace detail +} // namespace nlohmann + +// #include + + +namespace nlohmann +{ +namespace detail +{ +template +struct iterator_types {}; + +template +struct iterator_types < + It, + void_t> +{ + using difference_type = typename It::difference_type; + using value_type = typename It::value_type; + using pointer = typename It::pointer; + using reference = typename It::reference; + using iterator_category = typename It::iterator_category; +}; + +// This is required as some compilers implement std::iterator_traits in a way that +// doesn't work with SFINAE. See https://github.com/nlohmann/json/issues/1341. +template +struct iterator_traits +{ +}; + +template +struct iterator_traits < T, enable_if_t < !std::is_pointer::value >> + : iterator_types +{ +}; + +template +struct iterator_traits::value>> +{ + using iterator_category = std::random_access_iterator_tag; + using value_type = T; + using difference_type = ptrdiff_t; + using pointer = T*; + using reference = T&; +}; +} +} + +// #include + +// #include + + +#include + +// #include + + +// http://en.cppreference.com/w/cpp/experimental/is_detected +namespace nlohmann +{ +namespace detail +{ +struct nonesuch +{ + nonesuch() = delete; + ~nonesuch() = delete; + nonesuch(nonesuch const&) = delete; + void operator=(nonesuch const&) = delete; +}; + +template class Op, + class... Args> +struct detector +{ + using value_t = std::false_type; + using type = Default; +}; + +template class Op, class... Args> +struct detector>, Op, Args...> +{ + using value_t = std::true_type; + using type = Op; +}; + +template