This repository focuses on rendering real estate room of 2D images into a 3D scene using NeRF (Neural Radiance Fields), a technique for synthesizing novel views of a scene from a set of images. It is primarily inspired by NeRF and references the following papers:
Mainly inspired by NeRF
- REAL
- Contents
- Requirements
- LLFF data - Format
- Camera pose
- Execution
- Training Sequence
- Improvement
- Citation
- License
For this project to render the images until now, you need to get the LLFF data format. At the beginning, planed to convert it without any other special data format, but it is not easy to convert it. Therefore, It is not possible if you don't have the LLFF data format which describes the camera pose and 3D points of the room and transformation of the camera parameters.
By the way, I will try to convert it to the LLFF data format automatically within a single command option(instruction) or somthing like that in the future.
The breif description about LLFF data is below.
LLFF can correct certain types of distortion in the input images, such as lens distortion and chromatic aberration, by estimating the intrinsic camera parameters of each image.
By estimating and correcting the intrinsic camera parameters, LLFF can enhance the quality of the images by reducing or eliminating the effects of distortion. This correction process can result in improved image sharpness, reduced color fringing, and a more accurate representation of the captured scene.
- images/ ---- .---
- sparse/ ---- .bin
- bound_poses.npy or transforms.json, {yourdata}.txt
- check LLFF Repository first!!
"camera_angle_x": 1.6316266081598993,
"camera_angle_y": 1.0768185778803099,
"fl_x": 903.3096914819945,
"fl_y": 904.1146220896455,
"k1": -0.0006951346026472416,
"k2": -0.0022074727073696896,
"k3": 0,
"k4": 0,
"p1": -0.00018190630274219532,
"p2": -0.00015686925639183075,
"is_fisheye": false,
"cx": 959.5738541657016,
"cy": 544.0907729519863,
"w": 1920.0,
"h": 1080.0,
"aabb_scale": 32,
"frames": [
{
"file_path": "./images/0017.jpg",
...
An example transformation is provided, demonstrating camera parameters like camera angle, focal length, distortion coefficients, principal point, image dimensions, and more.
./{data}/text
These parameters define how the camera lens captures and distorts the incoming light rays. In the provided example, the focal length, principal point, and distortion coefficients are specified for a single camera.
focal length x, y / principal point x, y / radial distortion coefficients k1~k4 / tangential distortion coefficients p1, p2.
# Camera list with one line of data per camera:
# CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
# Number of cameras: 1
1 OPENCV 1920 1080 903.30969148199449 904.11462208964554 959.57385416570162 544.09077295198631 -0.0006951346026472416 -0.0022074727073696896 -0.00018190630274219532 -0.00015686925639183075
These parameters represent the orientation and position of the camera in the world coordinate system. Additionally, each image may have multiple observations (POINTS2D) of 3D points in the scene, defined by their X and Y coordinates and the corresponding POINT3D_ID.
Q: quaternion which rotates a point from the world coordinate system into the camera coordinate system.
T: translation of the camera center in world coordinates.
IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
POINTS2D[] as (X, Y, POINT3D_ID)
Number of images: 35, mean observations per image: 1618.6571428571428
1 0.98166594374421712 0.11018315231591673 0.14857255792029417 0.046020026852707313 -0.68943682766847558 0.8318357390927269 -2.5659713605463765 1 0017.jpg
...
# 3D point list with one line of data per point:
# POINT3D_ID, X, Y, Z, R, G, B, ERROR, TRACK[] as (IMAGE_ID, POINT2D_IDX)
# Number of points: 5740, mean track length: 9.869860627177701
944 5.1531701493470319 6.0687577564030635 5.0653561052108316 112 100 78 0.78653193433340196 32 2166 18 888 33 1394 27 1595
...
You can find the sample data in the ./data
directory.
It contains some images that I took in my room and the camera pose and parameters of the room got following the structure of the LLFF.
Also you can get the whole training code in the ./train
directory which is based on nerf.
Modified the code a little bit to train only with the LLFF data format and to render the images with CUDA using OpenGL.
few images, camera pose, camera parameters with LLFF data format
interactive openGL viewer
OpenGL is a cross-platform graphics API that specifies a standard software interface for 3D graphics processing hardware which is possible to control the camera position and viewing direction interactively. I used OpenGL to render the images with CUDA. because it is fast and easy to use. futhermore, there's reference code for rendering with OpenGL in the LLFF repository.
Render with CUDA : reference
./cuda_renderer mpidir <your_posefile> <your_videofile> height crop crf
trained_example.mp4
trained_example.mp4
-
MLP
- Mainly predict the density of the product
- 5-Dimension input:
x, y, z, θ, φ
->ρ
(density)
-
Volume Rendering
-
Stratified Sampling approach
-
Hierarchical Volume Sampling - Coarse network -> Fine network
-
Positional Encoding
- Fix positional encoding issue - Possibly the reason why the result is not good
- Add more data
- autoamtically make llff data
- train more to make it more realistic
- build configuration
- modify the whole thing to easy to use at any data with single command
@misc{lin2020nerfpytorch,
title={NeRF-pytorch},
author={Yen-Chen, Lin},
publisher = {GitHub},
journal = {GitHub repository},
howpublished={\url{https://github.com/yenchenlin/nerf-pytorch/}},
year={2020}
}
@article{mildenhall2019llff,
title={Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines},
author={Ben Mildenhall and Pratul P. Srinivasan and Rodrigo Ortiz-Cayon and Nima Khademi Kalantari and Ravi Ramamoorthi and Ren Ng and Abhishek Kar},
journal={ACM Transactions on Graphics (TOG)},
year={2019},
}
MIT License - see LICENSE
for more details.