CS180 Proj4 · Neural Radiance Field!

Nov 2025 · Weiyi Zhang

Part 0 · Camera Calibration and 3D Scanning

We calibrate the camera and visualize all training views in viser as frustums to verify intrinsic/extrinsic parameters and scene coverage.

Camera Frustums in Viser

Interactive camera frustums view
Drag left/right or scroll to change view
You can drag and scroll to see different views. The frustums represent the camera positions and orientations

Part 1 · Fit a Neural Field to a 2D Image

Fit an MLP-based coordinate network to reconstruct 2D images from continuous pixel coordinates using positional encoding.

Model Architecture

BackboneMulti-Layer Perceptron Layers4 Width128 ActivationReLU(for hidden layer) & Sigmoid(for output layer) Positional Encoding10 Learning Rate1e-2 with Adam Loss FunctionMean Squared Error (MSE Loss)

Training Progression · Provided Test Image

Step0
Run A at current step
Fox
PSNR: -- dB
Run B at current step
Bridge
PSNR: -- dB

Drag the slider to see pictures in different training steps.

Positional Encoding & Width

Comparison across two max positional encoding frequencies and two network widths.

Low PE, wide MLP
L=2, Width=64
Low PE, narrow MLP
L=2, Width=256
High PE, wide MLP
L=10, Width=64
High PE, narrow MLP
L=10, Width=256

PSNR Curve

PSNR curve for training on one image
PSNR curve for training with my self-chosen image 'bridge'. with dimensions = 128, L = 10.

Part 2 · Fit a Neural Radiance Field from Multi-view Images (Lego)

Implement a NeRF-style volumetric renderer for the Lego scene, including ray sampling, MLP-based density/color prediction, and hierarchical rendering from calibrated multi-view images.

Implementation Overview

Rays & Samples Visualization

Sample rays with camera frustums (interactive)
Drag left/right or scroll to change view
Sample points along rays

Training Progression (Lego)

Lego prediction at 1k iters
Groundthruth
Lego NeRF prediction (interactive)
Iter 200
Train PSNR: -- dB · Val PSNR: -- dB · Loss: --

Drag the slider to browse NeRF predictions at different iterations.

Validation PSNR Curve

PSNR curve on validation set
PSNR · evaluated on held-out Lego views.

Spherical Rendering Video

Spherical Camera Path

Part 2.6 · Training NeRF with My Own Data

Novel View GIF

Novel views of my object
Camera Orbit GIF · synthesized views circling the object.

Code & Hyperparameter Adjustments

Hyperparameter Tuning

I experimented with several key NeRF hyperparameters to make the custom scene both stable and efficient to train. First, I tightened the depth range to near = 0.001 and far = 0.5, which better matches the small physical scale of my capture and avoids wasting samples on empty space. I then swept the number of points sampled along each ray (n_samples) over 32, 64, and 128. With 32 samples the reconstruction was noticeably noisy, while 64 samples produced sharp geometry and clean colors; increasing to 128 slightly improved details but made training significantly slower. Finally, I set the total number of optimization steps to 5000, which was enough for the training PSNR to saturate and for the rendered novel views to look visually consistent without overfitting.

Training Loss Curve

Training loss over iterations
Loss vs Iterations · convergence of my-scene NeRF.

Intermediate Renders

Step1
Panda GT view
Ground Truth
Panda NeRF prediction
Prediction · step 1

Drag the slider to browse NeRF reconstructions at different training steps.

Summary

When I first started working on this, I really thought I could achieve something close to the Lego example. Later I realized this is far more difficult than I imagined. I feel like I’ve tried almost every possible combination at every stage, yet every step turned out to be way more sensitive and error-prone than expected.

During image capture, I tested all kinds of setups:
• 6 tags as calib_images + 1 tag (same size) for object_images
• 6 tags as calib_images + 6 tags for object_images
• using the 6-tag object_images directly for calibration
• trying a single large tag for calibration
My conclusion: larger tags work much better. They’re more stable, easier to detect, and much less sensitive to lighting or background noise. Small tags get messed up by shadows, reflections, or noisy textures extremely easily.

I also hit a ton of issues while shooting photos. Later I realized the object’s distance doesn’t need to change at all; it’s the tag’s viewpoint that needs diversity, otherwise the pose solve becomes unstable. Downsampling is a must as well, because high-res noise makes detection much worse.

If possible, find a place with uniform lighting and a clean background. My desk has complicated wood grain patterns, so the detector kept hallucinating tag IDs that didn’t exist. On top of that, the desk sits between a lamp and a window, so I had to constantly avoid shadows and reflections. I basically spent the whole shooting process tiptoeing around these problems just to prevent PnP from exploding.

As for implementation details, the part that consumed the most time was visualization in Viser. My camera poses were always flipped or chaotic—sometimes all flipped in one direction (which can be fixed with a scale), sometimes half of them flipped and the other half not (which needs manual axis correction in code), and sometimes just totally inconsistent. In those cases, my final conclusion is simple: the images are the problem. Blurry shots, misdetected tags, or extreme viewing angles all lead to unstable pose estimation. That’s something I only understood after wrestling with it for days.

In the final rendering stage, I also ran into another big issue: because the captured viewpoints were too limited, NeRF didn’t have enough angular coverage. As a result, the rendered output had obvious artifacts—especially those “floating blurry layers” that look like ghost surfaces. Later I finally understood that this isn’t the model’s fault. If the training views don’t constrain the space enough, NeRF simply starts hallucinating. The less information you give it, the more it invents.

I kept trying until the last day of the deadline, but I still think the result is not good enough. I believe I will try again when I have time. If anyone sees this webpage and is willing to visit my GitHub to check my code and point out the areas I can optimize, I would be very grateful.