Gradient Domain Path Tracing

Dario Seyb

January 22, 2017

Abstract

In recent years the popularity of the path tracing rendering algorithm has risen dramatically. Especially

the movie industry appreciates the accuracy and relative simplicity of physically based rendering. Un-

fortunately the drawbacks of path tracing are still holding back artistic expression. For complex scenes

rendering times can be prohibitive. Hence extensions of path tracing which address these issues have to be

found. In this report we will present ”Gradient Domain Path Tracing”, one of the more recent extensions.

We will describe the algorithm, its advantages over classical path tracing and recent work in gradient

domain rendering. Additionally we will provide an implementation of the algorithm in the rendering

framework ”zaphod” [Sey17].

Keywords Path Tracing, Physically Based Rendering, Gradient Domain Light Transport

Contents

1 Introduction 2

2 A Brief Introduction to Path Tracing 2

3 Introduction to Gradient Domain Light Transport 3

3.1 Symmetric Gradient Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Shift Mappings 6

4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.1.1 Shift via Manifold Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1.2 Improved Shift via Manifold Exploration . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1.3 Half Vector Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1.4 Discussion of Shift Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.2 Computing the Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3 Multiple Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Solving Screened Poisson Equations 10

6 A Theoretical Analysis of Gradient Domain Light Transport 11

6.1 A Math Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.1.1 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.1.2 Power Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6.1.3 Dirac Delta Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6.2 Analyzing the Components of Gradient Domain Methods . . . . . . . . . . . . . . . . . . 13

6.2.1 The Gradient Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.2.2 Stochastic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.2.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.2.4 Poisson Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7 Recent Work 16

7.1 Gradient Domain Bidirectional Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 16

7.1.1 Shift Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7.1.2 Connection Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7.2 Temporal Gradient-Domain Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7.2.1 Shift Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7.2.2 Spatio-temporal Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

8 Discussion 18

8.1 Equal Time Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8.2 Analysing Shift Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

9 Acknowledgements 20

1 Introduction

Gradient domain methods are a rather recent research area in light transport simulation. Especially

in the last two years there have been multiple extensions to the original algorithm, applying gradient

rendering to existing light transport strategies. This report intends to be a survey of the state of the

art in gradient domain rendering. We will start by describing the basic algorithm and the components

needed to implement it. Next we will present a theoretical analysis of gradient rendering which provides

a mathematical explanation for the empirical results. Finally we will compare the various gradient

domain adaptions. We are especially interested in the diﬀerence between Gradient Domain Metropolis

Light Transport (MLT) and Gradient Domain Bidirectional Path Tracing (BDPT) [Vea98]. We focus

on these algorithms because their non-gradient domain versions are popular in research and production.

Additionally both MLT and BDPT are unbiased, sampling-based algorithms that handle a large variety

of scenes. Even traditionally hard to render scenes are handled robustly. As such, the scenes chosen

for analysis should be handled similarly well between algorithms, and the comparison can focus on the

inﬂuence of gradient rendering on their results.

2 A Brief Introduction to Path Tracing

Path tracing is a technique used for light transport simulation which uses Monte Carlo sampling to

evaluate the integral in the rendering equation by [Kaj86]

I(x, x

) =

Geometry Term

z }| {

g(x, x

)[(x, x

)

| {z }

Emission

BRDF

z }| {

ρ(x, x

, x

)

Incoming light

z }| {

I(x

, x

) dx

| {z }

Light reﬂected at x

in direction

−→

] (1)

I(x, x

) is the radiance arriving from point x

at x. It depends on the light that is being reﬂected at

and it is deﬁned recursively. In modern literature the prevalent mathematical framework to describe

light transport problems is the path integral formulation introduced by [Vea98].

Ω

(¯x)dµ(¯x) (2)

is the amount of radiance arriving at a pixel j on the sensor and f

is the measurement contribution

function which assigns an amount of transported energy to each path ¯x. The main diﬀerence between

Equations (1) and (2) is that instead of describing a measurement I by a local recursive integral we

describe it by globally integrating over Path Space Ω, that is, the set of all paths ¯x = x

. . . x

that are

sampled by a path tracer.

In this context Monte Carlo simulation corresponds to drawing random samples in path space. To

compute a pixel value we randomly generate paths that connect a point on the sensor to a point on a

light source. This is usually done by recursively tracing a ray through the scene and calculating a new

outgoing direction at each intersection point. Once a path is sampled we can calculate its contribution

to the image. Figure 1 illustrates this process.

Figure 1: Sampling a ray that contributes to a certain pixel on the sensor.

While the sampling process is identical whether we are using the path integral or the recursive scat-

tering formulation, working in path space allows us to develop algorithms that work on complete paths

and are decoupled from the path generation strategy.

The main issue with the Monte Carlo simulation approach is that naively sampling the high di-

mensional path space results in high variance between samples. This is visible in the ﬁnal image as a

characteristic high frequency noise. Because of the high number of samples required for a convergent

image, path tracing was prohibitively expensive for many applications. The high variance is in part due

to the fact that many paths never hit a light source. Figure 2 a) shows the distribution of light paths in

a scene with diﬀuse materials. Blue paths never hit a light; yellow paths do. As you can see, since the

light source is small and out of the way, very few paths hit it randomly. The result is a noisy image as

shown in Figure 2 b).

(a) Path Density (b) Resulting Noisy Image

Figure 2: A scene illuminated by a small light source rendered with standard

path tracing. A high sample count is necessary because very few of the paths

contribute to the image.

3 Introduction to Gradient Domain Light Transport

The goal of almost any new light transport method is to reduce the variance in the evaluated samples

since this leads to faster converging images which need less time to render. While there are many ways

to reduce the variance of a given sampler, for example via importance sampling [Vea98], these are often

The 2D ﬁgures in this report are created with a tool we created speciﬁcally for this purpose. You can ﬁnd it

at https://darioseyb.com/pathgraph. It is inspired by the paper ”Theory, Analysis and Applications of 2D Global

Illumination” [Jar+12].

quickly exhausted or provide only marginal improvements. Gradient Domain Path Tracing (G-PT) does

not change the base sampling distribution at all, instead it relies on the spectral properties of gradients

and screened Poisson reconstruction to reduce variance.

In natural images and many images we want to render with a path tracer, the energy is concentrated

in the low frequencies [Rud94]. In a traditional path tracer we are sampling the low frequencies directly.

Since the energy is the high, variance is high, too. This results in a noisy image. The idea of gradient

domain methods is to instead sample the high frequencies, i.e. gradients, directly. These are a lot lower

energy and thus can be sampled with less variance. The ﬁnal image can be reconstructed from the

gradients and a (noisy) base image using Screened Poisson Reconstruction as described in Section 5. A

detailed analysis of these claims will be provided in Section 6.

The idea of direct gradient sampling was ﬁrst introduced in the context of MLT [Leh+13]. Since

MLT is notoriously diﬃcult to implement correctly, [Ket+15] applied the same concepts to classical path

tracing with great success. The resulting algorithm ”Gradient Domain Path Tracing” is easy to implement

and provides greatly improved results compared to regular path tracing.

(a) Path Tracing (b) Gradient-Domain Path Tracing

Figure 3: Comparing the results of standard path tracing (left) and G-PT

(right) at the same number of samples (10).

Instead of sampling the value I

in Equation (2) directly we want to sample ∆

, the ﬁnite diﬀerence

between two pixels i and j.

∆

= I

− I

(3)

Naively computing ﬁnite diﬀerences is not enough to ensure low variance since Var(I

−I

) = Var(I

Var(I

) if I

and I

are uncorrelated. Hence we need to ﬁnd a way to sample I

and I

in a strongly

correlated fashion. Recall that a single measurement of I

only depends on the value of the measurement

contribution function f

(¯x). Thus, if we manage to generate a path ¯x contributing to pixel i and a path

¯y contributing a similar amount of radiance to pixel j, the variance of ∆

will be low.

In all the gradient domain methods proposed so far this is achieved by deterministically shifting a

base path ¯x through pixel i to an oﬀset path ¯y through pixel j. We call this shift mapping T

and write

(¯x) = ¯y. With T

deﬁned we can rewrite Equation (3) in terms of base and oﬀset paths.

∆

= I

− I

Ω

(¯x)dµ(¯x) −

Ω

(¯x)dµ(¯x)

Ω

(¯x) −f

(¯x))|T

|dµ(¯x)

(4)

The transformation is straightforward except for the introduction of the Jacobian determinant |T

This is necessary because we are changing the integration variable of f

from ¯x to T

(¯x). The integration

domain is not adjusted because we assume T to be bijective on Ω and thus T (Ω) = Ω. Thanks to the

Naive Gradient Sampling

Highly-Correlated Gradient Sampling

Figure 4: Comparing naive gradients with carefully sampled gradients. Both

images are rendered with 100 SPP. The images on the right are the results of

Poisson reconstruction based on the gradient images on the left.

formulation of the shift as a change in integration variables we have a lot of freedom in constructing a

shift mapping. As long as we can compute the corresponding Jacobian the results will be unbiased.

3.1 Symmetric Gradient Formulation

Unfortunately it is impractical to require a shift mapping to be a bijection on the complete path space.

[Man+14] introduce a Symmetric Gradient formulation which does not pose this requirement on T . Let

us ﬁrst investigate reasons why a shift may not be bijective. The ﬁrst case is that there might be paths ¯x

for which T (¯x) is not deﬁned. This can happen when the shift procedure fails, e.g. for numerical reasons.

We call these shifts non-invertible. The second case is that T (¯x) might map ¯x to a path ¯y /∈ Ω, which

means the oﬀset path ¯y would never be generated during regular path sampling. These shifts are called

non-symmetric. Thirdly, the ﬁrst two cases might apply to the inverse shift T

−1

. Bijectivity requires

−1

(T (¯x)) = ¯x, but if the inverse shift is not well deﬁned this requirement is violated. In the ideal

case all shifts would be symmetric (a non-invertible shift can not be symmetric, hence shifts which are

non-invertible as well as symmetric do not exist). If any of these cases apply, the integral in Equation (4)

will not return the correct value and as a result the ﬁnal image will be biased.

[Man+14] formulate a diﬀerent way to express ∆

which can handle cases of bijectivity violation

robustly. This formulation is extended by [Ket+15] to allow for multiple importance sampling. Instead

of just sampling the forward ﬁnite diﬀerence between pixels i and j we also sample the backwards ﬁnite

diﬀerence and combine them using weights.

∆

Forward Diﬀerence

z }| {

Ω

(¯x)(f

(¯x) −f

(¯x))|T

|)dµ(¯x) +

Backward Diﬀerence

z }| {

Ω

(¯x)(f

(¯x) −f

(¯x))|T

|)dµ(¯x) (5)

Non-symmetric and non-invertible shifts are handled via the weights w

and w

. In both cases

the diﬀerence in the other direction will not be sampled and we set the weight of the current sampling

strategy to 1. If the shift is symmetric the diﬀerence will be sampled twice, once via forward and once

via backward diﬀerences, so we weight it by 1/2.

Since the deﬁnition of bijectivity violations is fairly abstract we want to give practical examples for

each type of shift. Figure 5 shows the three ways shift computations can proceed.

a) Non-Symmetric b) Non-Invertible c) Non-Invertible

Figure 5: Shown are three examples for how a shift can fail. The shift in

a) is non-symmetric because the connection to vertex x

is blocked and a

blocked path will never be sampled by the path tracer. In Figure b) the oﬀset

path hits a material with a diﬀerent classiﬁcation, and hence the shift is non-

invertible. Figure c) shows another case of non-invertibility. The primary ray

gets refracted, but the oﬀset path encounters total reﬂection.

4 Shift Mappings

The shift mapping, that is the way we compute oﬀset paths, is at the heart of G-PT. A good shift mapping

produces oﬀset paths which have a similar throughput to the corresponding base path. This is not always

easy. Even small changes in the position of a path vertex can completely change the throughput of the

path if specular materials are involved. The ﬁrst two vertices of an oﬀset path do not depend on the

mapping. The ﬁrst vertex is ﬁxed on the point on the sensor that corresponds to the pixel shift from i

to j. The second vertex is the primary scene intersection of the ray deﬁned by the camera position and

the ﬁrst vertex. In this section well will take a look how to continue the oﬀset path after the primary

intersection. In all so far proposed shift mappings the aim is to connect the oﬀset path back to the

base path as soon as possible. This technique has proved successful, since if two paths share segments

their throughputs are likely to be similar. The points at which we connect the paths have to be chosen

carefully.

4.1 Construction

To construct an eﬃcient shift mapping we need to classify vertices into diﬀuse and specular. Diﬀuse

vertices are easier to handle and allow more ﬂexibility in how to connect paths. This is because their

BRDF is smooth and the directions to the next and previous path vertices don’t change the throughput

much. Specular vertices need to be treated with more care. Assume a vertex lies on the surface of a

perfect mirror. Moving either of the adjacent vertices on the path will completely nullify any throughput

through the whole path since the BRDF is only non zero for a very speciﬁc direction. To describe the

various shift mappings we introduce a general description of an oﬀset path.

(¯x) = ¯y = (s, h, o

, . . . , o

, x

, . . . , x

)

s = shift

)

h = R(s, ω

)

(6)

where shift

) denotes shifting the vertex x

in primary sample space from pixel i to pixel j, R

is the ray casting function and R(s, ω

) the primary hit of the ray starting at position s in direction

which is determined by the orientation of the sensor. x

is the connection vertex. After connection,

the vertices of ¯x and ¯y coincide. In this framework the shift mapping decides the length m of the oﬀset

vertex chain ¯o by picking a connection vertex x

and chooses the vertices in ¯o such that the change in

throughput is minimal.

4.1.1 Shift via Manifold Exploration The ﬁrst shift mapping, introduced in [Leh+13], uses tech-

niques from Manifold Exploration [JM12] to construct parts of ¯o (Figure 6). This is a natural choice since

Manifold Exploration was developed in the context of MLT and solves the problem of connecting two

diﬀuse vertices via a chain of specular vertices. Here ¯o is split into three parts. x

is chosen as the second

diﬀuse vertex after h. A specular chain is traced from h (like in standard path tracing) until a diﬀuse

vertex o

is hit. A second specular chain connecting o

and x

is then constructed using the optimization

technique from [JM12]. Note that this second chain can be empty if x

follows directly after o

. In this

case they are immediately connected and no Manifold Exploration is performed.

Base Path ¯x

Oﬀset Path T (¯x)

Diﬀuse Surface

Specular Surface

Figure 6: Shifting a path using manifold exploration. Since the primary hit h

on the oﬀset path lies on a diﬀuse surface we continue the path like in standard

path tracing. The next hit o

lies on a diﬀuse surface. We connect the oﬀset

path to the base path at the next diﬀuse base path vertex x

via manifold

exploration.

4.1.2 Improved Shift via Manifold Exploration [Man+14] contribute a new shift mapping which

reduces variance in the case where the second specular chain of the shift by [Leh+13] is empty (Figure

7). This can be an issue because the direct connection can result in a low geometry term. If the second

chain is empty the improved shift treats x

as specular and ﬁnds the next diﬀuse vertex in ¯x. It then

uses the found vertex as the new connection vertex and performs Manifold exploration for connection as

before. This completely eliminates the case of empty second specular chains.

Base Path ¯x

Oﬀset Path T (¯x)

Diﬀuse Surface

Specular Surface

Figure 7: Shifting a path using the improved manifold exploration shift by

[Man+14]. The ﬁrst hit h is on a specular surface and we copy the half vector

of the corresponding base path vertex. The second vertex o

is on a diﬀuse

surface and we could connect to the next diﬀuse vertex on the base path. Since

this would leave the second specular chain empty we use manifold exploration

to construct a path connecting to x

4.1.3 Half Vector Replication Finally, [Ket+15] introduce a shift not based on Manifold Explo-

ration. x

is chosen as the ﬁrst vertex in ¯x where both x

as well as x

c+1

are classiﬁed as diﬀuse (Figure

8). This eliminates the need for connecting diﬀuse vertices through specular chains. For the vertices in

between h and x

a path is traced by replicating the projected half vectors of the base path vertices at

each step. Replicating half vectors of specular vertices proves to result in similar values for the BRDF

at each vertex, since specular BRDFs are usually well deﬁned by the half vector between incoming and

outgoing directions.

Base Path ¯x

Oﬀset Path T (¯x)

Diﬀuse Surface

Specular Surface

Figure 8: Shifting a path using Half Vector Replication. At the ﬁrst hit h we

need to replicate the half vector of the corresponding base path vertex because

the surface is specular. As a result we hit o

which is on a diﬀuse surface. The

next base path vertex x

is also on a diﬀuse surface and we can connect the

oﬀset path back to the base path.

4.1.4 Discussion of Shift Mappings The shift mappings based on Manifold Exploration produce

excellent results, but are hard to implement and require a costly optimization process. This is why

[Ket+15] prefer their novel shift mapping. It results in a slightly higher amount of artifacts caused by

singularities that [Man+14] avoids, but those are canceled by using the multiple importance sampling

techniques also introduced in [Ket+15]. Overall [Ket+15] is the most promising. It requires about 30

lines of C++ code to implement correctly (including Jacobian calculation) and produces great results at

little cost.

4.2 Computing the Jacobian

To account for the change in integration variables in Equation (4) we need to compute the determinant of

the Jacobian |T

(¯x)| = |dT

(¯x)/d¯x| for the used shift mapping. This is important to make sure that the

results of the integration are mathematically correct, otherwise we would introduce bias. In this section

we aim to given an intuition for what this value denotes, but we will not derive the Jacobian for any

of the presented shift mappings. The value of |T

(¯x)| gives us a measure for how much T stretches or

squishes the path space around ¯x. Figure 9 shows how path density can be a lot lower after shifting,

resulting in high Jacobian values.

Computing this value can be challenging. In [Ket+15] the derivations are made easier by two insights.

First, the authors show that the calculation can be split into three steps. Namely, reparametrizing the

path in a way so the shift itself has a trivial to compute Jacobian, computing the Jacobian of the shifted

path in the reparameterized space and transforming the path back into the original space. Mathematically

this is expressed as multiplying the Jacobian determinants of the three steps.

(¯x)| = |

d¯y

d¯x

| =

Reparametrization

z}|{

d¯y

dˆy

| |

dˆy

dˆx

|{z}

Trivial Shift

Back Transform

z}|{

dˆx

d¯x

| (7)

The second insight is that computing the reparametrization at each path vertex separately and mul-

tiplying them together results in the ﬁnal Jacobian for the whole shift. This allows us to combine ideas

from diﬀerent shift mappings and reuse the derived Jacobians.

Base Path ¯x

Oﬀset Path T (¯x)

Figure 9: Visualizing the path density changes caused by T

. Blue paths

are sampled via regular path tracing, red paths via the strategy by [Ket+15].

In the bottom right corner the density of red paths is much lower than the

density of blue paths, leading to high Jacobian values.

4.3 Multiple Importance Sampling

As shown in the previous section the Jacobian values can get very large. These high values can be an

issue, because they cause outliers in the gradients which manifest themselves after reconstruction as very

obvious artifacts. As shown in Figure 10 they mostly appear in corners and are rather unpleasant as far as

rendering artifacts go. The typical way to deal with outliers like these is to employ multiple importance

sampling ([Vea98]). Due to the Symmetric Gradient Formulation we already weight the forward and

backward gradient sampling strategies. If we extend these weights to be continuous we can weight the

strategies by their importance in the symmetric shift case, where they would otherwise be 1/2 for both

strategies.

Figure 10: Even with multiple importance sampling, artifacts appear in

corners and on edges at low sample counts.

To ﬁt the diﬀerence sampling into the framework needed for importance sampling, we interpret the

base path ¯x and the inverse shift T

(¯y) = ¯x as two strategies to sample the same distribution. The

weights are then computed as follows when using the balance heuristic.

(¯x) =

p(¯x)

p(¯x) + p(T

(¯x))|T

(¯x)|

(8)

The intuition here is that if the Jacobian |T

(¯x)| is large the contribution of the sample will be

weighted by approximately 1/|T

(¯x)|. Hence outliers will be suppressed.

5 Solving Screened Poisson Equations

The outputs of a gradient domain path tracer are a primal image I

and a set of gradient images I

d∗

With G-PT the gradient images are the spatial diﬀerences I

and I

. We want to reconstruct the ﬁnal

image I from these values. That is, we want to estimate the image function I from its gradients. It is

easy to see that I is optimal if it minimizes the constraint in Equation (9), because that means that the

gradient ∆I is as close as possible to our computed gradient images. [Bha+08]

Z Z

||∆I −





dxdy (9)

Reconstructing an image based on this constraint introduces issues if there is error in the gradients.

Especially low frequencies are not reconstructed well, because small gradient error accumulates over longer

distances. To improve the reconstruction we use the primal image I

. The primal image is noisy in the

high frequencies, but represents low frequencies fairly well, even at low sample counts. [Bha+08] add an

additional term to the integrand in Equation (9).

Z Z

α(I − I

)

+ ||∆I −





dxdy (10)

Here α is a factor that controls the inﬂuence of the primal image on the solution. Note that setting

α to 0 reduces Equation (10) to Equation (9). Expressing the ﬁnal image as in Equation (11) allows us

to use standard numerical solvers.

I = argmin

(||α(

I − I

)||

+ ||∆

I −





) (11)

The variable p allows us to change the norm used for minimization. Only the L2 norm results in

bias-free images, but [Ket+15] illustrate that using the L1 norm, while introducing bias, results in images

with less artifacts and converges to the same results when the sample count is high. Artifacts in L2

reconstructions stem from the outliers in the gradient sampling result discussed in the previous section.

The reconstruction spreads out the error to minimize it. L1 reconstruction does not spread out error as

much and as such outliers generally only inﬂuence one pixel and manifest as ﬁreﬂies which are a common

artifact in path tracing and hence do not stand out as much. While multiple importance sampling reduces

the gradient outliers and makes L2 reconstruction more practical, L1 reconstruction still produces cleaner

images.

L1-Reconstrction L2-Reconstrction

Figure 11: Reconstruction using diﬀerent norms. The base gradient and

primal images are exactly the same.

6 A Theoretical Analysis of Gradient Domain Light Transport

Early papers on gradient domain methods provided little in the way of rigorous mathematical analysis.

[Leh+13] discuss the frequency characteristics of the gradient operator and Poisson reconstruction, but do

not go into any further depth. Particularly the factor α in Equation (11) is chosen empirically. The goal

of analysing the frequency characteristics of light transport is to gain a deeper insight into where error is

coming from and why gradient methods are successful. In the following we will introduce commonly used

techniques and motivate gradient based light transport by analyzing the frequency content of natural

images and their gradients.

6.1 A Math Primer

Many of the techniques used in signal processing and image analysis are not commonly taught in un-

dergraduate courses. For the beneﬁt of the reader we will give a short overview over the mathematical

concepts and notation used in this section.

6.1.1 The Discrete Fourier Transform Our aim is to analyze the frequency content of a discretely

sampled signal f (e.g an image). For now we can assume f to be a continuous function deﬁned over the

real numbers. The frequency content of a continuous function, also called its Fourier spectrum can be

found by projecting it onto a basis of complex exponential functions E

(x) = e

i2πωx

. This makes sense

if you recall that a complex exponential function can be decomposed into sine and cosine as follows

i2πωx

= cos(2πωx) + i sin(2πωx). Hence, once we can express a function f in terms of a set of periodic

parts E

and thus know the factor for each frequency ω we know how much that frequency inﬂuences

the function f. The forward and backwards Fourier transforms are given by [SSJ16] as follows.

F (ω) =

∞

−∞

f(x)e

−i2πωx

dx (12)

f(x) =

∞

−∞

F (ω)e

i2πωx

dω (13)

F (ω) is the factor for the frequency ω we are looking for. Equation (13) shows how to reconstruct

a function from its periodic parts. This deﬁnition only works with continuous functions, but not with a

set of discrete samples which we are typically trying to analyze. To deal with this we need two further

facts about the Fourier transform. First, if f is periodic, its Fourier spectrum is non-zero only at discrete

points. Secondly, if f is discrete, its Fourier spectrum will be periodic [SSJ16]. Generally we sample f

over a ﬁnite range and can assume that the sampled values repeat outside that range. Hence we can

assume f to be periodic and thus f has a discrete Fourier spectrum. Also, as already said, f is sampled

at discrete points and hence its Fourier spectrum is periodic. Combining these two properties we can see

that we can represent the Fourier spectrum of a discrete, periodic function with a ﬁnite amount of values

(since it itself is discrete and periodic).

In practice we need exactly as many samples of the Fourier spectrum as we have samples of the original

function. From those spectrum samples the original samples can be reconstructed perfectly.

Another important concept is that a multiplication f × g in the spatial domain corresponds to a

a convolution F ∗ G in the frequency domain. Correspondingly, a convolution in the spatial domain

corresponds to a multiplication in the frequency domain. This is easy to prove using the deﬁnitions of

the Fourier transform and its inverse.

Furthermore, evaluating F (ω) at 0 equals the integral of f over its complete domain as shown in

Equation (14).

F (0) =

∞

−∞

f(x)e

−i2πw×0

∞

−∞

f(x)dx

(14)

Photograph

High-Gradient Image

Figure 12: Left: Images with diﬀerent frequency content, Right: Their power

spectra. Note the vertical spike in the power spectrum of the photograph,

indicating the existence of higher frequencies in the y direction of the image.

This corresponds to the clear separation between sky, mountains, trees and

the lake in the image.

6.1.2 Power Spectra A power spectrum is the square of a Fourier transform |F (ω)|

. It is often used

to represent the results of analysis since it presents the frequency content f in a very visual way.

Figure 12 shows a natural image and an image constructed to have high gradients on the left and

their power spectra on the right. In a power spectrum the values in the centre of the image correspond

to low frequencies. As you can see, the power of the natural image is focused in the low frequencies while

the high-gradient image has a spread out power spectrum.

6.1.3 Dirac Delta Functions A common problem in light transport is representing and analyzing

functions which are 0 everywhere except for one speciﬁc point. This comes up for example in the BRDF

of a perfectly specular material. It reﬂects light into only one very speciﬁc direction. Unfortunately it

is hard to work with such a functions since they are not integrable via usual methods. To solve this we

introduce the Dirac Delta Function δ(x). ”Function” is in this case a bit of a misnomer, because it is

technically not a function, but a limit of a series of functions deﬁned as follows. [Bra00]

g(x, t) =

√

2πt

−x

/2t

δ(x) = lim

t→0

g(x, t)

(15)

g(x, t) is a Gaussian centered around the origin, which gets sharper as t decreases. Figure 13 shows

how g approaches to have one of the properties we want, namely, vanishing everywhere except for a

shrinking footprint around 0. We call δ a function, because we can work with it like it is one in the

context of integration. Speciﬁcally, multiplying it with a function f inside of an integral, like in Equation

(1), results in the following:

hδ(x), f (x)i = lim

t→0

∞

−∞

g(x, t)f (x)dx = f (0)

(16)

From here on out a Fourier transform will always be denoted by a capital letter.

An important property for the next section is its Fourier transform, given by ∆(ω) = 1. This follows

trivially from the deﬁnition of the Fourier transform in Equation (12) and the identity in Equation (16).

−3 −2 −1 1 2 3

g(x, t)

Figure 13: Plotting g for multiple values of t. Note how it approaches a spike

at 0.

6.2 Analyzing the Components of Gradient Domain Methods

This section is a summary of the analysis performed in [Ket+15]. At the heart of the analysis is the result

that the eﬃciency of gradient methods depends on the power spectrum of the image function. Images

with sparse gradients (which means power is concentrated in the low frequencies) beneﬁt the most from

gradient rendering. Luckily this is a property that most natural images have [Rud94]. Since we mostly

want to depict natural scenes, this is also the case for the subjects of light transport simulation. Note

that natural in this context doesn’t refer to ”scenes in nature”, but to scenes that depict reality in some

(even stylized) way.

The analysis is done in parts and makes assumptions and simpliﬁcations to reduce its verbosity. The

main simpliﬁcation is that we are only investigating two dimensional scenes and hence one dimensional

sensors. We also limit ourselves to one bounce. That way path space can be represented in two dimensions.

The primary sample space dimension x and the secondary sample space at the ﬁrst bounds p. Figure 14

shows the 2D scene we are using for the analysis.

Figure 14: A two dimensional scene with a camera and a path starting at

the camera, bouncing once and then being connected to the light.

Figure 15 is meant as a visual guide through this analysis. Sections advance from left to right. The

top row shows the analysis performed on the primal image, the bottom row on the gradient image. We

advice to come back to the ﬁgure after each section.

6.2.1 The Gradient Operator First we investigate the power spectrum of the gradient image G.

We use a simpliﬁed shift mapping T (x, ¯p) = (x + 1, ¯p) which only changes the primary sample space

location of the path (x, ¯p). The gradient function is then deﬁned as g(x, ¯p) = f(x, ¯p) − f(x + 1, ¯p).

For convenience we can express g as a convolution g(x, ¯p) = (d ∗ f)(x, ¯p) between the ﬁnite diﬀerence

operator d(x) = δ(x) −δ(x + 1) and f where δ is the Dirac delta function. As noted in section 6.1.1 this

0 50 100 150 200 250 300 350 400 450 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

a) Image contribution f b) Power Spectrum |F |

c) PSD |S|

∗ |F |

d) Integrated Image

0 50 100 150 200 250 300 350 400 450 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

e) Gradient g = d ∗f f) Power Spectrum |G|

g) PSD |S|

∗ |G|

h) Integrated Gradients

Path Dimension ¯p

Image Dimension x

Path Freq. Dimension ω

¯p

Image Freq. Dimension ω

Gradient Operator

Fourier Transform

Sampling

Integration

Reconstruction

0 50 100 150 200 250 300 350 400 450 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

i) Result

Figure 15: This ﬁgure shows the steps of the analysis. The top and the

bottom row show the same transformations applied to the primal image f and

the gradient image g respectively. On the left (a) and e)) we show the reference

images with no sampling error. The next column shows the respective Fourier

transforms. Note that that b) contains a lot more power than f). This is the

result of Equation (17) and the fact that all of the power of F is concentrated

in the low frequencies. Next we show the power spectral densities of f and g

after sampling. The result from Equation (18) is visually represented by the

lighter constant background of c). This is because sampling adds noise to all

frequencies equally and the power of the added noise is proportional to the

power of the sampled function. Next we integrated of the path dimension,

which corresponds to slicing along the red line in c) and g). The results of the

integration are shown in d) and h). Not the considerably higher base noise

in d). Poisson reconstruction corresponds to combining the results in d) and

h) while low-pass ﬁltering d) and high-pass ﬁltering h). Figure i) shows the

result of the reconstruction in the frequency domain. It retains the low level

of noise of h) while representing low frequencies accurately.

convolution corresponds to a multiplication in the Fourier domain and hence the Fourier transform of g

is G(ω

, ω

¯p

) = D(ω

, ω

¯p

)F (ω

, ω

¯p

). We can not give F directly since it depends on the scene, but we can

investigate d. Its Fourier transform is given by D(ω

, ω

¯p

) = δ(ω

, ω

¯p

) − δ(ω

x+1

, ω

¯p

) = 1 − e

i2πω

. This

follows trivially from the identities given in section 6.1.3. An important property is the power spectrum

of D, |D|

= |1 −e

i2πω

= 2 −2 cos(2πω

)

. In Equation (17) we summarize the spectral properties of

the gradient function.

G(ω

, ω

¯p

) = (1 − e

i2πω

)F (ω

, ω

¯p

)

|G(ω

, ω

¯p

= (2 − 2 cos(2πω

))|F (ω

, ω

¯p

(17)

Especially the second result is interesting. Figure 16 shows how the ﬁnite diﬀerence operator attenu-

ates low frequencies ω

6.2.2 Stochastic Sampling The complete analysis of stochastic sampling is fairly lengthy and would

be out of scope for this report. We limit ourselves to presenting the assumptions and results. For a more

detailed description we refer to the supplemental materials of [Ket+15]. The sampling process is modeled

as a multiplication s × f where s is the sampling function. Its power spectral density ([MC12]) is given

by Equation (18). It can be interpreted as the expected value of a power spectrum.

This could also be simpliﬁed to 4 sin

(πω

). This is not done in the original paper and we keep it to avoid

confusion.

0.1 0.2 0.3 0.4 0.5

|G|

Figure 16: The power spectrum of G assuming F is constant. Areas shaded

green correspond to a attenuation of power in the corresponding frequencies.

While the power of frequencies approaching the Nyquist limit is ampliﬁed

signiﬁcantly, most images do not contain these frequencies as discussed in

section 6.1.2.

(|S|

∗ |F |

)(ω

, ω

¯p

) = n

|F |

(ω

, ω

¯p

) +

Frequency independent term

z }| {

n||F ||

(18)

where n is the number samples drawn. The important result is that sampling adds power to all

frequencies equally and the added power is proportional to the energy of the signal.

6.2.3 Integration Finally we need to compute the power spectrum of the Monte Carlo integration.

We integrate f over all paths ¯p ∈ Ω to obtain a pixel value I as in Equation (2). The Fourier transform

of that integral corresponds to slicing the Fourier transform of f at w

¯p

= 0 (see Equation (14)). Since we

are taking n discrete samples and simply adding them up we need to multiply the output of the sampling

process by 1/n (and the power spectrum by 1/n

(|F |

∗ |S|

)(ω

, 0)) = |F (ω

, 0)|

||F ||

(19)

Note that this is just the power spectrum of the converged image with an additional error term

|

(ω

= ||F ||

/n which gets smaller when the number of samples increases. It is proportional to the

energy of the image and constant over all frequencies.

Having derived the error term |

(ω

for the image, we can similarly give the error term for gradient

sampling as |

(ω

= ||G||

/n. We see in Equation (20) that the relative error between gradient and

direct sampling only depends on the relative power of the gradient image versus the primal image.

|

(ω

|

(ω

||G||

||F ||

(20)

Recalling the result from section 6.2.1 this means that, since the energy of the gradient image will be

lower than the energy of the primal image for most natural images, the error introduced by sampling will

be lower as well. This is the ﬁrst important result of the analysis.

6.2.4 Poisson Reconstruction As a last step the analysis turns to the screened Poisson reconstruc-

tion. Again, we will only present results since the derivations are fairly lengthy. First we ﬁnd the error

term (analogous to Equation (20)) for the image error after reconstruction. This is done by transforming

the Poisson reconstruction problem into the Fourier domain [Bha+08] and plugging in the deﬁnitions of

F and G.

|

+ |D|

|

(α

+ |D|

)

(21)

Here α is the same values as in Equation (11) and sets the weight of the primal image during the

reconstruction. Next we derive an optimal value α

∗

for the factor α in Equation (21).

We omit the pixel ﬁlter here since it is non essential to the result of the analysis.

∗

||G||

||F ||

(22)

In practice the optimal value is unknown since we need the converged images to calculate it, but a

value of 0.2 seems to be good for many diﬀerent scenes and is used for all the reconstructions in this

report. Substituting α

∗

for α in Equation (21) gives us an expression returning the minimal error after

reconstruction for a given image.

|

∗

|

|D|

+ |

(23)

For comparison purposes we’d like to know the factor by which gradient sampling reduces the error

versus direct sampling. For this we rearrange Equation (23) and use the result in Equation (20) to

simplify, giving us:

r =

|

(ω

))|

|

∗

(ω

))|

||G||

/||F ||

+ 2 − 2 cos(2πω

)

||G||

/||F ||

(24)

This is the central result of the analysis. If r has a high value the error is reduced by a large margin.

Reconstruction eﬀectively combines the high frequencies of the gradient images with the low frequencies of

the primal images. In the process it ﬁlters the low frequency error from gradients and the high frequency

error in the primal image and thus reduces the overall error.

7 Recent Work

Because of the promising results provided by G-PT there has been additional research conducted in this

area of light transport simulation. A focus has been extending G-PT to more robust algorithms than

path tracing and exploiting temporal coherence in animated image sequences.

7.1 Gradient Domain Bidirectional Path Tracing

While G-PT does considerably better than traditional path tracing in most scenes it is still subject to the

same limitations path tracing suﬀers from. Rendering scenes with diﬃcult light transport characteristics

like the one in Figure 17 still results in high variance, even at high sample counts. To address these issues

[Man+15] introduces Gradient Domain Bidirectional Path Tracing (G-BDPT). G-BDPT is the natural

extension of the Bidirectional Path Tracing Algorithm by [Vea98] into the gradient domain.

The main contributions are an adaption of the shift mapping by [Man+14] to the bidirectional path

construction algorithm, an eﬃcient BDPT connection strategy and a multiple importance sampling tech-

nique which reduces gradient sampling artifacts drastically.

7.1.1 Shift Mapping The shift mapping used in [Man+15] is identical to the one proposed by

[Man+14]. It is similar to the G-PT shift described earlier in this paper. The main diﬀerence is that

instead of requiring two consecutive diﬀuse vertices to connect to the base path this shift always connects

at the third diﬀuse vertex. The number of specular vertices between the second and the third diﬀuse

vertex is not considered. Hence the variance of the throughput of the resulting shift paths is potentially

higher, but the shift mapping retains properties that can be exploited to make gradient sampling drasti-

cally more eﬃcient in the context of BDPT. We are basically trading higher variance for more samples

which in this case proves to result in lower error at equal render time.

7.1.2 Connection Strategy The cost of Bidirectional Path Tracing over traditional Path Tracing is

not negligible. Hence a practical shift mapping most be eﬃcient. Naively shifting every path generated

by the many connection strategies would be prohibitively expensive.

[Man+15] propose to drop connection strategies where the connecting path vertices are classiﬁed as

specular. They reason that these strategies don’t contribute much energy anyway since the BRDF at the

connecting vertices is likely to have small values when evaluated into essentially random directions.

Ignoring these connection strategies allows us to shift the light and eye paths once and reuse large

parts of the original shifts (including their Jacobian values).

Figure 17: A scene which is diﬃcult to render with standard path tracing,

because the light sources are small and covered. Even after rendering 20 000

SPP it did not converge to a clear image.

7.2 Temporal Gradient-Domain Path Tracing

Temporal methods have become popular in computer graphics in recent years. Most of them focus on

exploiting temporal coherence through ﬁltering rendered images in a post processing step. Especially

”Temporal Anti Aliasing” is widely used in computer games and there’s been a lot of research on how

to deal with the artifacts introduced by temporal ﬁltering. There have been attempts to apply temporal

ﬁltering to path tracing, but the common post processing based methods have trouble dealing with very

noisy images and view dependent lighting eﬀects [Sey16].

[Man+16] introduce Temporal Gradient Domain Path Tracing (T-GPT), a technique which extends

gradient domain path tracing into the spatial domain. Instead of just computing gradients between pixels

in space, they also compute diﬀerences over time. Additionally, instead of applying a post processing ﬁlter

they take advantage of temporal coherence at a sample level. Similarly to how G-PT computes oﬀset paths

(¯x) between pixels in space, T-GPT adds oﬀset paths between pixels in time as well. Additionally,

second order diﬀerentials (diﬀerences in time and space) are computed. The resulting images I

, I

dxdt

and I

dydt

are combined in a 3D Poisson reconstruction step.

7.2.1 Shift Mapping The shift mapping for spatial derivatives is the same as the one used for G-PT.

For time derivatives two mappings are proposed. The ﬁrst one is straight forward. The primary sample

space (pixel) position of a path is kept constant between frames and paths are reconnected as usual. This

can result in large path space diﬀerences if objects or the camera move between frames since the chance

of an object moving away from under a ray is quite large. The second proposed shift mapping takes

motion vectors, that is object and camera movement projected into primary sample space, into account.

Using the motion vectors pixels positions are reprojected between frames. Instead of naively using the

same primary sample space position for the temporal shift the reprojected one is used.

In practice the shifts with a temporal mapping are not computed explicitly. Instead we render two

consecutive frames independently from each other. We ensure coherence between frames by starting

oﬀ the random number generator with the same seed both times. In multi-threaded or multi-machine

environments additional care has to be taken to ensure that rendering process is deterministic.

7.2.2 Spatio-temporal Reconstruction To ensure temporal coherence in the output image sequence

(resulting in more pleasing, ﬂicker-less animations) we need to solve for image sequences instead of single

images. This is achieved by solving the following minimization problem. Note that all the symbols in the

equation denote image sequences.

I = argmin

(||α(

I − I

)||

+ ||







dxdt

dytdt







−







dxdt

dydt







) (25)

Where H

∗

denotes the ﬁnite diﬀerence operator and I

is the sequence of primal images. This

is basically identical to the reconstruction in Equation (11) except for the fact that we reconstruct a

sequence of images. To achieve arbitrary length reconstruction [Man+16] use a sliding window of 20

frames.

8 Discussion

In this report we presented the state of the art in gradient domain rendering. We have provided an

analysis that shows, under certain simpliﬁcations, how the process of sampling gradients followed by

Poisson reconstruction suppresses high frequency noise and reduces the overall error. The papers which

describe the individual techniques already have detailed analysis comparing their work to the non-gradient

domain counterparts. Also, a direct comparison between gradient domain methods is not very insightful.

For example, it is obvious that gradient domain path tracing is going to perform worse than gradient

domain bidirectional path tracing in scenes with diﬃcult light transport. Gradient domain rendering does

not change the characteristics of the underlying technique, but provides a constant error reduction factor

(see Equation (20)). Even with a very bad shift mapping, which always produces non-symmetric paths,

we will not do worse than the base algorithm at equal sample counts as long as we handle non-symmetric

shifts correctly. Something that is worth investigating is the run time overhead of gradient sampling,

which is dominated by the time taken to construct oﬀset paths. This is why equal time comparisons tend

to be more meaningful than equal sample count comparisons. Additionally, it would be interesting to get

statistics about how scene geometry inﬂuences the eﬀectiveness of shift mappings. Getting insights into

where the problem areas lie is helpful in the construction of new shift mappings.

8.1 Equal Time Comparisons

We want to compare the eﬀectiveness of gradient sampling in various scenarios. For this we measure

the relative mean squared error (relMSE) of the generated images after letting them render for ﬁxed

amount of time. The relMSE of a sampled image I is deﬁned as 

rel

= ||I

ref

− I||

/||I

ref

where

ref

is a converged reference image, usually obtained via path tracing a large number of samples. For

each technique we render two images, one using the classical version of the techniques and one using

gradient sampling. We measure the relative improvement by dividing their relMSEs. We also compute

the predicted improvement by computing the factor given by Equation (20) for the reference image.

Figure 18 shows the result of this investigation. We use three scenes with diﬀerent light transport

characteristics. The scene a) has little power in the gradients but highly specular materials. We predict

an improvement do to gradient sampling by 28x over classical sampling, but neither G-PT nor G-BDPT

achieve factors close to this. Scene b) is the classic ”Sponza” scene. It contains only diﬀuse materials

and is illuminated by a large light, ideal conditions for both path tracing as well as the shift mapping

employed for G-PT. In fact, we see that G-PT almost achieves the ideal improvement over PT. On the

other hand, G-BDPT only reduces relMSE by 8 times, half of the theoretically predicted value. At last,

the scene c) presents a diﬃcult light transport scenario for PT. This is the only scene were G-BDPT

outperforms PT in terms of relative error reduction.

8.2 Analysing Shift Mappings

To evaluate the performance of shift mappings we compute the percentage of failed paths per pixel. For

this we modify the implementation by [Ket+15] and add an additional output. In Figure 19 we display

the ﬁnal image on the left and a heat map of failed samples on the right. Generally, we can see that shift

failures concentrate on edges, though the specularity of the interacting materials also seems to have an

inﬂuence.

To further investigate this we study a scene with materials of varying reﬂectivity and otherwise fairly

simple light transport in Figure 20. Of interest here is that the shift does not fail frequently when hitting

PT BDPT MLT

|F |

0.3526 Classical 0.0090 0.0123 -

|G|

0.0122 Gradient 0.0012 0.0021 -

Predicted 28.86 Improvement 7.6683 5.8167 -

PT BDPT MLT

|F |

0.2814 Classical 0.0027 0.0068 -

|G|

0.0184 Gradient 0.0002 0.0008 -

Predicted 15.27 Improvement 14.063 8.6862 -

PT BDPT MLT

|F |

0.3003 Classical 0.0092 0.0086 -

|G|

0.0175 Gradient 0.0043 0.0016 -

Predicted 17.17 Improvement 2.1458 5.4328 -

Figure 18

Reference Image Failed Shift Mappings

Figure 19

perfectly specular materials. It does have an issue with materials that are glossy but classiﬁed as diﬀuse

(the second bar from the top).

Figure 21 serves as an example of a scene with diﬃcult light transport. Again, we can see that most

shifts fail on edges of objects. The complex light transport does not seem to reduce the eﬀectiveness of

the shift mapping much.

Concluding we want to note that, since many of the failing shifts lie on edges, edges will have higher

variance in the gradients, because we discard failing shifts and revert back to the naive gradient sampling

illustrated in Figure 4. This is unfortunate since these edges also tend to be the parts of the image where

the gradients are high. Hence the eﬀectiveness of gradient sampling is reduced and we conjecture that a

shift mapping which can deal with shifts across object edges would improve results signiﬁcantly. Even

just sampling a gradient multiple times in case a shift fails might reduce overall error eﬀectively. This

would be a variant of adaptive sampling.

Reference Image Failed Shift Mappings

Figure 20

Reference Image Failed Shift Mappings

Figure 21

9 Acknowledgements

The scene ﬁles in this report have been mostly provided by Benedikt Bitterli ([Bit16]). ”staircase” is

made by the Blend Swap user ”Wig42”. ”three spheres” and ”glass egg” are modeled after scenes from

Eric Veach. The photograph in Figure 12 is in the public domain. The renders used in Figure 18 are

supplemental images provided as part of [Man+15]. The Matlab code used for analysis is ours and

available on request.

References

[Bha+08] Pravin Bhat et al. “Fourier Analysis of the 2D Screened Poisson Equation for Gradient Do-

main Problems”. In: Proceedings of the 10th European Conference on Computer Vision: Part

II. ECCV ’08. Marseille, France: Springer-Verlag, 2008, pp. 114–128. isbn: 978-3-540-88685-3.

doi: 10.1007/978-3-540- 88688- 4_9. url: http://dx.doi.org/10.1007/978- 3-540-

88688-4_9.

[Bit16] Benedikt Bitterli. Rendering resources. https://benedikt-bitterli.me/resources/. 2016.

[Bra00] R.N. Bracewell. The Fourier Transform and Its Applications. Electrical engineering series.

McGraw Hill, 2000. isbn: 9780073039381. url: https://books . google.de/ books ?id=

ZNQQAQAAIAAJ.

[JM12] Wenzel Jakob and Steve Marschner. “Manifold Exploration: A Markov Chain Monte Carlo

Technique for Rendering Scenes with Diﬃcult Specular Transport”. In: ACM Trans. Graph.

31.4 (July 2012), 58:1–58:13. issn: 0730-0301. doi: 10.1145/2185520.2185554. url: http:

//doi.acm.org/10.1145/2185520.2185554.

[Jar+12] Wojciech Jarosz et al. “Theory, Analysis and Applications of 2D Global Illumination”. In:

ACM Transactions on Graphics (Presented at SIGGRAPH) 31.5 (Sept. 2012), 125:1–125:21.

doi: 10.1145/2231816.2231823.

[Kaj86] James T. Kajiya. “The Rendering Equation”. In: SIGGRAPH Comput. Graph. 20.4 (Aug.

1986), pp. 143–150. issn: 0097-8930. doi: 10.1145/15886.15902. url: http://doi.acm.

org/10.1145/15886.15902.

[Ket+15] Markus Kettunen et al.“Gradient-Domain Path Tracing”. In: ACM Trans. Graph. 34.4 (2015).

[Leh+13] Jaakko Lehtinen et al. “Gradient-domain Metropolis Light Transport”. In: ACM Trans.

Graph. 32.4 (July 2013), 95:1–95:12. issn: 0730-0301. doi: 10 . 1145 / 2461912 . 2461943.

url: http://doi.acm.org/10.1145/2461912.2461943.

[Man+15] Marco Manzi et al. “Gradient-Domain Bidirectional Path Tracing”. In: Eurographics Sympo-

sium on Rendering - Experimental Ideas and Implementations. Ed. by Jaakko Lehtinen and

Derek Nowrouzezahrai. The Eurographics Association, 2015. doi: 10.2312/sre.20151168.

[Man+14] Marco Manzi et al. “Improved Sampling for Gradient-domain Metropolis Light Transport”.

In: ACM Trans. Graph. 33.6 (Nov. 2014), 178:1–178:12. issn: 0730-0301. doi: 10 . 1145 /

2661229.2661291. url: http://doi.acm.org/10.1145/2661229.2661291.

[Man+16] Marco Manzi et al. “Temporal Gradient-domain Path Tracing”. In: ACM Trans. Graph. 35.6

(Nov. 2016), 246:1–246:9. issn: 0730-0301. doi: 10 .1145/2980179.2980256. url: http:

//doi.acm.org/10.1145/2980179.2980256.

[MC12] S.L. Miller and D.G. Childers. Probability and Random Processes: With Applications to Signal

Processing and Communications. Academic Press, 2012. isbn: 9780123869814. url: https:

//books.google.de/books?id=1rTcR3nIkJcC.

[Rud94] Daniel L Ruderman. “The statistics of natural images”. In: Network: computation in neural

systems 5.4 (1994), pp. 517–548.

[Sey16] Dario Seyb. “Eﬃcient Ray Tracing Techniques”. Pro Seminar Report ”Selected Topics in

Computer Graphics” at RWTH Aachen. 2016.

[Sey17] Dario Seyb. Zaphod Raytracer Code. 2017. url: http://github.com/daseyb/zaphod (visited

on 01/20/2017).

[SSJ16] Kartic Subr, Gurprit Singh, and Wojciech Jarosz. “Fourier Analysis of Numerical Integration

in Monte Carlo Rendering: Theory and Practice: Understanding Estimation Error in Monte

Carlo Image Synthesis”. In: ACM SIGGRAPH 2016 Courses. SIGGRAPH ’16. Anaheim, Cal-

ifornia: ACM, 2016, 10:1–10:40. isbn: 978-1-4503-4289-6. doi: 10.1145/2897826.2927356.

url: http://doi.acm.org/10.1145/2897826.2927356.

[Vea98] Eric Veach. “Robust Monte Carlo Methods for Light Transport Simulation”. AAI9837162.

PhD thesis. Stanford, CA, USA, 1998. isbn: 0-591-90780-1.