Introduction to VIDA

Paul Tiede
today

Installation

VIDA is written in the Julia programming language for a few reasons:

  1. It is fast! Julia can achieve C and FORTRAN speeds while not having to deal with memory.

  2. It's syntax is very simple. Unlike Themis which is written in C++ the development time is orders of magnitude lower in Julia.

So to install VIDA you first have to install the Julia binary on your system. The best bet is to go to the Julia site and install the official binary

To install the VIDA package you can run the following

using Pkg;Pkg.add("VIDA")

Idea behind VIDA

VIDA is based on the idea of interpreting the image as a probability distribution. Namely since any image is integrable, the space of images is in one-to-one correspondence with a probability distribution, especially since the total flux of the image is already known a priori.

Therefore, our idea is very close to variational inference, hence the name Variational Image Domain Analysis. Namely, where we view the image as a distribution and we aim to find a approximation of the distribution given some parameteric family $f_\theta(x,y)$, which for our purproses we will typically call a filter.

The choice of filter, depends on the problem of interest, namely what features we are interested in. Typically for the EHT where our images tend to be rings, we are interested in

  • Radius r₀

  • Width or half width σ

  • Structural asymmetry τ

  • Flux asymmetry s

  • Position angle ξ

VIDA then defines a series of filters parameterize these features.

Filters

Currently we have 5 filters defined, although they all belong to the same family. For an example on how to see the process for defining your own filter please see the readme.

The filters implemented are:

  • GaussianRing which is a symmetric and circular Gaussian ring.

  • SlashedGaussianRing which is a circular Gaussian ring with a flux gradient across its emission.

  • EllipticalGaussianRing symmetric Gaussian elliptical ring, where the emission is constant across the ring, unlike with the SlashedGaussianRing.

  • GeneralGaussianRing A combination of the two above where the ring is allowed to be elliptical and have a intensity gradient.

  • TIDAGaussianRing The GeneralGaussianRing, but where the asymmetry and flux orienation are fixed relative to one another.

  • AsymGaussian A asymmetric Gaussian blob. This can be useful if you image has a strong non-ring component in it.

  • Constant Adds a constant flux floor to the image. This is very helpful for image reconstructions that tend to add small scale flux everywhere in the image.

  • CosineRing{N,M} A ring filter where the azimuthal brightness and thickness is expressed using a cosine expansion.

Divergences

In order to extract features we first need a cost function that penalized our parameterized distributions $f_\theta(x,y)$. Since we are considering the image as a probability distribution, one cost function would be the distance or divergence between two distributions. A probability divergence is just a functional that takes in two probability distributions p,q and is minimized iff $p\equiv q$.

Currently we have two divergences implemented in VIDA

  • Bhattacharyya divergence (Bh)

\[ Bh(f_\theta|I) = \int \sqrt{f_\theta(x,y)I(x,y)} dxdy. \]

  • KL divergence

\[ KL(f_\theta|I) = \int f_\theta(x,y)\log\left(\frac{f_\theta(x,y)}{I(x,y)}\right)dxdy. \]

Both divergences give very similar answers, although we found the BH to be easier to maximize.

Using VIDA

Using VIDA is based on constructing three items:

  1. Data, i.e. an image that you want to extract features from.

  2. Cost function, i.e. pick if you want to use the KL or BH divergence

  3. Filter, i.e. construct the family of distributions or filters that you will use to approximate the image.

Then all you need to do is minimize the divergence and you will have extracted you image features.

Now lets runs through how that works

Getting started

To load VIDA we follow the typical Julia flow. Note that to include plotting functionality you need to include Plots as well

using Plots
using VIDA
using InteractiveUtils

Step 1 Read in Data

VIDA currently only works with fits images. THe fits header is based off of what eht-imaging outputs. So as long as you stick to that standard you should be fine.

To read in an image we just use the load_fits function which should work with any fits image from ehtim and clean

#Load the image
img = load_fits("data/elliptical_gaussian_rot-45.00m87Scale_seed_23_simobs_netcal_scanavg-z0.6-s100-t0-v0-l0-p50-e0.000.fits")

#To see what this img is lets print the type
println(typeof(img))

#To plot the image we can just call plot. This uses recipes and the Plots.jl framework
plot(img)
VIDA.EHTImage{Array{Float64,2}}

So from the output we see that img is a EHTImage type. The information in the curly brackets defines the parametric type information. What this means is that the image that is constrained in the EHTImage type is a Matrix whose elements are Float64's.

Julia isn't a traditional OOP language. Namely, methods/functions are first class and aren't members of a class. Instead how a function behaves is dependent on the type of argument inputs. This is known as something called multimethods or multiple dispatch where at run-time the type of functions called is determined by the arguments.

In some sense OOP is just a multiple dispatch type language where the our type of dispatch only depends on the class, i.e. the self argument in Python classes.

Now because of the lack of classes sometimes it can be difficult to figure out which functions will act of our datatypes, e.g. the EHTImage type. Fortunately, Julia has some convenience functions that let you see which functions/methods can act of the EHTImage type

#To see what functions can act on an EHTImage object just call
methodswith(EHTImage)
8-element Array{Method,1}:

From this list we see there are several methods that can act on EHTImage types. To see what a certain function does you can type ?inertia in the terminal to see the help for the inertia method.

Creating a divergence

In order to find the optimal filter you need to first decide on your objective or cost function. In VIDA we use probaility divergences to measure differences between the filter and image. A divergence is defined as an abstract type AbstractDivergence. In VIDA a divergence is a functor. A functor is a type that has an anonymous function attached to it. That means it is both a type and a function. For instance we create a divergence by

bh = Bhattacharyya(img);
 kl = KullbackLeibler(img);

Now to evaluate the divergence we need to pass it a filter. This can be any filter your choose. The great thing about julia is that bh will use multiple dispatch to figure out which filter is being passed to the divergence.

For instance lets create a few different filters

gr = GaussianRing(r0=20.0, σ=5.0, x0=0.0, y0=0.0)
ggr = GeneralGaussianRing(r0=20.0, 
                          σ = 5.0,
                          τ = 0.2,
                          ξτ = 0.78,
                          s = 0.5,
                          ξs = 0.78,
                          x0=0.0,
                          y0=0.0
                        )
# We can also plot both filters
a = plot(gr, title="GaussianRing")
b = plot(ggr, title="GeneralGaussianRing")
plot(a, b, layout=(1,2), size=(600,300))

VIDA has a number of filters defined. These are all subtypes of the AbstractFilter type. To see which filters are implemented you can use the subtype method:

subtypes(VIDA.AbstractFilter)
11-element Array{Any,1}:
 VIDA.AddFilter
 VIDA.AsymGaussian
 VIDA.Constant
 VIDA.CosineRing
 VIDA.Disk
 VIDA.EllipticalGaussianRing
 VIDA.GaussianRing
 VIDA.GeneralGaussianRing
 VIDA.MulFilter
 VIDA.SlashedGaussianRing
 VIDA.TIDAGaussianRing

Note that the AddFilter and MulFilter are internal filters that allow the user to easily combine two filters, for example:

add = gr + 1.0*ggr
VIDA.AddFilter{VIDA.GaussianRing,VIDA.MulFilter{VIDA.GeneralGaussianRing,Fl
oat64}}(VIDA.GaussianRing
  r0: Float64 20.0
  σ: Float64 5.0
  x0: Float64 0.0
  y0: Float64 0.0
, VIDA.MulFilter{VIDA.GeneralGaussianRing,Float64}
θ: VIDA.GeneralGaussianRing
  r0: Float64 20.0
  σ: Float64 5.0
  τ: Float64 0.2
  ξτ: Float64 0.78
  s: Float64 0.5
  ξs: Float64 0.78
  x0: Float64 0.0
  y0: Float64 0.0
Irel: Float64 1.0
)

To evaluate the divergence between our filter and image we then just evaluate the divergence on the filter

@show bh(gr);
@show bh(ggr);
@show bh(add);
bh(gr) = 0.2850517886474983
bh(ggr) = 0.2922585743257235
bh(add) = 0.28198849078985727

Now neither filter is really a great approximation to the true image. For instance visually they look quite different, which can be checked with the triptic function

a = triptic(img, gr)
b = triptic(img, ggr)
c = triptic(img, add)
plot(a,b,c, layout=(3,1), size=(800,800))

Extracting the Optimal Filter

To extract the optimal filter the first thing you need to do is define your ExtractProblem. This requires your divergence, initial filter, and bounds.

lower = GaussianRing(r0=0.1, σ=0.01, x0=-60.0, y0=-60.0);
upper = GaussianRing(r0=60.0, σ=20.0, x0=60.0, y0=60.0);
initial = GaussianRing(r0=20.0, σ=5.0, x0=0.0, y0=0.0);

prob = ExtractProblem(bh, initial, lower, upper);

Now to run the optimizers you just need to select which optimizer to use. Currently VIDA has three families of optimizers installed. Each one is a subtype of the VIDA.Optimizer abstract type

subtypes(VIDA.Optimizer)
3-element Array{Any,1}:
 VIDA.BBO
 VIDA.CMAES
 VIDA.Opt

Of the three implemented optimizers my suggestion would be to try the BBO one first. This uses the BlackBoxOptim.jl package. BBO has a number of options that can be changed. To see them please use the julia ? mode, or see the documentation.

To optimize all you need to do is run the extractor function.

optfilt, divmin = extractor(prob, BBO())
triptic(img, optfilt)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim
.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.A
daptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.Continuous
RectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 2756 evals, 2607 steps, improv/step: 0.264 (last = 0.2635), fitn
ess=0.170067861
1.00 secs, 5684 evals, 5537 steps, improv/step: 0.240 (last = 0.2191), fitn
ess=0.165721076
1.50 secs, 8752 evals, 8605 steps, improv/step: 0.259 (last = 0.2924), fitn
ess=0.165688939
2.00 secs, 11679 evals, 11532 steps, improv/step: 0.266 (last = 0.2880), fi
tness=0.165688930
2.50 secs, 14750 evals, 14604 steps, improv/step: 0.270 (last = 0.2852), fi
tness=0.165688930
3.00 secs, 17814 evals, 17669 steps, improv/step: 0.244 (last = 0.1181), fi
tness=0.165688930
3.50 secs, 20890 evals, 20746 steps, improv/step: 0.211 (last = 0.0227), fi
tness=0.165688930
4.00 secs, 23962 evals, 23818 steps, improv/step: 0.185 (last = 0.0104), fi
tness=0.165688930

Optimization stopped after 24857 steps and 4.17 seconds
Termination reason: Max number of function evaluations (25000) reached
Steps per second = 5960.51
Function evals per second = 5995.04
Improvements/step = Inf
Total function evaluations = 25001


Best candidate found: [18.7786, 4.23264, -5.85353, -2.86036]

Fitness: 0.165688930

Well that seemed to do a terrible job. The reason is that a lot of these images tend to have some low level flux throughout the image. To account for this the filter tends to get very big to absorb some of this flux. To combat this you can add a constant background filter to the problem.

lower = GaussianRing(r0=0.1, σ=0.01, x0=-60.0, y0=-60.0) + 1e-10*Constant();
upper = GaussianRing(r0=60.0, σ=20.0, x0=60.0, y0=60.0) + 1.0*Constant();
initial = GaussianRing(r0=20.0, σ=5.0, x0=0.0, y0=0.0) + 0.1*Constant();

prob = ExtractProblem(bh, initial, lower, upper);
optfilt, divmin = extractor(prob, BBO())
triptic(img, optfilt)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim
.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.A
daptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.Continuous
RectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 3177 evals, 3062 steps, improv/step: 0.256 (last = 0.2557), fitn
ess=0.061815832
1.00 secs, 6327 evals, 6212 steps, improv/step: 0.261 (last = 0.2670), fitn
ess=0.055471319
1.50 secs, 9484 evals, 9370 steps, improv/step: 0.272 (last = 0.2939), fitn
ess=0.055460468
2.00 secs, 12579 evals, 12465 steps, improv/step: 0.276 (last = 0.2856), fi
tness=0.055460461
2.50 secs, 15712 evals, 15600 steps, improv/step: 0.274 (last = 0.2667), fi
tness=0.055460461
3.00 secs, 18836 evals, 18724 steps, improv/step: 0.263 (last = 0.2093), fi
tness=0.055460461
3.50 secs, 21992 evals, 21880 steps, improv/step: 0.239 (last = 0.0979), fi
tness=0.055460461

Optimization stopped after 24891 steps and 3.98 seconds
Termination reason: Max number of function evaluations (25000) reached
Steps per second = 6253.66
Function evals per second = 6281.30
Improvements/step = Inf
Total function evaluations = 25001


Best candidate found: [18.7085, 3.50879, -5.88309, -2.76303, 0.0212716]

Fitness: 0.055460461

Well that's much better! Now if you wanted to capture the asymmetry in the ring you can use other filters, for example the CosineRing filter. Note that this filter tends to be a little harder to fit.

lower = CosineRing{1,4}(r0=0.1, 
                        σ=[0.1, -1.0], ξσ = [-π],
                        τ = 0.01, ξτ = -π,
                        s = [0.01, -1.0, -1.0, -1.0],
                        ξs = [-π,-π,-π,-π],
                        x0=-60.0, y0=-60.0
                       ) + 1e-10*Constant();
upper = CosineRing{1,4}(r0=40.0, 
                        σ=[20.0, 1.0], ξσ = [π],
                        τ = 0.999, ξτ = π,
                        s = [0.999, 1.0, 1.0, 1.0],
                        ξs = [π,π,π,π],
                        x0=60.0, y0=60.0
                       ) + 1.0*Constant();
initial = CosineRing{1,4}(r0=20.0, 
                        σ=[5.0, 0.1], ξσ = [0.0],
                        τ = 0.1, ξτ = 0.0,
                        s = [0.1, 0.0, 0.0, 0.0],
                        ξs = [0.0,0.0,0.0,0.0],
                        x0=0.0, y0=0.0
                       ) + 1e-2*Constant();

prob = ExtractProblem(bh, initial, lower, upper);
optfilt, divmin = extractor(prob, BBO(tracemode=:silent));
triptic(img, optfilt)

Well that looks pretty great!