# Essence of the sea_cucumber

Experiments with CNN features and circuits.
In short, why does this (maximized node `4`

in the `block5_conv4`

layer of `VGG19`

)

look like `sea_cucumber`

to all ImageNet-trained CNNs?

## Context

Using my feature extraction script I analyzed
node `4`

in the `block5_conv4`

layer of `VGG19`

:

```
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from PIL import Image
```

```
base_model = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
target_layer="block5_conv4"
target_index=4
steps=100
step_size=0.1
# Take the network and cut it off at the layer we want to analyze,
# i.e. we only need the part from the input to the target_layer.
target = [base_model.get_layer(target_layer).output]
part_model = tf.keras.Model(inputs=base_model.input, outputs=target)
```

```
# The next part is the function to maximize the target layer/node by
# adjusting the input, equivalent to the usual gradient descent but
# gradient ascent. Run an optimization loop:
activation = None
@tf.function(
# Decorator to increase the speed of the gradient_ascent function
input_signature=(
tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
tf.TensorSpec(shape=[], dtype=tf.int32),
tf.TensorSpec(shape=[], dtype=tf.float32),)
)
def gradient_ascent(img, steps, step_size):
loss = tf.constant(0.0)
for n in tf.range(steps):
# As in normal NN training, you want to record the computation
# of the forward-pass (the part_model call below) to compute the
# gradient afterwards. This is what tf.GradientTape does.
with tf.GradientTape() as tape:
tape.watch(img)
# Forward-pass (compute the activation given our image)
activation = part_model(tf.expand_dims(img, axis=0))
print(activation)
print(np.shape(activation))
# The activation will be of shape (1,N,N,L) where N is related to
# the resolution of the input image (assuming our target layer is
# a convolutional filter), and L is the size of the layer. E.g. for a
# 256x256 image in "block4_conv1" of VGG19, this will be
# (1,32,32,512) -- we select one of the 512 nodes (index) and
# average over the rest (you can average selectively to affect
# only part of the image but there's not really a point):
loss = tf.math.reduce_mean(activation[:,:,:,target_index])
# Get the gradient, i.e. derivative of "loss" with respect to input
# and normalize.
gradients = tape.gradient(loss, img)
gradients /= tf.math.reduce_std(gradients)
# In the final step move the image in the direction of the gradient to
# increate the "loss" (our targeted activation). Note that the sign here
# is opposite to the typical gradient descent (our "loss" is the target
# activation which we maximize, not something we minimize).
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
return loss, img
```

```
# Preprocessing of the image (converts from [0..255] to [-1..1]
starting_img = np.random.randint(low=0,high=255,size=(224,224,3), dtype=np.uint8)
img = tf.keras.applications.vgg19.preprocess_input(starting_img)
img = tf.convert_to_tensor(img)
# Run the gradient ascent loop
loss, img = gradient_ascent(img, tf.constant(steps), tf.constant(step_size))
# Convert back to [0..255] and return the new image
img = tf.cast(255*(img + 1.0)/2.0, tf.uint8)
plt.imshow(np.array(img))
im = Image.fromarray(np.array(img))
im.save("node4.png")
```

## The confusing part

Judging my the OpenAI Microscope it looks like the node mostly gets activated by furry animals -- *in the training set*. Of course our image in artificial and this far outside the usual distribution, and we can expect such different behaviour. But why do we get the `sea_cucumber`

prediction, rather than predictions of `dog`

, `bison`

or `lion`

?

Feeding this image into the network, it seems insanely sure that the right label is `sea_cucumber`

. Also other imagenet-trained networks such as Inception or VGG16 give the same result. Note: This was not indended and not optimized for.

```
model_vgg19 = tf.keras.applications.VGG19(weights='imagenet', include_top=True)
x = tf.keras.applications.vgg19.preprocess_input(np.expand_dims(img, axis=0))
predictions = model_vgg19.predict(x)
print('Predicted:', tf.keras.applications.vgg19.decode_predictions(predictions, top=3)[0])
```

```
Predicted: [('n02321529', 'sea_cucumber', 1.0), ('n01924916', 'flatworm', 1.2730256e-33), ('n01981276', 'king_crab', 2.537045e-37)]
```

```
model_vgg16 = tf.keras.applications.VGG16(weights='imagenet', include_top=True)
x = tf.keras.applications.vgg16.preprocess_input(np.expand_dims(img, axis=0))
predictions = model_vgg16.predict(x)
print('Predicted:', tf.keras.applications.vgg16.decode_predictions(predictions, top=3)[0])
```

```
Predicted: [('n02321529', 'sea_cucumber', 1.0), ('n01950731', 'sea_slug', 4.6657154e-15), ('n01924916', 'flatworm', 1.810621e-15)]
```

```
model_resnet = tf.keras.applications.ResNet50(weights='imagenet', include_top=True)
x = tf.keras.applications.resnet.preprocess_input(np.expand_dims(img, axis=0))
predictions = model_resnet.predict(x)
print('Predicted:', tf.keras.applications.resnet.decode_predictions(predictions, top=3)[0])
```

```
Predicted: [('n02321529', 'sea_cucumber', 0.9790509), ('n12144580', 'corn', 0.00899157), ('n13133613', 'ear', 0.005869923)]
```

Even this online service (snaplogic using Inception) mistakes a picture of my phone screen showing the image:

## Investigation

Let's look at the activations, after feeding the image into the VGG19 network I have been using:

```
target = [model_vgg19.get_layer("block5_conv4").output]
model_vgg19_cutoff = tf.keras.Model(inputs=model_vgg19.input, outputs=target)
x = tf.keras.applications.vgg19.preprocess_input(np.expand_dims(img, axis=0))
activations = model_vgg19_cutoff.predict(x)
plt.plot(np.mean(np.mean(np.mean(activations, axis=0), axis=0), axis=0))
```

So the question we're asking, is this the typical pattern for a dog or bison? Or maybe closer to the `sea_cucumber`

pattern, in this 512-dimensional space?

Let's have a look at the `groenendael`

(1st image in Microscope) and `sea_cucumber`

classes, as well as a few randomly selected ones. I downloaded the imagenet data and used this list to find the right files.
Hmm I don't really see a pattern by eye here, nor a similarity to above / excitation in index 4. In hindsight this makes sense, we wouldn't expect the category to be simply 1-hot encoded in activation space, because a) there is not enough room, and b) there are more layers following so I would rather think of some clusters in the high dimensional activation space. Let's maybe look some summary statistic, like the absolute distance in this 512-dim vector space.

So I take the training images, feed them into the network and read of the activations of the 512 nodes in the layer we are looking at (averaged over the 14x14 locations). Then I compute the distance as absolute distance between the vectors, 512-dimenisonal L2 norm.
The image below shows the distance between the optimized "sea_cucumber essence" image and the activations of `sea_cucumber`

training data (green), `groenendael`

(blue), and a mix of 10 random classes (100 random images each). The blue curve shows the average activation-distance between randomly selected images of different classes. The code for all the following plots can be found in `code_distances.py`

.

For context, here is the average distance between randomly selected images (grey), images from the same class (red) and images from different classes (blue): We learn three main things here:

- Generally images of the same class seem to be nearer to each other in this 512-dim space than random / different classes, but the effect is not very strong. Of course we wouldn't expect that the distance is the best measure of "closeness" between activations.
- These numbers are all waaaay smaller than the ~7k and 36k we get from the "sea_cucumber essence" image. This tells us (somewhat unsurprisingly) that that optimized image is far outside the training distribution in at least this measure.
- The
`sea_cucumber`

training data seems to give activations*slightly*closer to the "sea_cucumber essence" image -- so maybe it's just far outside the distribution but into the`sea_cucumber`

direction?

Naturally the L2-distance isn't the ideal way to reduce the 512-d space into something plot-able. One method I found is t-SNE which projects the 512-dimensions into two parameters which we can plot:
Looks like we get a nice separation (t-SNE does not know the labels) of different categories, and the "sea_cucumber essence" activations tend to lie within the `sea_cucumber`

training data!

This doesn't definitely answer the question, but I think it's clear that this node4-maximized image ends up in a corner of parameter space which, even though it is "far away" (L2 distance), lies in a region that is clearly near the region that `sea_cucumber`

training images lie in. Presented with this out-of-distribution image, and tasked with choosing between only the existing categories, the network decides for `sea_cucumber`

.

# Feature vizualization

[*My project for AGI Safety Fundamentals programme (~Oct 2021). Code & pictures below. Cross-posted on LessWrong.*]

To me, reading about Feature Visualization felt like one of the most revealing insights about CNNs in the last years. Seeing the idea "this node finds eyes, this node finds mouths, the combination detects faces" (oversimplified) actually implemented by the CNN was a pleasant surprise, as in, it suggests we might actually understand how NNs work. There's more reading on the programme website here, I can highly recommend the articles by Chris Olah's group on distill.

Seeing this I think many of us immediately want to try this, and play around with it. There is of course the OpenAI Microscope to look at results, and the Lucid library, but I wanted to actually reproduce the idea myself without relying on a somewhat black box (big library / OpenAI Microscope).

Almost all tutorials I found however used Lucid, and this really cool write-up "How to visualize convolutional features in 40 lines of code" unfortunately starts with `from fastai.conv_learner import *`

. In retrospective I think I could understand this now, but I didn't, and finding out which parts were fastai functions and what they do was rather tricky. I also didn't manage to install the required (older) version of fastai.

So I decided to have a go myself, and, luckily, I found that "DeepDream" is based a very similar idea and I could adopt most code from this notebook from Google AI. This isn't actually too complicated, especially broken down to the bare minimum:

- A trained network whose features we want to visualize
- A loop to maximize the activation of a targeted node
- A few lines to make and show an image.

The whole code runs in about a minute on my laptop (no GPU).

1) The first part is easy, we get the pre-trained network from tensorflow.

```
import tensorflow as tf
base_model = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
```

2) The next part in the code (it's mostly comments really), see the comments marked with `#`

for explanations:

```
def maximize_activation(starting_img,\
target_layer="mixed0", target_index=0,\
steps=10, step_size=0.1):
# Take the network and cut it off at the layer we want to analyze,
# i.e. we only need the part from the input to the target_layer.
target = [base_model.get_layer(target_layer).output]
part_model = tf.keras.Model(inputs=base_model.input, outputs=target)
# The next part is the function to maximize the target layer/node by
# adjusting the input, equivalent to the usual gradient descent but
# gradient ascent. Run an optimization loop:
def gradient_ascent(img, steps, step_size):
loss = tf.constant(0.0)
for n in tf.range(steps):
# As in normal NN training, you want to record the computation
# of the forward-pass (the part_model call below) to compute the
# gradient afterwards. This is what tf.GradientTape does.
with tf.GradientTape() as tape:
tape.watch(img)
# Forward-pass (compute the activation given our image)
activation = part_model(tf.expand_dims(img, axis=0))
# The activation will be of shape (1,N,N,L) where N is related to
# the resolution of the input image (assuming our target layer is
# a convolutional filter), and L is the size of the layer. E.g. for a
# 256x256 image in "block4_conv1" of VGG19, this will be
# (1,32,32,512) -- we select one of the 512 nodes (index) and
# average over the rest (you can average selectively to affect
# only part of the image but there's not really a point):
loss = tf.math.reduce_mean(activation[:,:,:,target_index])
# Get the gradient, i.e. derivative of "loss" with respect to input
# and normalize.
gradients = tape.gradient(loss, img)
gradients /= tf.math.reduce_std(gradients)
# In the final step move the image in the direction of the gradient to
# increate the "loss" (our targeted activation). Note that the sign here
# is opposite to the typical gradient descent (our "loss" is the target
# activation which we maximize, not something we minimize).
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
return loss, img
# Preprocessing of the image (converts from [0..255] to [-1..1]
img = tf.keras.applications.inception_v3.preprocess_input(starting_img)
img = tf.convert_to_tensor(img)
# Run the gradient ascent loop
loss, img = gradient_ascent(img, tf.constant(steps), tf.constant(step_size))
# Convert back to [0..255] and return the new image
img = tf.cast(255*(img + 1.0)/2.0, tf.uint8)
return img
```

3) Finally apply this procedure to a random image:

```
import numpy as np
import matplotlib.pyplot as plt
starting_img = np.random.randint(low=0,high=255,size=(300,300,3), dtype=np.uint8)
optimized_img = maximize_activation(starting_img, target_layer="block4_conv1", target_index=47, steps=10, step_size=0.1)
plt.imshow(np.array(optimized_img))
```

And here we go!

Looks like features. Now let's try to reproduce one of the OpenAI microscope images, node 4 of layer block4_conv1 -- here is my version:

And the OpenAI Microscope image:

Not identical, but clearly the same feature in both visualizations!

Finally here is a run with InceptionV3, just for the pretty pictures, this time starting with a non-random (black) image. And an animation of the image after every iteration.

Note: There's an *optional* bit to improve the speed (by about a factor of 2 on my laptop), just add this decorator in front of the `gradient_ascent`

function:

```
@tf.function(
# Decorator to increase the speed of the gradient_ascent function
input_signature=(
tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
tf.TensorSpec(shape=[], dtype=tf.int32),
tf.TensorSpec(shape=[], dtype=tf.float32),)
)
def gradient_ascent(img, steps, step_size): ...
```

This is basically how far I got in the time, the code can be found on my GitHub (link). But I do plan to look at some more interpretability techniques (maybe something for transformers or RL?) or more general AGI Safety ideas in the future!

Feel free to post a comment or send me a message if you have any questions or anything really, happy to chat about these things!

*I just want to thank the organizers of the AGI Safety Fundamentals programme again, for setting up the programme and all their support. I can highly recommend the programme, as well as the well-curated curriculum here if you just want to read through it yourself.*

# Planck cmb likelihoods

The Planck spacecraft is a satellite built to measure the Cosmic Microwave Background, the microwave radiation reaching us from the edge of the observable universe. Its goal is to detect tiny fluctuations (on the scale of 1 part in 100,000) of this radiation from different directions.

We use these observations to compare out theoretical models to reality. But because those fluctuations are based on randomness (quantum fluctuations in the early universe) we cannot predict the actual pattern on the sky. What we can predict, and test, are certain statistical properties. Most commonly we use the *multipoles* (usually referred to as the power spectrum) -- what are those? They describe how correlated or different the radiation in directions looks, e.g. *dipole* relating to the correlation of opposite directions. Another way to think of this is the correlation of two directions at a certain (angular) distance. The dipole describes correlation of points 180 degrees apart, quadrupole 90 degrees etc. The 180th moment describes points separated by an angle of just 1 degree and this turns out to be the "strongest" correlation. In simple terms, we could say the circumstances in the early universe allowed matter and energy to travel such that spots about 40 million light years apart were very correlated.

For my work I frequently compute this power spectrum and compare it to the observations of Planck. For most of my work I used the code cobaya which automatically performs the calculation internally and returns the likelihood of a set of parameters being compatible with the Planck observations. One day I ran a model which was not included in cobaya, so I had to compute the likelihood myself and found surprisingly little documentation on the likelihood.

I did some ~~reverse-engineering~~reading of the cosmoslik wrapper to find out how it works, and want to write about the results here. But before I continue, if you just want to get the likelihood of a given power spectrum use cosmoslik's `cosmoslik.likelihoods`

module! Simply pass a dictionary of power spectra and you get the corresponding loglikelihood value.

Just out of curiosity though, I wanted to figure out how exactly I can use the Planck likelihoods without any external programs, also to make sure that the wrapper was still functioning. cosmoslik's code turned out to be very helpful and largely what this is based on. So here's my writeup of how to use the clik python code (this is still a wrapper for the C & Fortran code but since it comes together with the likelihood code from the Planck team I won't dig deeper here). Firstly, assuming you have installed the likelihood and sourced the `clik_profile`

script, you can import the clik python module and load a clik likelihood file:

```
from clik import clik
lowT = clik("/path/to/baseline/plc_3.0/low_l/commander/commander_dx12_v3_2_29.clik")
```

Now the function `lowT`

can be called with a list as argument and returns the likelihood. To figure out which values to put in the list, we use the function `lowT.get_lmax()`

. The former returns `(29, -1, -1, -1, -1, -1)`

indicating it requires the TT spectra from l=0 to 29. The list maps to `(TT, EE, BB, TE, ?, ?)`

where the latter two should be `TB`

and `EB`

but I'm not sure about the order.
Next use `lowT.get_extra_parameter_names()`

to get the nuisance parameters that need to be appended to the list, in this case `('A_planck',)`

.

First let us get the power spectra, e.g. from CLASS

```
from classy import Class
cosmo = Class()
cosmo.set({'output': 'tCl pCl lCl', 'lensing': 'yes'})
cosmo.compute()
Cell = cosmo.lensed_cl()
```

Note that CLASS returns the power spectra in relative units while the likelihoods require absolute units (see Julien Lesgourgues's comment here for some context), so we convert to uKĀ² by multiplying by the mean CMB temperature squared.

```
Tcmb = 2.7255*1e6 #uK
Cl_TT = Cell['tt']*Tcmb**2
```

Also note that we just use the power spectrum C_ell, not the modified version D_ell. The latter is used as input for `cosmoslik`

.

Finally we can call the likelihood with a list of spectra and nuisance parameters:

```
A_planck = 1.000442
lowT([*Cl_TT[:30], A_planck])
```

For something close to bestfit LCDM the result should be around -12.

For a more complex example, let's look at the high-l TTTEEE likelihood. `highTTTEEE.lmax`

shows us we need (2508, 2508, -1, 2508, -1, -1), i.e. the TT, EE and TE spectra, and in that order. `highTTTEEE.extra_parameter_names`

lists a whopping 47 nuisance parameters which we have to pass in the right order as well.
Tipp: You can get those in the right order from a dictionary wit something like `nuisance_TTTEEE = [planck_nuisance_array[p] for p in highTTTEEE.extra_parameter_names]`

.
Finally `highTTTEEE(Cl_TT[:2509]+Cl_EE[:2509]+Cl_TE[:2509]+nuisance_TTTEEE)`

should return something like -1173, again for LCDM.