Understanding how convolution of images work in cnn

i ‘d like to understand convolutional neural network. Consider the picture:

enter image description here I don’t understand why the result of applying convolution to the 32by32by3 input image is not of size 32 by 32 by 3 by 12? From what I understand, we there are 12 filters and we apply each filter to each of the 3 channels of the image to get 3 new images for each filter.

Suppose the result of the first convolution is really 32 by 32 by 12, I don’t understand how you get them.

Also, what happen when we do a second convolution? Say we do a second full convolution with 5 filters, then the result of the second convolution is 32 by 32 by 3 by 12 by 5?

Using FFT in the following convolution in a simulation

I have the following convolution as part of a numerical simulation.

$ $ T(r)=\int d^3r_2 p(r_2)f(r_2)\alpha(r-r_2)$ $

My problem is that the analytical expressions for $ f$ and $ p$ do exist but, I have the expression for $ \alpha$ only in the Fourier domain in the form of $ \alpha(k)$ . I planned to evaluate in the following way:

  1. Construct a grid using the mesh grid of $ 100\times100\times100$ using meshgrid and linspace in numpy
ran = linspace(-1,1,N_r) x,y,z = meshgrid(ran,ran,ran) #position space 
  1. Construct the components xf, yf, zf in the Fourier domain from x, y, z
xf = fftn(x) yf = fftn(y) zf = fftn(z) 
  1. Find the Fourier transform of $ f(r)\times p(r)$ using fftn in numpy
  2. Multiply it with $ \alpha(k)$
  3. Taking the inverse Fourier transform using ifftn in numpy.

I am not very sure that the above method works and I actually failed to verify it properly. I tried using scipy.ndimage.convolve to compare the results with the inverse Fourier transform of the product in the Fourier domain. Is it correct as to what I am doing with the code? And is there a way where I can verify that a method is working with a simpler example?

Trying to verify:

I have tried the following to test the theory. Seems like it does not work. I expect that the result res_1 and res_2 to be the same. I also used the function real to truncate the tiny imaginary part which results from the fftn and ifftn functions.

x = linspace(-1,1,10) xf = fftn(x)  def f(x):     return x**2+x**3*sin(x)  def g(k):     return k**2+k**3/(3-k**2)  g_k = g(xf) g_x = real(ifftn(g_k))  res_1 = img_con(g_x,f(x))  res_2 = real(ifftn(g(xf)*fftn(f(x))))  print(res_1) print(res_2) 

Am I doing something which is wrong?

Why is my image convolution funciton so slow?

I wasn’t sure if I should post this on the machine learning board or this one, but I chose this one since my problem has more to do with optimization. I am trying to build a YOLO model from scratch in python, but each convolution operation takes 10 seconds. Clearly I am doing something wrong, as YOLO is supposed to be super fast (able to produce results real-time). I don’t need the network to run real-time, but it will be a nightmare trying to train it if it takes several hours to run on one image. Please help me!

Here is my convolution function:

def convolve(image, filter, stride, modifier):     new_image = np.zeros ([image.shape[0], _round((image.shape[1]-filter.shape[1])/stride)+1, _round((image.shape[2]-filter.shape[2])/stride)+1], float)      #convolve     for channel in range (0, image.shape[0]):         filterPositionX = 0         filterPositionY = 0         while filterPositionX < image.shape[1]-filter.shape[1]+1:             while filterPositionY < image.shape[2]-filter.shape[2]+1:                 sum = 0                 for i in range(0,filter.shape[1]):                     for j in range(0,filter.shape[2]):                         if filterPositionX+i<image.shape[1] and filterPositionY+j<image.shape[2]:                             sum += image[channel][filterPositionX+i][filterPositionY+j]*filter[channel][i][j]                 new_image[channel][int(filterPositionX/stride)][int(filterPositionY/stride)] = sum*modifier                 filterPositionY += stride             filterPositionX += stride             filterPositionY = 0      #condense     condensed_new_image = np.zeros ([new_image.shape[1], new_image.shape[2]], float)     for i in range(0, new_image.shape[1]):         for j in range(0, new_image.shape[2]):             sum = 0             for channel in range (0, new_image.shape[0]):                 sum += new_image[channel][i][j]             condensed_new_image[i][j] = sum      condensed_new_image = np.clip (condensed_new_image, 0, 255)      return condensed_new_image 

Running the function on a 448×448 grayscale image with a 7×7 filter and a stride of 2 takes about 10 seconds. My computer has an i7 processor.

Origin of the convolution theorem

I am chemist, with some interest in signal processing. Sometimes, we occasionally use the deconvolution process to remove the instruments response from the desired signals. I am looking for the earliest reference which proposed the convolution theorem which is often utilized in signal processing (i.e., convolution becomes a multiplication in the Fourier domain).

The Earliest Known Uses of the Word of Mathematics websites gives lot of details on the word convolution, but who was the first person to specifically show the above mentioned property- the connection of Fourier transforms with convolution?

Here is the history of Convolution Operation: https://pulse.embs.org/january-2015/history-convolution-operation/

However, a mathematician privately disagreed with that historical account given in this article.


Training of two 3×3 convolution layers vs training one 5×5 convolution layer

I’m not 100% sure this is the right stackexchange, please feel free to redirect me to another one.

I know that two 3×3 convolution layers can be equivalent to one 5×5 convolution layer.

I also know that in many cases (maybe all of them), the training of both options is equivalent (in terms of result, not speed or optimisation)

Suppose I have a dataset on which I KNOW that no information whatsoever can be derived from any 3×3 window, but it can be from a 5×5 window.

If I wanted to, I could train a network with a 5×5 convolution layer, then manually transform it into two 3×3 convolution layers.

My question is: could I train a network directly with two 3×3 convolution layers ?

My reasoning: What could the first convolution learn apart from overfitting the training dataset? Supposedly nothing. So, the only layer that could some generalization of my data is the second one. But then, my problem sums up to comparing ONE 3×3 convolution versus ONE 5×5 convolution.

In the general case, the 5×5 convolution would perform better. But with my restrictions on the dataset, maybe it becomes equivalent to the 3×3 convolution? I which case I could train 3×3 convolution.

Thank you !

Continuity of convolution on $\mathcal{D}’_+$

Let $ \mathcal{D}’_+:=\{T\in \mathcal{D}'(\mathbb{R}): \textrm{supp}(T)\subset [0,\infty)\}$ . Here $ \mathcal{D}'(\mathbb{R})$ is the usual space of distributions on $ \mathbb{R}$ , equipped with the weak$ \ast$ -topology induced by $ \mathcal{D}(\mathbb{R})$ , and $ \mathcal{D}_+’$ is given the subspace topology induced from $ \mathcal{D}'(\mathbb{R})$ .

Question: Is convolution $ \ast:\mathcal{D}’_+ \times \mathcal{D}’_+\rightarrow \mathcal{D}’_+$ separately continuous?

How to treat the ouput-values after convolution with sobel-kernel

I’m trying to write a program (too practice c++) which can detect the edges of an image. I’m using SFML to load in the image.

I tried the Sobel-operation and got confused with the output values. I’m getting the intensity value from the pixels around my current pixel, and I multiply those with the appropriate value of the Sobel-kernel. Those values get added together.

My Problem is, that the input intensity values range from 0 to 255, but my output value can range from some negative value (can be smaller than -255) to some positive value (can be bigger than 255). I couldn’t find any hint on how to convert them into greyscale values; My InputImage is Greyscale, meaning r,g,b are similar. As a the Sobel-matrix/kernel I used a custom Matrix Class.

I hope the way I wrote that question wasn’t too confusing, and that I have provided enough code to show my problem.

// The Sobel-kernel-x looks like that int kernel_x[] = { 1, 0, -1,                   2, 0, -2,                   1, 0, -1 }    for (int y = 1; y < height - 1; y++) {     for (int x = 1; x < width - 1; x++)     {         int sum = 0;         for (int i = -1; i <= 1; i++)         {             for (int j = -1; j <= 1; j++)             {                 sum += img.getPixel(x + j, y + i).r * sobelkernel_x.at(j + 1, i + 1);             }         }         //std::cout << sum << std::endl;         cur_img->setPixel(x, y, sf::Color(sum, sum, sum));     } } 

Convolution and approximate heat kernel

I have the following question: Let $ A, B\in C^\infty((0,\infty)\times M\times M)$ , where $ M$ is a compact manifold. Define \begin{align} (A*B)(t,x,y) = \int_0^t\int_M A(t-s,x,z)B(s,z,y)dzds \end{align} For this to make sense a sufficient condition is that actually $ A, B\in C^\infty([0,\infty)\times M\times M)$ . Suppose furthermore that $ A(0,x,y) = \delta(x-y)$ . Then the following holds \begin{align} (\partial_t-\Delta_x)(A*B)(t,x,y) = B(t,x,y) + ((\partial_t-\Delta_x)A*B)(t,x,y). \end{align} See for example Isaac Chavel: Eigenvalues in Riemannian Geometry, page 153, Lemma 2. My questions is: Could this identity hold without assuming continuity up to $ t=0$ , but instead integrability? For example if we assume that $ A(t,x,y) = (4\pi t)^{-n/2}e^{-d^2(x,y)/4t}$ and $ B$ is integrable, could this identity still hold?

L1 distance after Convolution

Given two discrete distributions $ P$ and $ Q$ with the same support $ x_1,\cdots,x_n$ . Assume $ K \in L^1(\mathbb{R})$ is a nonnegative function with $ \int_\mathbb{R} K(x)dx = 1$ , and let $ K_h(x) = \frac{1}{h}K(\frac{x}{h})$ .

I am wondering whether the following result holds: $ $ \left| \int_\mathbb{R} |P* K_h(x) – Q * K_h(x)|dx – \sum_i|P_i – Q_i| \right| \rightarrow 0, \quad as \, h\rightarrow 0 \, ?$ $ where $ P*K_h = \sum_i P_iK_h(x-x_i)$ is the convolution between $ P$ and $ K_h$ . In other words, I would like $ \left\Vert P*K_h – Q*K_h\right\Vert_1$ to be close to $ \left\Vert P-Q\right\Vert_1$ when $ h$ is small.

Actually we have that $ $ \int_\mathbb{R} |P* K_h(x) – Q * K_h(x)|dx \leq \sum_i|P_i – Q_i| \int K_h(x-x_i)dx = \sum_i|P_i – Q_i|,$ $ but I do not know what is the case for the opposite direction.

I am also wondering will the same statement still hold when the $ \ell_1$ distance is replaced with other distances, like Jensen–Shannon divergence.

Any hint would be appreciated.