Accelerating Neural Network Evaluation

Neural networks work by activating neurons in the prior row based on the linear combination of the prior row and weights determined by the network's training.

This is equivalent to the matrix operations:

Matrix operations are easily parallelized. Below is a table showing the number of operations needed to multiply an NxN matrix by a vector of length N, the number of cycles needed if optimally parallelized, and the number of cores to achieve optimal parallelization.

N	Operations	Parallelized Cycles	Cores Required
2	6	2	4
4	28	3	16
8	120	4	64
16	496	5	256
32	2,016	6	1,024
64	8,128	7	4,096
128	32,640	8	16,384

Commercial CPUs top out at around 32 cores (AMD Ryzen Threadripper 3970X), so for anything more complex than 5x5 matrices, we need to use GPUs, which have more cores.

Space shortcuts

Page tree

Accelerating Neural Network Evaluation

Space shortcuts

Page tree

GPU and Neural Network Acceleration

Accelerating Neural Network Evaluation