Accelerating Neural Network Evaluation
Neural networks work by activating neurons in the prior row based on the linear combination of the prior row and weights determined by the network's training.
This is equivalent to the matrix operations:
Matrix operations are easily parallelized. Below is a table showing the number of operations needed to multiply an NxN matrix by a vector of length N, the number of cycles needed if optimally parallelized, and the number of cores to achieve optimal parallelization.
N | Operations | Parallelized Cycles | Cores Required |
---|---|---|---|
2 | 6 | 2 | 4 |
4 | 28 | 3 | 16 |
8 | 120 | 4 | 64 |
16 | 496 | 5 | 256 |
32 | 2,016 | 6 | 1,024 |
64 | 8,128 | 7 | 4,096 |
128 | 32,640 | 8 | 16,384 |
Commercial CPUs top out at around 32 cores (AMD Ryzen Threadripper 3970X), so for anything more complex than 5x5 matrices, we need to use GPUs, which have more cores.