Neural Networks for RF and Microwave Design

As seen in earlier discussions, the implementation of powerful second-order optimization techniques for neural network training result in significant advantages in training. The second-order methods are typically much faster than BP, but could require the storage of an inverse Hessian matrix and its computation or approximation. For large neural networks, training turns out to be a very large-scale optimization. Decomposition is an important way to solve large-scale optimization problems. Several training algorithms that decompose the training process by training the neural network layer-by-layer have been proposed [47 49]. In [48], the weights w L of the output layer and the output vector z L-1 of the previous layer are treated as two sets of variables. First, an optimal solution pair ( w L, z L-1) is determined to minimize the sum-squared-error between the neural network outputs and the desired outputs. The current solution z L-1 is then set as the desired output of the previous hidden layer, and optimal weight vectors of the remaining hidden layers are recursively obtained.
Linear programming can be used to solve large-scale linearized optimization problems. Neural network training was linearized and formulated as a kind of constrained linear programming in [50]. In this work, weights are updated with small local changes, with a requirement that none of the individual sample errors increase, and an objective of maximizing the overall reduction in error. In [47], a layer-by-layer optimization of a neural network was...