Abstract
In this context we have studied neural networks, radial basis functions and Sugeno Controllers (fuzzy control); below we present some results on neural networks.
Learning Theory
Contact: andreas.hofinger@oeaw.ac.at
In this context we have studied neural networks, radial basis functions and Sugeno Controllers (fuzzy control); below we present some results on neural networks.
In a general setup, a neural network can be seen as a parameterized, typically nonlinear, mapping used in function approximation, classification and optimization problems. A popular class of a neural network are socalled feedforward neural networks with one hidden layer and linear output layer with activation function , which can be written in the form
The essential difference to classical discretization methods like splines or finite elements (FE) is that no parameter is chosen a priorily (like e.g. the grid points in FE), but all of them are determined via a nonlinear optimization procedure, the socalled training or learning process. The major advantage of neural networks is their high approximation capability, i.e. for a large class of functions, a dimension independent rate of convergence is attainable. This rate can be achieved only because of the inherent nonlinearity.
Finding optimal parameters of the network can be seen as a nonlinear parameter identification problem, although typically there is no interest in the parameters themselves, but mainly in the output of the network. It turns out that this problem is asymptotically illposed, which is not too surprising due to the similarity to parameter identification: small data errors may lead to arbitrarily large errors in the optimal parameters as the number of nodes in the network tends to infinity. In addition, there is also some kind of nonlinear illposedness, which is not asymptotic in nature: even networks consisting of only two nodes can lead to instabilities with respect to the parameters in (1) if the nodes get too close. For general, nonsmooth (e.g., piecewise linear) activation functions it is also possible to construct examples, where the process is unstable when one is interested in the output measured in higher Sobolevnorms (again even for the case of finite networks).
For these reasons, the training of a neural network has to be regularized. Various regularization techniques have been applied to the illposed training process; these methods can be divided into
 Tikhonovtype methods and
 Iterative regularization techniques
In a first step the regularization methods "weight decay" and "output smoothing" were investigated, and appropriate parameter choice rules have been developed. Although providing stability results and convergence rates these methods do not take care of the underlying structure of the neural network, but consider it just as output of some (highly) nonlinear operator.
Therefore the natural next step is the investigation of iterative methods, like Landweber's method or the Gauß Newton method, which are relatives of the widely used (but unstable!) backpropagation algorithm. The typical nonlinearity or tangential cone conditions which guarantee convergence of these methods are not fulfilled if the network consists of more than one node ("if all neurons learn at the same time"). In order to overcome this problem, a socalled greedy algorithm was developed.
Adaptive Greedy Algorithm
In a greedy algorithm the size of the network is increased successively by only one node each step; the nonlinear optimization process is split into a sequence of lowdimensional problems. Together with an aposteriori stopping rule, which restricts the final size of the network, convergence of the approximations for the case of noisy data can be shown
Although this approach in principal yields stable approximations, it is not ready for numerical implementation, since it requires unavailable information about the noisefree data (comparable to a bound on the norm of ).
Therefore an alternative, adaptive method was developed, which is feasible, since the smoothness information above is not needed, but recovered adaptively from the data. Furthermore, the resulting algorithm leads to stable approximations and provides an optimal convergence rate for noiselevel tending to zero.
Open Goals
 Inverse Problems
 Over the last years, neural networks have also become a
frequently used method for solving inverse problems arising
in many contexts, from geophysics to medical and economic
applications. A sound theoretical basis for using neural
networks as algorithms for solving such problems is still
missing.
 Generalization Error

Neural networks are typically used to approximate
functions that are only known at a finite number of
sampling points. So far regularization has been
investigated for the case of continuous data only,
therefore it seems necessary to develop proper
modifications of these methods to cover the case of
discrete data. An important task in this context is the
investigation of the generalization error. This
type of error is caused by incomplete or partially
missing measurements. In this case instead of the
functional
the
discrete approximation
 Nonsmooth Regularization
 Another point of interest is the analysis of
different (Tikhonovtype) regularization methods,
f.i. the maximum entropy method or BVregularization, which
seem more appropriate for the analysis of neural networks
with nonsmooth activation functions, e.g. the Heaviside
function or low order splines.
 Network Architecture:
 An overall question to answer is the choice of the
optimal network architecture. So far, we derived
methods for determining the optimal size of single
layer networks (corresponding to the stopping index of
the greedy algorithm). As a next step, also
multilayer networks can be considered. It is not
obvious if methods such as the greedy algorithm can be
transferred to these more complex structures.
 U. Bodenhofer, M. Burger, H. W. Engl, J. Haslinger, Regularized datadriven construction of fuzzy controller, Journal of Inverse and Illposed Problems, 10 (2002), pp. 319344. Preprint: .pdf, .ps.gz
 M. Burger H. W. Engl, Training neural networks with noisy data as an illposed problem, Advances in Computational Mathematics, 13 (2000), pp. 335354. Preprint: .ps.gz
 M. Burger, A. Hofinger, Regularized greedy algorithms for network training with data noise, Computing, to appear (2004). Preprint: .pdf, .ps.gz
 M. Burger, A. Neubauer, Analysis of Tikhonov regularization for function approximation by neural networks, Neural Networks, 16 (2003). Preprint: .ps.gz
 M. Burger, A. Neubauer, Error bounds for approximation with neural networks, Journal of Approximation Theory, v.112 n.2 (2001), pp. 235250. Preprint: .ps.gz
 A. Hofinger, Iterative regularization and training of neural networks, Diplomarbeit, Johannes Kepler University Linz, 2003. Preprint: .pdf.gz
 A. Hofinger, Nonlinear function approximation: Computing smooth solutions with an adaptive greedy algorithm, SFB Report 0419, Johannes Kepler University Linz, SFB "Numerical and Symbolic Scientific Computing", 2004. Preprint: .pdf, .ps.gz
Uncertainty
Contact: andreas.hofinger@oeaw.ac.at
Another aspect which we want to address is "uncertainty", which is currently a hot topic in connection with inverse problems especially in the U.S. While we (and most of the functional analysis based inverse problems community) treat the error in the data (and also in the model) via a bound on some function space norm, there, a different, i.e., stochastic, error concept, is used. Convergence then cannot be studied by giving error estimates in some norm, but one has to consider different convergence concepts for random variables.
One standard approach here is to replace convergence of deterministic quantities,by convergence of the root mean square error RMSE.
We consider a different approach, where instead of convergence of random variables, the convergence of the underlying probability distributions is investigated. The proper tool for metrizing the convergence of probability distributions is the Prokhorovmetric.
One advantage of this approach is that the deterministic results are contained in the new theory, since deterministic quantities correspond to the deltadistribution and , i.e., deltadistributions converge with the same speed as the underlying variables. Another important feature is that infinitedimensional problems can be easily considered, whereas in stochastic approaches the model must typically be discrete to allow for analysis.
So far Tikhonovregularization for linear equations in a fully stochastic setting has been investigated, and results for convergence and convergence rates have been obtained. The results are derived for a very general setup that is applicable to a wide variety of (linear) stochastic illposed problems. These contain for instance the following cases

Deterministic equation with random error
Since the data is modelled as random variable, also the (pointwise) regularized solution is a random variable. Consequently, convergence of regularized solutions to the least squares solution in the deterministic setting, now translates to the convergence of a distribution to the  distribution .

Stochastic operator is approximated by operator which is
 fixed deterministic
 fixed deterministic with overlaid stochastic noise
 stochastic and noisefree (i.e., )
 stochastic with overlaid noise
 Combination of these
A key step in the new approach is the lifting of a locally Lipschitz continuous map to the space of probability measures, which allows to transfer existing deterministic estimates to a stochastic setting.
 H. W. Engl, A. Walkolbinger, Continuity properties of the extension of a locally Lipschitz continuous map to the space of probability measures, Monatshefte f. Math. 100 (1985), 85103.
 H. W. Engl, A. Hofinger, S. Kindermann, Convergence rates in the Prokhorov metric for assesing uncertainty in illposed problems, Inverse Problems, 21:399412, 2005. Preprint: .pdf, .ps.gz
SpezialForschungsBereich SFB F013  Special Research Program of the FWF  Austrian Science Fund