SysWatch provides a bundle of nonlinear regression techniques building the core of any forecasting algorithm. We distinguish between parametric and non-parametric regressions. The main difference is that a nonparametric algorithm requires the whole training data set (point cloud) in the evaluation period.
M = Amount of data points (vectors), K = Amount of dimensions (parameters)
The K-Nearest regression is a non-parametric method. For a given multivariate input data set and output data set of same size, we define a reordering permutation for evaluation record
It follows an 𝜀-neighbourhood
and therefore, an estimate defined as
The Kernel Regression is a nonlinear data driven regression algorithm. It is a weighted sum over the whole data set , where similar points receive higher weights
Important ingredients of the regression are the kernel function (e.g. Gaussian) producing weights and a bandwidth parameter , which is calibrated against the point cloud
The similarity sampling regression is a local data driven approximation method. For a given multivariate input data set and output data set of same size, we choose an 𝜀-neighbourhood 𝑈𝑡𝜀 for defined by
The estimate is defined as
Given a data set and a linear regression model assumes that the relationship between the dependent variable Y and the regressor vector X is linear. Thus, the model takes the form
In addition to the linear regression method the generalized linear model contains quadratic and bilinear terms, which make the surface non-linear. The regression equation is:
Logistic Regression is used to find a probability that a regressor belongs to a certain category (mostly two categories) by transforming a given observation into an interval [0,1].
An optimal solution of can be found by maximizing the logarithmic likelihood function using a gradient descent for example.
and the logarithmic likelihood function divided by M
Radial basis functions neural networks (RBFNNs) are a special representative of artificial neural
networks (ANNs) with only one hidden layer (and one input and one output layer). ANNs have
the huge advantage that they are able to approximate nearly every kind of relation in data
and that they automatically add new or delete existing neurons or edges between two neurons. The number of neuron of the input layer I is number of the dimension of the input vector. O is the number of neurons of the output layer which is the same as the dimension of the output vector. The number of neurons in the hidden layer H can verify. For the theory, we accept that H is a fixed number of neurons of the hidden layer. Every neuron of the hidden layer has an own activation function which is a representative of a radial gaussian function.
with centre for and the Euclidean distance where is the i-th value in time t. Additional, a multidimensional radial gaussian function with covariance matrix produce better results. (Remark: All weights of edges between the input neurons and hidden neurons are equal one.) The j-th output can be calculated as follows
The training phase includes certain teaching steps which have the goal to optimize the following items:
- weights of the edges between the hidden neurons and output neurons
- number of hidden neurons H
- centres of the activation function of the hidden neurons
- standard deviation of the activation function of the hidden neurons