Function approximation is the core of many problems in machine learning. DeepMind's latest research combines the advantages of neural networks and random processes, proposing nerve process models, achieving good performance and high computational efficiency.
Function Approximation is the core of many problems in machine learning. In the past decade, a very popular approach to this problem is to depth neural network. The advanced neural network consists of a black box function approximator that learns a single function from a large number of training data points. Therefore, most of the workloads of the network fall in the training phase, while the assessment and testing stages are simplified to fast forward propagation. Although high test time performance is valuable for many practical applications, it is impossible to update the output of the network after training, which may be we don't want. For example, meta-learning is an increasingly popular research area, which is solved is this limiter.
As an alternative to neural network, it is also possible to reasonabate the random process to perform function returns. The most common example of this method is Gaussian Process (GP), which is a neural network model with complementary properties: GP does not require expensive training phases, which can be carried out according to some observations. Inference, this makes them very flexible when testing.
In addition, GP represents unlimited different functions in the position where the observed position, therefore, on the basis of given some observation results, it can capture the uncertainty of its prediction. However, GP is expensive in the calculation: the original GP number of data points is 3 times square grade Scale, and the current optimal approximation method is approximated twice. In addition, the available kernel usually is limited in its function, requiring an additional optimization process to determine the most suitable Kernel and its super parameters for any given task.
Therefore, combining neural networks and random processes, it makes up for some of the shortcomings of both methods, which are more concerned as a potential solution. In this work, the team of DeepMind Research Scientist Marta Garnelo has proposed a method based on neural network and learning random processes, which is called NEURAL Processs, NPS. NP has some basic attributes of GP, that is, they learn to scale on the function, and can estimate their prediction uncertainty according to the context observation, and transfer some work from training to test time to achieve flexibility of the model.
More importantly, NP generates a prediction in a very high computational efficiency. A given N context points and M target points, a training-trained NP's reasoning corresponds to the forward delivery of a depth neural network, which is SCALE, not like classic GP. In addition, the model is to overcome the restrictions on many function design by learning implicit kernels directly from data.
The main contributions of this study are:
The neural process is proposed, which is a model that combines the advantages of neural network and random processes.
We compare the neurological process (NP) with the relevant work of meta-learning, Deep Latent Variable Models and Gaussian Processes. Given that NP is related to these areas, they can compare between many related topics.
By applying NP to a range of tasks, we include one-dimensional return, real image completion, Bayesian optimization and Contextual Bandits to prove NP's advantages and capabilities.
Neural process model
Figure 1: Neural process model.
(A) NEURAL ProCess, X, and Y respectively correspond to data of Y = f (x), C and T represent the number of context points and target points, respectively, and Z represents global potential. Gray background represents the variable observed.
(B) Schematic diagram of NEURAL Process. The variables in the circle correspond to the variables of the model model in (a), and the variables in the box indicate the intermediate representation of NP, and the bold letters indicate the following calculation modules: H - Encoder, A - Aggregator and G - Decoder. In our implementation, H and g correspond to neural networks, and a corresponds to a mean function. The solid line indicates the generation process, and the dotted line indicates the reasoning process.
In our NP implementation, we provide two additional requirements: the order of context and the invariance of computational efficiency (INVARIANCE).
The final model can be attributed to the following three core components (see Figure 1B):
From the input space to the encoder H representing the space, the input is a pair of context values and generates a representation for each pair. We use H parameters into a neural network.
AGGREGATOR A, the input of the summary encoder.
Conditional Decoder G, which uses the global potential of sampling as an input as an input, and outputs predictions for the corresponding value.
Figure 2: Title of the correlation model (A-C) and the neural course (D). Gray shadow indicates observed variables. C represents the context variable, t represents the target variable, that is, the variable to be predicted when a given C is given.
result
Figure 4. Mnist and Celeba pixelated regression
The graph on the left shows an image completion of pixelation can be framed as a 2-D regression task, where f (pixel coordinate) = pixel brightness. The figure on the right shows the results of the image to implement Mnist and Celeba. The top image corresponds to the context node of the model. In order to be more clear, the unwisened point is marked in blue and white in Mnist and Celeba. In the case of a given text node, each row corresponds to a different sample. As the text node increases, the predictive pixels are getting closer to the underlying pixels, and the variance between samples is gradually reduced.
Figure 5. Sampling of the 1-D target function with the neurogram
These figures show the process of 5 iterative optimization. Each predictive function (blue) is drawn by a sampling of a latent variable, where the variable is to increase the number of text nodes (black). The base truth function is represented as a black dotted line. The red triangle represents the next evaluation point, which corresponds to the minimum of the extracted NP curve. The red circle in the next iteration corresponds to this evaluation point, which refers to a new text node that will be NP.
Table 1. Sampling of Thompson is optimized to Bayesses
The average number of optimization steps requires the global minimum of the 1-D function generated by the Gaussian process. These values are normalized by random search taking steps. The performance of the Gauero process using the appropriate core (Kernel) is equivalent to the upper limit of performance.
Table 2. Results of the Wheel Bandit problem after increasing the Δ value
The results showing the average error and standard error of more than 100 accumulated regret and simple regret. The result is normalized to the performance of Uniform Agent.
discuss
We introduced a set of models combined with random processes and neural networks, called neurological processes. The NPS learns to represent distributions on the function and make flexible predictions based on some text inputs. NPS does not need to write the kernel personally, but directly learned implicit metrics (Implicit Measure).
We apply NPS to some column regression tasks to showcase their flexibility. The purpose of this article is to introduce NPS and compare it with the current research. Therefore, the task we present is that although there are many types, but the dimension is relatively low. The NPS is extended to a higher dimension, which may greatly reduce the computational complexity and data driver representation. Read the full article, in the original title: [ICML ORAL] DeepMind proposes a new direction of depth learning: neurological process model
Article Source: [Micro Signal: AI_ERA, WeChat public number: Xin Zhiyuan] Welcome to add attention! Please indicate the source of the article.
Our other product: