Some basic concepts
Figure 1. Basic process of machine learning
Training set: A set of data obtained by observing, measurement, etc. in order to study a variable (x) relationship with another variable (Y). The Y-one data pair (x, y) corresponding to the corresponding amount of X and respects is collected in this set of data. For example, we have to study the relationship between the house area (X) and the price (Y), and give a data pair (x, y) for each of the sold homes. Observing 10 sets of sold homes, you can get 10 such data. At this time, it has received a training set for studying the relationship between house area and price (although the sample amount is relatively small). These datasets are generally collected from a realistic environment (our purpose is to see the essence through phenomenon).
Sample: The object of training is a sample, such as a house for sale.
Model: Due to certain historical reasons, the model in machine learning is also called hypothesis, h), this h is the "essence" we want to find through the phenomenon. The process of establishing a model is usually determined a process of determining a function expression (whether to remember this type of topic in winter holiday: Observe a group, what is the next number?). The most common model is the regression model (linear regression or logic regression, etc.), for example, we assume that the relationship between the house area and the price is a linear regression model, it can be written: h (θ) = θ0 + θ1x ... (1) H (θ) = θ0 + θ1x ... (1) where H is a function (possibly called Y, but in the machine learning y typically represents known function values, that is, the later due to variables; the h is equivalent to prediction Y), θ is the parameters of the function (also as being the weight of each self-variable, the greater the weight, the greater the impact on Y), and X is the argument.
Training model: Select model (select the right model requires a rich experience), the general form of the function is determined. The usual training model refers to the process of pending parameters using the training cash solution. The general form of the above (1) and the line equation Y = ax + b is the same, and there is no way to write. At this time, we know that the model is a straight line. In order to determine the determination equation of this line, we need to find two unknown parameters - θ0 (intercept) and θ1 (slope), if only two samples in the training set, then Just ask a binary secondary equation group to solve the problem.
Feature: Features is in a model, all the collection of arguments (X) who want to study. For example, in the model of studying the price of the house, all factors that may affect the price can be seen as a feature, the house area, the city, the number of rooms. In the process of establishing a model, the choice of features is a university, and even a special branch to study feature selection or feature representation.
2. Expiration of the training set
It is mentioned above, the training set is a collection of many (x, y) pairs. Where X is due to variables, Y is an argument. It is generally considered that the change in X causes the y to change, that is, the value of X determines the value of Y. In the model of predicting house prices, if we can find all factors that affect housing prices (all X), and determine the accurate parameters (θ) of each factor, then the price of any house can be predicted (Y) .
2.1 representation of single factor training concentration
Single factors are equivalent to only one independent variable in the equation, this independent variable can be represented by a lowercase letter X;
If multiple samples are collected, the way to add parentheses with parentheses in the upper right corner, is represented as x (1), x (2),. . ., X (m), where M represents the number of samples;
The representation of the matrix: The vector is generally indicated by lowercase letters, and the matrix is represented by uppercase letters. All X (lowercase letters) in all single-factor samples can be used with a column of MX 1 (M r line 1 column) (only one column of matrix is a column vector): ⎞⎠⎟⎟⎟⎟⎟X = (x (1) x (2) ⋮ x (m)))
2.2 Multi-factor training concentration
Multi-factors are equivalent to having multiple arguments (multiple Features) in the equation. Different from the lower right corner of the lower right corner to distinguish between X1, X2,. . ., XN, where n represents the number of Features;
When there is a plurality of samples, a matrix X (uppercase letter) of a MXN (M row N column) can be represented: ⎤⎦⎥⎥⎥⎥⎥⎥⎥X = [x1 (1) x2 (1) ... xn ( 1) x1 (2) x2 (2) ... xn (2) ⋮⋮ ⋱ ⋮ x1 (m) x2 (m) ... xn (m)]
2.3 Training Concentrated Due to Variables
Whether it is a single factor or a multi-factor, each sample contains only a variable (Y), so it only needs to distinguish between Y, Y (1), Y (2), and. . ., Y (m), where m represents the number of samples;
The column vector Y is expressed as:
⎞⎠⎟⎟⎟⎟⎟Y = (y (1) y (2) ⋮ y (m))
3. Expression of the parameter
Perhaps some agreement, in machine learning, generally use θ to represent parameters, parameters are the parameters of the variable X (can also be seen as the weight of each independent variable, the greater the weight of the index, the weight of the index to Y) The larger, theoretically, how many parameters are there, but just like the parameter A of X + B in the line equation Y = AX + B, there is a constant term B. Therefore, the parameters are generally more than one number of arguments. When there is n to variables, there will be n + 1 parameters.
The final model is represented by a particular equation, and the unknown parameters in this equation are determined in the process of the training model. These parameters are the same for all samples, such as the parameters of the first self-variable X1 in the first sample X (1) and the parameters of the first self-variable X1 in any other sample X (i). . Therefore, the parameters between samples are not used, only the parameters between different arguments can be used, and one N + 1-dimensional column vector θ can be used to represent all parameters:
⎞⎠⎟⎟⎟⎟θ = (θ0θ1 ⋮ θn)
4. Model representation
The model that is said here is a specific function, which has been mentioned above, the model is generally represented by H. The symbol representation of the model is illustrated by a linear regression model.
4.1 Direct Representation
Direct representation is that we represent the algebraic representation before the learning linear algebra.
Single variable linear regression equation: hθ (x) = θ0 + θ1XHθ (X) = θ0 + θ1x
Multivariable linear regression equation: NHθ (x) = θ0 + θ1x1 + θ2x2 + θ3x3 + ... + θnxn
4.2 matrix representation
After learning the linear algebra, you can use the matrix to indicate the above equation, not only means convenient, not only the matrix operation efficiency is also more efficient. It is there to be specifically described herein that X0 is added to the above equation in order to match the representation of the matrix, and X0 = 1, and θ0 is used as the parameters of X0.
Single variable / multivariable linear regression equation: ⎤⎦⎥⎥⎥⎥Hθ (x) = xθ = [x0 (1) x1 (1) ... xn (1) x0 (2) x1 (2) ... xn (2) ⋮ ⋮ ⋱ ⋮ ⋮ 0 0 0 x x]】 [θ0θ1 ⋮ θn], at this time, X is a matrix of MX (N + 1), each row represents a sample, each column represents a feature, result Is a column vector of MX 1, where m represents the number of samples, n represents the number of variables (each column in X has the same parameters, and a column represents the same feature in different samples);
When there is only one sample multiple variables, it can also be represented as: ⎤⎦⎥⎥⎥⎥Hθ (x) = θtx = [θ0θ1 ... θn] [X0x1 ⋮ xn], at this time, X is a (n + 1) dimension Column vector, each row represents the value of a variable. Read more
Our other product: