圖片來源:http://en.wikipedia.org/wiki/Gradient_descent
先前的[Algorithm] Stochastic gradient descent(梯度下降法)作為Online Learning Model(即時訓練模型)(一)提到使用SGD的方式來跑linear regression。本文嘗試使用R來實作這段過程(但是範例是batch作業,非即時):
測試資料
x0 <- c(1,1,1,1,1) # column of 1's
x1 <- c(1,2,3,4,5) # original x-values
# create the x- matrix of explanatory variables
x <- as.matrix(cbind(x0,x1))
# create the y-matrix of dependent variables
y <- as.matrix(c(3,7,5,11,14))
m <- nrow(y)
用矩陣解
# analytical results with matrix algebra
solve(t(x)%*%x)%*%t(x)%*%y # w/o feature scaling
# results using canned lm function match results above
summary(lm(y ~ x[, 2])) # w/o feature scaling
>> result
Intercept: 0.2
x[, 2]: 2.6
改成用ML的方式做
# define the cost function
cost <- function(x, theta){
cost <- (1 /(2*m)) * (x %*% t(theta)-y)^2
return(colSums(cost))
}
# define the gradient function dJ/dtheata: 1/m * (h(x)-y))*x where h(x) = x*theta
# in matrix form this is as follows:
grad <- function(x, y, theta) {
gradient <- (1/m)* (t(x) %*% ((x %*% t(theta)) - y))
return(t(gradient))
}
# define gradient descent update algorithm
grad.descent <- function(x, maxit = 1000, alpha = 0.05){
theta <- mat.or.vec(1,ncol(x)) # Initialize the parameters
#alpha = 0.05 # set learning rate
costBefore = 0 # before cost
costDiff = 1 # diff cost between before and after
count = 0 # count max
while (costDiff>10e-10) {
theta <- theta - alpha * grad(x, y, theta)
costAfter <- cost(x, theta) # compute cost
if (costAfter==Inf) stop('not converge, please reduce alpha')
costDiff <- abs(costAfter - costBefore)
costBefore <- costAfter
#print(costDiff)
count = count +1
if (count==maxit) {
#print(count)
break
}
}
print(paste("coustDiff = ", costDiff))
print(paste("count = ", count))
return(theta)
}
print(grad.descent(x,1000,0.03))
RESULT
x0 | x1 |
---|---|
0.2029918 | 2.599171 |
沒有留言:
張貼留言