2015年2月24日 星期二

[R] Gradient Descent(SGD) 演算法實作

圖片來源:http://en.wikipedia.org/wiki/Gradient_descent

先前的[Algorithm] Stochastic gradient descent(梯度下降法)作為Online Learning Model(即時訓練模型)(一)提到使用SGD的方式來跑linear regression。本文嘗試使用R來實作這段過程(但是範例是batch作業,非即時):



測試資料

x0 <- c(1,1,1,1,1) # column of 1's
x1 <- c(1,2,3,4,5) # original x-values

# create the x- matrix of explanatory variables

x <- as.matrix(cbind(x0,x1))

# create the y-matrix of dependent variables
y <- as.matrix(c(3,7,5,11,14))
m <- nrow(y)

用矩陣解

# analytical results with matrix algebra
solve(t(x)%*%x)%*%t(x)%*%y # w/o feature scaling

# results using canned lm function match results above
summary(lm(y ~ x[, 2])) # w/o feature scaling

>> result
Intercept: 0.2
x[, 2]: 2.6

改成用ML的方式做

# define the cost function   
cost <- function(x, theta){
  cost <-  (1 /(2*m)) * (x %*% t(theta)-y)^2 
  return(colSums(cost))
}

# define the gradient function dJ/dtheata: 1/m * (h(x)-y))*x where h(x) = x*theta
# in matrix form this is as follows:
grad <- function(x, y, theta) {
  gradient <- (1/m)* (t(x) %*% ((x %*% t(theta)) - y))
  return(t(gradient))
}

# define gradient descent update algorithm
grad.descent <- function(x, maxit = 1000, alpha = 0.05){

  theta <- mat.or.vec(1,ncol(x)) # Initialize the parameters
  #alpha = 0.05 # set learning rate
  costBefore = 0 # before cost
  costDiff = 1 # diff cost between before and after
  count = 0 # count max

  while (costDiff>10e-10) {
    theta <- theta - alpha * grad(x, y, theta)
    costAfter <- cost(x, theta) # compute cost

    if (costAfter==Inf) stop('not converge, please reduce alpha')

    costDiff <- abs(costAfter - costBefore)
    costBefore <- costAfter
    #print(costDiff)
    count = count +1
    if (count==maxit) {
      #print(count) 
      break
    }
  }
  print(paste("coustDiff = ", costDiff))
  print(paste("count = ", count))
  return(theta)
}

print(grad.descent(x,1000,0.03))
RESULT
x0 x1
0.2029918 2.599171
因為ml是追求近似值,所以跟解方程式的結果還是有點差異(可以自己設定差異)。就是一個簡單的實作。