岭回归(英文名:ridge regression, Tikhonov regularization)是一种专用于共线性数据分析的有偏估计回归方法,实质上是一种改良的最小二乘估计法,通过放弃最小二乘法的无偏性,以损失部分信息、降低精度为代价获得回归系数更为符合实际、更可靠的回归方法,对病态数据的拟合要强于最小二乘法。

import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error,r2_score
from sklearn import datasets

# CV crosss validation :交叉验证
from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet,ElasticNetCV,LassoCV
diabetes = datasets.load_diabetes()
X = diabetes['data']
y = diabetes['target'] 
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.15) 
lr = LinearRegression()

lr.fit(X_train,y_train)

# 回归问题的得分,不是准确率
lr.score(X_test,y_test)

0.508409427784998

'''The coefficient R^2 is defined as (1 - u/v), where u is the residual
sum of squares ((y_true - y_pred) ** 2).sum() and v is the total
sum of squares ((y_true - y_true.mean()) ** 2).sum().'''
u = ((y_test - y_)**2).sum()
v = ((y_test - y_test.mean())**2).sum()
r2 = 1 - u/v
r2

0.508409427784998

y_ = lr.predict(X_test)
display(y_.round(0),y_test)

array([192., 85., 134., 138., 264., 191., 142., 141., 291., 91., 253., 174., 164., 153., 167., 83., 229., 169., 92., 206., 174., 78., 197., 53., 163., 157., 104., 139., 211., 106., 77., 125., 117., 170., 82., 183., 162., 164., 218., 228., 181., 126., 169., 100., 120., 69., 211., 168., 111., 169., 187., 204., 163., 133., 154., 157., 165., 76., 153., 82., 114., 115., 97., 148., 71., 186., 165.])

array([164., 181., 124., 142., 308., 122., 185., 168., 270., 74., 281., 52., 109., 246., 181., 92., 99., 122., 91., 265., 143., 59., 131., 48., 216., 55., 65., 93., 288., 118., 77., 97., 61., 258., 51., 163., 144., 185., 296., 281., 141., 135., 171., 69., 177., 83., 220., 235., 109., 138., 257., 297., 151., 170., 210., 259., 110., 55., 185., 42., 87., 96., 84., 97., 134., 129., 131.])

r2_score(y_test,y_)

0.508409427784998

mean_squared_error(y_test,y_)

2684.848466337077

使用岭回归

lr = LinearRegression()

lr.fit(X_train,y_train)

print(lr.score(X_test,y_test))

y_ = lr.predict(X_test)

mean_squared_error(y_test,y_)

0.508409427784998

2684.848466337077

rigde = Ridge(alpha=0.001)

rigde.fit(X_train,y_train)

print(rigde.score(X_test,y_test))

y_ = rigde.predict(X_test)

mean_squared_error(y_test,y_)

0.5077536734066447

2688.429904298921

在划分较小数的时候使用np.logspace(-5,1,50),精准效率优于np.linspace(0.01,5,50)

在这里插入图片描述在这里插入图片描述

from sklearn.linear_model import RidgeCV 


ridgeCV = RidgeCV(alphas=np.logspace(-5,1,50),scoring='r2',cv = 6)

ridgeCV.fit(X_train,y_train)

y_ = ridgeCV.predict(X_test)
r2_score(y_test,y_)

0.5021580806301859

ridgeCV = RidgeCV(alphas=np.linspace(0.01,5,50),scoring='r2',cv = 6)

ridgeCV.fit(X_train,y_train)

y_ = ridgeCV.predict(X_test)
r2_score(y_test,y_)

0.5006336933433428

版权声明:如无特殊说明,文章均为本站原创,转载请注明出处

本文链接:http://wakemeupnow.cn/article/ridge/