회귀분석 응용
RM ~ LSTAT 두 변수를 이용한 다중회귀분석
# Price ~ RM + LSTAT + RM**2 + RM * LSTAT + LSTAT**2
# Price = b0 + b1 * rm + b2 * lstat + b3 * rm**2 + b4 * rm * lstat + b5 * lstat **2
# 학습 세트에 다항식항(컬럼)을 추가
X_train_rm_lstat_poly = poly.fit_transform(X_train_rm_lstat)
# 테스트 세트에 다항식항(컬럼)을 추가
X_test_rm_lstat_poly = poly.fit_transform(X_test_rm_lstat)
print(X_test_rm_lstat_poly[:2])
lin_reg.fit(X_train_rm_lstat_poly, y_train)
print(f'intercept: {lin_reg.intercept_}, coef: {lin_reg.coef_}')
[[ 8.259 3.54 68.211081 29.23686 12.5316 ]
[ 6.312 10.58 39.841344 66.78096 111.9364 ]]
intercept: 58.131040227957655, coef: [-1.76285033e+01 1.52009093e+00 2.09295492e+00 -3.53889752e-01 -3.14275848e-03]
y_pred_rm_lstat_poly = lin_reg.predict(X_test_rm_lstat_poly)
mse = mean_squared_error(y_test, y_pred_rm_lstat_poly)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_rm_lstat_poly)
print(f'Price ~ RM + LSTAT: RMSE = {rmse}, R**2 = {r2}')
print('y true:', y_test[:5])
print('y pred:', y_pred_rm_lstat_poly[:5])
Price ~ RM + LSTAT: RMSE = 5.715001544053191, R**2 = 0.6303959336867977
y true: [42.8 21.2 31. 14.1 25.3]
y pred: [50.29506251 22.34374022 28.85971088 16.00625841 25.07303372]
RM ~ LSTAT 두 변수를 이용한 다중회귀분석2
# Price ~ RM + LSTAT + LSTAT**2
# Price = b0 + b1 * rm + b2 * lstat + b3 * lstat**2
X_train_last = np.c_[X_train_rm, X_train_lstat_poly]
X_test_last = np.c_[X_test_rm, X_test_lstat_poly]
print('X_train_last:', X_train_last[:2], '\n X_test_last: ', X_test_last[:2])
X_train_last: [[ 5.093 29.68 880.9024]
[ 6.251 16.44 270.2736]]
X_test_last: [[ 8.259 3.54 12.5316]
[ 6.312 10.58 111.9364]]
lin_reg.fit(X_train_last, y_train) # fit/train
print(f'Price ~ RM + LSTAT + LSTAT**2: intercept: {lin_reg.intercept_}, coef {lin_reg.coef_}')
Price ~ RM + LSTAT + LSTAT**2: intercept: 11.976646227033507, coef [ 4.14148052 -1.79652146 0.03381396]
y_pred_last = lin_reg.predict(X_test_last) # 예측/테스트
print('y true:', y_test[:5])
print('y predict:', y_pred_last[:5].round(2))
y true: [42.8 21.2 31. 14.1 25.3]
y predict: [40.25 22.9 26.39 16.14 28.23]
mse = mean_squared_error(y_test, y_pred_last)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_last)
print(f'Price ~ RM + LSTAT: RMSE = {rmse}, R**2 = {r2}')
Price ~ RM + LSTAT: RMSE = 5.592903859241363, R**2 = 0.6460199841225782
'Python > Python기초' 카테고리의 다른 글
Python 74_ Logistic Regression 로지스틱 회귀 (0) | 2020.03.05 |
---|---|
Python 73_ seaborn 패키지 이용한 시각화 ( load_boston 활용 ) (0) | 2020.03.04 |
Python 72_Scikit_Learn을 이용한 Boston House Data 회귀분석1 (0) | 2020.03.02 |
Python 71_ Scikit_Learn을 이용한 비선형 회귀분석 (0) | 2020.02.28 |
Python 70_ Scikit_Learn에서 Linear regression 사용하기2 (0) | 2020.02.27 |