Python 70_ Scikit_Learn에서 Linear regression 사용하기2

Python/Python기초

Python 70_ Scikit_Learn에서 Linear regression 사용하기2

Codezoy 2020. 2. 27. 18:43

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

임의의 X, y 값 설정

np.random.seed(1216)

X = 2 * np.random.rand(100, 1)

print('X shape:', X.shape) # 0.0 ~ 2.0 숫자들로 이루어진 100x1 행렬(2차원 ndarray)

y = 4 + 3 * X + np.random.randn(100, 1)

print('y shape', y.shape)

X shape: (100, 1)

y shape (100, 1)

plt.scatter(X, y)

plt.show()

X_b = 행렬 [1 , X]로 구성

X_b = np.c_[np.ones((100, 1)), X]

print('X_b shape:', X_b.shape)

print(X_b[:5])

X_b shape: (100, 2) (2번째 데이터가 X 값임)

[[1. 1.99119637]

[1. 1.24682182]

[1. 1.75218737]

[1. 1.77875572]

[1. 0.77352496]]

다중선형회귀를 풀 수 있다고 알려진 공식에 대입

# linalg 모듈: Linear Algebra(선형 대수)

theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

print('theta =', theta_best)

theta = [[3.90187826]

[3.07247138]]

# 행렬식을 이용해서 찾은 theta 값과 LinearRegression 클래스에서 계산된 theta 비교

lin_reg = LinearRegression()

lin_reg.fit(X, y)

print(f'y절편: {lin_reg.intercept_}, 기울기: {lin_reg.coef_}')

y절편: [3.90187826], 기울기: [[3.07247138]]

X_test = [[0],

[1],

[2]]

# 행렬식: y = X_b @ theta

X_test_b = np.c_[np.ones((3, 1)), X_test]

print(X_test_b)

y_pred = X_test_b.dot(theta_best)

print(y_pred)

[[1. 0.]

[1. 1.]

[1. 2.]]

[[ 3.90187826]

[ 6.97434963]

[10.04682101]]

# scikit-learn 패키지를 사용한 예측

predictions = lin_reg.predict(X_test)

print(predictions)

[[ 3.90187826]

[ 6.97434963]

[10.04682101]]

plt.scatter(X, y)

plt.plot(X_test, y_pred, 'ro-')

plt.show()

저작자표시 (새창열림)