R21_규칙 기반 분류 (Classification Rules)

의사결정 트리: 분할 정복(Divide and conquer)
규칙 학습 : 분리 정복(Seperate and conquer)
분할 정복과 분리 정복의 차이점

○ 분할에 의해 생성된 파티션은 재정복되지 않고 단지 하위 분할만 된다. 즉, 트리는 이전 결정의 이력에 의해 영원히 제약된다.

○ 분리 정복으로 규칙을 발견하면, 규칙의 모든 조건으로 커버되지 않는 어떤 관찰값(예시)든 재정복될 수 있다.

# 규칙 학습자(rule learner) 분류기 . 가장 정확도가 높은 규칙 1개를 찾는다.

mushroom <- read.csv(file = 'mlwr/mushrooms.csv',

encoding = 'UTF-8')

> table(mushroom$type)

edible poisonous

4208 3916

# veil_type 변수는 모든 행이 동일한 값 - 분류 기준이 될 수 없음.

mushroom$veil_type <- NULL # factor level이 1개인 특성(column)을 삭제

# 규칙 분류기 - One Rule 분류기

install.packages('OneR')

library(OneR)

# 모델 훈련

>mushroom_1R <- OneR(type ~ . , data = mushroom)

> mushroom_1R

type : 조사대상

x~y : x는 y에 종속된다는 뜻, Y ~ x + y + z : Y는 독립변수 x, y, z 에 종속된다.

. : 모든 변수

Call:

OneR.formula(formula = type ~ ., data = mushroom)

Rules:

If odor = almond then type = edible

If odor = anise then type = edible

If odor = creosote then type = poisonous

If odor = fishy then type = poisonous

If odor = foul then type = poisonous

If odor = musty then type = poisonous

If odor = none then type = edible

If odor = pungent then type = poisonous

If odor = spicy then type = poisonous

Accuracy:

8004 of 8124 instances classified correctly (98.52%)

# odor 한 가지 변수를 가지고 분류하는 것이 정확도 98.52%로 가장 높다.

> mushroom_1R_cap <- OneR(type ~ cap_shape + cap_surface + cap_color,

+ data = mushroom)

> mmushroom_1R_cap

Call:

OneR.formula(formula = type ~ cap_shape + cap_surface + cap_color,

data = mushroom)

Rules:

If cap_color = brown then type = edible

If cap_color = buff then type = poisonous

If cap_color = cinnamon then type = edible

If cap_color = gray then type = edible

If cap_color = green then type = edible

If cap_color = pink then type = poisonous

If cap_color = purple then type = edible

If cap_color = red then type = poisonous

If cap_color = white then type = edible

If cap_color = yellow then type = poisonous

Accuracy:

4836 of 8124 instances classified correctly (59.53%)

# cap_shape + cap_surface + cap_color 세 변수로 type의 상관관계 분석 --> color로 분석하는 것이 59.53%로 가장 높다.

> summary(mushroom_1R)

Call:

OneR.formula(formula = type ~ ., data = mushroom)

Rules:

If odor = almond then type = edible

If odor = anise then type = edible

If odor = creosote then type = poisonous

If odor = fishy then type = poisonous

If odor = foul then type = poisonous

If odor = musty then type = poisonous

If odor = none then type = edible

If odor = pungent then type = poisonous

If odor = spicy then type = poisonous

Accuracy:

8004 of 8124 instances classified correctly (98.52%)

mushrooms.csv

Maximum in each column: '*'

Pearson's Chi-squared test:

X-squared = 7659.7, df = 8, p-value < 2.2e-16

# odor = none 인 것 중에 poisonous가 120개 존재함.

# 성능 개선을 위해 RIPPER 알고리즘을 사용

install.packages('RWeka')

library(RWeka) # 실행하기 위해서는 자바 설치 필요

mushroom_ripper <- JRip ( type ~. , mushroom )

mushroom_ripper

JRIP rules:

===========

(odor = foul) => type=poisonous (2160.0/0.0)

(gill_size = narrow) and (gill_color = buff) => type=poisonous (1152.0/0.0)

(gill_size = narrow) and (odor = pungent) => type=poisonous (256.0/0.0)

(odor = creosote) => type=poisonous (192.0/0.0)

(spore_print_color = green) => type=poisonous (72.0/0.0)

(stalk_surface_below_ring = scaly) and (stalk_surface_above_ring = silky) => type=poisonous (68.0/0.0)

(habitat = leaves) and (cap_color = white) => type=poisonous (8.0/0.0)

(stalk_color_above_ring = yellow) => type=poisonous (8.0/0.0)

=> type=edible (4208.0/0.0)

Number of Rules : 9

> summary(mushroom_ripper)

=== Summary ===

Correctly Classified Instances 8124 100 %

Incorrectly Classified Instances 0 0 %

Kappa statistic 1

Mean absolute error 0

Root mean squared error 0

Relative absolute error 0 %

Root relative squared error 0 %

Total Number of Instances 8124

=== Confusion Matrix ===

a b <-- classified as

4208 0 | a = edible

0 3916 | b = poisonous

저작자표시

'R > R 머신러닝' 카테고리의 다른 글

R23_수치 데이터 예측: 회귀 방법 (0)	2019.11.04
R22_버섯 분류 - 나이브 베이즈 방법 (0)	2019.11.01
R20_ 분할 정복 _ 의사결정 트리( Decision Tree ) (0)	2019.10.30
R19_Naive Bays(확률적 학습), Text mining (0)	2019.10.29
R18_k-NN 알고리즘(최근접 이웃 분류)2- iris 분류 (0)	2019.10.28

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

R21_규칙 기반 분류 (Classification Rules)

'R > R 머신러닝' 카테고리의 다른 글

'R > R 머신러닝' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역