통계학 기초 정리 (5) : 상관계수(피어슨 상관계수, 스피어만 상관계수, 켄달타우 상관계수, 상호정보 상관계수)

1. 피어슨 상관계수(Pearson Coefficient Correlation)

전형적인 선형 관계를 볼 수 있다. (비선형관계에서는 사용할 수 없다.)

어떤 숫자형태의 변수여야 한다. 숫자 형태의 값들은 연속적인 값을 가진다. 연속적인 값을 가지지 않으면 피어슨 상관계수를 쓸 수 없다.

-1에서 1의 값을 가지고, 1은 완전한 양의 상관관계 / -1 은 완전한 음의 상관관계 / 0은 상관관계가 없음

✅파이썬에서 활용방법

#피어슨 상관계수 계산
pearson_corr, _ = pearsonr(df['Study Hours'], df['Exam Scores'])
print(f"피어슨 상관계수: {pearson_corr}")

2. 비모수 상관계수(Nonparametric Correlation Coefficient )

데이터가 정규분포를 따르지 않을 때 사용하는 상관계수

데이터의 분포에 대한 가정을 하지 못할 때
순서형 데이터에서도 사용하고 싶을 때

(1) 스피어만 상관계수(Spearman Correlation Coefficient )

두 변수의 순위 간의 일관성을 측정

켄달타우 상관계수보다 데이터 내의 편차와 에러에 민감하게 반응한다.

Spearman Rank Correlation

In this article, we will explore the theory, assumptions and interpretation of Spearman’s rank correlation, a flexible statistical tool that assesses the strength and direction of the relationship between two quantitative, ranked variables.

www.technologynetworks.com

(2) 켄달타우 상관계수(Kendall's Tau correlation coefficient)

순위 간의 일치 쌍 및 불일치 쌍의 비율을 바탕으로 계산

같은 데이터를 가지고 실험하더라도 두 계수는 스피어만이 오차에 대해 더 민감하기 때문에 결과가 다르게 나올 수 있다.

✅파이썬 활용 방법

#스피어만, 켄달타우 함수
from scipy.stats import spearmanr, kendalltau


#스피어만 상관계수 
spearmanr_corr, _ = spearmanr(df['Customer Satisfaction'], df['Repurchase Intent']) #_에는 P값이 들어감. 변수를 넣어서 출력
print(f"스피어만 상관계수: {spearmanr_corr}")


#켄달타우 상관계수 
kendall_corr, _ = kendalltau(df['Customer Satisfaction'], df['Repurchase Intent'])
print(f"켄달타우 상관계수: {kendall_corr}")

Choosing the Right Correlation: Pearson vs. Spearman vs. Kendall’s Tau

Pearson Correlation, Spearman Rank Correlation, and Kendall’s Tau Rank Correlation are all methods used to measure the strength and direction of relationships between variables. The choice between…

ishanjainoffical.medium.com

3. 상호정보 상관계수(Mutual Information Correlation Coefficient)

상호정보를 이용하여 변수끼리의 상관계수를 계산하는 것
변수 간의 정보 의존성을 바탕으로 비선형 관계를 탐지함
서로의 정보에 대한 불확실성을 줄이는 정도를 바탕으로 계산함
범주형 데이터에도 적용이 가능하다.

✅파이썬 활용 방법

from sklearn.metrics import mutual_info_score # 상호정보 상관관계

#범주형 예제 데이터
X = np.array(['cat', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat'])
Y = np.array(['high', 'low', 'high', 'high', 'low', 'low', 'high', 'low', 'low', 'high'])

#상호 정보량 계산 
mi = mutual_info_score(X,Y)
print(f"Mutual Information (categorical): {mi}")

'기초통계' 카테고리의 다른 글

통계학 기초 문제풀이 (0)	2025.01.21
통계학 기초 정리 (6) : 가설검정(재현가능성, p-해킹, 선택적보고) (0)	2025.01.09
통계학 기초 정리 (4) : 회귀분석(선형회귀, 다항회귀, 스플라인회귀) (4)	2025.01.08
통계학 기초 정리 (3) : 각종 검정 방법 (t검정, 다중 검정, 카이제곱 오류), 제 1종 오류와 제 2종 오류 (0)	2025.01.08
통계학 기초 정리 (2) : 모집단과 표본, 분포의 종류 (0)	2025.01.07

1. 피어슨 상관계수(Pearson Coefficient Correlation)

2. 비모수 상관계수(Nonparametric Correlation Coefficient )

(1) 스피어만 상관계수(Spearman Correlation Coefficient )

(2) 켄달타우 상관계수(Kendall's Tau correlation coefficient)

3. 상호정보 상관계수(Mutual Information Correlation Coefficient)

'기초통계' 카테고리의 다른 글

티스토리툴바