Notice

Recent Posts

Recent Comments

Link

GitHub

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

코딩하는 타코야끼

[Matplotlib] 2강_각종 그래프 그리기_(Scatter Plot / Bar plot) 본문

[T.I.L] : Today I Learned/Matplotlib

[Matplotlib] 2강_각종 그래프 그리기_(Scatter Plot / Bar plot)

가스오부시 2023. 5. 9. 20:06

728x90

1. 산점도 (Scatter Plot) 그리기

📍산점도(산포도)

X와 Y축을 가지는 좌표평면상 관측값들을 점을 찍어 표시하는 그래프
변수(Feature)간의 상관성이나 관측값들 간의 군집 분류를 확인할 수 있다.

scatter( ) 메소드 사용

1번인수 : x값, 2번인수 y값
x와 y값들을 모두 매개변수로 전달해야 한다.
x,y 의 인수는 스칼라 실수나 리스트 형태의 객체들을 넣는다.
- 리스트
- 튜플
- numpy 배열 (ndarray)
- 판다스 Series
x와 y의 원소의 수는 같아야 한다.

x = range(1, 1001, 50)
y = range(1001, 1, -50)
y2 = range(1, 1001, 50)
print(len(x), len(y))
>>>
20 20

plt.scatter(x, y, label = 'A', marker = '<') # 비례적인 관계
plt.scatter(x, y2, label = 'B', marker = '*', s = 100) # 반비례적인 관계

# plt.legend(['A라벨', 'B라벨']) 이런식으로 A와 B의 라벨 표시를 동시에 할 수 있다.
plt.legend(bbox_to_anchor = (1,1))
plt.grid(True, linestyle = ':')
plt.show()

📍 설정¶

marker (마커)
- marker란 점의 모양을 말하며 미리정의된 값으로 변경할 수있다.
- scatter() 메소드의 marker 매개변수를 이용해 변경한다.
- https://matplotlib.org/stable/api/markers_api.html
s
- 정수: 마커의 크기
alpha
- 하나의 마커에 대한 투명도
- 0 ~ 1 사이 실수를 지정 (default 1)

📍 산점도 활용

df = pd.read_csv('../data/diamonds.csv')
df.shape
>>>
(53940, 10)

df.info
>>>
<bound method DataFrame.info of        carat        cut color clarity  depth  table  price     x     y     z
0       0.23      Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1       0.21    Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2       0.23       Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31
3       0.29    Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63
4       0.31       Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75
...      ...        ...   ...     ...    ...    ...    ...   ...   ...   ...
53935   0.72      Ideal     D     SI1   60.8   57.0   2757  5.75  5.76  3.50
53936   0.72       Good     D     SI1   63.1   55.0   2757  5.69  5.75  3.61
53937   0.70  Very Good     D     SI1   62.8   60.0   2757  5.66  5.68  3.56
53938   0.86    Premium     H     SI2   61.0   58.0   2757  6.15  6.12  3.74
53939   0.75      Ideal     D     SI2   62.2   55.0   2757  5.83  5.87  3.64

[53940 rows x 10 columns]>

🌓 시각화

plt.figure(figsize = (10,7))
plt.scatter(df["carat"], df['price'],
           alpha = 0.1
           ) # 보통 x: 원인, y: 결과를 넣는다.

plt.title("캐럿과 가격간의 상관관계")
plt.xlabel('캐럿')
plt.ylabel('가격')

# plt.grid(True, linestyle = ":")
plt.show()

🌓 캐럿(carat)과 가격(Price)간의 상관관계 시각화

상관계수 계산

df[['carat', 'price']].corr()

🌓 회귀선

x 가 증가하면 y도 증가 - 비례적관계 (0 ~ 1)
x 가 증가하면 y는 감소 - 반비례적 관계 (-1 ~ 0)

2. 막대그래프 (Bar plot) 그리기

📍 막대그래프(Bar plot)

수량/값의 크기를 비교하기 위해 막대 형식으로 나타낸 그래프
범주형 데이터의 class별 개수를 확인할 때 사용
bar(x, height) 메소드 사용
- x : x값, height: 막대 높이
  - X는 분류값, height는 개수
barh(y, width) 메소드
- 수평막대 그래프
- 1번인수: y값, 2번인수: 막대 너비
매개변수
- 첫번째: 수량을 셀 대상
- 두번째: 수량
xticks(), yticks() : 축의 눈금을 재정의 할때 사용.
xlim(), ylim(): 축의 값의 범위를 재정의 할 때 사용.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

fruits = ['사과', '배', '귤']
counts = [150, 70, 200]

plt.figure(figsize = (16,5))

plt.subplot(1, 2, 1)
plt.bar(fruits, counts, width = 0.5) # width : 막대의 0 ~ 1

# plt.text(x좌표, y좌표, '출력할 텍스트')
for x,y in enumerate(counts):
    plt.text(x - 0.05, y, str(y))

plt.title('과일 개수')
plt.xlabel('과일')
plt.ylabel('수량')

# y축의 값의 범위를 변경
plt.ylim(0, 250)
plt.grid(True)

plt.subplot(1, 2, 2)
plt.barh(fruits, counts, height = 0.4)
plt.title('과일 개수') # subplot(axes)의 title 을 설정,
plt.xlabel('수량')
plt.ylabel('과일')

plt.xlim(0, 250) # X축의 값의 범위를 지정
plt.grid(True)

plt.suptitle('막대그래프', fontsize = 20) # figure 의 title 을 설정.
plt.tight_layout()
plt.show()

📍 막대그래프 활용

강수량추이

df = pd.read_excel('data/강수량.xlsx')
df.shape
>>>
(4, 10)

df.set_index('계절', inplace = True)

🌓 2009년 계절별 강수량을 막대그래프로 비교

plt.bar(df.index, df[2009], width = 0.7)
plt.title('2009년 계절별 강수량')

for x,y in enumerate(df[2009]):
    plt.text(x - 0.16 , y + 5, str(y))
    
plt.ylim(0,800)
plt.ylabel('강수량')
plt.xlabel('계절')

# plt.grid(True, linestyle =":")
plt.show()

🌓 여름 년도별 강수량의 변화 => 주관심사 - 변화 흐름 --> line plot

plt.figure(figsize = (10,4))
plt.plot(df.columns, df.loc['여름'], marker = '.', c = 'r')
plt.bar(df.columns, df.loc['여름'])

for x,y in zip(df.columns, df.loc['여름']):
    plt.text(x - 0.25, y + 6, str(y))

plt.xlabel('년도')
plt.ylabel('강수량')
plt.ylim(0,1200)

# plt.grid(True, linestyle = ':')
plt.show()

🌓 2010, 2011 년도 계절별 강수량을 확인 => 누적 막대 그래프

width = 0.3
x = np.arange(4)
plt.bar(x - width/2, df[2010], width = width, label = '2010')
plt.bar(x + width/2, df[2011], width = width, label = '2011')

plt.xticks(x, labels = df.index) # X: 눈금의 위치값. label: 눈금의 라벨
plt.legend()
plt.show()

df[[2010,2011]].plot(kind='bar')

저작자표시 비영리 변경금지 (새창열림)

'[T.I.L] : Today I Learned > Matplotlib' 카테고리의 다른 글

[Matplotlib] 4강_Pandas 시각화 (0)	2023.05.11
[Matplotlib] 3강_각종 그래프 그리기_( pie( ), hist( ), boxplot( ) ) (0)	2023.05.11
[Matplotlib] 2강_각종 그래프 그리기_(Line Plot) (0)	2023.05.09
[Matplotlib] 1강_Matplotlib 개요 (0)	2023.05.09
[Matplotlib] 한글처리 및 환경설정 (0)	2023.05.09

'[T.I.L] : Today I Learned/Matplotlib' Related Articles