[Pandas] DataFrame loc, iloc 알아보기

Python 2024. 4. 19. 06:16

- 목차

키워드.

Locataion by Label or by Integer.

들어가며.

Pandas 의 DataFrame 은 loc, iloc 을 활용한 Selection 방식이 존재합니다.

loc 와 iloc 는 Location by Label, Location by Integer 의 약자인데요.

loc 와 iloc 는 통해서 Series 나 Sub DataFrame 를 생성할 수 있습니다.

이번 글에서는 loc 와 iloc 를 활용하는 예시들과 함께 설명을 이어가도록 하겠습니다.

loc.

loc 는 Label 을 활용하여 DataFrame 을 Selection 할 수 있습니다.

Label 이란 Index 또는 Column 의 실제 이름을 의미합니다.

간단히 아래의 예시를 살펴보겠습니다.

아래 예시는 3개의 Row 와 5개의 Column 을 가지는 DataFrame 입니다.

첫번째 Row 는 소문자 알파벳을 가지는 Row 입니다.

두번째, 세번째 Row 는 한글과 숫자를 각각 가지는 Row 들입니다.

import pandas as pd

data = [
	["a","b","c","d","e"], 
	["가","나","다","라","마"], 
	["1","2","3","4","5"]
]

df = pd.DataFrame(data)

모든 Row 를 조회하기 위해서는 모든 Index 의 값을 사용하면 됩니다.

총 3개의 Row 이기 때문에 0, 1, 2 가 각 Row 의 index 입니다.

df.loc[ [0,1,2] ]

   0  1  2  3  4
0  a  b  c  d  e
1  가  나  다  라  마
2  1  2  3  4  5

이러한 Selection 을 range 형태로 응용할 수 있습니다.

"df.loc[0:2]" 는 "df.loc[ [0,1,2] ]" 와 동일한 기능을 수행합니다.

즉, loc 내부에서 0:2 는 [0,1,2] 와 같이 인식됩니다.

df.loc[0:2]

   0  1  2  3  4
0  a  b  c  d  e
1  가  나  다  라  마
2  1  2  3  4  5

index 가 숫자가 아닌 문자열인 경우.

이번에는 index 를 문자열로 설정하도록 하겠습니다.

import pandas as pd

data = [
	["a","b","c","d","e"], 
	["가","나","다","라","마"], 
	["1","2","3","4","5"]
]

index = ["Eng", "Kor", "Num"]

df = pd.DataFrame(data, index = index)

각 index 는 Eng, Kor, Num 로 설정되었고, 더 이상 숫자를 기반으로한 Selection 은 적용되지 않습니다.

df.loc[[0,1,2]]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1153, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1382, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1322, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1520, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6115, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6176, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([0, 1, 2], dtype='int64')] are in the [index]"

index 가 더 이상 숫자가 아닌 문자열이므로 Location by Label 방식으로 Selection 을 수행합니다.

print(df.loc["Eng"])

print(df.loc["Kor"])

print(df.loc[ ["Eng", "Num"] ])

0    a
1    b
2    c
3    d
4    e
Name: Eng, dtype: object

0    가
1    나
2    다
3    라
4    마
Name: Kor, dtype: object

     0  1  2  3  4
Eng  a  b  c  d  e
Num  1  2  3  4  5

loc 사용 시에 출력되는 결과는 Series 또는 DataFrame 입니다.

df.loc[ "Eng" ] 와 같이 하나의 Index Label 을 입력하는 경우에 출력되는 값은 Series 입니다.

반면 df.loc[ ["Eng", "Num"] ] 와 같이 Index Label 을 List 형식으로 입력하는 경우에는 DataFrame 이 출력됩니다.

iloc.

iloc 은 Location by Integer 의 약자로 사용됩니다.

loc 은 실제 Index Label 을 입력하여 사용하는 반면에 iloc 는 DataFrame 의 Row 의 Position 값을 활용합니다.

아래의 경우에 Label Index 를 가지는 DataFrame 이지만 숫자를 기반으로한 Selection 이 적용됩니다.

import pandas as pd

data = [
	["a","b","c","d","e"], 
	["가","나","다","라","마"], 
	["1","2","3","4","5"]
]

index = ["Eng", "Kor", "Num"]

df = pd.DataFrame(data, index = index)
df.iloc[ [0,1,2] ]

     0  1  2  3  4
Eng  a  b  c  d  e
Kor  가  나  다  라  마
Num  1  2  3  4  5

다만 존재하지 않는 4번 Row 를 조회하는 경우에는 아래와 같은 에러가 발생합니다.

df.iloc[4]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1153, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1714, in _getitem_axis
    self._validate_integer(key, axis)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1647, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

순서가 변경된 DataFrame 의 iloc.

reindex 함수를 사용하여 DataFrame 의 index 순서를 변경합니다.

기존의 DataFrame 의 Index 순서가 Eng -> Kor -> Num 인 상태에서 iloc[0] 의 값은 Label 이 "Eng" 인 Series 를 반환합니다.

그 이후에 Index 의 순서를 Kor -> Eng -> Num 로 변경하게 되면,

iloc[0] 의 값은 Label 이 Kor 인 Series 가 반환됩니다.

import pandas as pd

data = [
	["a","b","c","d","e"], 
	["가","나","다","라","마"], 
	["1","2","3","4","5"]
]

index = ["Eng", "Kor", "Num"]

df = pd.DataFrame(data, index = index)

print(df.iloc[0])

df = df.reindex(["Kor", "Eng", "Num"])

print(df.iloc[0])

0    a
1    b
2    c
3    d
4    e
Name: Eng, dtype: object

0    가
1    나
2    다
3    라
4    마
Name: Kor, dtype: object

삭제된 Row 찾기 ( drop ).

DataFrame 의 drop 함수를 통해서 특정 Row 를 삭제할 수 있습니다.

여기서 주목할 점은 삭제된 Row 의 Index 가 DataFrame 에서 제거된다는 점입니다.

아래의 예시는 drop 함수를 통해서 변경된 DataFrame 을 확인할 수 있습니다.

import pandas as pd

data = [
	["a","b","c","d","e"], 
	["가","나","다","라","마"], 
	["1","2","3","4","5"]
]

df = pd.DataFrame(data)
print("original DataFrame")
print(df)

df.drop(0, inplace=True, axis=0)
print("removed DataFrame")
print(df)

original DataFrame

   0  1  2  3  4
0  a  b  c  d  e
1  가  나  다  라  마
2  1  2  3  4  5

removed DataFrame

   0  1  2  3  4
1  가  나  다  라  마
2  1  2  3  4  5

위 출력 결과에서 주목할 점은 0 번 Index 에 해당하는 Row 가 제거된 이후에

DataFrame 에는 0번 Index 가 제거되어 있습니다.

이 상황에서 0번 Index 를 찾는 loc 와 iloc 를 실행해보도록 합니다.

0번 Label 을 찾는 loc[0] 는 아래와 같이 예외를 발생시킵니다.

df.loc[0]

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 414, in get_loc
    return self._range.index(new_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1153, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1393, in _getitem_axis
    return self._get_label(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexing.py", line 1343, in _get_label
    return self.obj.xs(label, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 4236, in xs
    loc = index.get_loc(key)
          ^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 416, in get_loc
    raise KeyError(key) from err
KeyError: 0

반면 iloc[0] 은 정상적으로 동작하게 됩니다.

iloc 는 Label 이 아니라 DataFrame 의 실제 위치를 기반으로 데이터를 탐색하기 때문입니다.

df.iloc[0]

0    가
1    나
2    다
3    라
4    마
Name: 1, dtype: object

'Python' 카테고리의 다른 글

[pandas] DataFrame melt 알아보기 (0)	2024.05.17
[Pandas] Boolean Indexing 알아보기 (0)	2024.04.15
[Pandas] DataFrame pd.merge 알아보기 (Join) (0)	2024.04.09
[Python] yield 알아보기 ( generator ) (0)	2024.04.06
[Numpy] copy & view 알아보기 (Shallow, Deep Copy) (2)	2024.03.18

ABOUT ME

코딩수집 코딩수집

- 목차

키워드.

들어가며.

loc.

index 가 숫자가 아닌 문자열인 경우.

iloc.

순서가 변경된 DataFrame 의 iloc.

삭제된 Row 찾기 ( drop ).

'Python' 카테고리의 다른 글

티스토리툴바

ABOUT ME

- 목차

키워드.

들어가며.

loc.

index 가 숫자가 아닌 문자열인 경우.

iloc.

순서가 변경된 DataFrame 의 iloc.

삭제된 Row 찾기 ( drop ).

'Python' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바