From Python Pandas Error ValueError: -1 is not in range
to Two Kinds of Pandas Series Indexing Methods: label-based indexing (loc
) and position-based indexing (iloc
)
Yesterday, I encountered an error, ValueError: -1 is not in range
, when I tried to index Python Pandas Series data using [-1]
, and I found many on-line references but didn’t get a valid solution. Today, I look up McKinney’s book, Python for Data Analysis, and find he mention this point in Subchapter 5.4 Integer Indexes1. By reading McKinney’s introduction, I realize there are two methods to index Pandas Series elements due to its special data structure. And above error is caused by the fact that Pandas confuses the two. In the following text, I’ll record it in detail.
Firstly, create a variable ser
whose type is fundamental Pandas Series, i.e. pandas.core.series.Series
:
1
2
3
4
5
import pandas as pd
import numpy as np
ser = pd.Series(np.arange(3.))
ser, type(ser)
1
2
3
4
5
(0 0.0
1 1.0
2 2.0
dtype: float64,
pandas.core.series.Series)
At this time, if we want to select the last element in ser
using ser[-1]
, the error ValueError: -1 is not in range
will occur:
1
ser[-1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ValueError Traceback (most recent call last)
File G:\...\venv\Lib\site-packages\pandas\core\indexes\range.py:391, in RangeIndex.get_loc(self, key, method, tolerance)
390 try:
--> 391 return self._range.index(new_key)
392 except ValueError as err:
ValueError: -1 is not in range
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[10], line 1
----> 1 ser[-1]
...
KeyError: -1
The reason is that Pandas cannot correctly infer if users want to use label-based indexing or position-based indexing in this case. Specifically, code ser = pd.Series(np.arange(3.))
doesn’t specify index
attribute (used for label-based indexing) when creating a Pandas Series, so it automatically generates index labels from 0 to 1 as we can see. So, when we execute ser[-1]
, Pandas mistakenly thinks we are selecting element by such index label, i.e. label-based indexing method, rather than position-based indexing, which is used for built-in Python lists and tuples. This is why Pandas finds -1
is not in index range.
To verify this point, I new a Series ser2
, whose index label is -1
, and create ser3
by concatenating ser
and ser2
together:
1
2
3
ser2 = pd.Series([4], index=[-1])
ser3 = pd.concat([ser, ser2])
ser3
1
2
3
4
5
0 0.0
1 1.0
2 2.0
-1 4.0
dtype: float64
at this time, we could index the so-called “last” element by:
1
ser3[-1]
1
4.0
To avoid this kind of ambiguity, we could directly create a non-integer indexed Series at the beginning by explicitly specifying index
attribute, for example:
1
2
ser4 = pd.Series(np.arange(3.), index=['a', 'b' ,'c'])
ser4
1
2
3
4
a 0.0
b 1.0
c 2.0
dtype: float64
then we can select the last element by position-based indexing method:
1
ser4[-1]
1
2.0
Or, by label-based indexing:
1
ser4['c']
1
2.0
Or, we can still do the same as above case when creating Series, but use iloc
function to tell Pandas that -1
denotes a position, rather than a label:
1
2
ser = pd.Series(np.arange(3.))
ser, ser.iloc[-1]
1
2
3
4
5
(0 0.0
1 1.0
2 2.0
dtype: float64,
2.0)
By the way, it’s a good chance tell iloc
from loc
. The iloc
function2 is to:
Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from 0
to length-1
of the axis), but may also be used with a boolean array.
whereas loc
3:
Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.
We can verify and compare their functions in a special case:
1
2
ser = pd.Series(np.arange(4.), index=[-1,0,1,2])
ser, ser.iloc[-1], ser.loc[-1]
1
2
3
4
5
6
7
(-1 0.0
0 1.0
1 2.0
2 3.0
dtype: float64,
3.0,
0.0)
References