From Python Pandas Error ValueError: -1 is not in range to Two Kinds of Pandas Series Indexing Methods: label-based indexing (loc) and position-based indexing (iloc)

Aug. 01, 2024

Yesterday, I encountered an error, ValueError: -1 is not in range, when I tried to index Python Pandas Series data using [-1], and I found many on-line references but didn’t get a valid solution. Today, I look up McKinney’s book, Python for Data Analysis, and find he mention this point in Subchapter 5.4 Integer Indexes1. By reading McKinney’s introduction, I realize there are two methods to index Pandas Series elements due to its special data structure. And above error is caused by the fact that Pandas confuses the two. In the following text, I’ll record it in detail.

Firstly, create a variable ser whose type is fundamental Pandas Series, i.e. pandas.core.series.Series:

1
2
3
4
5
import pandas as pd
import numpy as np

ser = pd.Series(np.arange(3.))
ser, type(ser)
1
2
3
4
5
(0    0.0
 1    1.0
 2    2.0
 dtype: float64,
 pandas.core.series.Series)

At this time, if we want to select the last element in ser using ser[-1], the error ValueError: -1 is not in range will occur:

1
ser[-1]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ValueError                                Traceback (most recent call last)
File G:\...\venv\Lib\site-packages\pandas\core\indexes\range.py:391, in RangeIndex.get_loc(self, key, method, tolerance)
    390 try:
--> 391     return self._range.index(new_key)
    392 except ValueError as err:

ValueError: -1 is not in range

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[10], line 1
----> 1 ser[-1]

...

KeyError: -1

The reason is that Pandas cannot correctly infer if users want to use label-based indexing or position-based indexing in this case. Specifically, code ser = pd.Series(np.arange(3.)) doesn’t specify index attribute (used for label-based indexing) when creating a Pandas Series, so it automatically generates index labels from 0 to 1 as we can see. So, when we execute ser[-1], Pandas mistakenly thinks we are selecting element by such index label, i.e. label-based indexing method, rather than position-based indexing, which is used for built-in Python lists and tuples. This is why Pandas finds -1 is not in index range.

To verify this point, I new a Series ser2, whose index label is -1, and create ser3 by concatenating ser and ser2 together:

1
2
3
ser2 = pd.Series([4], index=[-1])
ser3 = pd.concat([ser, ser2])
ser3
1
2
3
4
5
 0    0.0
 1    1.0
 2    2.0
-1    4.0
dtype: float64

at this time, we could index the so-called “last” element by:

1
ser3[-1]
1
4.0

To avoid this kind of ambiguity, we could directly create a non-integer indexed Series at the beginning by explicitly specifying index attribute, for example:

1
2
ser4 = pd.Series(np.arange(3.), index=['a', 'b' ,'c'])
ser4
1
2
3
4
a    0.0
b    1.0
c    2.0
dtype: float64

then we can select the last element by position-based indexing method:

1
ser4[-1]
1
2.0

Or, by label-based indexing:

1
ser4['c']
1
2.0

Or, we can still do the same as above case when creating Series, but use iloc function to tell Pandas that -1 denotes a position, rather than a label:

1
2
ser = pd.Series(np.arange(3.))
ser, ser.iloc[-1]
1
2
3
4
5
(0    0.0
 1    1.0
 2    2.0
 dtype: float64,
 2.0)

By the way, it’s a good chance tell iloc from loc. The iloc function2 is to:

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

whereas loc3:

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

We can verify and compare their functions in a special case:

1
2
ser = pd.Series(np.arange(4.), index=[-1,0,1,2])
ser, ser.iloc[-1], ser.loc[-1]
1
2
3
4
5
6
7
(-1    0.0
  0    1.0
  1    2.0
  2    3.0
 dtype: float64,
 3.0,
 0.0)


References