pyspark.pandas.DataFrame.idxmax#

DataFrame.idxmax(axis=0)[source]#

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

Note

This API collect all rows with maximum value using to_pandas() because we suppose the number of rows with max values are usually small in general.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0: The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

Returns

Series

See also

Series.idxmax

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf
   a    b    c
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmax()
a    2
b    0
c    2
dtype: int64

For axis=1, return the column label of the maximum value in each row:

>>> psdf.idxmax(axis=1)
0    c
1    c
2    c
3    c
dtype: object

For Multi-column Index

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
   a    b    c
   x    y    z
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmax()
a  x    2
b  y    0
c  z    2
dtype: int64