Split a string column in Pandas if a small letter is followed by a capital letter or a digit

Md. Al-Imran Abir
1 min readAug 8, 2022
logo of Pandas, a Python library. It contains some non uniform multicolor vertical bars with the word pandas written in a dark color
Pandas logo

Let’s say we have a dataset like the following:

Example dataset

For better understanding, I have used familiar Tennis players' names. Now we want to extract the Names of every player in a separate column. The problem is we have variable numbers of player names in different rows. Due to this, I couldn't use regex directly (it was inserting NaN for the rows that have less than three names) in pd.series.str.split or pd.Series.str.extract (experienced users may have a solution using this). So, I wrote a function for this task and used that in pd.Series.apply(). The full code is given below:

Full code and output for the example dataset

I came across this problem while I was doing an assignment for a Coursera course on Pandas. The dataset that was used for that assignment was from this Wikipedia article.

Here is how we may get that:

Real data

Then we can use the apply method on this dataframe and get our desired output.

--

--