Split a string column in Pandas if a small letter is followed by a capital letter or a digit
Let’s say we have a dataset like the following:
For better understanding, I have used familiar Tennis players' names. Now we want to extract the Names of every player in a separate column. The problem is we have variable numbers of player names in different rows. Due to this, I couldn't use regex
directly (it was inserting NaN for the rows that have less than three names) in pd.series.str.split
or pd.Series.str.extract
(experienced users may have a solution using this). So, I wrote a function for this task and used that in pd.Series.apply()
. The full code is given below:
I came across this problem while I was doing an assignment for a Coursera course on Pandas. The dataset that was used for that assignment was from this Wikipedia article.
Here is how we may get that:
Then we can use the apply
method on this dataframe and get our desired output.