Skip to content

startswith function gives misleading results when passed a series #3485

Closed
@darindillon

Description

@darindillon

Using pandas 0.10.1. I think this is probably "working correctly as designed", but the design seems non-obvious and there's not much documentation on this.
The Series object has a vectorized .str.startswith() method which works when you give it a constant string. But if you give it a series instead of a string, it "works" (in the sense of returning data) but that data is not what you'd expect. It appears the function was simply not designed to handle being passed a series, which is OK, but passing a series seems like a logicial thing to do, so can we at least make the function return an error rather than misleading data, so people don't get confused? I had to spent a half hour debugging to figure out why this wasn't working as I expected.

I want to select the rows where column 'two' starts with the string in column 'one'

d = pandas.DataFrame([['hi,hi 2'],['bye','bye 2']], columns=['one','two'])
d.two.str.startswith(d.one) #Returns false for everything, but you'd expect it to be true

The startswith function doesn't handle Series. Ideally, we should change it so it does handle series. But if that's too much work, can we at least have it return "series is not supported, pass a constant string instead" error?

Metadata

Metadata

Assignees

Labels

BugStringsString extension data type and string data

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions