Python compare string to list of words

Python string comparison is possible using the comparison operators: ==, !=, , =.

For example:

"Alice" == "Bob" # False
"Alice" != "Bob" # True

"Alice" < "Bob" # True
"Alice" > "Bob" # False

"Alice" = "Bob" # False

Python comes with a list of built-in comparison methods: ==, !=, , =.

You commonly see comparisons made between numeric types in Python. But you can compare strings just as well. As it turns out, comparing strings translates to comparing numbers under the hood.

Before jumping into the details, let’s briefly see how to compare strings in Python.

Comparing Strings with == and !=

Comparing strings with equal to and not equal to operators is easy to understand. You can check if a string is or is not equal to another string.

For example:

name = "Jack"

print[name == "John"]
print[name != "John"]

Output:

False
True

Comparing Strings with , =

To compare strings alphabetically, you can use the operators , =.

For instance, let’s compare the names “Alice” and “Bob”. This comparison corresponds to checking if Alice is before Bob in alphabetical order.

print["Alice" < "Bob"]

Output:

True

Now you have the tools for comparing strings in Python. Next, let’s understand how the string comparison works behind the scenes.

String Unicodes in Python

In reality, comparing Python strings means comparing integers under the hood.

To understand how it works, you first need to understand the concept of Unicode.

Python string uses the Unicode Standard for representing characters. This means each character has a unique integer code assigned to it. It is this Unicode integer value that is compared when comparing strings in Python

Here is the Unicode table for English characters [also known as the ASCII values].

Unicode Character Unicode Character Unicode Character Unicode Character
64 @ 80 P 96 ` 112 p
65 A 81 Q 97 a 113 q
66 B 82 R 98 b 114 r
67 C 83 S 99 c 115 s
68 D 84 T 100 d 116 t
69 E 85 U 101 e 117 u
70 F 86 V 102 f 118 v
71 G 87 W 103 g 119 w
72 H 88 X 104 h 120 x
73 I 89 Y 105 i 121 y
74 J 90 Z 106 j 122 z
75 K 91 [ 107 k 123 {
76 L 92 \ 108 l 124 |
77 M 93 ] 109 m 125 }
78 N 94 ^ 110 n 126 ~
79 O 95 _ 111 o

When a Python program compares strings, it compares the Unicode values of the characters.

By the way, to check the Unicode of a character, you do not have to look it up from this table. Instead, you can use the built-in ord[] function.

For instance:

>>> ord['a']
97
>>> ord['b']
98
>>> ord['c']
99
>>> ord['d']
100

Now, let’s check the Unicode values for the capitalized versions of the above four characters:

>>> ord['A']
65
>>> ord['B']
66
>>> ord['C']
67
>>> ord['D']
68

As you can see, the Unicode values for capitals characters differ from their lowercase counterparts. This highlights an important point—Python is case-sensitive with characters and strings.

For example, the result of this comparison:

'A' < 'a'

Yields True.

This is because:

  • The ord[] function returns 65 for ‘A’ .
  • The ord[] function returns 97 for ‘a’.
  • –> The result of 65 < 97 is True.

How Python String Comparison Works Under the Hood

When you compare strings in Python the strings are compared character by character using the Unicode values.

When you compare two characters, the process is rather simple. But what happens when you compare strings, that is, sequences of characters?

Let’s demonstrate the process with examples.

Example 1—Which String Comes First in Alphabetic Order

Let’s compare the two names “Alice” and “Bob” to see if “Alice” is less than “Bob”:

>>> print["Alice" < "Bob"]
True

This states that “Alice” is less than “Bob”. In real life, this means that Alice comes before Bob in alphabetical order, which totally makes sense.

But how does Python know it?

Python starts by comparing the first characters of the strings. In the case of “Alice” and “Bob” it starts by checking if ‘A’ is less than ‘B’ in Unicode:

>>> ord['A'] < ord['B'] # Corresponds to 65 < 66
True

As ord[‘A’] returns the Unicode value of 65 and ord[‘B’] 66, the comparison evaluates to True.

This means Python does not need to continue any further. Based on the first letters it is already able to determine that “Alice” is less than “Bob” because ‘A’ is less than ‘B’ in Unicode.

This is the simplest way to understand how Python compares strings.

Let’s see another a bit trickier example where the compared strings have same first letters.

Example 2—How to Compare Strings with Equal First Letters

What if the first letters are equal when comparing two strings? No problem, Python then compares the second letters.

For instance, let’s check if “Axel” comes before “Alex” in alphabetical order.

print["Axel" < "Alex"]

The result:

False

This suggests that Alex comes before Axel, which is indeed the case.

Let’s see how Python was able to determine this:

  1. The first letters are compared. Both are ‘A’, so there is a “tie”. The comparison continues to the next characters.
  2. The second characters are are ‘x’ and ‘l’. The unicode value for ‘x’ is 120 and 108 for ‘l’. And 120 < 108 returns False. Thus the whole string comparison returns False.

Example 3—How to Compare Strings with Identical Beginning

What if the strings are otherwise equal, but there are additional characters at the end of the other one?

For instance, can you determine if “Alex” comes before “Alexis” in alphabetical order?

Let’s check this using Python:

print["Alex" < "Alexis"]

Result:

True

In this case, the Python interpreter simply treats the longer string as the greater one. In other words, “Alex” is before “Alexis” in alphabetical order.

Now you understand how the string comparison works under the hood in Python.

Finally, let’s take a look at an interesting application of string comparison by comparing timestamps.

Compare Timestamps in Python with String Comparison

In this guide, you have learned that each character in Python has a Unicode value which is an integer. This is no exception to numeric strings.

For example, a string “1” has a Unicode value of 49 and “2” has a Unicode value of 50 and so on:

>>> ord["1"]
49
>>> ord["2"]
50

The Unicode value of a numeric character grows as the number grows.

This means comparing the order of numeric strings gives you a correct result:

>>> "5" < "8"
True

But why would you ever compare numbers as strings?

Comparing numeric strings is useful when talking about ISO 8601 timestamps of format 2021-12-14T09:30:16+00:00.

For example, let’s check if “2021-12-14T09:30:16+00:00” comes before “2022-01-01T00:00:00+00:00“:

>>> "2021-12-14T09:30:16+00:00" < "2022-01-01T00:00:00+00:00"
True

But wait a minute! Does the comparison operator ]

  • less than [

  • Bài mới nhất

    Chủ Đề