Strings in Python – pyofpython

String

Python string is a sequence of Unicode characters. Python does not have a character data type. It has a built-in class “str” to handle string operation. Even a single character is a string.

How to create a string?

Strings can be created when anything is written within single, double or triple quotes. The type() command can be used to verify the result. Let’s write some strings.

So, we have a variable named as avengers and the string stored in it is I am Iron Man. One can observe how single, double and triple quotes have been used to write the string. Also, using the type () command we have cross verified that the type is string. First thing first, why so many types of quotes? Always remember that a pair of a particular type of quotes can be enclosed within another pair of particular types of quotes. By pair I mean, starting and ending quotes. But what’s the advantage?

When we have to print some specific portion or all of the string within quotes, then, we can enclose one set of quoted string within another set of quotes and print the result. Like in the above example, we want to print 3000 with quotes. This can be done by using single quotes to write 3000, and, the entire string within double quotes. If you want to print double quotes and single quotes as well, then, everything can be enclosed within triple quotes.

Can we have a pair of different quotes? Say, single quote to start with and double quote to end with.

Syntax error!!! I was thinking that Python can do anything. Syntactically, Python also doesn’t support wrong English format.

Triple quotes have one more important application. It is used when multiple lines are needed in a string. It is also used to write multiple line comments. (Remember # is used to write single line comments)

Isn’t it similar to using a backslash? Try it. So, basically, these quotes save us from using some delimiters (not always though, but yes, many times). So, let’s begin with some important string functions.

String Functions

len (): This function is used to give the length of the string (including spaces, if any).

isdigit (): This function checks for the presence of any digit present in the string. If the string has all digits, then, it returns a True value.

What if the string is alphanumeric?

So, it will return true only when the entire string is a digit sequence.

isalnum (): This function checks for the presence of alphanumeric characters present in the string.

So, it can be observed that spaces and special characters will not work. Only alphabets and characters will give a True value.

isalpha (): This function checks for the presence of all non-digit present in the string. If the string has all characters then, it returns a True value.

What about an alphanumeric string?

No. The string has to be a character sequence only. But what about spaces? Can we have a character sequence only, but space separated? Or comma separated?

Space separated or comma separated all characters string when checked with isalpha() return a False value.

upper (): This function will convert the string into uppercase.

lower (): This function will convert the string into lowercase.

isspace (): If all is space within quotes, then, it returns a True value.

What if the string is empty?

OK!! The string must have at-least one space for isspace() to return a True value. However, for whitespace characters (space is also a whitespace character), some ASCII characters such as \f and \r (Form feed and Carriage Return Characters, respectively) and few Unicode Characters, this function returns a True value.

String has some interesting functions for searching anything. They are the startswith(), endswith(), and find() function.

startswith (): If the string starts with the given argument, then, it returns a True value.

endswith (): If the string ends with the given argument, then, it returns a True value.

find (): It searches for the given substring and returns the index position of the sub-string when it is found.

Cool!! So, it’s returning the index value and do remember that spaces are not counted.

replace (): This function is used to replace a substring with another substring. Thus, replace() function has two parameters, first one is the sub-string to be replaced, and second is substring to be replaced with.

The function replace() replaces the sub-string Are Gone with another substring Assemble.

split (): It splits the string from the part which is given in the argument.

We have split the avengers variable that was used in replace() function above. Since, the sub-strings are separated by a space, splitting them with space will return the result as a list of strings. The second output shows that if we split the string using some value which is not present in the string, then, it’ll return the entire string as it is.

What should be the string looking like if we want to split it using the second command as shown above?

So, whenever a dot is encountered in the string, it is split-ed from that part.

join (): It joins the string with the part which is given in the argument.

The string stored in variable iron_man is joined by ‘-’ sign.

strip (): If the string has a space at the beginning and end, then, strip() function will strip the spaces.

If you see there are spaces before and after the string. The function strip() strip-off the spaces which are present before and after the entire string. Spaces within the strings are not affected.

swapcase (): It is used to swap the cases of string from upper case to lowercase and lowercase to uppercase.

title (): It is used to capitalize the first letter of each word present in the string.

capitalize (): It is used to capitalize the first letter of the first word only present in the string.

ord (): It is used to convert a character to it’s ASCII or Unicode integer equivalent.

An error. It says that ord() expected a single character, but a string length of 5 is found. It means that the ord() function only works for a single character.

Now it’s good. Sorry Groot.

We can also check for some Unicode value as well.

chr (): It is used to convert an integer to it’s ASCII or Unicode character equivalent.

One can check if chr() supports multi-characters or single characters only like ord().

isprintable (): It is used to check if the string consists only of printable characters. It gives a True if the string is empty or all the characters it contains are printable. Escape sequences are non-printable. Non-alphabetic characters are ignored.

The .isprintable() method is the only method that returns True value if the string is an empty string.

Python String Concatenation

Concatenation literally means, to join things together. The first form of concatenation is by simply writing strings in quotes. Two or more individually quoted strings separated by space when printed are automatically concatenated.

But do remember that this works only with strings and not with any combination of data types. We cannot concatenate integers with string using this. The concatenation using operators is achieved by ‘+’ and ‘*’ sign.

Good. The two strings are concatenated. But you can see that we can make it more readable. If we give space inside a quote before writing Iron Man, maybe we can achieve what we want. Let’s check.

We can also multiply strings using ‘*’ sign. But what will a multiplied string look like?

This is a multiplied concatenated string. Since, 10 is within quotes, it is string and not a number. When it is multiplied with another string, it is concatenated and the result is 1010 and not 20. But this raises a question in my mind. Can we concatenate a string with an integer?

An error occurred. Looking at the last line we can see that it’s showing a Type error, which further shows that a string cannot be concatenated with an integer. Still if we want to concatenate these two then we can convert integer to string using str command.

Later we’ll see that we can make really meaningful sentences very easily by using both concatenating signs.

Python String Operations

Identity Operators

The identity operators, ‘is’ and ‘is not’ can be used on string.

Membership Operators

The membership operators in and not in can be used on string to find substring.

The output will be Boolean. Also, don’t forget the case-sensitive factor while checking for substrings.

Comparison Operators

The strings can be compared using relational operators. When we check for less than or greater than operation between strings, it gives True or False based on their alphabetical order position.

Since I came before T in alphabetical order, the output value return is True. What if both strings have the first index value common?

OKEH!! So in the movie Tony is of course superior to Thanos. But what happened here? Since the first letter is T, the second letter is compared now. O is greater than H, To is greater than Th, hence, output is True.

Case-sensitivity has to be taken care of. Hulk is not equal to hulk.

Now it’s good.

Arithmetic Operators

Strings support some arithmetic operations as well. Addition is basically concatenation but we can also perform multiplication on strings.

Now this is quite cute (I mean the character Groot). Meanwhile, if you look at this code, few things are there to observe. I have introduced a space before am and Gr. This will introduce the space in the final output as well. I have multiplied O with 2. this will give OO. So, use it wisely to make readable strings. What if there was no space.

Logical Operators

Certain logical operators and, or, not can be used with strings. One of the most important points regarding strings is that an empty string has a False value.

and: If the value on the left hand side is False, the and operator will give the left side value is executed. If the value on the left is True, then the value on the right hand side will be executed.

Let’s check these three outputs. In the first case, both strings are true. So, the value of the right side will be executed.

In the second case, the left side is an empty string. An empty string means false Boolean value. And if the left side value is false, then the left hand side value is returned.

Similarly, in the third case, the left hand side value is True (not an empty string), then, the value of the right hand side of the and operator will be returned. Here, it is an empty string.

or: If the value on the left hand side is True, the or operator will give True or left hand side value. If the value on the left is False, then the value on the right hand side will be executed.

In the first case, both strings are true. So, the value of the left side will be executed.

In the second case, the left side is an empty string. An empty string means false Boolean value. And if the left side value is false, then the right hand side value is returned.

Similarly, in the third case, the left hand side value is True (not an empty string), then, the value of the left hand side of or operator will be returned.

not: If the string is empty, a false value is returned. Using not will return the complemented output. When the string is empty, not will read it as not empty and will return True value.

String Slicing

Like lists, strings can also be sliced when there is a need of displaying a part or substring of the entire string. The syntax is similar to that of list slicing. The symbol used is ‘:’. The syntax is [a:b], where a is the index position of the starting string and b is the ending position. The sliced portion will be from a to b but excluding b.

So, the substring slicing will start from index value 1 and will go to index value 2, excluding 3^rd index value.

The slicing process is exactly similar to that of list and whatever we have studied there in list slicing, they all are applicable in string slicing as well. We will use them directly in programming later.

Python String Formatters

The strings can be formatted using letter ‘f’ before writing the string and writing the variable in curly braces at that exact place where it is written.

OOPS!! Syntax Error. OK. Let me remove the space between f and the starting quote.

Another method to format a string is using %s operator. It’s similar to using %d for integers and %f for floating point numbers. The curly braces variables of the previous method are replaced with %s and the actual variables are later mentioned again using % operator. Let’s check the same previous example but now using %s operator.

There is another method also to format a string. It is the .format() method. The variables are passed as comma separated (if there are more than one variables) parameters within format(). The position of variables are mentioned within the strings using curly braces having values as 0, 1, 2… Let’s again check the above example.

Here, {0} is mapping with Name variable and {1} is mapping with City variable. Can we have some other numbers in place of 0 and 1?

Index Error. This is because there are only two variables and the index value will be starting from 0 to 1. If there had been a third variable, {2} would have returned some value. In-order referencing index position is used. But we can also use empty curly braces as well. In this case, the order of variables passed as parameters in format() will be considered.

Let’s change the positions of Name and City.

OKEH!!! So, NewYork belongs to SpiderMan as much as SpiderMan belongs to NewYork. But one has to be clever enough to use string Formatters with empty curly braces. Otherwise, the end statement can end up being something hilarious and meaningless.

We can also use a Dictionary. Use ** followed by the dictionary name in format() parameter.

There are some other methods which enhance the string format.

zfill (): It will add or pad the string with zeros in the left side of the string. The width of the overall string after the number of zeros padded is passed as parameter in zfill (). If the overall width of the string is the same as the number of zeros to be padded, then, no zeros will be padded.

The string Tony is having length 4. Overall width of the string is 6. Thus, two zeros will be padded to the left of Tony to make it length 6.

If the width required is less than or equal to the width of the given string, then, no zero padding is done and the string is returned as it is.

center (): This function will align the string to center. There are two parameters passed with center (). One is width and second is for padding. Padding part is optional. The value passed as width gives the overall width of the string after center alignment. Let’s check for both these functional parameters.

Looking at the first example, Mark70 has a length of 06. The width passed as parameter is 10. It means that, after 2 spaces, the string will be aligned at center and then two spaces will be introduced making the total width of the string as 10.

In the second example, the first 2 and last 2 spaces are filled with a dash sign. It’s similar to padding with zeros. Here, the padding is done with a dash sign.

rjust (): This function will align the string to right-justified. There are two parameters passed with rjust (). One is width and second is for padding. Padding part is optional. The value passed as width gives the overall width of the string after right alignment. Let’s check for both these functional parameters.

Looking at the first example, Mark70 has a length of 06. The width passed as parameter is 10. It means that, after 4 spaces, the string will be shifted to the right side.

In the second example, the first 4 spaces are filled with a dash sign. It’s similar to padding with zeros. Here, the padding is done with a dash sign.

rstrip (): This is an extension to the strip() function. It will strip or remove the trailing characters from the string. Any whitespace characters present at the right will be stripped off. Any specific set of characters that are required to be stripped off from the right side can be passed as an argument in rstrip().

Let’s do some selective right side stripping.

This is selective stripping from the right side. We are stripping 41 from our string. Reading 41 from right to left, 1 will come first and then 4, i.e., 14. And this is the rule to pass the parameter in rstrip(). Let’s check one more example.

I think it’s quite clear now.

ljust (): This function will align the string to left-justified. There are two parameters passed with ljust (). One is width and second is for padding. Padding part is optional. The value passed as width gives the overall width of the string after right alignment. Let’s check for both these functional parameters. We are taking the same example which was used in rjust.

Looking at the first example, Mark70 has a length of 06. The width passed as parameter is 10. It means that, after 6 characters, 4 spaces will be introduced.

In the second example, the first 6 spaces are filled with a character, and then 4 dashes are used to complete the 10 widths. It’s similar to padding with zeros. Here, the padding is done with a dash sign.

lstrip (): This is an extension to the strip() function. It will strip or remove the trailing characters from the string. Any whitespace characters present at the left will be stripped off. Any specific set of characters that are required to be stripped off from the left side can be passed as an argument in lstrip().

This is selective stripping from the left side. We are stripping 41 from our string. Reading 41 from left to right, 4 will come first and then 1, i.e., 41. And this is the rule to pass the parameter in lstrip(). In the second example, 41.69, has been stripped off from the left side.

partition (<sep>): It splits the string when <sep> encounters for the first time. The output is a collection of tuples, string before <separator>, <separator>, string after <separator>.

rpartition (<sep>): It splits the string when <sep> encounters as last occurrence. The output is a collection of tuples, string before <separator>, <separator>, string after <separator>.

rsplit (sep = None, maxsplit = -1): It splits the string into substring. The parameters <sep> and <maxsplit> are optional. The default value of <maxsplit> is -1. Let’s see how this function works.

Now set the <sep> parameter equal to aka.

It splits the string into two parts, before and after aka. Let’s check the values of maxsplit with 1 and -1.

While maxsplit = -1 splits the string as normal splitting, maxsplit = 1 splits the string starting from right side to the left side. Here, the value is 1, hence, it splits after first substring.

split (sep = None, maxsplit = -1): It splits the string into substring like the rsplit (). The only difference is that with <maxsplit> it splits from left to right instead of right to left. Consecutive delimiters return empty strings and consecutive whitespace characters are treated as single delimiters.

splitlines ([<keepends>]): It splits the string into lines and returns them in lists. The following table consists of the character sequence which constitutes a line boundary.

Consecutive line boundary characters are returned as empty strings as shown above. You can see that output is a list.

If the parameter is passed as True or 1, then line boundaries are intact in the output list as well.

Bytes

A small integer ranging from 0 to 255, a byte object is an immutable sequence of single byte values. A string with prefix b, is termed as a byte string. A byte string works with single, double and triple quotes. Only ASCII characters are allowed in byte literals. If the character value is greater than 127, then, a proper escape sequence must be specified. Let’s check these things.

If the prefix b is also prefixed with r, then, processing of the escape sequence can be disabled.

What’s new in Python 3.8?

Python 3.8 the string formatter has an ‘=’ specifier with f- string. It goes like this f’{variable=}. It will give the result along with the variable as well.

In short, it represents the whole expression. Again, use it wisely according to the need. Because, you can observe that in the above output Name = SpiderMan and City = NewYork is not looking elegant here. So, use it where the entire expression is required.