What are the 4 data types in python?
You’ve learnt how to do quite a few things in Python in the first three chapters. You’ve seen how most things that happen in a computer program can be categorised as store, repeat, decide, or reuse. In this Chapter, you’ll focus on the store category. You’ll find out more about data types and why things aren’t always as simple as with the examples you’ve
seen so far. The simplest description of a computer program is the following one: A computer program stores data, and it does stuff with that data. Granted, this is not the most detailed and technical definition you’ll find, but it summarises a program perfectly. You need information in any computer program. And then you need to do something with that information. No computer program can exist without data. The data could be a simple
name or number as in the Angry Goblin game, or you could have a very complex and highly structured data set. And if you’re using a programming language, you’ll want to perform some actions with the data and transform it in one way or another. Deciding how you want your program to store the data is an important decision you’ll need to make as a programmer. Languages such as Python provide many alternatives for you to consider. In this Chapter, you’ll read about how data types are handled in
Python and about the different categories of data types. The first part of this Chapter will cover some of the theory related to data types. In the second part, you’ll work on a new project in which you’ll work out all the words that Jane Austen used when writing Pride and Prejudice and how often she used each word. Can you guess the five most common words in Pride and Prejudice? You’ll find out the answer when you complete the project later on in this Chapter. You’ve already come across several of the most important data types in Python. You’ve used integers and floats, strings, Booleans, and lists. There are many more data types in Python. Many, many more! As you learn about more data types, it is helpful to learn about the categories of data types. This is because some data types may have similar properties, and when you know how to deal with a category of data type, you’ll be better equipped to deal
with a new data type you’ll encounter from the same category. In this section, you’ll learn about sequences, iterables, mutable and immutable data types. Don’t be put off by the obscure wording. Like in every other profession, programmers like to use complex-sounding words to make the subject look difficult and elusive. Cannot-be-changed doesn’t sound as grand as immutable! My aim in this book is the
opposite, but we cannot escape using the terms you’ll find in documentation, error messages and other texts. You’ve already used iteration in Python. When you repeat a block of code several times using a A data type is iterable if Python can go through the
items in the data type one after the other, such as in a You’ll recall that you can always check the data type by using the The output you’ll get from printing the data types for these five variables is: You’re already familiar with the first four of these. You’ll recall that you used the This code will simply print out the numbers 0 to 9. Now you know more about functions, you’ll recognise The following code is identical to the You can explore which data types are iterable by trying to use each variable in a The variable The last line of the error message has all the information that you need. The error is a You’ll get the same error message when you try the You’ll get no errors saying that these data types are not iterable in this case: When you iterate through a string, each character is considered one at a time. So the variable You may have noticed that you used the variable In the Another category of data types is a sequence. A sequence is a data type that has ordered items within it. You’ve already seen for example how we can use indexing on both lists and strings—both sequences— to extract an item from a specific position, for example: There is a lot of overlap between data types that are sequences and those that are iterables.
For the time being, it’s fine if you want to think of these two categories as the same, but for completeness, I’ll clarify that they’re not the same. There are some iterable data types that are not sequences. Let’s keep using lists and strings in this section. You’ve seen that both these data types are iterable, and they are sequences. In the last code example in the previous section, you’ve seen how you can extract an item from either a list or a
string based on the position within the data structure. You can now try to reassign a new value to a certain position. Let’s start with lists: In the second line, you’ve reassigned what’s being stored in the third place within the list, since the index Now you can try to do the same with a string: The trick that worked for lists does not work with strings. You get a Lists are mutable, which means that they are flexible containers that can change. You can add new items to a list or replace existing ones. Mutable data types are ideal for data that is likely to change while a program is running. Note that even with immutable data types, you’re still allowed to overwrite the entire variable. So if I did want to change the third letter of my name to In this
case, you have reassigned new data to the variable There is a lot of terminology in coding. Here’s a new term you
haven’t encountered so far: method. If you know what a function is, and you do from the previous Chapter, you also know what a method is. A method is a function that is associated with a specific data type and acts on that data type. This description will make a lot more sense with some examples. Let’s start with standard functions. You’ve seen You’ve also already seen one method being used in the previous Chapter. When you created a list and then you wanted to add items to the list you used The value Methods behave in the same way as functions. The only difference is that in addition to any data included in the parentheses as arguments, methods also have direct
access to the data stored in the object they are attached to. In the example above, the method If you’re using an IDE such as PyCharm, you’ll have noticed by now that these tools have autocompletion to make writing your code easier. If you start typing As Let’s look at a few more list methods: The list You have removed the first item from the list by using the index Note that as you’re using the same list repeatedly, the list is now shrinking as you’ve used The These methods are list
methods. They are defined to work on lists. Let’s have a look at some string methods now: As before with lists, if you’re using an IDE and you type the name of a string followed by a full stop or period, you’ll see all the methods that are associated with strings. Let’s go back to the last examples you’ve been trying out: The variable The The list method However, there’s a reason why string methods and list methods behave differently. Strings are immutable. They are not meant to change once they’ve been created. For this reason, the string methods do not modify the original string but instead return a copy. If you wanted to replace the old string with the new one returned by the method, you’ll have to do so explicitly: >>> my_name = "Stephen" >>> my_name = my_name.upper() >>> my_name 'STEPHEN' You assign the copy of the string returned by As lists are mutable, their methods act directly on the original data. A common bug occurs in programs when a programmer forgets about this distinction and either uses reassignment on list methods or doesn’t use reassignment when using string methods but is then expecting the variable to contain the new data. You’ve encountered several data types already. Some data types, called data structures, store a collection of values. Python has three basic data structures. You’ve already used one of these, the list. A list is a sequence and an iterable, and it’s mutable. You’ll shortly learn about the other two basic data structures in Python: tuples and dictionaries. Python has many other data types, including data structures. In the second part of this book, you’ll learn about more advanced data structures used for dealing with quantitative datasets in science, finance, and other data-driven fields. TuplesThe second of Python’s basic data structures is the tuple. You have a choice on how to pronounce this! Some pronounce the term as tup-el, rhyming with couple. Others pronounce it as two pill. Although tuple is not a common English word—it’s a term that appears in mathematics—it’s the same root that appears at the end of words such as triple, quadruple, or multiple. And you’d see the link between tuple and these words soon. You can create a tuple in a similar way as you would a list. However, the type of bracket associated with a tuple is the parentheses instead of the square brackets: >>> some_numbers = (3, 5, 67, 12, 3, 5) >>> type(some_numbers) You can create a list with the same numbers so that you can explore the differences and similarities between lists and tuples: >>> same_numbers_in_list = [3, 5, 67, 12, 3, 5] >>> type(same_numbers_in_list) Tuples are also sequences and you can use indexing and slicing on tuples in the same way as you do with lists: >>> some_numbers[1] 5 >>> some_numbers[2:4] (67, 12) >>> some_numbers[-1] 5 Notice that when you use indexing or slicing on any sequence, you’ll always use the square brackets immediately after the variable name, even for sequences that are not lists. You can check whether tuples are iterable by using the tuple in a >>> for item in some_numbers: # some_numbers is a tuple ... print(item) ... 3 5 67 12 3 5 You’ve been able to iterate through the tuple You can now check for mutability. You can compare reassigning the value for one of the items in a tuple with the case when you’re using a list: >>> same_numbers_in_list[2] = 1000 >>> same_numbers_in_list [3, 5, 1000, 12, 3, 5] When using a list such as >>> some_numbers[2] = 1000 Traceback (most recent call last): File "", line 1, in This is the same error you got when you tried to reassign a new letter into a string. Tuples, like strings, are immutable. A tuple is an immutable sequence. At this point, you may be wondering why you need tuples at all. They seem to be like lists but with fewer features! However, when you want to create a sequence of items and you know that this sequence will not change in your code, creating a tuple is the safer option. It makes it less likely that your code will accidentally change a value in the tuple, as if your code tries to do so, you’ll get an error. Using a tuple when you know the data should not change means you’re less likely to introduce bugs in your code. Tuples are also more memory efficient than lists, but you don’t need to worry about memory issues for the time being. Although the round brackets, or parentheses, are the type of brackets associated with tuples, you can also omit the parentheses altogether when creating tuples: >>> some_numbers = 3, 5, 67, 12, 3, 5 >>> type(some_numbers) As with lists, you can store any data type in a tuple, including other data structures: >>> another_tuple = (3, True, [4, 5], "hello", (0, 1, 2), 5.5) This tuple contains six items:
For the time being, you don’t need to worry about whether to use a list or a tuple. However, you’ll come across tuples as you code as Python uses this data structure often. DictionariesThe third basic data structure is the dictionary. Imagine you need to store the test marks for students in a class. You could do this with two lists: >>> student_names = ["John", "Kate", "Trevor", "Jane", "Mark", "Anne"] >>> test_results = [67, 86, 92, 55, 59, 79] As long as the names and test results are stored in the same order, you can then extract the values in the same positions in each list. For example, if you extract the second item in each list, you’ll have the name Although you can do this, it’s not ideal. If you want to find Mark’s score, you’ll first need to find out which position in the list Mark occupies and then get the value that’s in the same position in the second list. And what if Trevor leaves the school and needs to be removed from the lists. You’ll need to make sure that his name and his mark are both removed from the two separate lists. Things will get even more complicated if you need to store the test marks for several subjects for each student instead of storing just the test mark for one subject. In the White Room analogy, you’re creating two separate boxes, one labelled You’ve already seen how the square brackets are associate with lists and the round brackets with tuples. Luckily, you still have more types of brackets left on your keyboard! It’s time to use the curly brackets now. You can create a dictionary with all the student names and marks: >>> student_marks = {"John": 67, "Kate": 86, "Trevor": 92, "Jane": 55, "Mark": 59, "Anne": 79} >>> type(student_marks) In addition to using curly brackets, you would have noticed another difference when creating a dictionary. Each item in a dictionary contains two parts separated by a colon. The dictionary The first part of each item is called the key. The second part is called the value. So in the first item of the dictionary above, the key is The order of the items in a dictionary is not important. This is different to lists, tuples, and strings, in which the order of items is a defining feature of the data structure. What matters in a dictionary is the association between the key and its value. If you try to access an item from a dictionary using the same indexing you’ve used for lists, tuples, and strings, you’ll get an error: >>> student_marks[2] Traceback (most recent call last): File "", line 1, in Trying to access the third item in the dictionary doesn’t make sense as a dictionary is not a sequence. Instead, we can use the key to access values from within the dictionary: >>> student_marks["Trevor"] 92 Let’s see whether dictionaries are mutable: >>> student_marks["Kate"] = 99 >>> student_marks {'John': 67, 'Kate': 99, 'Trevor': 92, 'Jane': 55, 'Mark': 59, 'Anne': 79} The answer is ‘yes’. Dictionaries are a mutable data type. You’ve
changed the value associated with >>> student_marks["Anne"] = student_marks["Anne"] + 1 >>> student_marks {'John': 67, 'Kate': 99, 'Trevor': 92, 'Jane': 55, 'Mark': 59, 'Anne': 80} Python will look at what’s on the right of the assignment operator What happens if you try to access the value of a key that doesn’t exist? >>> student_marks["Matthew"] Traceback (most recent call last): File "", line 1, in The key >>> student_marks["Matthew"] = 50 >>> student_marks {'John': 67, 'Kate': 99, 'Trevor': 92, 'Jane': 55, 'Mark': 59, 'Anne': 80, 'Matthew': 50} You can now have a look at some of the methods that you can use with dictionaries. You can start with a couple of methods that are not very exciting but can be very useful: >>> student_marks.keys() dict_keys(['John', 'Kate', 'Trevor', 'Jane', 'Mark', 'Anne', 'Matthew']) >>> student_marks.values() dict_values([67, 99, 92, 55, 59, 80, 50]) Another useful method is >>> student_marks.get("Anne") 80 >>> student_marks.get("Zahra") >>> When you use >>> student_marks.get("Anne", "There is no student with this name") 80 >>> student_marks.get("Zahra", "There is no student with this name") 'There is no student with this name' How about >>> for stuff in student_marks: ... print(stuff) ... John Kate Trevor Jane Mark Anne Matthew You don’t get an error, but you can see that only some of the information has been stored in the variable The Pride & Prejudice Project: Analysing Word FrequenciesIt’s time to start working on a new project to consolidate lots of what you’ve learned so far. Your task is to read and analyse Jane Austen’s Pride and Prejudice. Except, you won’t be reading the novel! In this project, you’ll find all the words that Jane Austen used in the book and how often each one was used. I mentioned earlier that you wouldn’t be reading the book. However, your computer program will. So, before you start working on the Pride & Prejudice project, you’ll find out how to read data from an external source. You’ll need the file named NOTE: As the content of The Python Coding Book is currently being gradually released, this repository is not final, so you may need to download it again in the future when there are more files that you’ll need in later chapters. Making files accessible to your project The simplest way to make sure you can access a file from your Python project is to place it in the project folder—the same folder where your Python scripts are located. If you’re using an IDE such as PyCharm, you can drag a file from your computer into the Project sidebar to move the file. Alternatively, you can locate the folder containing your Python scripts on your computer and simply move the files you need in that folder as you would move any other file on your computer. Tip: In PyCharm, if the Project sidebar is open you can click on the project name (top line) and then show the contextual menu with a control-click (Mac) or right-click (Windows/Linux). One of the options will be Reveal in Finder or Show in Explorer depending on what operating system you’re using. Reading Data From a Text FileIn all the programs you’ve written so far, all the data you used was data you typed into your program directly. This is rarely the case in real life. In many coding applications, your program will need access to data available in some other form. In the second half of this book, you’ll spend a lot of time looking at various ways of importing and accessing data from external sources. Here, you’ll look at one of the most basic yet very useful ways of importing data: reading from a text file. The text
of the book is in the text file Just as you can open a file in your computer’s operating system, you can open the file within your Python program. It’s time to open a new Python script and get started: open("pride_and_prejudice.txt") The function file = open("pride_and_prejudice.txt") print(file) The output from this code is not quite what you might expect:
Let’s ignore this for now. You can think of the object
returned from file = open("pride_and_prejudice.txt") text = file.read() print(text) The method file = open("pride_and_prejudice.txt") text = file.read() print(type(text)) You’ll find that Note that the first and last lines of the text file are not part of the book but are the credit for this open-source ebook that we’re using. You can remove these lines from your version if you wish. Before we proceed, there’s some housekeeping you need to do. You should never leave a file open longer than you need to. The simplest way to take care of this is by closing the file once you’ve brought the data into a Python variable: file = open("pride_and_prejudice.txt") text = file.read() file.close() print(text) Although opening and closing the file using the with open("pride_and_prejudice.txt") as file: text = file.read() print(text) The code above achieves the same goal as the previous version. The file is automatically closed once the code within the The P&P Project: Reading and Cleaning the DataFrom the point of view of your program, all you’ve got so far is a single string. A very long, single string. What you ideally need is the individual words. One of the Python string methods will come to your rescue for this task. You can explore this method in the Console: >>> some_text = "This is a sentence which is stored as one single string" >>> some_text.split() ['This', 'is', 'a', 'sentence', 'which', 'is', 'stored', 'as', 'one', 'single', 'string'] The You can now use this method on the Pride and Prejudice text: with open("pride_and_prejudice.txt") as file: text = file.read() words = text.split() print(words) Have a look at the words in the list that’s printed out. I’m not showing the output here as Jane Austen wrote many words! Your task is to find all the words that have been used in the book. Do you mind whether a word is capitalised in some instances and not in others? No, not for the problem you’re trying to solve. How about whether a word comes at the end of a sentence and is followed by a full stop or period? Or perhaps a comma or other punctuation mark? In any project in which you bring in data from an external source, you’ll need to clean the data before you can start using it. What’s required to clean the data will depend on what data you have and what you want to do with the data. In this case, cleaning the data means removing capitalisation and removing punctuation marks so that you’re left with just the lowercase words. You can start by removing all capitalisation in the text. You’ve used the However, there’s a simpler way. When you write a computer program, you’ll never write your code from line 1 to the last line in order. You’ll jump back and forth as you add code and make changes throughout your script. In this case, you’re better off returning to where you had a single, long string and apply the with open("pride_and_prejudice.txt") as file: text = file.read() text.lower() words = text.split() print(words) Did this work? Are all the words in the list you print out lowercase? No, they’re not. You’ll recall when we talked about mutable and immutable data types and compared the difference between how list methods and string methods behave. String methods do not change the string they’re acting on. Instead, they return a copy. You can override this by reassigning the string returned to the same variable: with open("pride_and_prejudice.txt") as file: text = file.read() text = text.lower() words = text.split() print(words) The output from this code is the following—the list is truncated in the output shown here for display purposes:
The first few words in the list highlight a
few problematic entries. If you look at the 8th and 14th entries, you’ll see >>> "prejudice," == "prejudice" False The extra comma in the first case makes the strings different. Since you’ll need to identify which words are repeated, you’ll need to remove this comma and all other punctuation marks. Earlier in this Chapter, you came across another string method that will come in useful here, with open("pride_and_prejudice.txt") as file: text = file.read() text = text.lower() text = text.replace(",", " ") words = text.split() print(words) You’re replacing all commas with a space in the whole text before splitting the string into a list of words:
The commas have all gone. The commas you still see are the ones separating the words in the list—these are Python commas and not commas in the text. You’ll now need to repeat this for all the punctuation marks. You’ve seen that you have two options for repeating code, the with open("pride_and_prejudice.txt") as file: text = file.read() text = text.lower() for punctuation_mark in ".,?!;:-": text = text.replace(punctuation_mark, " ") words = text.split() print(words) You can loop over any iterable data type, and a string is an iterable. Therefore, you can write a string with several punctuation marks in the You can scan through the output of your code to make sure that none of the punctuation marks you listed in the Now you need to try to think of all possible punctuation marks. Or just rely on the ready-to-use string that Python has in one of its built-in modules called >>> import string >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' This module contains a string named import string with open("pride_and_prejudice.txt") as file: text = file.read() text = text.lower() for punctuation_mark in string.punctuation: text = text.replace(punctuation_mark, " ") words = text.split() print(words) You’ve completed the first part of this project. You’ve read in the text from a text file and cleaned the data. You can now remove the final You learned about commenting as one way of making your code more readable. You can add a few concise and well-placed comments in your code: import string with open("pride_and_prejudice.txt") as file: text = file.read() # Clean data by removing capitalisation and punctuation marks text = text.lower() for punctuation_mark in string.punctuation: text = text.replace(punctuation_mark, " ") # Split string into a list of strings containing all the words words = text.split() There isn’t an ideal number of comments you’ll need to put in. This depends on your preference and style. I chose not to comment the first section where the file is being opened as the code seems clear enough. The comment in the final block is there to remind the reader that Comments are not the only way to make the code
more readable. Choosing descriptive variable names is just as important. Naming the variable defined in the The P&P Project: Analysing the Word FrequenciesYou’ve converted the text into a list of words that have been cleaned. How do you proceed from this point? Let’s time travel back to the pre-computer age, and you can ask yourself: "How would I perform this task using pen and paper if I had lots of time to spare?" Try writing down the steps you’ll need before reading on. Here are the steps you’ll need to do to solve this problem:
You’ve now created the algorithm you’ll need to follow. Your next task is to translate these steps from English into Python, except for the last step that won’t translate well! Before you can start writing the code for this next section, you have another important decision to make. What’s the best way to store your data as you go along? What data structure should you use? Although you can store the data in two lists, one containing the words and the other containing the number of times each word appear in the book, you’ve learned earlier in this Chapter that there’s a better way. You can use a dictionary. It’s useful to create a dummy version of the dictionary you want in the Console to visualise it and so that you can experiment with it as needed: >>> some_words = {"hello": 3, "python": 8, "bye": 1} >>> some_words["computer"] = 2 # Add new word >>> some_words {'hello': 3, 'python': 8, 'bye': 1, 'computer': 2} >>> some_words["python"] = some_words["python"] + 1 # Increment count for existing word >>> some_words {'hello': 3, 'python': 9, 'bye': 1, 'computer': 2} Now you know what form your dictionary will take and how to add new words and increment the count for words already in the dictionary, you can go back to the P&P script. You’ll first need to create a variable with an empty dictionary stored in it: # follows on from code you've already written above # which reads from file and cleans the data word_frequencies = {} Next, you need to repeat the steps you listed earlier, going through each word in the text. You’ll need a # follows on from code you've already written above # which reads from file and cleans the data word_frequencies = {} for word in words: if word not in word_frequencies.keys(): word_frequencies[word] = 1 It is good practice to use the singular version of the name you used for your list as the variable you define in the For each word in the list of words, you’re checking if the word is already in the dictionary >>> # using the same variable some_words you created and modified earlier >>> some_words {'hello': 3, 'python': 9, 'bye': 1, 'computer': 2} >>> some_words.keys() dict_keys(['hello', 'python', 'bye', 'computer']) >>> 'python' in some_words True >>> 'monday' in some_words False >>> 'monday' not in some_words True The dictionary method There’s one last step left. You’ll need to add an # follows on from code you've already written above # which reads from file and cleans the data word_frequencies = {} # Loop through list of words to populate dictionary for word in words: if word not in word_frequencies.keys(): word_frequencies[word] = 1 else: word_frequencies[word] = word_frequencies[word] + 1 Time to reveal what the variable print(word_frequencies) The output is a very long dictionary—only a truncated version is displayed here, but you’ll be able to see the whole dictionary in your version:
You have a dictionary with all the words in the book and the number of times each word is used. You can find out how many unique words there are in the book: print(len(word_frequencies)) The length of the dictionary "Can I sort the dictionary based on word frequencies?" is a common question I’m asked. The answer is yes. Although it is possible to do this directly within Python, you’ll take a different approach in the next section, which will then allow you to sort using a tool you’re probably already familiar with. Looping Through a DictionaryEarlier in this Chapter, when you first learned about dictionaries, you saw that you could loop through a dictionary. However, you could only iterate through the
keys of the dictionary. Let’s look at a better way of looping through a dictionary with a A very brief detour first. Create a tuple containing two items: >>> numbers = (5, 2) You can unpack this tuple into two separate variables: >>> first, second = numbers >>> first 5 >>> second 2 Unpacking works with other sequences as well, not just tuples. Now, let’s look at another dictionary method called >>> some_words = {'hello': 3, 'python': 9, 'bye': 1, 'computer': 2} >>> some_words.items() dict_items([('hello', 3), ('python', 9), ('bye', 1), ('computer', 2)]) This method returns an object of type >>> for something in some_words.items(): ... print(something) ... ('hello', 3) ('python', 9) ('bye', 1) ('computer', 2) You can now loop through a dictionary and get access to all the information in the dictionary, not just the keys. The variable >>> for word, frequency in some_words.items(): ... print(word) ... print(frequency) ... hello 3 python 9 bye 1 computer 2 You’re now defining two variables in the Writing Data To a SpreadsheetYou almost have all the tools you need to go through the dictionary containing words and word frequencies and write its contents into a spreadsheet. There’s only one thing missing. You’ve seen how to open and read a file. You can also open and write to a file: >>> file = open("test_file.txt", "w") >>> file.write("Good Morning.\nI'm writing this to a file. Hurray!") 49 >>> file.close() You’ve opened the file in a similar way as before, with one difference. There are now two arguments in
The string A word of warning: if you open a file that already exists using the Earlier in this Chapter when you opened If you locate your project folder on your computer, you’ll now be able to find a new file called
If you look back at the string you used as an input argument in You may also have noticed that the You can now return to the P&P project, and you’re ready to export the data to a spreadsheet. You’ll use the CSV file format, which is the most straightforward
spreadsheet format. CSV stands for comma-separated values. A CSV file is a standard text file which has the The steps you’ll need to create a spreadsheet are:
You can now translate these steps into Python: # follows on from existing code written earlier in P&P project # Export words and frequencies to a CSV spreadsheet file = open("words in Pride and Prejudice.csv", "w") # Write header line file.write("Word,Frequency\n") # Loop through dictionary and write key-value pairs to csv for word, frequency in word_frequencies.items(): file.write(f"{word},{frequency}\n") file.close() Before the loop, you’re writing the top line of your spreadsheet which is the header row. You’ll need to include the newline character Within the You can now find a file called Here’s the full code for the P&P project. In the version below, the import string #### # PART 1: read and clean data with open("pride_and_prejudice.txt") as file: text = file.read() # Clean data by removing capitalisation and punctuation marks text = text.lower() for punctuation_mark in string.punctuation: text = text.replace(punctuation_mark, " ") # Split string into a list of strings containing all the words words = text.split() #### # PART 2: find words and their frequencies word_frequencies = {} # Loop through list of words to populate dictionary for word in words: if word not in word_frequencies.keys(): word_frequencies[word] = 1 else: word_frequencies[word] = word_frequencies[word] + 1 #### # PART 3: Export words and frequencies to a CSV spreadsheet with open("words in Pride and Prejudice.csv", "w") as file: # Write header line file.write("Word,Frequency\n") # Loop through dictionary and write key-value pairs to csv for key, value in word_frequencies.items(): file.write(f"{key},{value}\n") In the Snippets section at the end of this Chapter, you’ll see a modified version of this code that packages the various parts into functions. You’ll also add an extra section that creates a simple quiz which will present you with a random word from the book and you’ll need to guess how often it appears in the book. ConclusionData is a key part of every computer program. Programming languages like Python have a large range of data types and data structures to deal with all requirements. Learning the differences and similarities between different data types, how to convert between data types and how to manipulate data stored in variables is a key part of learning to code. In this Chapter, you’ve covered:
Your next stop on this journey will be full of errors and bugs! The next Chapter focuses on the differences between errors and bugs and learning how to deal with errors and how to find and fix bugs. SnippetsList Comprehensions and Other ComprehensionsIn this Chapter and in the previous one, you have learned about a common and useful algorithm in programming. You need to
populate a list or another data structure. You first create an empty list and then you use a For example, let’s assume you have a list of names, and you’d like to create a new list containing the same names in uppercase letters: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [] for name in names: new_names.append(name.upper()) print(new_names) The list
The steps you use in this algorithm are:
These three steps are so common in programming that Python has a shorter way of achieving the same result. You can use list comprehensions: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [name.upper() for name in names] print(new_names) You’ve
now replaced the three lines in the original code which create the empty list and then populate it using a
Another way to look at what’s happening in this list comprehension is to
translate this line into plain English. The translation would read like this: Create a list called Efficiency of list comprehensionsList comprehensions make code more compact and quicker to write. However, there are also some efficiency advantages when using list comprehensions compared to the longer version. When you create an empty list, the computer program doesn’t
know how large the list should be and allocates a certain amount of memory for this list. Remember that code is executed one line at a time, and the With a list comprehension, as the whole process happens on a single line, the program can be more efficient and creates a list of the correct size straight away. You’ll read more about the difference in efficiency between these two versions of code in a later Chapter when you’ll also compare this to other ways of solving the same problem. List comprehensions with conditional statementsLet’s assume that in your new list, you’d only like to have the uppercase versions of the names
that are four letters long. In the classic version of the algorithm, you can add an names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [] for name in names: if len(name) == 4: new_names.append(name.upper()) print(new_names) Only when the length of
You can achieve the same result with a list comprehension: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [name.upper() for name in names if len(name) == 4] print(new_names) This code gives the same output as the longer version before it. You’ve managed to squeeze four lines of code into one. The translation now reads: Create a list called Let’s make one final addition. If the name is not four letters long, then you’d like to place the string names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [] for name in names: if len(name) == 4: new_names.append(name.upper()) else: new_names.append("xxxx") print(new_names) This code give the following result:
List comprehension can also be used to achieve this: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = [name.upper() if len(name) == 4 else "xxxx" for name in names] print(new_names) The If this were the main text, I’d refer you to a snippet to explain why this is the case, as it’s not key to understanding list comprehensions. However, you’re already reading a snippet! The short answer is that the Other comprehensionsList comprehensions are the most commonly used type of comprehension. However, you can use comprehensions with other data structures, too. If you want to create a dictionary where the keys consist of each name in the list of names, and the value of each key is the length of the name, you could write the following code: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] name_lengths = {} for name in names: name_lengths[name] = len(name) print(name_lengths) The algorithm is similar to the earlier version with lists. You create an empty dictionary and
then iterate through the original list using a
You can achieve the same output using a dictionary comprehension: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] name_lengths = {name: len(name) for name in names} print(name_lengths) The statement before the Dictionary comprehensions can get a bit more complex if the keys and values come from different data structures. You may be tempted to try the same thing with a tuple: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = (name.upper() for name in names) print(new_names) The output from this code gives the following result:
You’ll read about generators later on in this book. Therefore, I won’t dwell on them at this stage. If you’d like to create a tuple using a comprehension, you can do so as follows: names = ["John", "Mary", "Ishan", "Sue", "Gaby"] new_names = tuple(name.upper() for name in names) print(new_names) And this does indeed give a tuple:
Comprehensions can look a bit weird initially. However, once you get used to them, you’ll see them as a convenient and useful way of writing neater and more efficient code. And they’ll save you a few lines of code, too! Sign-Up For UpdatesThis site was launched in May 2021. The main text of the book is now almost complete—a couple of chapters are still being finalised. Blog posts are also published regularly. Sign-up for updates and you’ll also join the Codetoday Forum where you can ask me questions as you go through this journey to learn Python coding. Follow me on Twitter, too What are the 4 data types?The data is classified into majorly four categories:. Nominal data.. Ordinal data.. Discrete data.. Continuous data.. What are data types in Python?Built-in Data Types. How many data types are there in Python?In a programming language like Python, there are mainly 4 data types: String – It is a collection of Unicode characters (letters, numbers and symbols) that we see on a keyboard. Numerical – These data types store numerical values like integers, floating-point numbers and complex numbers.
What are the 4 data types give examples of each type?Data type. Boolean (e.g., True or False). Character (e.g., a). Date (e.g., 03/01/2016). Double (e.g., 1.79769313486232E308). Floating-point number (e.g., 1.234). Integer (e.g., 1234). Long (e.g., 123456789). Short (e.g., 0). |