Learn “NoSQL” in MySQL
MySQL supports a native JSON data type that supports automatic validation and optimized storage and access of the JSON documents. Although JSON data should preferably be stored in a NoSQL database such as MongoDB, you may still encounter tables with JSON data from time to time. In the first section of this post, we will introduce how to extract data from a JSON field in MySQL with simple statements. And in the second section, we will introduce how to aggregate data from MySQL tables into JSON arrays or objects which can then be used conveniently in your applications.
The system to be set up is similar to the one introduced in the previous article regarding how to execute SQL queries in Python. If you have set up the system according to the instructions in that article, you can proceed to the next section. If not, you can follow the simplified instructions below to set up your system. For a detailed explanation of the commands and options, please refer to the previous article.
Essentially, we will start a local MySQL server in a Docker container:
You can execute SQL queries in the console started above directly. Alternatively, if you prefer to use a graphical interface, you can install and use DBeaver, which is a great graphical database manager for all kinds of databases. It’s really worth a shot if you have been struggling with MySQL workbench. For more details about how to install and set up DBeaver, this article has a short but helpful summary.
Let’s first explore the common MySQL functions and operators that can be used to extract data from a JSON field.
There are two main types of JSON values in MySQL:
- JSON array — A list of values separated by commas and enclosed within square brackets [[]].
- JSON object — A dictionary/hashmap/object [the name is different in different programming languages] with a set of key-value pairs separated by commas and enclosed within curly brackets [{}].
The JSON arrays and objects can be nested within each other, as we will see later.
We can use the JSON_EXTRACT
function to extract data
from a JSON field. The basic syntax is:
JSON_EXTRACT[json_doc, path]
For a JSON array, the path is specified with $[index]
, where the index starts from 0:
mysql> SELECT JSON_EXTRACT['[10, 20, 30, 40]', '$[0]'];
+------------------------------------------+
| JSON_EXTRACT['[10, 20, 30, 40]', '$[0]'] |
+------------------------------------------+
| 10 |
+------------------------------------------+
For a JSON object, the path is specified with $.key
, where the key
is a key of the object.
mysql> SELECT JSON_EXTRACT['{"name": "John", "age": 30}', '$.name'];
+-------------------------------------------------------+
| JSON_EXTRACT['{"name": "John", "age": 30}', '$.name'] |
+-------------------------------------------------------+
| "John" |
+-------------------------------------------------------+
If there are only two arguments for JSON_EXTRACT
as used above, we can use the ->
operator which is an alias for JSON_EXTRACT
. To demonstrate the usage of this operator, we need to have a table with a JSON
field. Please copy the following SQL queries and execute them in the MySQL console or DBeaver:
Especially, MySQL handles strings used in JSON context using the utf8mb4
character set and utf8mb4_bin
collation. A character set is a set of symbols and encodings, and a collation is a set of rules for comparing characters in a character set. It’s better
to create the table with a JSON field using the corresponding character set and collation.
Because utf8mb4_bin
is a binary collation, the keys are case-sensitive and we need to specify them with the correct case:
Now we can use the ->
operator to extract data from the JSON field:
As we see, ->
is just a shortcut or alias for JSON_EXTRACT
.
Interestingly, the quotes are still there for test_name
and test_id
. This is not what we want. We want the quotes to be removed, similar to the name
field.
To remove the quotes of the extracted value, we need to use the JSON_UNQUOTE
function. Since JSON_UNQUOTE[JSON_EXTRACT[…]]
is so commonly used, there is a shortcut operator for this combination as well, namely ->>
. Let's see it in practice:
It proves that ->>
and JSON_UNQUOTE[JSON_EXTRACT[...]]
have the same results. Since ->>
is much less to type, it’s preferred in most
cases.
However, if you want to extract data from a nested JSON array or JSON object, you cannot use chained ->
or ->>
. You can only use ->
and ->>
for the top level and need to use JSON_EXTRACT
for nested levels. Let’s extract the scores for each student:
Cheers! It works as expected.
Key takeaway for extracting data from a JSON field in MySQL:
- Use
$.key
to extract the value of a key from a JSON object. - Use
$[index]
to extract the value of an element from a JSON array. - Use
->
as a shortcut forJSON_EXTRACT
if the value is not a string. - Use
->>
as a shortcut forJSON_UNQUOTE[JSON_EXTRACT[...]]
if the value is a string and you want to remove the quotes for the extracted string. - If you want to extract data from a nested JSON array or JSON object, you cannot use chained
->
or->>
. You can only use->
and->>
for the top level and need to useJSON_EXTRACT
for nested levels.
There are quite a bunch of other functions for working with JSON data in MySQL. However, if you need to use these functions to validate/search your JSON field or perform CRUD operations on it, you should seriously consider using MongoDB to store the JSON field. MongoDB is much more professional and convenient in dealing with non-structured data [documents].
Above we have introduced how to extract values from a JSON field in MySQL. Now we will learn the opposite and explore how to select JSON data from MySQL tables. To continue with this section, we need some dummy data to play with. Please copy the following SQL queries and run them in MySQL console or DBeaver:
For this table, the default character and collate are used. With these two queries, we create a table that stores the extracted data from the first section. This is a common task for data piping and analysis, namely, performing some data analysis after data cleaning. In practice, you would want to store the scores in a separate table so the tables are more normalized. However, here the data is put in the same table for simplicity of demonstration.
We can now aggregate data into a JSON array with the JSON_ARRARYAGG
function:
We can also aggregate the data into a JSON object using the JSON_OBJECTAGG
function:
The aggregated data can then be used in your application directly. JSON_ARRARYAGG
and JSON_OBJECTAGG
can save you the effort of aggregating the data in your application and can be handy sometimes. For example, you can then use the json.loads[]
method to convert the JSON string to an array or dictionary in Python.
If you need to execute the plain SQL queries of JSON_ARRARYAGG
and
JSON_OBJECTAGG
in Python, you can use the SQLAlchemy package as demonstrated in this post.
In this article, we have introduced how to work with JSON data in MySQL. In the first section, the functions and operators that are used to extract data from a JSON field are discussed with simple examples. And in the second section, we did the opposite and aggregated normalized data into JSON arrays or objects which can then be used directly in your program. Normally we should avoid storing non-structured data [documents] in MySQL. However, if it cannot be avoided, the knowledge in this article should be helpful for your work.