Cara menggunakan mongodb to elasticsearch sync
Status: DeprecatedThis tutorial is written for an outdated version of Transporter. You can read the Ubuntu 16.04 version of this tutorial instead, which uses a more recent version of Transporter. Show
IntroductionElasticsearch facilitates full text search of your data, while MongoDB excels at storing it. Using MongoDB to store your data and Elasticsearch for search is a common architecture. Many times, you might find the need to migrate data from MongoDB to Elasticsearch in bulk. Writing your own program for this, although a good exercise, can be a tedious task. There is a wonderful open source utility called Transporter, developed by Compose (a cloud platform for databases), that takes care of this task very efficiently. This tutorial shows you how to use the open-source utility Transporter to quickly copy data from MongoDB to Elasticsearch with custom transformations. GoalsIn this article, we are going to cover how to copy data from MongoDB to Elasticsearch on Ubuntu 14.04, using the Transporter utility. We’ll start with a quick overview showing you how to install MongoDB and Elasticsearch, although we won’t go into detail about data modeling in the two systems. Feel free to skim through the installation steps quickly if you have already installed both of them. Then we’ll move on to Transporter. The instructions are similar for other versions of Ubuntu, as well as other Linux distributions. PrerequisitesPlease complete the following prerequisites.
Step 1 — Installing MongoDBImport the MongoDB repository’s public key.
Create a list file for MongoDB.
Reload the local package database.
Install the MongoDB packages:
Notice that each package contains the associated version number. Once the installation completes you can start, stop, and check the status of the service. It will start automatically after installation. Try to connect to the MongoDB instance running as service:
If it is up and running, you will see something like this:
This means the database server is running! You can exit now:
Step 2 — Installing JavaJava is a prerequisite for Elasticsearch. Let’s install it now. First, add the repository:
Update your package lists again:
Install Java:
When prompted to accept the license, select Step 3 — Installing ElasticsearchNow we’ll install Elasticsearch. First, create a new directory where you will install the search software, and move into it.
Visit Elasticsearch’s download page to see the latest version. Now download the latest version of Elasticsearch. At the time of writing this article, the latest version was 1.5.0.
Install unzip:
Unzip the archive:
Navigate to the directory where you extracted it:
Launch Elasticsearch by issuing the following command:
It will take a few seconds for Elasticsearch to start up. You’ll see some startup logs as it does. Elasticsearch will now be running in the terminal window.
Keep this terminal open. Make another SSH connection to your server in another terminal window and check if your instance is up and running:
9200 is the default port for Elasticsearch. If everything goes well, you will see output similar to that shown below:
Step 4 — Installing MercurialNext we’ll install the revision control tool Mercurial.
Verify that Mercurial is installed correctly:
You will get the following output if it is installed correctly:
Step 5 — Installing GoTransporter is written in the Go language. So, you need to install
For Go to work properly, you need to set the following environment variables: Create a folder for Go from your
Update your path:
Log out of your current SSH session and log in again. You can close just the session where you’ve been working and keep the Elasticsearch session running. This step is crucial for your environment variables to get updated. Log in again, and verify that your variable has been added:
This should display the new path for Go. In our case, it will be:
If it does not display the path correctly, please double-check the steps in this section. Once our Create a file named
Now copy this brief “Hello, world” program below to the newly opened file. The entire point of this file is to help us verify that Go is working.
Once done, press Then, from your home directory, run the file with Go:
You should see this output:
If you see the “Hello, world” message, then Go is installed correctly. Now go to the
Step 6 — Installing GitWe’ll use Git to install Transporter. Install Git with the following command:
Step 7 — Installing TransporterNow create and move into a new directory for Transporter. Since the utility was developed by Compose, we’ll call the directory
This is where Clone the Transporter GitHub repository:
Move into the new directory:
Take ownership of the
Make sure
Run the
This step might take a while, so be patient. Once it’s done you can build Transporter.
If all goes well, it will complete without any errors or warnings. Check that Transporter is installed correctly by running this command:
You should see output like this:
So the installation is complete. Now, we need some test data in MongoDB that we want to sync to Elasticsearch. Troubleshooting: If you get the following error:
This means that your
Try logging out and logging in again. If the error still persists, use the following command instead:
Step 8 — Creating Sample DataNow that we have everything installed, we can proceed to the data syncing part. Connect to MongoDB:
You should now see the MongoDB prompt,
Insert some sample documents into a collection named
Select the contents you just entered:
This should display the results shown below (
Now you can exit from the database:
A bit of terminology:
Our ultimate goal is to sync the data from the bar collection of the foo database from MongoDB to the bar type of the foo index in Elasticsearch. Step 9 — Configuring TransporterNow, we can move on to the configuration
changes to migrate our data from MongoDB to Elasticsearch. Transporter requires a config file (
Move to the
Config FileYou can take a look at the example
The new file is similar but updates some of the URIs and a few of the other settings to match what’s on our server. Let’s copy the contents from here and paste into the new
Copy the contents below into the file. Once done, save the file as described earlier.
Notice the
Similarly, the URI for Elasticsearch will look like:
Save the Application FileNow,
open the
Replace the sample contents of the file with the contents shown below:
Save the file and exit. Here’s a brief explanation of our pipeline.
Options include:
Transformation FileNow, the last piece of the puzzle is the transformation. If you recall, we stored two records in MongoDB with Let’s say we want the documents being stored in Elasticsearch to have
another field called
Paste the contents below into the file. Save and exit as described earlier.
The first line is necessary to tackle the way Transporter handles MongoDB’s This is a simple transformation for the example, but with a little JavaScript you can do more complex data manipulation as you prepare your data for searching. Step 10 — Executing the TransformationNow that we are done with the setup, it’s time to sync and transform our data. Make sure Elasticsearch is running! If it isn’t, start it again in a new terminal window:
In your original terminal, make sure you are in the
Execute the following command to copy the data:
The Check Elasticsearch to verify that the data got copied, with our transformation:
You will get a result like this:
Notice the field ConclusionNow we know how to use Transporter to copy data from MongoDB to Elasticsearch, and how to apply transformations to our data while syncing. You can apply much more complex transformations in the same way. Also, you can chain multiple transformations in the pipeline. It’s a good practice that if you are doing multiple transformations, keep them in separate files, and chain them. This way, you are making each one of your transformations usable independently for the future. So, that’s pretty much it. You can check out the Transporter project on GitHub to stay updated for the latest changes in the API. You might also want to check out this tutorial about basic CRUD operations in Elasticsearch. |