Migrate Data From Atlas to Self-Hosted MongoDB

Migrate Data From Atlas to Self-Hosted MongoDB

Migrate Data From Atlas to Self-Hosted MongoDB

In this blog post, we will discuss how we can migrate data from MongoDB Atlas to self-hosted MongoDB. There are a couple of third-party tools in the market to migrate data from Atlas to Pecona Server for MongoDB (PSMDB), like MongoPush, Hummingbird, and MongoShake. Today, we are going to discuss how to use MongoShake and migrate and sync the data from Atlas to PSMDB.

NOTE: These tools are not officially supported by Percona.

MongoShake is a powerful tool that facilitates the migration of data from one MongoDB cluster to another. These are step-by-step instructions on how to install and utilize MongoShake for data migration from Atlas to PSMDB. So, let’s get started!

Prerequisites:

A MongoDB Atlas account. I created a test account (replica set) and loaded sample data with one click in Atlas:

  1. Create an account in Atlas.
  2. Create a cluster.
  3. Once a cluster is created, go to browse collections.
  4. It will ask for load sample data. Once you click on it, you will see the sample data like below.
    Atlas atlas-mhnnqy-shard-0 [primary] test> show dbs
    sample_airbnb        52.69 MiB
    sample_analytics      9.44 MiB
    sample_geospatial     1.23 MiB
    sample_guides        40.00 KiB
    sample_mflix        109.43 MiB
    sample_restaurants    6.42 MiB
    sample_supplies       1.05 MiB
    sample_training      46.77 MiB
    sample_weatherdata    2.59 MiB
    admin               336.00 KiB
    local                20.35 GiB
    Atlas atlas-mhnnqy-shard-0 [primary] test>

An EC2 instance with PSMDB installed. I installed PSMDB on the EC2 machine:

rs0 [direct: primary] test>

rs0 [direct: primary] test> show dbs
admin   40.00 KiB
config  12.00 KiB
local   40.00 KiB
rs0 [direct: primary] test>

Make sure Atlas and PSMDB both have the same DB version (I have also used this tool on MongoDB 4.2, which is already EOL).

PSMDB version:

rs0 [direct: primary] test> db.version()
6.0.9-7
rs0 [direct: primary] test>

MongoDB Atlas version:

Atlas atlas-mhnnqy-shard-0 [primary] test> db.version()
6.0.10
Atlas atlas-mhnnqy-shard-0 [primary] test>

To install MongoShake, follow these steps:

Step 1: Install Go
Ensure that Go is installed on your system. If not, download it from the official website and follow the installation instructions. I used Amazon Linux 2, so used the below command to install go:

sudo yum install golang -y

Step 2: Install MongoShake
Open the terminal and run the following command to install MongoShake:

git clone https://github.com/alibaba/MongoShake.git
  1. Untar the file; it will create a folder with the name Mongoshake.
  2. cd MongoShake.
  3. Run ./build.sh file.

Once you have installed MongoShake, you need to configure it for the migration process. Here’s how:

  1. Configuration file (collector.conf) will be under conf dir under Mongoshake dir.
  2. In the config file, you can edit the URI for both RS or sharded clusters. Also, the tunnel (how you are migrating the data) method. If you are doing it directly, then the value will be direct. You can edit the log file path and log file name. Below are some important parameters:
    mongo_urls = mongodb+srv://gautam:****@cluster0.teeeayh.mongodb.net/  // Atlas conn string
    Tunnel.address = mongodb:127.0.0.1:27017 // PSMDB conn string
    Sync_mode = all 				        // default incr
    log.dir = /home/percona/MongoShake/log/    // default /root/mongoshake/

    Sync_mode other options: all/full/incr.

  • All means full synchronization + incremental synchronization. (copy the data and apply the oplogs after sync completes). 
  • Full means full synchronization only. (only copy the data).
  • Incr means incremental synchronization only. (only apply the oplog).

There are other parameters as well in the configuration file, which you can tune as per your needs. For example, if you want to read data from the Secondary node and do not want to overwhelm the Primary with the reads, you can set below parameter:

mongo_connect_mode = secondaryPreferred

Step 3: Once you are done with the configuration, run MongoShake in a screen session like the one below:

./bin/collector.linux -conf=conf/collector.conf -verbose 0

Step 4: Monitor the log file in the log directory to check the progress of migration.

Below is the sample log when you start MongoShake:

[2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully
[2023/09/25 21:09:13 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/
[2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully
[2023/09/25 21:09:19 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/
[2023/09/25 21:09:19 UTC] [INFO] GetAllTimestamp biggestNew:{1695675385 26}, smallestNew:{1695675385 26}, biggestOld:{1695668185 9}, smallestOld:{1695668185 9}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}]
[2023/09/25 21:09:19 UTC] [INFO] all node timestamp map: map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}] CheckpointStartPosition:{1 0}
[2023/09/25 21:09:19 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully
[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 Regenerate checkpoint but won't persist. content: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}
[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 checkpoint using mongod/replica_set: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}, ckptRemote set? [false]
[2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 syncModeAll[true] ts.Oldest[7282839399442677769], confTsMongoTs[4294967296]
[2023/09/25 21:09:19 UTC] [INFO] start running with mode[all], fullBeginTs[7282870323207208986[1695675385, 26]]

You will see the below log once full sync is completed, and incr will start (incr means it will start syncing live data via oplog):

[2023/09/25 22:12:04 UTC] [INFO] GetAllTimestamp biggestNew:{1695679924 3}, smallestNew:{1695679924 3}, biggestOld:{1695677613 1}, smallestOld:{1695677613 1}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0::{7282879892394344449 7282889818063765507}]
[2023/09/25 22:12:04 UTC] [INFO] ------------------------full sync done!------------------------
[2023/09/25 22:12:04 UTC] [INFO] oldestTs[7282879892394344449[1695677613, 1]] fullBeginTs[7282889689214746625[1695679894, 1]] fullFinishTs[7282889818063765507[1695679924, 3]]
[2023/09/25 22:12:04 UTC] [INFO] finish full sync, start incr sync with timestamp: fullBeginTs[7282889689214746625[1695679894, 1]], fullFinishTs[7282889818063765507[1695679924, 3]]
[2023/09/25 22:12:04 UTC] [INFO] start incr replication

You will see the logs like this when both nodes are in sync (when lag is 0, i.e., tps=0):

[2023/09/25 22:14:41 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]
[2023/09/25 22:14:46 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]
[2023/09/25 22:14:51 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]
[2023/09/25 22:14:56 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]]

Once the full data replication process is complete and both clusters are in sync, you can stop pointing the application to Atlas. Check the logs of MongoShake, and when the lag is 0, as we can see in the above logs, stop the replication/sync from Atlas or stop MongoShake. Verify that the data has been successfully migrated to PSMDB. You can use MongoDB shell or any other client to connect to the PSMDB instance to verify this.

MongoDB Atlas databases and their collection count:

Database: sample_airbnb
-----
Collection 'listingsAndReviews' documents: 5555

Database: sample_analytics
-----
Collection 'transactions' documents: 1746
Collection 'accounts' documents: 1746
Collection 'customers' documents: 500

Database: sample_geospatial
-----
Collection 'shipwrecks' documents: 11095

Database: sample_guides
-----
Collection 'planets' documents: 8

Database: sample_mflix
-----
Collection 'embedded_movies' documents: 3483
Collection 'users' documents: 185
Collection 'theaters' documents: 1564
Collection 'movies' documents: 21349
Collection 'comments' documents: 41079
Collection 'sessions' documents: 1

Database: sample_restaurants
-----
Collection 'neighborhoods' documents: 195
Collection 'restaurants' documents: 25359

Database: sample_supplies
-----
Collection 'sales' documents: 5000

Database: sample_training
-----
Collection 'posts' documents: 500
Collection 'trips' documents: 10000
Collection 'grades' documents: 100000
Collection 'routes' documents: 66985
Collection 'inspections' documents: 80047
Collection 'companies' documents: 9500
Collection 'zips' documents: 29470

Database: sample_weatherdata
-----
Collection 'data' documents: 10000


Atlas atlas-mhnnqy-shard-0 [primary] sample_weatherdata>


PSDMB databases and their collection count:

rs0 [direct: primary] test> show dbs
admin                80.00 KiB
config              240.00 KiB
local               468.00 KiB
mongoshake           56.00 KiB
sample_airbnb        52.20 MiB
sample_analytics      9.21 MiB
sample_geospatial   984.00 KiB
sample_guides        40.00 KiB
sample_mflix        108.17 MiB
sample_restaurants    5.57 MiB
sample_supplies     980.00 KiB
sample_training      40.50 MiB
sample_weatherdata    2.39 MiB
rs0 [direct: primary] test>
Database: sample_airbnb
-----
Collection 'listingsAndReviews' documents: 5555

Database: sample_analytics
-----
Collection 'transactions' documents: 1746
Collection 'accounts' documents: 1746
Collection 'customers' documents: 500

Database: sample_geospatial
-----
Collection 'shipwrecks' documents: 11095

Database: sample_guides
-----
Collection 'planets' documents: 8

Database: sample_mflix
-----
Collection 'embedded_movies' documents: 3483
Collection 'users' documents: 185
Collection 'theaters' documents: 1564
Collection 'movies' documents: 21349
Collection 'comments' documents: 41079
Collection 'sessions' documents: 1

Database: sample_restaurants
-----
Collection 'neighborhoods' documents: 195
Collection 'restaurants' documents: 25359

Database: sample_supplies
-----
Collection 'sales' documents: 5000

Database: sample_training
-----
Collection 'posts' documents: 500
Collection 'trips' documents: 10000
Collection 'grades' documents: 100000
Collection 'routes' documents: 66985
Collection 'inspections' documents: 80047
Collection 'companies' documents: 9500
Collection 'zips' documents: 29470

Database: sample_weatherdata
-----
Collection 'data' documents: 10000


rs0 [direct: primary] sample_weatherdata>

Above, you can see we have verified data in PSMDB. Now, update the connection string of the application to point to PSMDB.

NOTE: Sometimes, during the migration process, it is possible for some indexes to replicate. So, during the data verification process, please verify the indexes, and if an index is missing, create that index before the cutover time.

Conclusion

MongoShake simplifies the process of migrating MongoDB data from Atlas to self-hosted MongoDB. Percona experts can assist you with migration as well. By following the steps outlined in this blog, you can seamlessly install, configure, and utilize MongoShake for migrating your data from MongoDB Atlas.

To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered? 

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

mysql mysql-server Tutorials