Real time synchronization of mongodb data to elastic search based on nodejs

Time:2021-7-12

1、 Preface

Because the company needs to use elasticsearch for full-text retrieval, mongodb is used for persistent storage, but it is hoped that the data in mongodb can be synchronized to elasticsearch in real time when changes occur. At the beginning, elasticsearch v1.7.2 is mainly used, and Mongo River can solve this problem. With the upgrade of elasticsearch, we found that elasticsearch has abandoned Mongo river. What a pity… Google later found a magic tool, Mongo connector, which is a tool written by foreign gods in Python and highly recommended by mongodb’s official website. However, we need to synchronize the attachment information in the document to elasticsearch. Mongo connector does not support attachment synchronization very well. Can there be a nodejs version of data synchronization? GitHub found a node elastic search sync written by Dashen, which is easy to use, but its function is too simple to support complex data filtering and attachment synchronization. Living people can’t be suffocated by urine. Referring to node elastic search sync, I wrote a synchronization tool node mongodb es connector.

2、 Preparation

2.1 installing mongodb

To install mongodb, you can download it from the official website:

https://www.mongodb.com/

PS: about how to build mongodb replica cluster:

https://www.cnblogs.com/ljhdo…

2.2 installing elasticsearch

To install elasticsearch, you can download it from the official website:

https://www.elastic.co/cn/dow…

PS: Please Google for elasticsearch head, kibana, logstash and other related installation

2.3 installing nodejs

To install nodejs, you can download it from the official website:

http://nodejs.cn/

PS: don’t forget to install NPM. Please Google yourself

These are the prerequisites for using node mongodb es connector

2.4 node mongodb es connector download address

github: https://github.com/zhr8521007…

npm: https://www.npmjs.com/package…

3、 File structure

The crawler data config project builds the configuration (add the configuration of the data you want to synchronize here)
The only configuration file that needs to be added or modified by yourself. This file only provides an example and can be deleted without using it    
│       └── ……    
├── lib  
│   ├── pool
╎ - elasticsearchpool.js elasticsearch connection pool
The mongodbpool.js mongodb connection pool
│   ├── promise
The elasticsearchpromise.js elasticsearch method class  
The mongopromise.js mongodb method class
│   ├── util
The fswatcher.js configuration file monitoring class (mainly monitors the configuration files in the crawler dataconfig directory)
The logger.js log class
The execution method after the trigger event of oplogfactory.js Mongo oplog (addition, deletion and modification)
The tail.js monitors whether mongodb data changes 
The util.js tool class
Main.js main method (mainly to synchronize the data in mongodb to elasticsearch immediately after the first startup)
├── logs
│├ -- logger-2018-03-23.log synchronization data print log
│       └── ……    
├── test
│   ├── img
The image of elasticsearch.jpg is not explained
The image of mongodb.jpg is not explained
Structure.jpg image does not explain
│ └ -- test.js test class (nothing written)
App.js startup file               
Index.js interface file (only add, delete and modify the configuration file)
├── package-lock.json 
├── package.json
Readme.md English document (markdown)
Readme.zh-cn.md Chinese document (markdown)
└── LICENSE

Mycards.json file (this file only provides an example)

{
    "mongodb": {
        "m_database": "myTest",
        "m_collectionname": "carts",
        "m_filterfilds": {
            "version" : "2.0"
        },
        "m_returnfilds": {
            "cName": 1,
            "cPrice": 1,
            "cImgSrc": 1
        },
        "m_connection": {
            "m_servers": [
                "localhost:29031",
                "localhost:29032",
                "localhost:29033"
            ],
            "m_authentication": {
                "username": "UserAdmin",
                "password": "pass1234",
                "authsource":"admin",
                "replicaset":"my_replica",
                "ssl":false
            }
        },
        "m_documentsinbatch": 5000,
        "m_delaytime": 1000
    },
    "elasticsearch": {
        "e_index": "mycarts",
        "e_type": "carts",
        "e_connection": {
            "e_server": "http://localhost:9200",
            "e_httpauth": {
                "username": "EsAdmin",
                "password": "pass1234"
            }
        },
        "e_pipeline": "mypipeline",
        "e_iscontainattachment": true
    }
}
    • m_ Database – mongodb need to monitor the database
    • m_ Collectionname – the collection to be monitored in mongodb
    • m_ Filter filters – query conditions in mongodb. At present, some simple query conditions are supported. (the default value is null)
    • m_ Returnfiles – fields that mongodb needs to return. (default is null)
    • m_connection

      • m_ Servers – address of mongodb server. (replica structure, array format)
      • m_ Authentication – if mongodb login authentication is required, use the following configuration (the default value is null)

        -User name - the user name of the mongodb connection
        -Password - password of mongodb connection
        -Authsource - mongodb user authentication, the default is admin
        -Replicaset - the name of the replica structure of mongodb
        -SSL - the SSL of mongodb. (the default value is false)
    • m_ Documents in batch – the number of incoming data from mongodb to elastic search at one time
      You can set a larger value, the default is 1000
    • m_ Delay time – the interval of each elasticsearch (default value is 1000ms)

      • e_ Index – index in elasticsearch
      • e_ Type – the type in elasticsearch, which is mainly used to use bulk
      • e_connection

        • e_ Server – elasticsearch connection string
        • e_ Httpauth – if elasticsearch requires login validation, use the following configuration (default is null)

          • User name – the user name of the elasticsearch connection
          • Password – password for elasticsearch connection
      • e_ Pipeline – the name of the pipeline in elasticsearch
      • e_ Iscontainattachment – whether the pipeline contains attachment rules (the default is false)

    4、 How to use

    Users can edit their configuration file in / crawler data config directory in advance, and the file must be stored in JSON format

    In the file root directory, open the CMD command window and enter the following information:

    node app.js
    

    After the project starts, modify the configuration file (such as mycards. JSON), and the data will be synchronized in real time. Or modify a piece of data in mongodb, and it will be synchronized to elastic search in real time.

    PS: how to synchronize mongodb’s main document and attachment information to elastic search?

    Using the pipeline of elasticsearch.

    First, you need to create a pipeline to elasticsearch

    PUT _ingest/pipeline/mypipeline
    {
      "description" : "Extract attachment information from arrays",
      "processors" : [
        {
          "foreach": {
            "field": "attachments",
            "processor": {
              "attachment": {
                "target_field": "_ingest._value.attachment",
                "field": "_ingest._value.data"
              }
            }
          }
        }
      ]
    }
    

    Then modify the node data in the configuration file (such as mycards. JSON)

    "e_pipeline": "mypipeline"
    

    5、 Results display

    Data in mongodb
    Real time synchronization of mongodb data to elastic search based on nodejs

    Data in elasticsearch

    Real time synchronization of mongodb data to elastic search based on nodejs