Skip to content

ES/DB compare script #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,6 @@ jspm_packages
!.elasticbeanstalk/*.global.yml
.DS_Store
.idea

# Report which might be generated using `scripts/es-db-compare` script
report.html
7 changes: 5 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
"test": "NODE_ENV=test npm run lint && NODE_ENV=test npm run sync:es && NODE_ENV=test npm run sync:db && NODE_ENV=test ./node_modules/.bin/istanbul cover ./node_modules/mocha/bin/_mocha -- --timeout 10000 --require babel-core/register $(find src -path '*spec.js*') --exit",
"test:watch": "NODE_ENV=test ./node_modules/.bin/mocha -w --require babel-core/register $(find src -path '*spec.js*')",
"seed": "babel-node src/tests/seed.js --presets es2015",
"demo-data": "babel-node local/seed"
"demo-data": "babel-node local/seed",
"es-db-compare": "babel-node scripts/es-db-compare"
},
"repository": {
"type": "git",
Expand Down Expand Up @@ -53,8 +54,11 @@
"express-request-id": "^1.1.0",
"express-sanitizer": "^1.0.2",
"express-validation": "^0.6.0",
"handlebars": "^4.5.3",
"http-aws-es": "^4.0.0",
"joi": "^8.0.5",
"jsondiffpatch": "^0.4.1",
"jsonpath": "^1.0.2",
"jsonwebtoken": "^8.3.0",
"lodash": "^4.17.11",
"memwatch-next": "^0.3.0",
Expand All @@ -64,7 +68,6 @@
"pg": "^7.11.0",
"pg-native": "^3.0.0",
"sequelize": "^5.8.7",
"jsonpath": "^1.0.2",
"swagger-ui-express": "^4.0.6",
"tc-core-library-js": "appirio-tech/tc-core-library-js.git#v2.6.3",
"traverse": "^0.6.6",
Expand Down
57 changes: 57 additions & 0 deletions scripts/es-db-compare/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Script to find mismatches between data in DB and ES

We keep all the data in two places in DB (Database) and in ES (Elasticsearch Index). Every time we make any changes to the data in the DB all the changes are also reflected in ElasticSearch. Due to some circumstances data in ES and DB can become inconsistent.

This script may be run to find all the inconsistencies between data we have in ES and DB and create a report.

## Configuration

The following properties can be set from env variables:

- `PROJECT_START_ID`: if set, only projects with id that large than or equal to the value are compared.
- `PROJECT_END_ID`: if set, only projects with id that less than or equal to the value are compared.
- `PROJECT_LAST_ACTIVITY_AT`: if set, only projects with property lastActivityAt that large than or equal to the value are compared.
- `REPORT_S3_BUCKET`: If set, report would be uploaded to this S3 bucket, otherwise report will be saved to disk.
- `AWS_ACCESS_KEY_ID`: AWS credentials, required to upload report to S3 bucket.
- `AWS_SECRET_ACCESS_KEY`: AWS credentials, required to upload report to S3 bucket.

There could be some fields that always mismatch in ES and DB.
The variable named `ignoredPaths` at `scripts/es-db-compare/constants.js` maintains a list of json paths which will be ignored
during the comparation. You may need to modify/add/delete items in the list.

### Required

- `PROJECT_START_ID` and `PROJECT_END_ID` must exist together.
- At least one of `PROJECT_START_ID` with `PROJECT_END_ID` or `PROJECT_LAST_ACTIVITY_AT` needs be set before running the script.
- If you want to upload report to AWS S3 you need to set `REPORT_S3_BUCKET`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` environment variables.

## Usage

Set up configuration and execute command `npm run es-db-compare` on the command line.
It will then generate a HTML report with name `report.html` under the current directory.

Example commands:

- Generate a report comparing ALL the projects:

```bash
PROJECT_LAST_ACTIVITY_AT=0 npm run es-db-compare
```

- Generate a report comparing projects that have been updated on **26 December 2019** or later:

```bash
PROJECT_LAST_ACTIVITY_AT="2019-12-26" npm run es-db-compare
```

- Generate a report comparing projects with ID range:

```bash
PROJECT_START_ID=5000 PROJECT_END_ID=6000 npm run es-db-compare
```

- Any of the command above can be run with additionally set `REPORT_S3_BUCKET`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` environment variables to upload report to S3 bucket like:

```bash
REPORT_S3_BUCKET=<S3 bucket name> AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID> AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>PROJECT_LAST_ACTIVITY_AT="2019-12-26" npm run es-db-compare
```
152 changes: 152 additions & 0 deletions scripts/es-db-compare/compareMetadata.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
/* eslint-disable no-console */
/* eslint-disable consistent-return */
/* eslint-disable no-restricted-syntax */
/* eslint-disable no-param-reassign */
/*
* Compare metadata between ES and DB.
*/
const lodash = require('lodash');

const scriptUtil = require('./util');
const scriptConstants = require('./constants');

const hashKeyMapping = {
ProjectTemplate: 'id',
ProductTemplate: 'id',
ProjectType: 'key',
ProductCategory: 'key',
MilestoneTemplate: 'id',
OrgConfig: 'id',
Form: 'id',
PlanConfig: 'id',
PriceConfig: 'id',
BuildingBlock: 'id',
};

/**
* Process a single delta.
*
* @param {String} modelName the model name the delta belongs to
* @param {Object} delta the diff delta.
* @param {Object} dbData the data from DB
* @param {Object} esData the data from ES
* @param {Object} finalData the data patched
* @returns {undefined}
*/
function processDelta(modelName, delta, dbData, esData, finalData) {
const hashKey = hashKeyMapping[modelName];
if (delta.dataType === 'array' && delta.path.length === 1) {
if (delta.type === 'delete') {
console.log(`one dbOnly found for ${modelName} with ${hashKey} ${delta.originalValue[hashKey]}`);
return {
type: 'dbOnly',
modelName,
hashKey,
hashValue: delta.originalValue[hashKey],
dbCopy: delta.originalValue,
};
}
if (delta.type === 'add') {
console.log(`one esOnly found for ${modelName} with ${hashKey} ${delta.value[hashKey]}`);
return {
type: 'esOnly',
modelName,
hashKey,
hashValue: delta.value[hashKey],
esCopy: delta.value,
};
}
}
if (['add', 'delete', 'modify'].includes(delta.type)) {
const path = scriptUtil.generateJSONPath(lodash.slice(delta.path, 1));
const hashValue = lodash.get(finalData, lodash.slice(delta.path, 0, 1))[hashKey];
const hashObject = lodash.set({}, hashKey, hashValue);
const dbCopy = lodash.find(dbData, hashObject);
const esCopy = lodash.find(esData, hashObject);
console.log(`one mismatch found for ${modelName} with ${hashKey} ${hashValue}`);
return {
type: 'mismatch',
kind: delta.type,
modelName,
hashKey,
hashValue,
path,
dbCopy,
esCopy,
};
}
}


/**
* Compare Metadata data from ES and DB.
*
* @param {Object} dbData the data from DB
* @param {Object} esData the data from ES
* @returns {Object} the data to feed handlebars template
*/
function compareMetadata(dbData, esData) {
const data = {
nestedModels: {},
};

const countInconsistencies = () => {
lodash.set(data, 'meta.totalObjects', 0);
lodash.map(data.nestedModels, (model) => {
const counts = Object.keys(model.mismatches).length + model.dbOnly.length + model.esOnly.length;
lodash.set(model, 'meta.counts', counts);
data.meta.totalObjects += counts;
});
};

const storeDelta = (modelName, delta) => {
if (lodash.isUndefined(data.nestedModels[modelName])) {
data.nestedModels[modelName] = {
mismatches: {},
dbOnly: [],
esOnly: [],
};
}
if (delta.type === 'mismatch') {
if (lodash.isUndefined(data.nestedModels[modelName].mismatches[delta.hashValue])) {
data.nestedModels[modelName].mismatches[delta.hashValue] = [];
}
data.nestedModels[modelName].mismatches[delta.hashValue].push(delta);
return;
}
if (delta.type === 'dbOnly') {
data.nestedModels[modelName].dbOnly.push(delta);
return;
}
if (delta.type === 'esOnly') {
data.nestedModels[modelName].esOnly.push(delta);
}
};

for (const refPath of Object.keys(scriptConstants.associations.metadata)) {
const modelName = scriptConstants.associations.metadata[refPath];
const { deltas, finalData } = scriptUtil.diffData(
dbData[refPath],
esData[refPath],
{
hashKey: hashKeyMapping[modelName],
modelPathExprssions: lodash.set({}, modelName, '[*]'),
},
);
for (const delta of deltas) {
if (scriptUtil.isIgnoredPath(`metadata.${refPath}`, delta.path)) {
continue; // eslint-disable-line no-continue
}
const deltaWithCopy = processDelta(modelName, delta, dbData[refPath], esData[refPath], finalData);
if (deltaWithCopy) {
storeDelta(modelName, deltaWithCopy);
}
}
}
countInconsistencies();
return data;
}

module.exports = {
compareMetadata,
};
Loading