Description
Hello all. I was requested to raise this as an issue from the neo4j-dcos slack channel.
tl;dr routing doesn't appear to work using the dcos neo4j ee universe package.
Below is the longer writeup.
We have an issue that we hope that you might be able to help us solve. The main issue is that the javascript driver's bolt routing doesn't appear to work. It is our understanding that by specifying the scheme "bolt+routing" in the javascript driver we can connect to any node and the driver will take care of discovering the network and selecting which nodes to use for different operations.
We have neo4j-ee running in a Mesosphere DC/OS cluster in Google Cloud.
We used the neo4j Mesosphere Universe package to launch a basic three node cluster (just core nodes). The neo4j cluster is using the standard private dcos network (9.0.0.0/8).
To confirm that we are running the EE version, we can check the version:
http://localhost:7474/db/manage/server/version
{
"edition" : "enterprise",
"version" : "3.1.2"
}
The cluster also looks like it was installed and booted correctly. The output logs show that each node found all the other nodes. E.g.,
<snip>
Discovering cluster with initial members: [9.0.3.130:5000, 9.0.5.130:5000, 9.0.4.130:5000]
<snip>
We have a docker node container in which we are running a few tests. In the container we load the neo4j js drivers as:
$ npm install neo4j-driver@next
(We have also tried the neo4j-driver and neo4j-driver@latest)
We have a simple node app for checking connectivity (write/read):
var neo4j = require('neo4j-driver').v1;
const neo4j_user = "a_user"
const neo4j_pass = "a_pass"
const neo4j_ip = "9.0.3.130"
var driver = neo4j.driver(`bolt+routing://${neo4j_ip}`, neo4j.auth.basic(neo4j_user, neo4j_pass))
var session = driver.session()
session
.run( "CREATE (a:Person {name:'Arthur', title:'King'})" )
.then( function()
{
return session.run( "MATCH (a:Person) WHERE a.name = 'Arthur' RETURN a.name AS name, a.title AS title" )
})
.then( function( result ) {
console.log( result.records[0].get("title") + " " + result.records[0].get("name") );
session.close();
driver.close();
}).catch(function(err){
session.close();
driver.close();
console.log(err);
});
This works so long as the ip specified is that for the leader. If we give it an ip for one of the followers, we get the following error message:
{ Error: Could not perform discovery. No routing servers available.
at new Neo4jError (/home/node/node_modules/neo4j-driver/lib/v1/error.js:67:132)
at newError (/home/node/node_modules/neo4j-driver/lib/v1/error.js:57:10)
at /home/node/node_modules/neo4j-driver/lib/v1/internal/connection-providers.js:222:35
at process._tickDomainCallback (internal/process/next_tick.js:135:7) code: 'ServiceUnavailable' }
If we use just the 'bolt' scheme instead of 'bolt+routing', we can't write to non-leader nodes and receive a 'not a leader' error message.
As a sanity check, we checked to make sure that all the nodes have route roles. They do:
CALL dbms.cluster.routing.getServers()
[
addresses [9.0.3.130:7687]
role WRITE
,
addresses [9.0.4.130:7687, 9.0.5.130:7687]
role READ
,
addresses [9.0.4.130:7687, 9.0.3.130:7687, 9.0.5.130:7687]
role ROUTE
]
We also checked to make sure that the general cluster routes were correct (from inside the neo4j running containers):
$ dig core-neo4j.marathon.containerip.dcos.thisdcos.directory
<snip>
;; ANSWER SECTION:
core-neo4j.marathon.containerip.dcos.thisdcos.directory. 5 IN A 9.0.5.130
core-neo4j.marathon.containerip.dcos.thisdcos.directory. 5 IN A 9.0.4.130
core-neo4j.marathon.containerip.dcos.thisdcos.directory. 5 IN A 9.0.3.130
<snip>
We also ran the above node app during an interactive session to look at the routing tables of the driver/session.
When we use the leader ip, we get:
> session._writeConnectionHolder._connectionProvider._routingTable.routers
RoundRobinArray { _items: [ '9.0.3.130' ], _offset: 0 }
When we use either of the follower ips, we get:
> session._writeConnectionHolder._connectionProvider._routingTable.routers
RoundRobinArray { _items: [ '9.0.4.130' ], _offset: 0 }
and the ip just changes based upon the one that we used. It appears that the routing table hasn't been properly loaded.
So, at this point we are a little stuck and not sure what the issue is. Everything appears to be configured correctly, but no dice on connecting to a non-leader node and using the routing, which is a feature we would love to use.