github linkedin
Cassandra state detected DOWN

Issue

While upgrading Contrail from 2.0 to 2.20 i’ve stuck with error.

# contrail-status
...
contrail-database-nodemgr initializing (Cassandra state detected DOWN.)
....

Following the trail with contrail-status

1) “contrail-status” is JUST python script located in /usr/bin/contrail-status

2) It asks supervisord about services

supervisorctl -s unix:///tmp/supervisord_database.sock status
contrail-database RUNNING pid 9262, uptime 3 days, 4:15:59
contrail-database-nodemgr RUNNING pid 6043, uptime 0:10:51

3) Then it checks URL for every service

curl http://localhost:8103/Snh_SandeshUVECacheReq?x=NodeStatus

4) Parses ProcessStatus

<ProcessStatus>
  <module_id type="string" identifier="1">contrail-database-nodemgr</module_id>
  <instance_id type="string" identifier="2">0</instance_id>
  <state type="string" identifier="3">Non-Functional</state>
  <description type="string" identifier="5">Cassandra state detected DOWN.</description>
</ProcessStatus>

Why contrail-database-nodemgr gives “Non-Functional”???

I’ve found that “Cassandra state detected DOWN” is “FAIL_STATUS_SERVER_PORT” error type. ¬†Contrail getting this error HERE in code. So command below fails for some reason.

cassandra-cli --host 127.0.0.1 --batch < /dev/null| grep 'Connected to:

Cassandra listens ONLY ON management IP (not 127.0.0.1).

# netstat -lnptu|grep 9160
tcp        0      0 172.17.0.8:9160         0.0.0.0:*               LISTEN      9262/java

FIX

We need to add following into “/etc/contrail/contrail-database-nodemgr.conf”

[DEFAULT]
hostip=172.17.0.8

Restart nodemgr

supervisorctl -s unix:///tmp/supervisord_database.sock restart contrail-database-nodemgr