Details on recent changes in dropwizard-cassandra and some plans for upcoming releases.

I recently wrote about a project that allows you to easily integrate Cassandra with Dropwizard, and wanted to give an update on some of the recent changes in the project.

First, I need to acknowledge Nick Telford for his contributions to the library. He's brought some very helpful ideas and improvements that I'm sure will benefit others, and his experience with Dropwizard has been invaluable in maintaining consistency.

Recent Changes

These are some of the highlights. For full details, please see the .

Unbundling

One of Nick's improvements was to separate the building of a Cluster from the bootstrapping phase. The original Bundle implementation wasn't really necessary, and with Nick's changes it is now both simpler and easier to use. The CassandraBundle class has been marked as deprecated and will be removed in a future release, so I'd recommend switching to the more direct usage (as documented in the ) if you haven't already.

HealthCheck

In the first release, CassandraHealthCheck used Cluster#connect() to establish and initialise a session to the Cassandra cluster. This approach was chosen as it appeared to be a lightweight and reliable way to force connection to the Cassandra nodes without needing any knowledge of the data contained within it. This would essentially prove that the cluster is up and we can connect to it (but obviously not confirm whether it's the correct cluster).

Unfortunately, after a period of extended use, it emerged that there was a memory leak in the application. After some investigation, this narrowed to the DataStax Cassandra driver itself, and the way Session instances are managed. The Cluster.Manager maintains a set of session instances and holds on to them until the cluster is garbage collected, regardless of whether the session or cluster is closed. To spell it out a bit more clearly: in a typical application, every session you create will remain in memory until your application terminates .

This issue has been raised on the DataStax issue tracker, and I've not yet seen any feedback or progress at the time of writing. I'm hoping this can be picked up soon and rectified, but until then please be careful how you use Session instances in your application.

As a result of this bug, an initial change was pushed out to use Host#isUp() (retrieved from the driver) as a means of determining whether any nodes are available. Unfortunately this proved to be an unreliable means of determining application health as it was not responsive enough and didn't invoke any direct communication with Cassandra at the time the health check was executed. This meant that there were a few (sometimes many) false positives, which is not acceptable.

The JDBC-style validation query approach was finally chosen for CassandraHealthCheck , with a default query on a system table that can be overridden in configuration. The advantages of this approach are many:

  • The health check requires communication with the Cassandra cluster at the time of execution
  • Each application can determine the query that is most appropriate for their use case
  • The health check can validate not only that Cassandra is up, but that we're connected to the correct cluster (this relies on a custom validation query, of course)

Builds Overloads

Another improvement by Nick is an overload for CassandraFactory#build() that does not require a Dropwizard Environment . An example of when this might be useful is when creating a Command that requires interaction with a Cassandra cluster (e.g. schema migration). Of course, there are many other scenarios where this might be useful.

The only behavioural difference between the two overloads is in who manages the Cluster . If you provide an Environment , the Cluster will be managed for you via Dropwizard's Managed objects; if not, you will be responsible for closing the Cluster appropriately as there is no environment lifecycle to hook into.

Roadmap

Versioning

At present, dropwizard-cassandra has been versioned without reference to the version of Dropwizard that it is built against. Going forward, this will need to change.

I've seen some suggestions on the Dropwizard mailing list around versioning strategies, with one proposal of ${dropwizard.version}-${library.version} . While this might work, I'm personally more a fan of inverting these so the library version comes first (as used in the Scala community, where libraries are suffixed with the version of Scala they're compiled against).

Either way, the distinction is less important than the desire to standardise this across all Dropwizard contrib libraries. There doesn't appear to be an agreed versioning strategy yet, but I plan to pursue this as a priority. Any suggestions are welcome!

Suggestions Welcome

Beyond the versioning changes, please feel free to make suggestions or log bugs on the project .



Published

21 June 2014

Share

submit to reddit