DatabaseRecorder - recording and replaying the database connection

I’ve had this idea to record calls to the database connection for replaying “real” results later on a while ago. I’ve now found a few hours of time on the train and tried it out.

tl’dr: It doesn’t seem worth the effort. There’s no performance gain so far.

Here’s what I found:

Initial implementation

I’ve started out setting up a simple library database_recorder which wraps the execute method on the ActiveRecord::Base.connection into a capture method. The capture method will either record or replay.

If it records it will simply push the result of the current call to connection.execute (i.e. the results from the database) into an array. An at_exit hook will then marshal this array and store it on the disk.
If it replays it will (assuming the array has been unmarshalled and loaded before) shift a result set from the array and directly return it – not calling the database this time (this approach probably won’t work well with an async database driver).

There’s a simle convenience method that makes it easy to hook this up in a Cucumber env.rb:

# features/env.rb
DatabaseRecorder.record_or_replay!(ENV['REPLAY'])

Now one can control the record vs replay mode by setting the REPLAY env variable like this:

$ cucumber              ; record the db connection
$ REPLAY=true cucumber  ; replay it

After throwing up a first simple implementation I’ve tried this with the Cucumber feature suite of adva-cms2 which contains 92 scenarios with 869 steps and uses an SQlite3 database for the tests. I was surprised to see that quite some features actually passed already.

Tweaking it

A few more tweaks were necessary though:

Recording and replaying execute isn’t enough, one also has to take care of such methods as last_insert_row_id which sits on the native database connection object which only is accessible as an instance variable. The implementation of the ActiveRecord SQlite3 adapter makes it so that there’s no single method and no single collection of methods that could be recorded/replayed like this. Instead I had to wrap the execute method on the ActiveRecord::Base.connection and last_insert_row_id and changes on the native SQlite3 database connection object it holds. A little bit ugly, but it works (and once again proves that stuff like LoD is totally valid for Ruby/Rails code as well and one should care about it for widely used libraries).

Once I’ve figured this out almost all scenarios passed. The remaining ones were related to the users signup process.

It now occured to me that my initial assumption (of course) wasn’t accurate: an application won’t always pass exactly the same data down to the database even with tests always running in the same order and test data not being generated randomly but explicitely defined. There’s stuff like random tokens! Wow.

Ok, so I added a line Devise.stubs(:friendly_token).returns('12345') … which made all tests pass. Basically the revelation here is that in order to make this record/replay approach work one needs to make absolutely sure that data is passed to the database in a completely deterministic manner. This may or may not an issue depending on the app, testing philosophy/assumptions and other things.

Now all 92 scenarios were green!

Results

The db_session file that contains the recorded, marshalled database connection results is ~1.4 MB big … just saying.

When I now looked at the run times of both test suite runs I was shocked to notice that both took almost exactly the same time. In fact the test run which was recording the database connection was slightly faster than the one which replayed it. So far I have no clue about the reasons for that. I would think the code that actually “replays” results is simple enough (just doing Array.shift) to outrule the actual database connection. I might overlook something else (garbage collection? … was running on MRI Ruby 1.8.7) but so far it seems that recording test runs are consistently (very) slightly faster than replaying ones.

How disappointing, this would have been nice.

Maybe someone else will pick this experiment up and find something I’ve missed!

Otherwise, I guess, at least I’ve tried it and proved myself wrong ;)