2016 July 14

Types of Pain

Just sharing some pains for the first commit(s) to the SQL Server Rails v5 support effort. These are very raw findings I captured while working. The idea is to share the pains/gains in making a 3rd party adapter for Rails 5. Goals include:

Have fun with ActiveRecord v5.
Learn where things are hard for us because we may be doing it wrong!
Learn where things are hard for 3rd party adapters in general.
Make contributions to Rails when it makes sense.
Share how the SQL Server adapter is different.

The title of this post is for humor only. The internals of Rails' ActiveRecord have moved forward with leaps and bounds. The ones most interesting for me & the adapter are the new type system. I wrote a little about Sean Griffin's excellent work in a post called ActiveRecord 4.2's Type Casting last year. This year, I was totally expecting Sean to deliver on his promises to decouple types and quoting to help make ActiveRecord better.

Forward: Our Prepared Statement Methodology

Is completely different than most other adapters whereby we use the sp_executesql stored procedure to cache frequently used query plans at the database itself. See this 2011 Engine Yard Blog which talks about this methodology.

Schema Reflection

The SchemaStatements#columns is the first thing that usually breaks when we upgrade for major Rails releases. Like many other low level adapter parts, when it breaks... NOTHING works till you get it fixed. I had a particularly fun time with the changes needed this time around.

Our lowest level method to look up all things schema related is our column_definitions method. The SQL here is responsible for joining many info schema tables to give us everything we need to know for the current database as well as other schemas. Even though there are caches in ActiveRecord just for schema reflection, we found this SQL has to be fast. Hence we create our own binds to leverage sp_executesql's statement cache abilities.

Previously we created binds by making a new column object using the INFORMATION_SCHEMA.COLUMNS.TABLE_NAME of type nvarchar(128) with whatever table name value requested by ActiveRecord. Optionally we did the same with the TABLE_SCHEMA if a table name identifier was fully qualified. This kept our DB reflection fast. However, binds no longer contain column information. This is what I found in the ActiveRecord::Relation class where binds and substitutions are made:

QueryAttribute.new(arel_attr.name, value, klass.type_for_attribute(arel_attr.name))

In some ActiveRecord tests, we see the following:

ActiveRecord::Relation::QueryAttribute.new "id", 20, Post.type_for_attribute("id")

It was now apparent to me how ActiveRecord broke columns out of the type/cast/quote layer by using columns further up the stack. The Relation::QueryAttribute is actually a subclass of ActiveRecord::Attribute. My solution to fix schema reflection was to manually create some query attributes for our binds.

nv128 = SQLServer::Type::UnicodeVarchar.new limit: 128
binds << Relation::QueryAttribute.new('TABLE_NAME', identifier.object, nv128)
binds << Relation::QueryAttribute.new('TABLE_SCHEMA', identifier.schema, nv128) unless identifier.schema.blank?
results = sp_executesql(sql, 'SCHEMA', binds)

Now we were able to reflect on the database again. However, because we need to know the SQL Server data type for our binds to work with sp_executesql, we have extended our types to respond to a sqlserver_type similar to the simplified type. This way, we can ask our binds for type.sqlserver_type and get the tidy type arguments needed for sp_executesql. That was a lot of type'in 😀

module ActiveRecord
  module ConnectionAdapters
    module SQLServer
      module Type
        module Sql

          def sqlserver_type
            type.to_s
          end

        end
        ::ActiveModel::Type::Value.include SQLServer::Type::Sql
      end
    end
  end
end

Updating Our Types

This is where things get hard for us. Unlike other databases, quoting a Ruby type for the database is often dependent on the SQL Server data type of a specific column. For example, a string might be easy for MySQL. But for SQL Server, only non-national (single byte) strings/types can be quoted with single quotes. For multi-byte strings of database type nchar, nvarchar, ntext, and nvarchar(max) – we have to quote using the N'string' "national" prefix. Fun! More interesting are supporting our date and time types. Many of which have different fractional precision and quoting is important to each.

ActiveRecord's types have been moved to ActiveModel and each needs their own serialize(value) implementation vs the old type_cast_for_database(value) which now resides at the connection and a type caster map object. We easily updated our type objects to use the serialize method. But I have yet to dig down to see how many types I will have to couple to quoting as the project moves forward.

At first glance, our column objects are getting smaller and the type objects are getting fatter.

During Inserts

Our exec_insert relied on knowing two critical things. First, if a identity column is being inserted. Second, the target table name of that insert so that we enable identity inserts (table scope) for the duration of that insert statement. Since binds no longer have column information, we are going to have to start trusting the pk parameter signaling that an ID has been set. We were somewhat doing this anyway in our sql_for_insert method. Also, once again, we need to pick up the pattern of using regular expressions on the sql value to pull the table name out. It appears the PostgreSQL adapter does this as well. It will be interesting to see if this slows us down or not.