Using Ruby 1.9 Ripper

While Ripper parses your code it continously fires events (or “calls callbacks”) when it finds something interesting. There are two types of events: scanner (lexer) and parser events.

The scanner basically goes through the code from the left to the right character by character. When it finds known things (such as a keyword, whitespace or a semicolon) it fires a corresponding even that you can react to. The parser works on a higher level and watches for known Ruby constructs (such as a symbol, a method call or a class definition) and also fires events.

You can check the available events by outputting Ripper::SCANNER_EVENTS and Ripper::PARSER_EVENTS.

You can respond to these events by simply defining methods named :"on_#{event_name}" (omitting the @ character for scanner events). As long as you do not mess this up (which you might want to do) the parser always passes the results from the last inner parser events to the current parser event. E.g.:

require 'ripper'

class DemoBuilder < Ripper::SexpBuilder
  def on_int(token) # scanner event
    super.tap { |result| p result }
  end

  def on_binary(left, operator, right) # parser event
    super.tap { |result| p result }
  end
end

src = "1 + 1"
DemoBuilder.new(src).parse

This outputs:

[:@int, "1", [1, 0]]
[:@int, "1", [1, 4]]
[:binary, [:@int, "1", [1, 0]], :+, [:@int, "1", [1, 4]]]

When a scanner event is fired you can check the current position (it is passed to the event but you can also always call self.position) which allows for tracking detailled positioning information. Positions are given as [row, column] with the row being 1-based. On parser level events the current position is not very useful (and not passed to your event callbacks) because parser events are fired when the parser recognizes a known ruby construct as completed - i.e. at the end of the construct.

Scanner events are fired “just so”, i.e. the scanner finds something and calls your callback method. The return values might or might not be passed to parser events. Parser events otoh build a meaningful tree and their return values are always passed to the next (outer) event. You can generally think of events being fired “from the inside out”, starting with lowlevel scanner events.

You can examine the hierarchie of these events by doing:

require "pp"
src = "1 + 1"
pp Ripper::SexpBuilder.new(src).parse

will output:


[:program,
 [:stmts_add,
  [:stmts_new],
  [:binary, [:@int, "1", [1, 0]], :+, [:@int, "1", [1, 4]]]]]

You think of this as a nested method call where the first element of each array is the method name and the rest are the arguments. In the example above there would be 5 method calls. The first :@int call would receive the arguments "1" and [1, 0], the :binary would receive ["1", [1, 0]], :+, ["1", [1, 4]]. The other calls, like :program would not receive any arguments.

When executed the (theoretical) interpreter would first evaluate the innermost arguments, right? That’s exactly what Ripper does, too. It will first fire the first @int event, then the second one and then pass the return values of these two events (together with the :+ operator token) to the next outer method, which is the :binary event in this case.

(“Theoretical” of course refers to these particular s-expressions. There are languages that are very much based on exactly this concept, like e.g. Lisp.)

As you can see even though the scanner fires events on whitespace there aren’t any whitespace characters passed to any of the callbacks. I don’t know if there’s anything else happening to these but of course you can define callbacks for the different kinds of whitespace and do something useful with it. The same is true for comments and quite some stuff that doesn’t make a semantical difference in Ruby (such as parentheses for method calls etc.).

To examine all events in the order they are actually fired you can use the event log that ships with Ripper2Ruby:


src = "1 + 1"
Ripper::EventLog.out(src)

will output:


@int                1
@sp                 " "
@op                 +
@sp                 " "
@int                1
binary
stmts_new
stmts_add
program

I’m not an expert here but Ripper’s s-expressions and events seemed to make more sense to me than ParseTree’s stuff. Ripper still doesn’t seem to be completely consistent though.

E.g. for word lists (i.e. Arrays that are defined using %w() syntax) there are different events fired depending whether you have %w() or %W().

src = '%W(foo bar)'
pp Ripper::SexpBuilder.new(src).parse

outputs:


[:program,
 [:stmts_add,
  [:stmts_new],
  [:words_add,
   [:words_add,
    [:words_new],
    [:word_add, [:word_new], [:@tstring_content, "foo", [1, 3]]]],
   [:word_add, [:word_new], [:@tstring_content, "bar", [1, 7]]]]]]

But on the other hand:


src = '%w(foo bar)'
pp Ripper::SexpBuilder.new(src).parse

outputs:


[:program,
 [:stmts_add,
  [:stmts_new],
  [:qwords_add,
   [:qwords_add, [:qwords_new], [:@tstring_content, "foo", [1, 3]]],
   [:@tstring_content, "bar", [1, 7]]]]]

As you can see for qwords (i.e. the non-interpolating version) there seems to be a :qwords_add and :qwords_new event missing. I can’t see any good reason for this.

Also, Ripper seems to get the method call operator wrong when you use "::"


src = "A::b()"
pp Ripper::SexpBuilder.new(src).parse

outputs:


[:program,
 [:stmts_add,
  [:stmts_new],
  [:method_add_arg,
   [:call,
    [:var_ref, [:@const, "A", [1, 0]]],
    :".",
    [:@ident, "b", [1, 3]]],
   [:arg_paren, nil]]]]

Watch the period which should be a :"::" symbol.

In quite some situations I’ve found the events ambigous or not explicit. E.g. for the closing parentheses in a words list like %w(foo bar) Ripper fires a :@tstring_end event - which is the same event as it fires for closing parentheses in Strings as in %(foobar).

It gets really weird when you try to build something from the events that Ripper fires for Heredocs or even stacked Heredocs combined with method calls on the Heredoc opener token - maybe the most weird Ruby construct anyway. In general though this stuff is fun to work with and quite obvious once you got the idea :)