Ruby Performance: Map-Join vs. Gsub
There are plenty of cases in my Rails applications when I’m sorting through data that I can’t sort with just a good database query (which is almost always the best option when possible). As such, I’ve become more interested in “algorithmic efficiency”. In my recent work on exercism.io I came across a programming challenge to do a simple string substituion. My solution at its core looked like this (DNA_TO_RNA
is a hash that maps one char to another):
dna_str.gsub(/[#{DNA_TO_RNA.keys.join}]/, DNA_TO_RNA)
…however, as I perused other students’ solutions, I noticed that a more common approach was this…
dna_str.each_char.map {|char| DNA_TO_RNA[char]}.join
I was curious about either of the two was a better performing algorithm. Truthfully I liked the readability of the latter solution, as my solution would require many Rubyists to look up String#gsub to see why/how this worked. In fact, I only discovered this technique while reading through the Ruby docs as part of this exercise.
Here is the benchmark that I ran:
Benchmark.ips do |x|
x.report('dna_convert_with_gsub') do |times|
i = 0
while i < times
Complement.of_dna('ACGTGGTCTTAA')
i += 1
end
end
x.report('dna_convert_with_map_join') do |times|
i = 0
while i < times
Complement.of_dna2('ACGTGGTCTTAA')
i += 1
end
end
x.compare!
end
…and here are the benchmark results:
Calculating -------------------------------------
dna_convert_with_gsub
6199 i/100ms
dna_convert_with_map_join
14988 i/100ms
-------------------------------------------------
dna_convert_with_gsub
70749.2 (±4.8%) i/s - 353343 in 5.006677s
dna_convert_with_map_join
197282.1 (±3.2%) i/s - 989208 in 5.019425s
Comparison:
dna_convert_with_map_join: 197282.1 i/s
dna_convert_with_gsub: 70749.2 i/s - 2.79x slower
I was somewhat expecting that gsub might be better than iterating over each character of the string, but alas…the map/join technique is noticeably faster. I didn’t try out various string lengths, so I imagine that could be a factor. I did run multiple benchmarks, and all produced nearly identical results. Of note…this was run using Ruby 2.1.4.