This post is going to discuss an issue I met when building a text 
editor plugin that tries to find the class/function scope which the 
current line on the editor belongs to 
(http://atom.io/packages/ctags-status). The problem I met can be broken 
into two parts: (i) Given a set of ranges that may be overlapping on a 
one dimension plane, find the ranges that cover a point on the plane. 
(ii) Given a set of overlapping ranges, get the topmost range where the 
height of ranges follows the ascending order of the starting point of 
all ranges (the higher in the stack, the later in the sequence). Note, 
the issue is not a hard problem. This post documents how I encounter and
 work on the problem.
So, here is the story.
When
 I build the early version of the plugin, I want to ship it as soon as 
possible and see if it is downloaded by anyone (Atom editor does not 
expose plugin usage data to its author yet, so the only number I have is
 downloads). Thus, there was not much thought process in those days.
The
 early implementation models each scope as a range with start and end 
line. To find the scope that the current line belongs to, the problem 
becomes a range search problem. Ranges would be overlapping when there 
is nested scope. In that case, the start and end lines of the inner 
scope would always be enclosed by those of the outer scopes. So, I can 
sort all scopes by their start line in ascending order, and the innerest
 scope on the current line would be the last one in the sequence that 
its line range encloses the current line. This is a O(N log N) 
preprocessing + O(N) lookup. I was happy with it.
So far so good?
The
 issue was not surfaced until I used the plugin to browse a long source 
file that has dozens of functions (yup, shouldn't the file be split for 
readability?). When I kept moving down the cursor for a while, its 
movement was no longer smooth. The issue was that the plugin needs to 
find the scope upon each cursor line change. When I fired up the 
profiler, I found 300 - 400ms were spent on scope finding when there 
were dozens of continuous cursor line changes. I was not sure whether 
the plugin was really the cause of the UX problem but it is the one that
 took most of the processing time. So, time for optimization!
Since
 this is a range search problem, KD tree, segment tree, and interval 
tree quickly came to my mind. There are several factors to consider in 
picking a solution: (i) availability of existing implementation (I don't
 like reinventing without enhancement), (ii) speed of insert / delete / 
update (when a source file is edited, there is a high chance that scopes
 are moved), and (iii) lookup speed of course. When I was still deciding
 which search tree best fits the issue, I raised a questions to myself. 
Why don't simply hash the scope(s) on each line? A simple hash with a 
stack in each bucket is a good fit because:
(i) I just need JavaScript object (hash) and array (stack) to build it.
(ii) A typical source file has less than thousand of lines with a dozen of scopes. The worst case is having thousands of pointers (lines * scopes) referring to a dozen of strings (scope names). That should not take a lot of spaces.
(iii) A file edit would introduce quite a lot of scope movements (e.g. insert a new line at top of the file pushes all scopes down). Maintaining a data structure via insert / update / delete is like rebuilding it in the worst case. Building a big hash takes O(NL), number of scopes * number of lines in the file (which is several thousands of iterations). The hash building process is offline and I don't expect it would take long, so I am happy with that.
(iv) O(1) lookup, the best that I can get.
As a result, the plugin is using a hash for scope finding.
(ii) A typical source file has less than thousand of lines with a dozen of scopes. The worst case is having thousands of pointers (lines * scopes) referring to a dozen of strings (scope names). That should not take a lot of spaces.
(iii) A file edit would introduce quite a lot of scope movements (e.g. insert a new line at top of the file pushes all scopes down). Maintaining a data structure via insert / update / delete is like rebuilding it in the worst case. Building a big hash takes O(NL), number of scopes * number of lines in the file (which is several thousands of iterations). The hash building process is offline and I don't expect it would take long, so I am happy with that.
(iv) O(1) lookup, the best that I can get.
 
 
 


