
Diffmerge tool ranking code#
I haven't looked at the code yet, and so my points may be uninformed. I'm afraid I'm just going to be offering some opinions. On the other side I'm sure that such a system would probably be the most advance (context-free) completion system Eclipse has ever seen Anyways, let's see how q-grams works out first But as I said, with this we would get more and more into research. We just have to share the completion events. With such an approach it becomes quite easy to evaluate which algorithm works best or whether we should use a weighted scoring function etc. With this data you can replay the completion events and try out several ranking strategies and identify the one that performs best over all queries, i.e., presents the actually selected proposal on the highest position most of the time. Now assume you have such kind of information for a few thousand completion events. She then selects "getLayoutData()" from the list of proposals. The system now proposes, say, 20 elements that match the regex "gl". Given a programmer enters "gl" and triggers code completion. Think of it as some kind of feedback a programmer gives you what was actually relevant. The scoring and evaluation can also be done "post mortem", i.e., if we track what tokens the users enter and what they finally selected from the proposal list, we get into the comfortable position to know what would have been the best solution. Regarding your computation "fear" (here we drift little bit more into research ideas - you may skip this ) That's a quite nice set of algorithms you implemented in just one week I would propose to go for n-grams + prefix bonus.

Attachment: not_subwords_levenshtein.jpg.So I will try to find improvements, but keep me aware of what you about that. The second limitation is that Levenshtein does not really take into account subwords as we intend it. I don't really know if my calculation is entirely correct, but it is quite simple and for now it seems quite acceptable. Now, Levenshtein result is leveraged with a common prefix care. I could not find precisely what I wanted (if you know some, please let me know) so I tried to add a home-maid calculation to get more precise results. So, I tried to find an extension of this algo that could leverage results with common prefix length.

I worked mainly on Levenshtein's one since it seems to be cleaner for what we want (at least for now).įirst limitation is that it lacks of weighting results on common prefix share : tell me if I'm wrong, but we want to prior proposals that share common prefixes, right? But here is what I have on a simple example :Īs you can see, "equalsIgnoreCase" is thrown to the bottom of the list due to its length. After some tests, I found some limitations on our current usage. Attachment: Subwords_levenshtein_completion.jpg.Attachment: Subwords_default_completion.jpg.I am still running some tests to see how these algo behave with upper/lower-case characters and if they entirely take into account "subwords" as we need it. I can provide sources if you decide to check by yourself. As images show, I tested on TextArea object that holds lot of items. More tests have to be done to check possible limitations on multiple use cases but I can already say that I did not notice any performance loss in diplaying proposals. I think we will have to implement extra proposal sorter extension to keep original presentation (first methods, then fields.) It is quite frustrating when using completion.

One major problem risen by this feature is that it mixes methods and fields proposals. Thus, even if both algos compare strings letter by letter, Levenshtein's one seems to prior matching "subwords" as we intend it (successions of similar letters) while Jaro's one simply counts matching letters on the whole strings. In some cases, Jaro algo may be confusing (ends with melted results, less understandable). I made some screenshots to illustrates all of this :Īs you can see, Levenshtein algo is very interesting to sort items by accuracy. Here is what I did : I added, besides default subwords proposals, two other engines that add relevance ordering on proposals using respectively Jaro-Winckler and Levenshtein algorithms (that I coded from scratch as needed by EPL, helped by descriptions found on the web). Well, I just finished implementing a "benchmark plugin" in order to compare proposals ranking algorithms efficiency.
