Scarab has a nifty feature called "Duplicate Checking." What this means is that we have broken up the Issue entry process into a well defined set of steps. This allows us to do a search for pre-existing Issues in the database which match up to the one that is being entered. The primary goal being that this will help prevent duplicate Issues from being entered into the database. Of course, it is also possible to skip this duplicate checking step and enter in the Issue directly. The steps involved are:
The way that the feature works is that Attributes are assigned to a Module and a certain set of Attributes are marked as being made available on the first screen.
The process of duplicate checking is done by using the Lucene Search Engine to index any free form text entry attributes (like 'Summary') and then provide a set of logic like this:
For example, if the first duplicate checking page has the following Attributes: Summary (text), Platform (select menu), and OS (select menu). The user then sets the values for the Attributes to be: "Nasty bug in Scarab", "All", "Windows 3.1". The search would be spelled out like this: ("Nasty" OR "bug" OR "Scarab") AND (All) AND (Windows 3.1). We left out "in" because this would be a word that Lucene would most likely not index by default because it is a standard word in the English language.
When Lucene returns the results which are shown on the second page, it orders them from top to bottom in order of the weight given to the match. In other words, queries which result in an exact match appear at the top and queries which result in a fuzzy match are closer to the bottom.
NOTE: The definition of how things are searched is still up in the air. It is currently hard coded into the .java source code but can be fairly easily modified. Once we have larger sets of data, we will work to refine the search definition in order to reduce the number of false positives.