Lucene is an open-source Java full-text search library. It makes it easy to add search functionality to an application or website. After some studying about the best way to handle the work I started last week I found that this is the best tool to use for going through the SNOMED CT files.
The files were made to be put straight into a database. Because we do not plan on doing that, indexing seems to be the next best thing. Basically what Lucene gave me is the ability to take the files and index them so that I could easily search for the information I needed for my reference terms and cut down the time it was taking to process the information in the SNOMED files. So for each line in the file I split it on the tab delimiters and told lucene what each column is so that I could search on the data contained in that column. So for instance in the sct2_Relationship_Full_INT_20130131 it has sourceId and destinationId I need to search on. So, I put that into a StringField:
doc.add(new StringField(“sourceId”, fileFields, Field.Store.YES));
doc.add(new StringField(“destinationId”, fileFields, Field.Store.YES));
This makes it so that I can tell it I want to find termId “1234” in the sourceId field.
What the structure looks like:
Index –> Document1, Document2, etc… –>Field1, Field2, etc…
Lucene was not hard to learn to use. But it is only an API all of the hard part of indexing is done but it is up to the user to figure out how to parse the file data into Lucene and how you want to get the data out which has to do with how you put the data into the Documents.
Lucene managed to cut the search processing time and made it more than 10 times faster.