Learning to Use Lucene

http://lucene.apache.org/

Lucene is an open-source Java full-text search library. It makes it easy to add search functionality to an application or website. After some studying about the best way to handle the work I started last week I found that this is the best tool to use for going through the SNOMED CT files.

The files were made to be put straight into a database. Because we do not plan on doing that, indexing seems to be the next best thing. Basically what Lucene gave me is the ability to take the files and index them so that I could easily search for the information I needed for my reference terms and cut down the time it was taking to process the information in the SNOMED files. So for each line in the file I split it on the tab delimiters and told lucene what each column is so that I could search on the data contained in that column. So for instance in the sct2_Relationship_Full_INT_20130131 it has sourceId and destinationId I need to search on. So, I put that into a StringField:

doc.add(new StringField(“sourceId”, fileFields[4], Field.Store.YES));
doc.add(new StringField(“destinationId”, fileFields[5], Field.Store.YES));
This makes it so that I can tell it I want to find termId “1234” in the sourceId field.

What the structure looks like:
Index –> Document1, Document2, etc… –>Field1, Field2, etc…

Here is a great description of what it is and how to use it http://www.darksleep.com/lucene/
Here is a tutorial with code: http://www.lucenetutorial.com/

Lucene was not hard to learn to use. But it is only an API all of the hard part of indexing is done but it is up to the user to figure out how to parse the file data into Lucene and how you want to get the data out which has to do with how you put the data into the Documents.

Lucene managed to cut the search processing time and made it more than 10 times faster.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s