lucene

Zend Lucene And PDF Documents Part 5: Conclusion

17th November 2009 - 5 minutes read time

If you have been following the last four posts you should now have an application that will allow you to view and edit PDF metadata, extract the document contents for search indexing, and allow users to search that index.

The one final thing to do is to sort out what happens when any PDF metadata is changed. At the moment the application will allow us to change the metadata as much as we like, but these changes will not be replicated in our search index. In order to do this we have to fully re-index everything. This is obviously the wrong way to go about things, and the solution is quite simple. All we need to do is up the file controllers/PdfController.php and change the editmetaAction() method so that when the PDF metadata is saved, the search index is updated. Add the following code to the editmetaAction() method, just before the redirect.

Zend Lucene And PDF Documents Part 3: Indexing The Documents

5th November 2009 - 14 minutes read time

Last time we had reached the stage where we had PDF meta data and the extracted contents of PDF documents ready to be fed into our search indexing classes so that we can search them.

The first thing that is needed is a couple of configuration options to be set up. This will control where our Lucene index and the PDF files to be indexed will be kept. Add the following options to your configuration files (called application.ini if you used Zend Tool to create your applcation).

  1. luceneIndex = \path\to\lucene\index
  2. filesDirectory = \path\to\pdf\files\