Zend Framework
Using The Zend Framework FlashMessenger
Sun, 08/14/2011 - 20:21 | by philipnorton42The FlashMessenger in Zend Framework has a bit of an odd name as it has nothing to do with Adobe Flash at all. It is a controller action helper defined in the class Zend_Controller_Action_Helper_FlashMessenger, which is used to store and retrive messages that are temporarily stored in the user's session. This is useful if you want to provide form validation and redirection at the same time as you can print out messages after the page has been loaded. If you are familiar with Drupal then this class acts in the same kind of way as the drupal_set_messages() function.
Beginning Zend Framework by Armando Padilla
Sun, 04/24/2011 - 17:34 | by philipnorton42
I was lucky enough to pick up a couple of free books from the recent PHP Unconference Europe, one of which was Beginning Zend Framework by Armando Padilla. Having not looked into Zend Framework for a while I thought I would read the book to refresh my knowledge catch up and post a review.
The premise of the book was to create a sample application to keep track of music artist information, with each chapter building on the code from the previous. The first few chapters are about installing Apache, PHP and MySQL and some UML diagrams of the application that will be built. After reading this I was actually enthusiastic about the application and couldn't wait to get started.
Zend Lucene And PDF Documents Part 5: Conclusion
Tue, 11/17/2009 - 14:44 | by philipnorton42If you have been following the last four posts you should now have an application that will allow you to view and edit PDF metadata, extract the document contents for search indexing, and allow users to search that index.
The one final thing to do is to sort out what happens when any PDF metadata is changed. At the moment the application will allow us to change the metadata as much as we like, but these changes will not be replicated in our search index. In order to do this we have to fully re-index everything. This is obviously the wrong way to go about things, and the solution is quite simple. All we need to do is up the file controllers/PdfController.php and change the editmetaAction() method so that when the PDF metadata is saved, the search index is updated. Add the following code to the editmetaAction() method, just before the redirect.
Zend Lucene And PDF Documents Part 4: Searching
Mon, 11/16/2009 - 11:35 | by philipnorton42Last time we had indexed our PDF documents and were ready to add a search form to our application. Adding search requires two things, the form to enter the search terms into and an action to control what happens when the form is submitted.
Zend Lucene And PDF Documents Part 3: Indexing The Documents
Thu, 11/05/2009 - 10:03 | by philipnorton42Last time we had reached the stage where we had PDF meta data and the extracted contents of PDF documents ready to be fed into our search indexing classes so that we can search them.
The first thing that is needed is a couple of configuration options to be set up. This will control where our Lucene index and the PDF files to be indexed will be kept. Add the following options to your configuration files (called application.ini if you used Zend Tool to create your applcation).
luceneIndex = \path\to\lucene\index filesDirectory = \path\to\pdf\files\
Zend Lucene And PDF Documents Part 2: PDF Data Extraction
Mon, 10/26/2009 - 23:33 | by philipnorton42Last time we looked at viewing and saving meta data to PDF documents using Zend Framework. The next step before we try to index them with Zend Lucene is to extract the data out of the documents themselves. I should note here that we can't extract the data perfectly from every PDF document, we certainly can't extract any images or tables from the PDF into any recognisable text. There is a little issue with extracting the text because we are essentially looking at compressed data. The text isn't saved into the document, it is rendered into the document using a font. So what we need to do is extract this data into some format the Lucene can tokenize. Because we are just getting the text out of the document for our search index we can take a few short-cuts in order to get as much textual data out of the document as possible. All of this data might not be fully readable and we will definitely loose any formatting and images, but for the purposes we are using it for we don't really need it. The idea is that we can retrieve as much relevant and indexable content for Zend Lucene to tokenize. Also, it is not possible to extract the data out of encrypted PDF documents.