I have explained about the Sahana project I my previous blog post. So today I'm going to explain about the Sahana OCR project which is an sub module in the main Sahana project. You can refer my research paper about the extension I did for this project.
Sahana Ocr project was done to automate the data entering process to the system and it was developed on .net framework using Visual C++. It was developed by many volunteers as well as gsoc contributers as Omega and Gihan in past two years. I have selected this project as my level 3 project and did some improvements to the existed system to improve the character recognition of the module.
After a disaster situation we have to collect the data about the victims and manage them using Sahana as I mentioned in my earlier post. So we can download the form template by the Sahana server, print the forms in to hard copies, distribute the forms to the victims, ask them to fill the forms and collect the forms back. Then we have to enter those data manually to the Sahana system. But in the case of a large disaster the manually data entering process is difficult and very tome consuming. So Sahana Ocr project is developed to automate that data entering process by using optical character recognition (OCR) technology.
This is a sample of the forms distributed to the victims and you can refer the wiki abou this module to get a better understanding about the forms and the xforms. sahana_xform
So in this we have to provide the scanned image of the filled form and a xml file called xform which have the coordinates of the data fields in the template if the form to the Sahana Ocr module. Then the module sent the two files to the Form_processor which is the manager class of the system. First Form_processor do the rotation compensation to the image. The system uses five back squares at the form to identify the right side of the form. Then it validate the form image with the xform whether the correct xform coordinates are in the form image. If it is not the system gives an error.
After validating the image the image was sent to the the image processor and the image processor finds the Input areas of the form then data fields within them and finally letter boxes according to the xforms coordinates. The data fields, input areas and letter boxes are explained in the above image.
The image processor was developed using the OpenCv libraries and it worked greatly in this project. You can find more abou the OpenCv libraries by referring here.
Then those letter boxes are sent to the Ocr class which is capable to recognize the character in that letter box. The initial Ocr class was developed using Fann neural network libraries and the accuracy of the recognitions was very bad because of the lack of training of the neural network. That’s the point that I contributed to this project. After discussing with the previous mentors and the developers of this project I have used Tesseract Ocr engine which was an Google product to replace the Fann neural network.
You can get information about the Tesserat from here.
I did some testing with the Tesseract for the hand written characters and check whether it is suitable for our module. The initial accuracy of the recognition with the Fann Library was below 30% and when it was tested with the Tesseract it was increased up to more than 80% and the worst case with was 80% when I tested for different data sets. I realized that it is a very powerful tool and selected it for the project. Then I have customized the Tesseract to make compatible with the expectations and changed the Form-processor in order to make it compatible with the Tesseract.
Finally the recognized letters are consolidated as strings and assign them to the appropriate labels such as Name, Address and Date of Birth. Then I have developed a GUI for the module which gives a graphical view for this process. Then the results are display at the GUIs result tab window. Then it allows the users to save the data in a file by submitting it.
So it was the greatest project that I was involved up to now because it gave me a lots of knowledge and it was an great opportunity made contacts with some interesting personalities involved in that project. I have an idea to work on this project more and extend this project to make the accuracy at least up to 97% and make the rest of the project as uploading the data in to the server directly kind of things. These some more ideas with me to work further on this project.
· Train the Tesseract more for hand written characters
· Increasing the accuracy by reading the labels. That means currently the OCR only reads the letters at the letter boxes and output the result. But in that case there are conflicts when recognizing the “0” and ”O” which the first one is zero and the second one is letter O. So when we read the label we can give a logic that the letter boxes under the Name label cannot have the zero number and also the letter boxes under the date of birth cannot have the letter O with it. So in that way using label reading we can further improve the accuracy.
· Making the data embedded with the Url given to the each form. That may make easy to transfer data from one place to another.
So please try on this project and try to make some innovative ideas to serve the people in the world in a situation that they are in need.
You can refer some more articles about this project from here.