Sunday, February 28, 2010

Sahana Ocr

I have explained about the Sahana project I my previous blog post. So today I'm going to explain about the Sahana OCR project which is an sub module in the main Sahana project. You can refer my research paper about the extension I did for this project.

Sahana Ocr project was done to automate the data entering process to the system and it was developed on .net framework using Visual C++. It was developed by many volunteers as well as gsoc contributers as Omega and Gihan in past two years. I have selected this project as my level 3 project and did some improvements to the existed system to improve the character recognition of the module.

After a disaster situation we have to collect the data about the victims and manage them using Sahana as I mentioned in my earlier post. So we can download the form template by the Sahana server, print the forms in to hard copies, distribute the forms to the victims, ask them to fill the forms and collect the forms back. Then we have to enter those data manually to the Sahana system. But in the case of a large disaster the manually data entering process is difficult and very tome consuming. So Sahana Ocr project is developed to automate that data entering process by using optical character recognition (OCR) technology.

This is a sample of the forms distributed to the victims and you can refer the wiki abou this module to get a better understanding about the forms and the xforms. sahana_xform

So in this we have to provide the scanned image of the filled form and a xml file called xform which have the coordinates of the data fields in the template if the form to the Sahana Ocr module. Then the module sent the two files to the Form_processor which is the manager class of the system. First Form_processor do the rotation compensation to the image. The system uses five back squares at the form to identify the right side of the form. Then it validate the form image with the xform whether the correct xform coordinates are in the form image. If it is not the system gives an error.

After validating the image the image was sent to the the image processor and the image processor finds the Input areas of the form then data fields within them and finally letter boxes according to the xforms coordinates. The data fields, input areas and letter boxes are explained in the above image.

The image processor was developed using the OpenCv libraries and it worked greatly in this project. You can find more abou the OpenCv libraries by referring here.

Then those letter boxes are sent to the Ocr class which is capable to recognize the character in that letter box. The initial Ocr class was developed using Fann neural network libraries and the accuracy of the recognitions was very bad because of the lack of training of the neural network. That’s the point that I contributed to this project. After discussing with the previous mentors and the developers of this project I have used Tesseract Ocr engine which was an Google product to replace the Fann neural network.

You can get information about the Tesserat from here.

I did some testing with the Tesseract for the hand written characters and check whether it is suitable for our module. The initial accuracy of the recognition with the Fann Library was below 30% and when it was tested with the Tesseract it was increased up to more than 80% and the worst case with was 80% when I tested for different data sets. I realized that it is a very powerful tool and selected it for the project. Then I have customized the Tesseract to make compatible with the expectations and changed the Form-processor in order to make it compatible with the Tesseract.

Finally the recognized letters are consolidated as strings and assign them to the appropriate labels such as Name, Address and Date of Birth. Then I have developed a GUI for the module which gives a graphical view for this process. Then the results are display at the GUIs result tab window. Then it allows the users to save the data in a file by submitting it.

So it was the greatest project that I was involved up to now because it gave me a lots of knowledge and it was an great opportunity made contacts with some interesting personalities involved in that project. I have an idea to work on this project more and extend this project to make the accuracy at least up to 97% and make the rest of the project as uploading the data in to the server directly kind of things. These some more ideas with me to work further on this project.

· Train the Tesseract more for hand written characters

· Increasing the accuracy by reading the labels. That means currently the OCR only reads the letters at the letter boxes and output the result. But in that case there are conflicts when recognizing the “0” and ”O” which the first one is zero and the second one is letter O. So when we read the label we can give a logic that the letter boxes under the Name label cannot have the zero number and also the letter boxes under the date of birth cannot have the letter O with it. So in that way using label reading we can further improve the accuracy.

· Making the data embedded with the Url given to the each form. That may make easy to transfer data from one place to another.

So please try on this project and try to make some innovative ideas to serve the people in the world in a situation that they are in need.

You can refer some more articles about this project from here.

Wednesday, February 24, 2010

How link another folder to your www web folder.

In last post I have describe the way to link your public_html folder to the www folder. To I will describe a simple way to link a folder to your /var/www to make easy your web developing stuff on linux.
1.To do that first you have to make a folder in your favorite place using mkdir command.
mkdir /home/abc/myFolder

2.Then go to your /var/www/ folder.
cd /var/www

3.Then link your folder to the www folder using ln command.
sudo ln -s /home/abc/myFolder

This will make an linking file in the www folder and you can see it in your www folder in the same name as you used for your linked folder

Now you can use your myFolder folder for run files in the web server by the address

So use this method to make easy of your life in using servers for advance developments.

What is public HTML

If you are working with any kind of web server on linux operating system you might have suffered enough from the warning massages saying “This is write disabled file, please make the file write enable”, “Cannot move the file in to www folder” so on. So is it a headache for you. Here is an great and simple solution for your problem.

In linux most of the situations you have to copy your files such as html and php, in to the /var/www/ folder to run them on your favorite web servers such as apache web server. But this folder is protected and you need some permission to copy, remove or change files in the www folder. So when we are involve in a web development using large number of files moving here and there it is difficult to work in this kind of environment.

So now you can use another folder that is away from www folder which is not a protected one as your www folder. It will help you to make changes in the folder as you wish and you can get the same functions as in the /var/www folder. Following are the steps of making a public_html folder and make it enable for access by your user-name as http://localhost/~ABC.

  1. make a folder in your favorite place in your folder structure. But make sure it is not a protected folder. As an example make a folder in the user folder by using the following command in command prompt:

sudo mkdir /home/ABC/public_html/

  1. Go to the folder of your web server. As an example if you are using apache2 web server you can use following command

cd /etc/apache2/

  1. Then link your public_html folder to the web server.

sudo cp -r mods-available/userdir.* mods-enabled/

  1. Then restart your web server

sudo /etc/init.d/apache2 restart

Now you are ready to use your public_html folder as your web servers working folder.

You can access the folder by typing the following address on the address bar.


The most important thing is if you are using your computer connected to a network and if others are access to your computer via network you can use this folder as the access folder to the out side, so the others can access to that folder with out permission enabled as well as you can keep your protected data away from this folder so others can't access those. Other wise if you enabled permission in the www folder the others may access to some other important data in your computer. The others can access to this folder by the address of your Internet address. As an example

So please try this and feel the difference.

Wednesday, February 3, 2010

Sahana foss disaster management system

Sahana is an open source disaster management system which is deployed to manage number of disaster situations all over the world.

Initially it was developed to manage the situation occurred in the Indian Ocean tsunami disaster which was occurred in the 26th of December in 2009. Lanka software foundations did the initial development of the Sahana phase-1 with the help of many organizations and the volunteers from various parts of the world. At that time there were no other free software which can support to handle all the situations as handling victims, managing camps, manging the volunteers, manging the NGOs like wise. So this was a better solution for all of this issues.

Initially Sahana contained 7 modules to help the management of the disaster. Those are

  1. Missing Person Registry - Helping to reduce trauma by effectively finding missing persons

  2. Organization Registry - Coordinating and balancing the distribution of relief organizations in the affected areas and connecting relief groups allowing them to operate as one

  3. Request Management System - Registering and Tracking all incoming requests for support and relief up to fulfillment and helping donors connect to relief requirements

  4. Camp Registry - Tracking the location and numbers of victims in the various camps and temporary shelters setup all around the affected area

  5. Volunteer Management - Coordinate the contact info, skills, assignments and availability of volunteers and responders

  6. Inventory Management - Tracking the location, quantities, expiry of supplies stored for utilization in a disaster

  7. Situation Awareness - Providing a GIS overview of the situation at hand for the benefit of the decision makers

But now it has more than 20 modules and day by day it is increasing its scope to be more reliable and useful for the society.

You can visit here and get an overview about the project.

This was successfully deployed in the following situations occurred in the past years .

  1. Tsunami - Sri Lanka 2005 - Officially deployed in the CNO for the Government of Sri Lanka

  2. AsianQuake - Pakistan 2005 - Officially deployed within with NADRA for the Government of Pakistan

  3. Southern Leyte Mudslide Disaster - Philippines 2006 - Officially deployed with the NDCC and ODC for the Government of Philippines

  4. Sarvodaya - Sri Lanka 2006 - Deployed for Sri Lanka's largest NGO

  5. Terre des Hommes - Sri Lanka 2006 - Deployed with new Child Protection Module

  6. Yogjarkata Earthquake - Indonesia 2006 - Deployed by ACS, urRemote and Indonesian whitewater association and Indonesian Rescue Source

  7. New York City - 2007-08Pre-deployed in support of the City of New York’s Coastal Storm Plan

  8. Peru Ica Earthquake - 2007 - Deployed for the Government of Peru

  9. Chendu - Shizuan Province Earthquake 2008 - Deployed by Chendgu Police

  10. Haiti earth quack 2010

This is an web base system and the initial developments were done in Apache, MySQL, and PHP/Perl. But now it is redeveloped using python thechnology and it was callled as SahanaPy.

Now the Sahana software Foundation was initiated to take care of the Sahana project and it will be a great step for the future of Sahana.

The source code can be downloaded from here using CVS

cvs -z3 co -r gsoc_2009 -P sahana

and the it can be run using the wamp/lamp server. Here are some tips to run the source code.

When it initially set up, it automatically creates a data base and the accordance tables in the server so now it is ready to give it's services.

So try it and join to the Sahana community to help the people when the really needs you.