Wednesday, October 27, 2010

My project on SahanaOCR Gsoc 2010

This is my first post after about 8 months from my last post. Since I was very busy on the training period and the Gsoc competition I was unable to write a post o my blog. How ever again I managed to find some time to write a blogpost fortunately. So I decided to continue the blogpost on SahanaOCR project from where I have stopped at my last post.

So as I mentioned earlier since I have worked with the SahanaOCR system before I came up with the idea of applying for the Gsoc competition for the SahanaOCR project. So I have wrote a proposal on the project with discussing the major implementations and developments that I'm going to do in the project in the session. Here is my project proposal, So you may have a better idea on how to write acceptable project proposal for the Gsoc competition.

My project extension plan was to improve the most necessary features of the current project. The proposal propose 5 main ideas for the existinng Sahana OCR project

    1. Make the system platform independent

    2. Integrate the Tesseract code to the project

    3. Adding the feature of differentiating the different forms by a bar-code embedded to the forms automatically handles those forms.

    4. Read the labels of the data fields and identify the type of data field so can give restrictions to the Tesseract to identify the numbers or letters separately.

    5. Automatically send the data to the databases of the relevant modules

So after completing these steps the Sahana OCR project will be able to complete the process of extracting data from forms with a great accuracy.

After being active on the Sahana mailing list and exchanged lot of ideas on the project with earlier mentors and contributors they have accepted my project proposal for the Gsoc 2010 competition. that was a day which gave me the same happiness as I passed my Advance Level exam. My mentor was Mr. Gihan Chamara who was graduated from my department last year. During the whole project he was the great strength for me and also Mr chamindra, Mr Jo and Mr Hayesha guided me to make this task success. I have started my project implementations on the 24th May and it gave me lot of memorable experiences to my life during the whole project time which ran up to 10th September.


It taught me how to plane a project, how to overcome the challenges we met during that, how to behave withing the public communities, how to get helps form others as well as some more things which were new to me.

Here is the wiki page that I have maintained during the project and you can get the more information on the technical details and how the project ran by referring that.

Here is the link for my project report which I have submitted at the end of the project period and it contains all the details of the current status of the project.

Here is my project branch which I have submitted my code in the Launchpad.

Even though the Gsoc competition and over my effort on the SahanaOCR has not over. So my wish is to continue on working with it and be with it until it became a applicable standalone product.