Thursday, May 3, 2018

The Second Leg

We knew what problems we faced, and started tackling them one at a time.Our aim was to perfect our project by the end of the semester.

Inclusion of more fonts

We scanned through all the books we had, and started listing all the different fonts used in them. Since we had already converted one font, the process went smoother for the new fonts we discovered. We also used the sorce code from some open-source converters we found online, which helped speed up the process considerably. There were some extremely obscure fonts too, so rare that even their font files weren't available on the internet. Another type of fonts we couldnt't convert were Type I fonts, which were discontinued by Adobe a few years ago. Devlys010 and Walkman Chanakya were two new major fonts that we added to our converter. At the end of this, we were able to convert a lot of the books provided to us. Our mentor also arranged more books from the Rajasthan board, the Chhattisgarh board as well as the CBSE. 

Font Sizing

We moved on to our next problem- the difference in sizes of the fonts. Another issue closely related to this was the difference in the appearance of fonts. These problems were solved in two steps, identifying fonts that closely resembled each other, and then finding an appropriate size conversion for them. Each of these tasks was time consuming, and not something we liked doing, but it had to be done. These conversions were then added to our original script, taking it one step closer to completion.

Headings

Our final task was to tag the text present in the documents according to their position in the hierarchical structure. InDesign supports six levels of headings, h1 through h6, for exporting to various types. These heading tags were important for a visually impaired reader to understand the book structure. We used a tree structure to decide heading levels, with a proper descending structure i.e h2 after h1, h3 after h2, and so on. One assumption we took in this case was that the first heading-type object of the document would be h1. All the remaining text was categorized as paragraph, and the document was ready for exporting.

The Final Deliverable

Our final deliverable that was published online as well as provided to NGOs consisted of two scripts - oe for conversion of fonts with appropriate sizing, and the other for tagging of headings. The only problem we couldnt resolve was the tagging of tables and lists in the document, but these objects were rarely present in the books we converted. The output file generated by exporting the converted InDesign document can be read by any screen reader or document reader which supports Hindi.
The project directory can be found at https://github.com/prakhariitd/COP315-Hindi-fonts-to-Unicode

Our scripts work quite fast, and a 300 page book can be converted in under 2 minutes.
This is how our project works:

1 comment: