Wednesday, May 2, 2018

The Font Conversion

We started working on the problem in December of 2017 under the guidance of Akashdeep sir, our PhD mentor. The first font that we had to convert was Chanakya, which was the original aim of the project.

The deliverable we had to produce was an InDesign Script that ould convert the non-unicode text present in a document. The reason for using InDesign was that our primary goal was to get publishers to use our script, so that they could directly publish the books in Unicode online. InDesign also provides the maximum editing freedom over a document. The procurement of new master files from various school boards was handled by our mentor, while we worked on the conversion.
  

The Dictionary

The first step to convert any font into another is to create a dictionary that maps characters in one font to the corresponding characters in another. We used a software called Font Forge which displays the mapping of a font's characters. Using this, we could map two fonts to each other. But since each font has hundreds of characters, and for non-Unicode fonts, this mapping is unordered, the task of creating a dictionary was extremely tedious and time-consuming. To make things easier, we created a tool in Visual Basic which allowed us to map the characters to their corresponding ones in non-Unicde and create the dictionary easily.

We selected the source non-Unicode font, the target Unicode font, selected a source code for the source font which would correspond to a certain character, and created that character using the unicode Devnagri available to us, therefore creating the dictionary.

The Exceptions
The dictonary alone wasn't suffcient for handling the complete conversion because each font had some exceptional characters that were made by some unique combinations, and coudnt be mapped by the Dictionary tool. these exceptions had to b identified manually by proofreading the converted outputs, and then mapped to their corresponding destinations manually. 

The First Product

At the end of the winters, we had completed a working InDesign script that could convert documents containing Chanakya font to Unicode fonts. However, there were still several problems left - 
  • Most books contained more than just one font, and therefore, we werent able to convert them completely.
  • Since two different fonts rarely have the same character size for the same font size, converting the fonts created an overflow in the pages, which caused some of the text to go beyond the limit of the page, and hence, become invisible. 
  • We also needed proper tagging in the books so that the exported versions could be read by the blind while understanding proper heirarchy in the text.
So, to solve these issues, we decided to expand our project and continue it through the upcoming semester.

No comments:

Post a Comment