Hindi font samples
For the Darpa TIDES "surprise language" exercise I pulled up from
tertiary storage full page images from the font sample booklet of some
Hindi printing press. While there is no direct OCR component to the
exercise (and why not, Charles?), those shops that actually use OCR to
enhance their data collection efforts may still find this useful for
calibrating their software.
Here is the gzipped tar file, a sample page
(page 2 of 48 300dpi TIFs) is included below. If you perform any work
on these pages that would enhance the dataset (rotate the images the
right way, segment out the various samples, ground truth the data
etc.) I would appreciate if you'd give me back a copy so that I could
post it here -- your contribution will be fully acknowledged.
Andras Kornai
June 5 2003
Ground truth (for testing with Tesseract) for the Marathi pages are now kindly provided by
ShreeDevi Kumar at github
February 23 2016