Bidder will have to scan and OCR 15 german books containing novels and short stories. The 15 volumes comprise about 3.000 pages. All are typeset in Times or Antiqua 9pt or above. The enclosed example is an 1200dpi scan of a double page from two different volumes; this is the smallest type which has to be scanned. From the scans the german text has to be retrieved as plain text in ISO 8859-1 or UTF-8 encoding. Chapters, sub-headings, italic and bold text has to be marked with tags. End-of-line hyphens have to be deleted where applicable. Since the texts are from 18th and 19th century sources standard spell checking cannot be applied. Therefore a high OCR quality is crucial for the job. The error rate of the resulting texts must be 1 error/10.000 characters or better. The material to be scanned will be delivered in 15 hardcover volumes 19x11 cm, each with 150 to 400 pages. The volumes may be cut for better and/or automated (flatbed) scanning, but the loose pages have to be returned completely and in original order after the job. The job will start with one of the 15 volumes as a test. After delivery and a positive quality test the other volumes will be sent to the bidder. Please quote your price for the complete set of 15 volumes with 3.000 pages and also state a price per page for optional additional volumes.
## Deliverables
1) A digital plain-text version of the scanned volumes with tagged font formats. Must be delivered via e-mail ASAP and CD/DVD. 2) The original hardcover books (which may have been cut for easier scanning). 3) All scanned pages as graphics files with one page or double page per file. Pages must be organizend in folders with the book's titles and their file names must be the page numbers (3 digits, leading zeroes). File format must be a lossless compression format (e.g. GIF or lossless JPG). Scanned files must be delivered via CD/DVD.
## Platform
Character set: ISO-8859-1 or UTF-8.