Western Knight Center for Specialized Journalism

How to Get Out Text from Image Using ABBY Fine Reader

The Western Knight Center is now the Knight Digital Media Center - Visit Now!
About the WKC
Seminar Archive
Speaker Showcase
Seminar Showcase
Resources and Links
Knight Digital Media Center
Contact Us

After scanning a book, newspaper, magazine, etc., you get a set of pictures (that is, graphic files, not text files) that you need to recognize in a special program (one of the best for this is ABBYY FineReader). Recognition of image to text is the process of obtaining text from graphics and it is this process that we will write out in more detail.

Steps You Need to Make to Get Out Text from Image

Follow the steps below:

  • Opening a file. Open the picture (s) that you plan to recognize. By the way, here it should be noted that you can open not only image formats, but also, for example, DJVU and PDF files. This will allow you to quickly recognize the whole book, which, over the network, is usually distributed in these formats.
  • Editing. Immediately agree with auto-recognition does not make much sense. If, of course, you have a book in which only text, no pictures and tablets, plus scanned in excellent quality, you can. In other cases, it is better to set all areas manually. Usually, you first need to remove unnecessary areas from the page. To do this, click on the edit button on the panel.
  • Then you need to leave only the area with which you want to work longer. For this there is a tool for trimming unnecessary borders. On the right in the column, select the crop mode.
  • Next, select the area you want to keep. In the picture below, it is highlighted in red. By the way, if you have several pictures open, you can apply cropping to all images at once! Convenient not to cut each separately. Please note that at the bottom of this panel there is another great tool - the eraser. With the help of it, you can erase unnecessary divorces, page numbers, specks, unnecessary special characters and certain sections from the image. After you click to cut the edges, your original picture should change: only the workspace will remain. Then you can exit the image editor.

Selection of Areas

On the panel, above the open picture, there are small rectangles that define the scan area. There are several of them, let's consider briefly the most common ones. Text is the main area on which the program will focus and will try to get text from the image. We will highlight this area in our example. After selection, the area is painted in a light green color. Then you can proceed to the next step. After all areas are set, click on the menu command to recognize. Fortunately, nothing else needs to be done in this step. The recognition time depends on the number of pages in your document and the power of the computer.

On average, one full page scanned in good quality takes 10-20 seconds. on average PC power (by today's standards).

Check Errors

Whatever the original quality of the images, usually there are always errors after recognition. All the same, so far no program can completely eliminate the work of a person.

Click on the checkout option and ABBYY FineReader will start outputting to you, in turn, the places in the document where he had a hesitation. Your task, comparing the original picture (by the way, it will show you this place in an enlarged version) with the variant of recognition - to answer in the affirmative, or to correct and approve. Then the program will go to the next difficult place and so on until the entire document is checked. Consider conversion more detailed tips here.

A partnership of...

Funded by the John S. and
James L. Knight Foundation