Batch OCR Documents

Batch OCR documents adds text to any documents that do not contain text using any of the supported OCR languages. Each document will be handled individually using the chosen settings during the batch process. Xodo PDF Studio also introduces the ability to run OCR with two languages at once. For more information on OCRing with two languages see OCR Preferences.

What is OCR?

Optical character recognition (OCR) is the mechanical or electronic conversion of images of typed or printed text into machine-encoded searchable text data.

 

How to OCR a Batch of PDFs

  1. On the toolbar bar go to the Batch Tab > Document > OCR
  2. Set the options for the batch process. Additional details for each of the settings are available below.
    • Select a language from the drop down or click on Download OCR Languages to download a new one.
    • Choose additional OCR options
    • Using the File List select the files that need to be processed
    • Set the destination settings for the processed batch files
    • If needed, set any open passwords to be attempted when processing files
  3. Once all of the settings are complete, click on Start... to begin the batch process.

 

Batch OCR Settings

Select Language

Language - Select the language to OCR the documents with

Download OCR Languages - Opens the language download manager

Discard Invisible Text - Removes any previous OCR text that has been added to the page.

Auto Deskew Images - When checked, if the document’s text/images are slanting too far in one direction or is misaligned, Xodo PDF Studio will attempt to auto-rotate the document so that the alignment is corrected.

File List

Add Files - Displays a file chooser to add individual files to the list.

Add Folder - Displays a file chooser that adds the contents of a directory to the list.

Set Default Batch Directory - When checked, all files from the default batch directory will be added to the File List each time a batch dialog is opened.

Include Subfolders - When checked, will include any supported file types found within sub folder of the chosen default batch directory.

- Removes the selected file(s) from the list.

- Moves the selected file(s) up the list.

- Moves the selected file(s) down the list.

- Moves the selected file(s) to the top of the list.

- Moves the selected file(s) to the bottom of the list.

Save Files To

Destination Folder

Use Source Folder - When this option is selected, the original folder for the PDF document (in the batch process) will be used to save the output files.

Destination Folder - This option allows you to set a destination folder to place all of the processed files. You can type the destination manually or click on the "..." button to open a directory chooser to set the destination folder

  • Preserve Folder Structure: When checked, the output files will be placed within a new folder (within the specified destination folder) using the file's parent directory name.

File Name Pattern

Use Source Filename - Will save the document using the same original name. If another file exists in the directory, a number will be appended to the output file name, to avoid duplicate file names.

New Filename - When this option is selected, you will need to enter a new filename used for the output files. Each document name will have an incremental counter starting at zero appended to the file name entered in this field. Custom variables may also be used to further distinguish each of the file separations. The available variables are:

  • $filename - The file name (no extension) that the document was opened from
  • $counter - An automatically incrementing number
  • $day - The day of the month
  • $month - The current month, using two digits
  • $year - The current year, using four digits
  • $shortyear - The current year, using two digits
  • $second - The current second
  • $minute - The current minute
  • $hour - The current hour, 1-12
  • $ampm - AM or PM
  • $longhour - The current hour, 0-23

Overwrite Files - When set, if a file with the same name already exists in the directory it will be overwritten with the newly output document.

Note: This CAN NOT be undone. Make sure that you have all your settings correct prior to starting the batch process

Passwords to try when opening documents

To set a password click in the password field or on the Edit button. Then enter the password you want to be used. Do this for up to four passwords to try on password protected PDFs during the batch process.

Note: The passwords entered here will only be used for this batch process and will not be stored anywhere else. Passwords will have to be entered for each new batch process.