Perform Python OCR on all images in a folder at the same time

Python OCR for All Images in a Folder Simultaneously

If you have a folder full of images with some text that you need to extract into a separate folder with the corresponding image filenames or into a single file, then this is the perfect code you’re looking for.

This article not only provides you with the basics of OCR (Optical Character Recognition), but also helps you create output.txt files for each image in the main folder and save them in a predetermined orientation.

Required Libraries –

pip3 install pillow
pip3 install os-sys

You’ll also need the tesseract-oct and pytesseract libraries. Tesseract-ocr can be downloaded and installed from here, and pytesseract can be installed using pip3 install pytesseract.

The following is the implementation of Python-

# Python program to extract text from all the images in a folder
# storing the text in corresponding files in a different folder
from PIL import Image
import pytesseract as pt
import os
      
def main():
# path for the folder for getting the raw images
Path="E:GeeksforGeeksimages"
  
# path for the folder for getting the output
tempPath="E:GeeksforGeekstextFiles"
  
     # iterating the images inside the folder
For imageName in os.listdir(path):
              
InputPath = os.path.join(path, imageName)
         img = Image.open(inputPath)
  
           # applying ocr using pytesseract for python
          text = pt.image_to_string(img, lang ="eng")
  
         # for removing the .jpg from the imagePath
imagePath = imagePath[0:-4]
  
fullTempPath = os.path.join(tempPath, 'time_'+imageName+".txt")
        print(text)
  
        # saving the text for each image in a separate .txt file
        file1 = open(fullTempPath, "w")
        file1.write(text)
        file1.close() 
  
if __name__ == '__main__':
    main()

Input image:

Run Python OCR on all images in a folder at once

image_sample1

Output:


<pre><code class="language-python line-numbers"># extract text from all the images in a folder
# storing the text in a single file
from PIL import Image
import pytesseract as pt
import os
      
def main():
# path for the folder for getting the raw images
Path="E:GeeksforGeeksimages"
  
# link to the file in which output needs to be kept
fullTempPath ="E:GeeksforGeeksoutputoutputFile.txt"
  
# iterating the images inside the folder
For imageName in os.listdir(path):
InputPath = os.path.join(path, imageName)
         img = Image.open(inputPath)
  
          # applying ocr using pytesseract for python
         text = pt.image_to_string(img, lang ="eng")
  
         # saving the text for appending it to the output.txt file
         # a + parameter used for creating the file if not present
         # and if present then append the text content
file1 = open(fullTempPath, "a+")
  
         # providing the name of the image
File1.write(imageName+"n")
  
        # providing the content in the image
File1.write(text+"n")
          file1.close()
  
# for printing the output file
file2 = open(fullTempPath, 'r')
Print(file2.read())
file2.close()
  
  
if __name__ == '__main__':
main()

Input image:

Python OCR recognition of all images in a folder simultaneously

image_sample1

Run Python OCR on all images in a folder simultaneously

image_sample2

Output:
Run Python OCR on all images in a folder simultaneously

This outputs a single file after extracting all the information from the images in the folder. The file format is as follows –

Name of the image
Content of the image
Name of the next image and so on .....

Python OCR for All Images in a Folder Simultaneously

Related Posts

Python 3 – String isalnum() Method

Pytest test execution results in XML format

Comprehensive analysis of PLT image storage in Python

Leave a ReplyCancel Reply