We will use tesseract library
How to install ?
on Linux:
sudo apt-get install tesseract-ocr
pip3 install pillow pytesseract
On Mac
brew install tesseract
brew install tesseract-lang
pip3 install pillow pytesseract
Then correct the tesseract installation path in pytesseract.py
find pytesseract.py
default path "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py"
Change tesseract_cmd = 'tesseract' to point to tesseract installation directory
ie,
tesseract_cmd = '/usr/local/bin/tesseract'
(you can search for tesseract to validate the installation directory)
Note:
How to install ?
on Linux:
sudo apt-get install tesseract-ocr
pip3 install pillow pytesseract
On Mac
brew install tesseract
brew install tesseract-lang
pip3 install pillow pytesseract
Then correct the tesseract installation path in pytesseract.py
find pytesseract.py
default path "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py"
Change tesseract_cmd = 'tesseract' to point to tesseract installation directory
ie,
tesseract_cmd = '/usr/local/bin/tesseract'
(you can search for tesseract to validate the installation directory)
Note:
you can ignore the pervious step and add the next line in any new ocr python script
pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>'
ie
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
Test OCR using command line
tesseract -l ara image.png text.txt
convert image.png to text.txt and default language is Arabic
convert image.png to text.txt and default language is Arabic
Simple Python Script to convert image to text
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
im =Image.open('/Users/rafie/Desktop/ocr.png')
text = pytesseract.image_to_string(im,lang='ara')
print(text)
No comments:
Post a Comment