With Docsumo’s free OCR tool, extract data accurately from any image or pdf document in any layout without any manual setup. Our deep learning data extraction technology immensely reduces manual errors and saves countless hours every month.

If you turn on OCR, Gmail converts the image attachment to text, detects the credit card number, and moves the message to quarantine. Note: OCR doesn’t scan images embedded in attached files, such as Adobe PDF or Microsoft Word documents. And, it’s not always 100% accurate.

Scanning documents into images can be time-consuming as it requires manual input. OCR saves individuals and businesses time and money by converting images into text data that is able to be read by other business software.

Also, a good OCR software should have a user-friendly interface and be easy to use to ensure smooth usage. Data cleansing and formatting are also important components to consider when selecting OCR software. The software should be able to effectively cleanse and format data to ensure data quality and consistency.

How Do I Know if a PDF has OCR Functionality? There are several ways to check whether your PDF has OCR functionality. Open the PDF and check whether you can search for a word in the file or whether you can select any of the text. If you cannot search in the PDF or select text, it is probably just a scanned image.

How to Download OCR Library For Image to Text Convert ?

To convert an image to text using PHP, you can utilize Optical Character Recognition (OCR) libraries. One popular library for this purpose is Tesseract OCR. Here’s an example of how you can use Tesseract OCR to convert an image to text:

Tesseract OCR. Here’s an example of how you can use Tesseract OCR to convert an image to text:

  • Install Tesseract OCR on your server. You can follow the installation instructions specific to your operating system from the official Tesseract OCR documentation.
  • Install the Tesseract OCR PHP extension. You can use the following command to install it via Composer:

composer require thiagoalessio/tesseract_ocr
  • Use the following PHP code to perform the image to text conversion:

 

<?php
require_once 'vendor/autoload.php';
 
use thiagoalessio\TesseractOCR\TesseractOCR;
 
// Specify the path to the image file
$imagePath = '/path/to/image.jpg';
 
// Create a new instance of TesseractOCR
$tesseract = new TesseractOCR($imagePath);
 
// Set any additional options if needed
// $tesseract->setLanguage('eng'); // Specify the language of the text in the image
 
// Perform the OCR and get the extracted text
$text = $tesseract->run();
 
// Output the extracted text
echo $text;
?>

***************OR******************

<?php
require_once ‘ocr/autoload.php’;

use thiagoalessio\TesseractOCR\TesseractOCR;

function imageToText($imagePath)
{
try {
// Create a TesseractOCR object and set the image path
$ocr = new TesseractOCR($imagePath);

// You can optionally set additional options like language, config file, etc.
// $ocr->lang(‘eng’)->configFile(‘path/to/config/file’);
$ocr->setLanguage(‘eng’);
// Run OCR and get the result
$result = $ocr->run();

return $result;
} catch (Exception $e) {
return “Error: ” . $e->getMessage();
}
}

// Replace ‘your_image_path.png’ with the path to your image file
$imagePath = ‘namefile.png’;
$resultText = imageToText($imagePath);

echo “OCR Result:\n”;
echo $resultText;
?>

  • Make sure to replace /path/to/image.jpg with the actual path to your image file.
  • Remember to include the Tesseract OCR PHP library by requiring the autoload file generated by Composer. Also, you can set additional options for TesseractOCR if needed, such as specifying the language of the text in the image.
  • Please note that the accuracy of the OCR process may vary depending on the quality and clarity of the image.

 

How to install ocr in vps server ?

To install Tesseract OCR on Ubuntu 22.04, you can follow these steps:

Update the package lists on your server:
Copy
sudo apt update
Install Tesseract OCR and its dependencies:
Copy


sudo apt install tesseract-ocr
Install additional language data if needed. For example, if you want to install English language support:

The languages currently covered are

  • Bengali (ben)
  • Gujarati (guj)
  • Hindi (hin)
  • Kannada (kan)
  • Malayalam (mal)
  • Meetei Meyak (mni)
  • Oriya (ori)
  • Punjabi (pan)
  • Santali (sat)
  • Tamil (tam)
  • Telugu (tel)
  • English (eng)

Copy
sudo apt install tesseract-ocr-eng
Verify the installation by checking the Tesseract OCR version:
Copy
tesseract –version
This should display the installed version of Tesseract OCR.

Once Tesseract OCR is installed, you can use it in your PHP code as mentioned in the previous response. Remember to include the Tesseract OCR PHP library by requiring the autoload file generated by Composer and set any additional options if needed.

Please note that Tesseract OCR supports various languages, and you can install language data for the specific languages you require.

 

To use TesseractOCR with PHP to extract text from images containing both Hindi and English languages, you can specify multiple language parameters in the TesseractOCR command. Here’s an example:

$imagePath = '/path/to/your/image.jpg';
$outputFile = '/path/to/output.txt';
 
$command = "tesseract $imagePath $outputFile -l hin+eng";
exec($command);
 
// Read the extracted text from the output file
$extractedText = file_get_contents($outputFile);
 
// Output the extracted text
echo $extractedText;

In the $command variable, we specify both the Hindi (hin) and English (eng) language parameters using the + symbol to indicate that we want to extract text from both languages.

Make sure you have TesseractOCR installed on your Ubuntu 22.04 VPS and that the necessary language data for Hindi and English is also installed. Adjust the $imagePath variable to point to your actual image file, and set the desired path and filename for the $outputFile variable.

After executing the command, the extracted text will be saved in the specified output file. You can then read the contents of the output file using file_get_contents() or perform any further processing as needed.

Remember to handle any error checking and validation for the image file and output file paths in your PHP code.

Let me know if you have any further questions!


Leave a Reply

Your email address will not be published. Required fields are marked *