Page 1 of 1

Tesseract OCR

Posted: Tue 31 May 2016, 04:29
by Geoffrey
Tesseract OCR compiled in Slacko 6.3.0, command line OCR to read text from images.

tesseract-3.04.01.pet

Dependency

leptonica-1.73-i686.pet

I know very little about OCR, I downloaded the required eng.traineddata which is in this pet, it seems to work with a sample text image ok,
just something to play with, some might find it useful, really needs a GUI frontend to make things easier.

Code: Select all

# tesseract
Usage:
  tesseract --help | --help-psm | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  -psm NUM              Specify page segmentation mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.

Single options:
  -h, --help            Show this help message.
  --help-psm            Show page segmentation modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters to stdout.
Image

Re: Tesseract OCR

Posted: Tue 31 May 2016, 12:01
by rcrsn51
Geoffrey wrote:really needs a GUI frontend to make things easier.
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.

Try PuppyOCR. and compare.

Posted: Tue 31 May 2016, 12:30
by Pelo
Try PuppyOCR. and compare. Don't forget it, Puppy team often changesapplications as Tesseract to get easier job for Puppy's passengers. I used PuppyOcr and kept it not only in my tool case but on the cloud, available for downloads when needed.
A GUI has been added, (in english)

Re: Tesseract OCR

Posted: Tue 31 May 2016, 13:29
by Geoffrey
rcrsn51 wrote:
Geoffrey wrote:really needs a GUI frontend to make things easier.
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.
Yeah I spotted that soon after posting this, I'm in the process of packaging a Qt gui frontend for it, I know, that may seem a little on the heavy side, but it's nice to try something different.

Posted: Tue 31 May 2016, 15:08
by musher0
@Geoffrey: Is this (Tesseract) your own work?

Tesseract est une application multiOS

Posted: Tue 31 May 2016, 16:26
by Pelo
Tesseract est une application multios qui a été puppisée par PuppyOcr. Sur le forum ubuntu, il a y du courrier à son sujet.
Le logiciel ne fait pas tout et dépend beaucoup du document à océriser. C'est un travail de fourmi, pour des documents qui ont de la valeur.
Ubuntu documentation
Ubuntu review available in English.

Posted: Tue 31 May 2016, 22:35
by Geoffrey
musher0 wrote:@Geoffrey: Is this (Tesseract) your own work?
If you consider compiling it from source code my own work, then ya, both tesseract-3.04.01 and leptonica-1.73-i686 I compiled,
I stripped out the DEV files as that was over 60megs worth, the only thing I added was the eng.traineddata,
downloaded that as a eng.traineddata.gz from somewhere or other. :wink:

Confirmed that PuppyOCR does the job

Posted: Sun 31 Dec 2017, 08:41
by hamoudoudou
Confirmed that PuppyOCR does the job without installing Tesseract, or anyting else.. Devs did good job with this little application.
Only French dictionary to be added to launch analysis of wrong words
Feedback artfulpup