Tesseract OCR

Message

Geoffrey · #1 Post by **Geoffrey** » Tue 31 May 2016, 04:29

Tesseract OCR compiled in Slacko 6.3.0, command line OCR to read text from images.

tesseract-3.04.01.pet

Dependency

leptonica-1.73-i686.pet

I know very little about OCR, I downloaded the required eng.traineddata which is in this pet, it seems to work with a sample text image ok,
just something to play with, some might find it useful, really needs a GUI frontend to make things easier.

Code: Select all

# tesseract
Usage:
  tesseract --help | --help-psm | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  -psm NUM              Specify page segmentation mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.

Single options:
  -h, --help            Show this help message.
  --help-psm            Show page segmentation modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters to stdout.

rcrsn51 · #2 Post by **rcrsn51** » Tue 31 May 2016, 12:01

Geoffrey wrote:really needs a GUI frontend to make things easier.

There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.

Pelo · #3 Post by **Pelo** » Tue 31 May 2016, 12:30

Try PuppyOCR. and compare. Don't forget it, Puppy team often changesapplications as Tesseract to get easier job for Puppy's passengers. I used PuppyOcr and kept it not only in my tool case but on the cloud, available for downloads when needed.
A GUI has been added, (in english)

Geoffrey · #4 Post by **Geoffrey** » Tue 31 May 2016, 13:29

rcrsn51 wrote:
Geoffrey wrote:really needs a GUI frontend to make things easier.
There already is one. It's called pic2txt and it's been around for years. If you look at some of the other OCR threads in this section, you will find references to it.

Yeah I spotted that soon after posting this, I'm in the process of packaging a Qt gui frontend for it, I know, that may seem a little on the heavy side, but it's nice to try something different.

musher0 · #5 Post by **musher0** » Tue 31 May 2016, 15:08

@Geoffrey: Is this (Tesseract) your own work?

Pelo · #6 Post by **Pelo** » Tue 31 May 2016, 16:26

Tesseract est une application multios qui a été puppisée par PuppyOcr. Sur le forum ubuntu, il a y du courrier à son sujet.
Le logiciel ne fait pas tout et dépend beaucoup du document à océriser. C'est un travail de fourmi, pour des documents qui ont de la valeur.
Ubuntu documentation
Ubuntu review available in English.

Geoffrey · #7 Post by **Geoffrey** » Tue 31 May 2016, 22:35

musher0 wrote:@Geoffrey: Is this (Tesseract) your own work?

If you consider compiling it from source code my own work, then ya, both tesseract-3.04.01 and leptonica-1.73-i686 I compiled,
I stripped out the DEV files as that was over 60megs worth, the only thing I added was the eng.traineddata,
downloaded that as a eng.traineddata.gz from somewhere or other.

hamoudoudou · #8 Post by **hamoudoudou** » Sun 31 Dec 2017, 08:41

Confirmed that PuppyOCR does the job without installing Tesseract, or anyting else.. Devs did good job with this little application.
Only French dictionary to be added to launch analysis of wrong words
Feedback artfulpup

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Tesseract OCR

Tesseract OCR

Re: Tesseract OCR

Try PuppyOCR. and compare.

Re: Tesseract OCR

Tesseract est une application multiOS

Confirmed that PuppyOCR does the job