如何在 Visual Studio 2010 中实现 Tesseract 与项目一起运行

我在 Visual Studio 2010 中有一个 C++ 项目并希望使用 OCR.我遇到了许多 Tesseract 的教程",但遗憾的是,我得到的只是头疼和浪费时间.

I have a C++ project in Visual Studio 2010 and wish to use OCR. I came across many "tutorials" for Tesseract but sadly, all I got was a headache and wasted time.

在我的项目中,我将图像存储为 Mat.我的问题的一种解决方案是将此 Mat 保存为图像(例如 image.jpg),然后像这样调用 Tesseract 可执行文件:

In my project I have an image stored as a Mat. One solution to my problem is to save this Mat as an image (image.jpg for example) and then call Tesseract executable file like this:

system("tesseract.exe image.jpg out");

这让我得到一个输出 out.txt 然后我调用

Which gets me an output out.txt and then I call

infile.open ("out.txt");

读取 Tesseract 的输出.

to read the output from Tesseract.

一切都很好,像椅子一样工作,但它不是最佳解决方案.在我的项目中,我正在处理一个视频,所以 save/call .exe/write/read 在 10+ FPS 并不是我真正想要的.我想对现有代码实现 Tesseract,以便能够将 Mat 作为参数传递并立即获得字符串形式的结果.

It is all good and works like a chair but it is not an optimal solution. In my project I am processing a video so save/call .exe/write/read at 10+ FPS is not what I am really looking for. I want to implement Tesseract to existing code so to be able to pass a Mat as an argument and immediately get a result as a String.

您是否知道使用 Visual Studio 2010 实现 Tesseract OCR 的任何好的教程(pref. step-by-step)?还是您自己的解决方案?

Do you know any good tutorial(pref. step-by-step) to implement Tesseract OCR with Visual Studio 2010? Or your own solution?


好的,我想通了,但它仅适用于 Release 和 Win32 配置(无调试或x64).Debug配置下有很多链接错误.

OK, I figured it out but it works for Release and Win32 configuration only (No debug or x64). There are many linking errors under Debug configuration.


1. 首先,在这里下载准备好的库文件夹(Tesseract + Leptonica):

1. First of all, download prepared library folder(Tesseract + Leptonica) here:

镜像 1(Google 云端硬盘)

镜像 2(MediaFire)

2. 将 tesseract.zip 解压到 C:

3. 在 Visual Studio 中,转到 C/C++ >一般>其他包含目录

3. In Visual Studio, go under C/C++ > General > Additional Include Directories

插入C: esseractinclude

4. 在 Linker > 下一般>其他图书馆目录

插入C: esseractlib

5. 在 Linker > 下输入 >额外的依赖





Sample code should look like this:

#include <tesseractaseapi.h>
#include <leptonicaallheaders.h>
#include <iostream>

using namespace std;

int main(void){

    tesseract::TessBaseAPI api;
    api.Init("", "eng", tesseract::OEM_DEFAULT);

    cout<<"File name:";
    char image[256];
    PIX   *pixs = pixRead(image);

    STRING text_out;
    api.ProcessPages(image, NULL, 0, &text_out);



有关与 OpenCV 和 Mat 类型图像的交互,请查看此处

For interaction with OpenCV and Mat type images look HERE
