在 Windows 10 上安装 C++ tesseract
我在 Windows 10 上安装 tesseract 以使用 C++ 进行开发时遇到问题.
I am having problems while installing tesseract to develop in C++ on Windows 10.
谁能提供指南以获得:
1. Leptonica(tesseract需要)lib和includes
2. Tesseract 库和包含
3. 将两者都链接到项目(例如 Visual Studio)
所以这个例子来自 https://github.com/tesseract-ocr/tesseract/wiki/APIExample 有效:
Can anyone provide a guide to get:
1. Leptonica (required by tesseract) lib and includes
2. Tesseract lib and includes
3. Link both to project (e.g. Visual Studio)
so that example from https://github.com/tesseract-ocr/tesseract/wiki/APIExample works:
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
int main()
{
char *outText;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.
");
exit(1);
}
// Open input image with leptonica library
Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();
printf("OCR output:
%s", outText);
// Destroy used object and release memory
api->End();
delete[] outText;
pixDestroy(&image);
return 0;
}
推荐答案
几天来,我一直在尝试将 tesseract 库链接到我在 Visual Studio 2019 中的 c++ 项目,我终于设法做到了.我发现的任何线程甚至官方 tesseract 文档都没有完整的操作说明列表.
I've been trying to link tesseract library to my c++ project in Visual Studio 2019 for a couple of days and I finally managed to do it. Any thread that I found or even official tesseract documentation do not have full list of instructions on what to do.
我会列出我所做的,希望对某人有所帮助.我不假装这是这样做的最佳方式.
I'll list what I have done, hopefully it will help someone. I don't pretend its the optimal way to do so.
官方 tesseract 文档中有一些基本技巧.转到Windows"部分.我确实安装了
sw
和cppan
但我想没有必要.这里的主要内容是安装 vcpkg.它需要 Git 所以我安装了它.然后:
There are basic tips in official tesseract documentation. Go to "Windows" section. I did install
sw
andcppan
but I guess it wasn't necessary. The main thing here is installing vcpkg. It requiers Git so I installed it. then:
<代码>>cd c:tools(我安装在c: ools
,你可以选择任何目录)
> cd c:tools
(I installed it in c: ools
, you may choose any dir)
<代码>>git clone https://github.com/microsoft/vcpkg
<代码>>.vcpkgootstrap-vcpkg.bat
<代码>>.vcpkgvcpkg install tesseract:x64-windows-static(我用的是x64版本)
> .vcpkgvcpkg install tesseract:x64-windows-static
(I used x64 version)
<代码>>.vcpkgvcpkg 集成安装
在这一点上,一切都应该可以了,他们说.应该包括标题,应该链接库.但没有一个对我有用.
At this point everything should work, they said. Headers should be included, libs should be linked. But none was working for me.
将项目配置更改为 Release x64(如果您安装了 x86 tesseract,则为 Release x86).
Change project configuration to Release x64 (or Release x86 if you installed x86 tesseract).
要包含标题:转到项目属性 ->C/C++ ->一般的.将其他包含目录设置为 C: oolsvcpkginstalledx64-windows-staticinclude
(或安装 vcpkg 的任何位置)
To include headers: Go to project properties -> C/C++ -> General. Set Additional Include Directories to C: oolsvcpkginstalledx64-windows-staticinclude
(or whereever you installed vcpkg)
链接库:项目属性 ->链接器 ->一般的.将附加库目录设置为 C: oolsvcpkginstalledx64-windows-staticlib
To link libraries : project properties -> Linker -> General. Set Additional Library Directories to C: oolsvcpkginstalledx64-windows-staticlib
项目属性 ->C/C++ ->代码生成.将运行时库设置为 Multi-threaded(/MT)
.否则我会收到诸如运行时不匹配静态与 DLL"之类的错误
Project properties -> C/C++ -> Code Generation. Set Runtime Library to Multi-threaded(/MT)
. Otherwise I got errors like "runtime mismatch static vs DLL"
Tesseract lib 无法链接到其依赖项,因此我将已安装的所有库添加到 C: oolsvcpkginstalledx64-windows-staticlib
.项目属性 ->链接器 ->输入.我将附加依赖项设置为 archive.lib;bz2.lib;charset.lib;gif.lib;iconv.lib;jpeg.lib;leptonica-1.80.0.lib;libcrypto.lib;libpng16.lib;libssl.lib;libwebpmux.lib;libxml2.lib;lz4.lib;lzma.lib;lzo2.lib;openjp2.lib;tesseract41.lib;tiff.lib;tiffxx.lib;turbojpeg.lib;webp.lib;webpdecoder.lib;webpdemux.lib;xxhash.lib;zlib.lib;zstd_static.lib;%(AdditionalDependencies)
Tesseract lib couldn't link to its dependcies, so I added all libs that I had installed to C: oolsvcpkginstalledx64-windows-staticlib
.
Project properties -> Linker -> Input. I set Additional Dependencies to archive.lib;bz2.lib;charset.lib;gif.lib;iconv.lib;jpeg.lib;leptonica-1.80.0.lib;libcrypto.lib;libpng16.lib;libssl.lib;libwebpmux.lib;libxml2.lib;lz4.lib;lzma.lib;lzo2.lib;openjp2.lib;tesseract41.lib;tiff.lib;tiffxx.lib;turbojpeg.lib;webp.lib;webpdecoder.lib;webpdemux.lib;xxhash.lib;zlib.lib;zstd_static.lib;%(AdditionalDependencies)
之后它终于编译并启动了.
And after that it finally compiled and launched.
但是... api->Init
返回了 -1
.要使用 tesseract,您应该拥有 tessdata 目录,其中包含您需要的语言的 .traineddata 文件.
But... api->Init
returned -1
. To work with tesseract you should have tessdata directory with .traineddata files for the languages you need.
下载tessdata.我从 官方文档 得到它.顺便说一句,就我而言,tessdata_fast 比 tessdata_best 工作得更好:)所以我下载了单曲eng"文件并将其保存为
C: oolsTesseractData essdataeng.traineddata
.
Download tessdata. I got it from official docs. BTW, tessdata_fast worked better than tessdata_best for my purposes :) So I downloaded single "eng" file and saved it like
C: oolsTesseractData essdataeng.traineddata
.
然后我添加了环境变量 TESSDATA_PREFIX
,其值为 C: oolsTesseractData essdata
.我还将 C: oolsTesseractData
添加到 Path 变量(以防万一)
Then I added environment variable TESSDATA_PREFIX
with value C: oolsTesseractData essdata
. I also added C: oolsTesseractData
to Path variables (just in case)
毕竟这一切终于对我有用了.
And after all this it is finally working for me.
相关文章