Trained models with fast variant of the "best" LSTM models + legacy models
Find a file
Stefan Weil ced78752cc Rename frk -> deu_latf (ISO 639-3, ISO 15924)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-03-09 11:04:25 +01:00
script Add scripts from tessdata_best (converted to fast integer models) 2018-05-10 13:42:02 +02:00
tessconfigs@3decf1c825 Update tessconfigs 2019-10-23 13:31:04 +02:00
.gitmodules Update URL for tessconfigs submodule (use HTTPS) 2019-10-11 13:06:09 +02:00
afr.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
amh.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ara.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
asm.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
aze.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
aze_cyrl.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
bel.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ben.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
bod.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
bos.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
bre.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
bul.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
cat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ceb.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ces.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
chi_sim.traineddata Update traineddata LSTM model with best model converted to integer 2018-05-10 09:14:36 +02:00
chi_sim_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
chi_tra.traineddata Update traineddata LSTM model with best model converted to integer 2018-05-10 09:14:36 +02:00
chi_tra_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
chr.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
configs Add tessconfigs submodule and links for required tessdata files 2019-09-03 16:08:29 +02:00
cos.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
cym.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
dan.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
dan_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
deu.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
deu_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
deu_latf.traineddata Rename frk -> deu_latf (ISO 639-3, ISO 15924) 2024-03-09 11:04:25 +01:00
div.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
dzo.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
ell.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
eng.traineddata Remove cube components from traineddata and update version component 2018-05-10 08:57:17 +02:00
enm.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
epo.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
equ.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
est.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
eus.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
fao.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
fas.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
fil.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
fin.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
fra.traineddata Remove cube components from traineddata and update version component 2018-05-10 08:57:17 +02:00
frm.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
fry.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
gla.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
gle.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
glg.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
grc.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
guj.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
hat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
heb.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
hin.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
hrv.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
hun.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
hye.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
iku.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ind.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
isl.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ita.traineddata ita: Remove ita.config from ita.traineddata (fix issue #18) 2020-11-30 21:59:00 +01:00
ita_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
jav.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
jpn.traineddata Update traineddata LSTM model with best model converted to integer 2018-05-10 09:14:36 +02:00
jpn_vert.traineddata Remove parameter textord_tabfind_vertical_horizontal_mix 2018-03-29 12:51:24 +02:00
kan.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
kat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
kat_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
kaz.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
khm.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
kir.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
kmr.traineddata rename kur_ara.traineddata to kmr.traineddata 2019-03-14 13:12:03 +00:00
kor.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
kor_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
lao.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
lat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
lav.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
LICENSE Add Apache license file 2019-06-13 20:14:26 +02:00
lit.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ltz.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
mal.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
mar.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
mkd.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
mlt.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
mon.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
mri.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
msa.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
mya.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
nep.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
nld.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
nor.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
oci.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
ori.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
osd.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
pan.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
pdf.ttf Add tessconfigs submodule and links for required tessdata files 2019-09-03 16:08:29 +02:00
pol.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
por.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
pus.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
que.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
README.md Update README.md 2020-10-20 17:04:25 +02:00
ron.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
rus.traineddata Remove cube components from traineddata and update version component 2018-05-10 08:57:17 +02:00
san.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
sin.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
slk.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
slk_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
slv.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
snd.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
spa.traineddata Remove cube components from traineddata and update version component 2018-05-10 08:57:17 +02:00
spa_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
sqi.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
srp.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
srp_latn.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb 2018-03-22 15:37:57 +05:30
sun.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
swa.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
swe.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
syr.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
tam.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
tat.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
tel.traineddata Only int best model for hin, san, mar, nep, tel and kan 2018-03-22 17:24:26 +05:30
tgk.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
tgl.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
tha.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
tir.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
ton.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
tur.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
uig.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
ukr.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
urd.traineddata remove legacy model from indic and arabic script languages 2018-03-22 21:47:04 +05:30
uzb.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
uzb_cyrl.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
vie.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
yid.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30
yor.traineddata Update LSTM Models to integerized tessdata_best for files<25mb 2018-03-22 15:55:50 +05:30

tessdata

These language data files only work with Tesseract 4.0.0 and newer versions. They are based on the sources in tesseract-ocr/langdata on GitHub. (still to be updated for 4.0.0 - 20180322)

These have models for legacy tesseract engine (--oem 0) as well as the new LSTM neural net based engine (--oem 1).

The LSTM models (--oem 1) in these files have been updated to the integerized versions of tessdata_best on GitHub. So, they should be faster but probably a little less accurate than tessdata_best.

tessdata_fast on GitHub provides an alternate set of integerized LSTM models which have been built with a smaller network. tessdata_fast files are the ones packaged for Debian and Ubuntu.

The legacy tesseract models (--oem 0) have been removed for Indic and Arabic script language files.

tessdata for 3.04 or 3.05

Get language data files for Tesseract 3.04 or 3.05 from the 3.04 tree.

More information and a complete list of all languages is available in the Tesseract wiki.

All data in the repository are licensed under the Apache-2.0 License, see file LICENSE.