/jp/ - Otaku Culture

Anonymous Tue Aug 29 00:22:07 2017 No.17549462 [View]
File: 234 KB, 572x279, file.png [View same] [iqdb] [saucenao] [google]

I just added OCR output repair logic to Spark Reader.

I don't really want to post about this because I'm probably going to get called a shill and hurt Spark Reader's reputation (if it had one), but I guess this might help someone more than it hurts SR's reputation. Correcting OCR manually is really fucking annoying. I see people complain about OCR like capture2text all the time because it's bad, but if it worked better it would be easier for beginners to read manga without furigana.

This takes advantage of the fact that SR has a sort-of kind-of intelligent parser (better than chiitrans and kanji.moe, at least) to look for characters that are problematic for OCR and try to replace them with similar-looking characters that give the sentence a better-seeming parse. The logic is really stupid but it works well for the worst cases. This makes capture2text usable, basically, and it's probably one of the few uses of parsing that aren't bad for learning.

Before:

>元の場所まで
>っれてってゃろ

After:

>元の場所まで
>つれてってやろ

Before:

>大丈夫`おれが
>手をっかんでて
>ゃろからょ!

After:

>大丈夫`おれが
>手をつかんでて
>やろからよ!

(It doesn't delete stray nonsense characters like `)

It messes up sometimes, so it runs on a per-line basis, not on everything SR has loaded on screen.

Before:

>ゅめゆめ
>藁悟しておく
>ように、ニヤ

After:

>ゆめゆめ
>藁悟レておく
>ように、ニャ

Here you would run it on the first and third lines and not the second one. If you use capture2text, change capture2text's settings so that it doesn't remove newlines.

It's in the right click menu for lines of text, right under "I know this word". https://github.com/wareya/Spark-Reader/releases/tag/rollingtestrelease

The examples are from the first few pages of 夢喰いメリー.

Advanced search
Text to find
Subject [?]Search by post subject. Leave empty for any.
Username [?]Search for user name. Leave empty for any user name.
Tripcode [?]Search for tripcode. Leave empty for any.
Email [?]Search by email. Leave empty for any.
Filename [?]Search by image filename. Leave empty for any.
From Date [?]Enter what date to start searching from. Format is YYYY-MM-DD
To Date [?]Enter what date to start searching until. Format is YYYY-MM-DD
Image hash
Search in	All Posts OPs Only
Deleted posts	Show all posts Show only deleted posts Only show non-deleted posts
Internal posts	Show all posts Show only internal posts Show only archived posts
Order	New posts first Old posts first
Capcode	All Posts Only by Users Only by Mods Only by Admins Only by Developers
Results	Posts Threads
Action	[ Simple ]

Navigation
View posts	[+24]	[+48]	[+96]

/jp/ - Otaku Culture

Search: