SWIFT – Google Vision Text Detection Response

I am using Google vision api to perform text recognition on the receipt image. I got some good results, but the returned format is very unreliable. If there is a large gap between the text , The reading will print the following line instead of the next line.

For example, using the following Recipt Image I get the following response:

4x Löwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3,30 26,401
3x Hefe-Weissbier a 3,30 9,90 1
1x Saft 0,25
1x Grosses Wasser
1x Vegetarische Varia
1x Gyros
1x Baby Kalamari Gefu
2x Gyros Folie
1x Schafskäse Ofen
1x Bifteki Metaxa
1x Schweinefilet Meta
1x St ifado
1x Tee
2,50 1
2,40 1
9,90 1
8,90 1
12,90
a 9,9019,80 1
6,90 1
11,90 1
13,90 1
14,90 1
2,10 1

When trying to associate prices with text, etc., which started well and met expectations, but later became quite useful. The ideal response is as follows:

4x Löwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3,30 26,401
3x Hefe-Weissbier a 3,30 9,90 1
1x Saft 0,25 2,50 1
1x Grosses Wasser 2,40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 1
2x Gyros Folie a 9,9019,80 1
1x Schafskäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x St ifado 14,90 1
1x Tee 2,10 1

Or close to that.

Is there a formatting request that can be added to the api to get a different response? I have achieved success when using tessereact, where you can change the output format to achieve this result, and would like to know if there is something similar for the visual api.

I understand that the api returns the letter coordinates that can be used, But I hope I don’t have to go into that depth.

You can add feature hints to the JSON request. For this The image of the receipt, DOCUMENT_TEXT_DETECTION gives good results:

{
"requests": [
{
" image": {
"source": {
"imageUri": "https://i.stack.imgur.com/TRTXo.png"
}
},< br /> "features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}< /pre>

You can copy the above JSON and paste it into the Request Body in the Try This API pane on the documentation page. Results:

4x LOwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3, 3026, 40 1
3x Hefe-Weissbier a 3,30990 1
1x Saft 0,25 2, 50 1< br />1x Grosses Wasser 2, 40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 !
2x Gyros Folie a 9,9019, 80 1
1x Schaf skäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x Stifado 14, 90 1
1x Tee 2, 10 1

Currently , Googie Vision's configuration is far inferior to Tesseract. Because Google supports these two projects, so guess which project will get higher priority in the future?

I am using Google vision api to perform text recognition on the receipt image. I got some good results, but the format returned is very unreliable. If the text is between If there is a large gap, the reading will print the following line instead of the next line.

For example, using the following Recipt Image I get the following response:

4x Löwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3,30 26,401
3x Hefe-Weissbier a 3,30 9,90 1
1x Saft 0, 25
1x Grosses Wasser
1x Vegetarische Varia
1x Gyros
1x Baby Kalamari Gefu
2x Gyros Folie
1x Schafskäse Ofen
1x Bifteki Metaxa
1x Schweinefilet Meta
1x St ifado
1x Tee
2,50 1
2,40 1
9,90 1
8, 90 1
12,90
a 9,9019,80 1
6,90 1
11,90 1
13,90 1
14, 90 1
2,10 1

When trying to associate prices with text, etc., which started well and met expectations, but later became quite useful. The ideal response is as follows:

p>

4x Löwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3,30 26,401
3x Hefe-Weissbier a 3,30 9,90 1
1x Saft 0,25 2,50 1
1x Grosses Wasser 2,40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 1
2x Gyros Folie a 9 ,9019,80 1
1x Schafskäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x St ifado 14,90 1< br /> 1x Tee 2,10 1

Or close to that.

Is there a formatting request that can be added to the api to get a different response? I have achieved success when using tessereact, where you can change the output format to achieve this result, and would like to know if there is something similar for the visual api.

I understand that the api returns the letter coordinates that can be used, But I hope you don’t have to go into that depth.

You can add feature hints to the JSON request. For such a receipt image, DOCUMENT_TEXT_DETECTION gives good results :

{
"requests": [
{
"image": {
"source": {
"imageUri": "https://i.stack.imgur.com/TRTXo.png"
}
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}

You can copy the above JSON And paste it into the Request Body in the Try This API pane on the documentation page. Result:

4x LOwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3, 3026, 40 1
3x Hefe-Weissbier a 3,30990 1
1x Saft 0,25 2, 50 1
1x Grosses Wasser 2, 40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 !
2x Gyros Folie a 9,9019, 80 1
1x Schaf skäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Met a 13,90 1
1x Stifado 14, 90 1
1x Tee 2, 10 1

Currently, the configuration of Googie Vision is far inferior to Tesseract. Because Google supports these two projects, So guess which project will get higher priority in the future?

Leave a Comment

Your email address will not be published.