Flow and Azure Cognitive Services Vision Service-OCR

Last time we looked at generating thumbnail images via the Vision Service. This time we’re going to look at using the service to get text out of a photograph of a document utilizing the the Vision Service’s Optical Character Recognition (OCR) process.

Photo by Miguel Á. Padriñán from Pexels

Button, Button

We’re going to use a Flow button trigger from the Flow phone app to start this flow. In Microsoft Flow, select New -> Instant – From blank. Give it a name and select “From Microsoft Flow”.

Create instant flow dialogue from Flow.

We only need a single input of type file for our Flow button app. Go ahead and add the input and give it an appropriate label and prompt text.

Button dialogue with single input asking for a file

If you’re not familiar with how to use the Flow mobile app for phones, you can refer to my previous blog posts on the topic.

Look and See

Now that we have our input, we can send it to the Vision Service. Add an action and search for “computer vision api”. Select that connector and the “Optical Character Recognition (OCR) to Text” action. (Note that this is currently designated as in preview).

Select the Image Source as type “Image Content” and the Image Content as the Image output from the trigger.

OCR step showing source as image content and content as the image output from the button trigger

Put It In An Email

The last step is to do something with the text. The options are only as limited as Flow itself. You could throw it in a text document, add it to an Excel spreadsheet, send it back to Cognitive Services for sentiment analysis or anything else.

In this case we’re just going to email it back to ourselves. Add a new action at the end and search for whatever email sending option you’d like to use. In my case, I’ll use the Office 365 “Send an email (V2)” action.

For the content of the email body, I’ll just select the “Detected Text” output from the OCR step. That’s it. Save and test it.

A couple of things to remember. First, images for OCR must be no bigger than 4200×4200 pixels in dimensions, and no larger than 20MB in size. Images that are too big will error out. Also, remember that the quality of the image will obviously affect the quality of the OCR.

The Tests

Here’s the original picture taken of the packaging from a docking station.

And here’s the text that I received. Obviously, there is no formatting or anything of the like. But for a good picture, the accuracy is pretty good.


dynadock@ U3.O UNIVERSAL USB DOCKING STATIONFeaturing: Dual Monitor Support Use your laptop screen, plus two additional monitors, up to 2048 x 1152 resolution each. Six (6) USB Ports With so many, you'll never have to decide which one to remove. Gigabit Ethernet Port When you need to connect to a high-performance network. 5.1 Channel Audio Because for some, stereo sound just isn't enough. USB Sleep & Charge. Lets you power your smart phone and other portables even when your laptop is off or not connected.Front CISub

Now, here’s a second test with a low quality image taken of text on a computer monitor.

And here’s the text that was output. You can see the accuracy is near zero. So picture quality makes a huge difference.


Chapter 0; To SKerlock Holmes, kne ismways THÉ woman, havg„ seld9P1 Yfärd Yim rpention her under any other name, In eyes shé eclißSes and, nates theQmoieoof her sex,/' was not that fie fell any emotion akin to love for Irene Adler. Ali emo-* tioåS, ang !hat 'éne pg,rticu14rly, $Tere abhorrent to his eold, ikeciSé, but #491irÅ$ly b@lanced mind. He take Athe fiaostl perfeci. reasoning and observing, machine tfia{ thehworlå fias seen, but as a lover ne wouiå have placed hlmselfin a false pos1Uon,//He never s Oke of theosofter passions, save witha gibe ana asneef10They were admirable things forthe oßserver=exeellen!' (or drawing the veil fromwmen'smotives and acti6ns. fog the

Fin

There are a number of apps out there by Microsoft, Google and others that can OCR text from a picture, and they do a much better job of it than our little flow here. But that’s ok. It’s still fun to play with the OCR and other Vision services. And it might do in a pinch if company policy allows using Flow, but not those other apps. Something to keep in mind.

The final flow should look something like this:

final flow showing all three steps

Leave a Reply