My daughter is in Primary 2, my son in Kindergarten 2. As they are part of Singapore’s education system, this also means for them that they learn Chinese (at varying levels) as their mother tongue, and also need to do some weekly reviews and practice spelling at home. For example, my son needs to practice the following spelling for this week:

Example of some (simple) homework

So basically, there are some words that they need to practice writing, and for my daughter also some short sentences. As a parent, I am of course trying to support them as much as possible with this, and help them whenever and however I can. But as I know very limited Chinese, there’s only so much I can do there. The two biggest problems for me are:

  1. I can only recognise a few characters and know their Hanyu Pinyin (the romanisation of the Chinese characters), so I can’t read out the text to my children. For example, for #5 above I know yi (one) and ma (horse), but do not know the second character. So if I were to try to read the three characters for #5, I would only be able to say “yi something ma”. For #1, I don’t know the characters at all, so I wouldn’t be of any help here.|
  2. I do not know the meaning of the characters, and what they translate into in English. For the few characters that I recognise, this is usually not enough. Additionally, when you use multiple characters to form a word, I would usually not know their translation either. To give an example (which I know), xiao means small and xin means heart. What’s “xiao xin”? “Careful” – something I learned before, but wouldn’t know by just knowing the individual characters.

So, to help them a bit more, I would need to be able to read the given text by having the corresponding Hanyu Pinyin available, and also have a translation as additional support.

Idea: Image analysis and translation with Azure Cognitive Services

This is where I suddenly thought of Azure Cognitive Services: “Cognitive Services brings AI within reach of every developer—without requiring machine learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Enable developers of all skill levels to easily add AI capabilities to their apps.” Specifically, I was thinking of the following process:

  1. Take a picture of the homework
  2. Run the picture through some OCR tool to retrieve all text in it. Here, the Text recognition component of Computer Vision should be able to help me
  3. Use the Translator component to translate the Chinese text into English, and also convert it into Hanyu Pinyin
  4. Put it all together by creating a new image consisting of three parts:
    1. The original homework image on the left
    2. The English translation in the middle
    3. The Hanyu Pinyin transliteration on the right

A relatively straight forward process, and I wanted to do this directly on my laptop as much as possible. The good thing (well, one of many) about Computer Vision and Translator is that both provide a REST API, so you can call them from whichever programming language you prefer. As I wanted to get something done quickly, I decided to proceed with PowerShell.

But first, I had to set up the required resources in Azure. I set up a dedicated Resource Group, and deployed an instance of Computer Vision and Translator respectively. Even better, there are free tiers that offer you a good amount of calls (more than I’ll actually need)!

Once deployed, I also grabbed the respective keys for the the 2 services, as I will need them to call the corresponding REST APIs from my PowerShell script

The PowerShell Script

And with that, it’s time to have a look at the script! To give it some flexibility, I use two parameters: sourcefile, where I pass the image file I want to analyse, and fontsize (default set to 16px) to allow me to change the font size as needed. The beginning of the script defines a few additional variables (the keys, e.g.). Please note that I hardcoded the languages (Chinese, English, Hanyu Pinyin) as part of the relevant parameters in the different URLs. And as a first step, I’m calling the Computer Vision API to extract the text from the provided image

param([Parameter(Mandatory)]$sourcefile, [int]$fontsize=16)

Add-Type -AssemblyName System.Drawing

$cvkey = "<insert your Computer Vision Key>"
$trkey = "<insert your Translator Key>"
$trlocation = "insert your Translator location, e.g. southeastasia"
$trURL = "https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from=zh-Hans&to=en"
$trlURL = "https://api.cognitive.microsofttranslator.com/transliterate?api-version=3.0&language=zh-Hans&fromScript=Hans&toScript=Latn"
$ocrURL = "https://southeastasia.api.cognitive.microsoft.com/vision/v3.2/ocr?language=zh-Hans"

$bytes = [System.IO.File]::ReadAllBytes($sourcefile)

Write-Host -ForegroundColor Green "Analysing image"
$submitimage = Invoke-RestMethod -Uri $ocrURL -Method POST `
                  -Body $bytes `
                  -Headers @{'Content-Type'= 'application/octet-stream'; 'Ocp-Apim-Subscription-Key' = $cvkey} 

As a second step, I’m preparing the output image. I use triple the width of the source image, and copy the source image to the left side of it.

$sourceimage = new-object System.Drawing.Bitmap $sourcefile
$filename = $sourcefile.Insert($sourcefile.LastIndexOf("."),"-tr") 
$bmp = new-object System.Drawing.Bitmap (($sourceimage.Width*3)+2),$sourceimage.Height 

$font = new-object System.Drawing.Font "Arial", $fontsize
$brushBg = [System.Drawing.Brushes]::White 
$brushFg = [System.Drawing.Brushes]::Black 
$brushPY = [System.Drawing.Brushes]::Blue

$graphics = [System.Drawing.Graphics]::FromImage($bmp) 
$graphics.FillRectangle($brushBg,0,0,$bmp.Width,$bmp.Height) 
$graphics.DrawImage($sourceimage, 0, 0, $sourceimage.Width, $sourceimage.Height)
$graphics.DrawLine($brushFg, ($sourceimage.Width+1),0,($sourceimage.Width+1),$sourceimage.Height)
$graphics.DrawLine($brushFg, (($sourceimage.Width*2)+2),0,(($sourceimage.Width*2)+2),$sourceimage.Height)

Finally, we start the second interesting part. For the text that we extracted earlier, we run it through the translation (Chinese to English) and transliteration (getting the Hanyu Pinyin) web services. As we receive a JSON response from the Computer Vision OCR call, we process it appropriately. Also, as the Chinese characters are listed individually in the response as part of regions and lines, we combine them where it makes sense so that we form the correct full sentences. So we review each region for the corresponding lines, and each line that contains multiple words will be transformed into a sentence. Furthermore, we also receive some information about the location of the individual characters as part of the response, indicated by the boundingBox. So we know where the text is actually located in the source image, and can thus also place the translation and transliteration into the corresponding areas of our new image.

Extract of a Computer Vision OCR response
foreach($region in $submitimage.regions) {
    foreach($line in $region.lines) {
        $lineText = "";
        foreach($word in $line.words) {
            $boundingBox = $word.boundingBox -split ","
            $lineText += $word.text            
        }
        $boundingBox = $line.words[0].boundingBox -split ","
        $lineText | Out-File -Append -Encoding utf8 -FilePath output.txt
        $headers = @{}
        $headers.Add("Ocp-Apim-Subscription-Key",$trkey)
        $headers.Add("Content-Type","application/json; charset=utf-8")
        $headers.Add("Ocp-Apim-Subscription-Region",$trlocation)
        $lineText = @{'Text' = $($lineText)}
        $lineText = $lineText | ConvertTo-Json

        $translationResults = Invoke-RestMethod -Method POST -Uri $trURL -Headers $headers -Body ([System.Text.Encoding]::UTF8.GetBytes("[$($lineText)]"))
        $translation = $translationResults.translations[0].text
        $conversionResult = Invoke-RestMethod -Method POST -Uri $trlURL -Headers $headers -Body ([System.Text.Encoding]::UTF8.GetBytes("[$($lineText)]"))
        $conversion = $conversionResult.text
        $graphics.DrawString($translation,$font,$brushFg,([int]$boundingBox[0]+$sourceimage.Width+1),$boundingBox[1]) 
        $graphics.DrawString($conversion,$font,$brushPY,([int]$boundingBox[0]+($sourceimage.Width*2)+2),$boundingBox[1]) 
    }
}

Write-Host -ForegroundColor Green "All done"

$graphics.Dispose() 
$bmp.Save($filename) 
Invoke-Item $filename  

Putting it all together, when we run the image of the homework sheet from the beginning of this post through this script, we receive the following image as the result:

End result after processing the image containing the homework. From left to right: original image, English translation, Hanyu Pinyin
A more complex example

With the above it is much easier for me to sit down with my children and go through their homework with them! I simply take a photo of their homework, and run it through the script above to get my supporting image.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.