AI Explained: Computer Vision and Image to Text

Maarten Ectors
2 min readNov 8, 2024

--

How can you create a solution to quickly see what is in an image? Meta released llama3.2-vision which can be used for this.

After you install ollama you are able to run ollama run llama3.2-vision

So now let’s ask a very specific question in a very concrete format:

be brief and put the answer about how many animals there are between <numberanimals></numberanimals> and the type of animal between <typeanimal></typeanimal>. Do not answer anything else. Here is the image: /home/remote/Desktop/animals.jpg

The answer should be:

<numberanimals>4</numberanimals><typeanimal>Giraffes</typeanimal>

Because the image was:

Now we can use this for a lot of business applications as well. Below is a sample invoice:

As simple prompt like:

Look at the image /home/remote/Desktop/invoice.png and put the name of the company who is on the invoice between <company></company>, their address between <address></address> and the total sum owned between <total></total>. Do not answer anything else.

will return:

<company>East Repair Inc.</company><address>1912 Harvest Lane, New York, NY 12210</address><total>$154.06</total>

You can clearly see how this can be of interest to many use cases. As with anything though, you need to be careful with hallucinations:

Look at the image /home/remote/Desktop/gold.png and put the month and year gold was at its peak between <highest_month></highest_month> and the price between <price></price>. Do not answer anything else

returned:

<highest_month>August 2020</highest_month><price>$2,075.05/oz</price>

Which was not the right answer for this image:

This blog post is part of a series of short AI explainers. Be sure to also check out:

If your business needs help with AI, why don’t we connect?

--

--

Maarten Ectors
Maarten Ectors

Written by Maarten Ectors

Maarten leads Profit Growing Innovator. The focus is on helping businesses strategically transform through innovation and outrun disruption.