AI Explained: Computer Vision and Image to Text

2 min readNov 8, 2024

How can you create a solution to quickly see what is in an image? Meta released llama3.2-vision which can be used for this.

After you install ollama you are able to run ollama run llama3.2-vision

So now let’s ask a very specific question in a very concrete format:

be brief and put the answer about how many animals there are between <numberanimals></numberanimals> and the type of animal between <typeanimal></typeanimal>. Do not answer anything else. Here is the image: /home/remote/Desktop/animals.jpg

The answer should be:

<numberanimals>4</numberanimals><typeanimal>Giraffes</typeanimal>

Because the image was:

Now we can use this for a lot of business applications as well. Below is a sample invoice:

As simple prompt like:

Look at the image /home/remote/Desktop/invoice.png and put the name of the company who is on the invoice between <company></company>, their address between <address></address> and the total sum owned between <total></total>. Do not answer anything else.

will return:

<company>East Repair Inc.</company><address>1912 Harvest Lane, New York, NY 12210</address><total>$154.06</total>

You can clearly see how this can be of interest to many use cases. As with anything though, you need to be careful with hallucinations:

Look at the image /home/remote/Desktop/gold.png and put the month and year gold was at its peak between <highest_month></highest_month> and the price between <price></price>. Do not answer anything else

returned:

<highest_month>August 2020</highest_month><price>$2,075.05/oz</price>

Which was not the right answer for this image:

This blog post is part of a series of short AI explainers. Be sure to also check out:

RAG: Using private information to answer AI questions
LangChain: Using different tools, e.g. weather API, to answer AI questions.
Computer Vision: How to see what is in a image.
LLMs for content classification: How to let LLMs classify content in different categories.
LLM Performance: How to get the best performance from LLMs.
MLOps and launching AI in production
AI Explained: MCP — the USB of LLMs and the key to Agentic AI

If you are more interested in understanding in the business side of AI:

If your business needs help with AI, why don’t we connect?

AI Explained: Computer Vision and Image to Text

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Maarten Ectors

No responses yet