AI Explained: Computer Vision and Image to Text
How can you create a solution to quickly see what is in an image? Meta released llama3.2-vision which can be used for this.
After you install ollama you are able to run ollama run llama3.2-vision
So now let’s ask a very specific question in a very concrete format:
be brief and put the answer about how many animals there are between <numberanimals></numberanimals> and the type of animal between <typeanimal></typeanimal>. Do not answer anything else. Here is the image: /home/remote/Desktop/animals.jpg
The answer should be:
<numberanimals>4</numberanimals><typeanimal>Giraffes</typeanimal>
Because the image was:
Now we can use this for a lot of business applications as well. Below is a sample invoice:
As simple prompt like:
Look at the image /home/remote/Desktop/invoice.png and put the name of the company who is on the invoice between <company></company>, their address between <address></address> and the total sum owned between <total></total>. Do not answer anything else.
will return:
<company>East Repair Inc.</company><address>1912 Harvest Lane, New York, NY 12210</address><total>$154.06</total>
You can clearly see how this can be of interest to many use cases. As with anything though, you need to be careful with hallucinations:
Look at the image /home/remote/Desktop/gold.png and put the month and year gold was at its peak between <highest_month></highest_month> and the price between <price></price>. Do not answer anything else
returned:
<highest_month>August 2020</highest_month><price>$2,075.05/oz</price>
Which was not the right answer for this image:
This blog post is part of a series of short AI explainers. Be sure to also check out:
- RAG: Using private information to answer AI questions
- LangChain: Using different tools, e.g. weather API, to answer AI questions.
- Computer Vision: How to see what is in a image.
- LLMs for content classification: How to let LLMs classify content in different categories.
- LLM Performance: How to get the best performance from LLMs.
If your business needs help with AI, why don’t we connect?