Talk with Your Images Using Gemini
Introduction
Have you ever wanted to extract meaningful insights from an image just by talking to it? Thanks to advancements in AI, you can now analyze and interact with your images using Google's Gemini AI. Whether it’s extracting text, identifying objects, or understanding complex visual elements, Gemini makes it easier than ever to engage with images in a conversational way.
Why Use AI for Image Analysis?
Traditionally, analyzing an image required complex computer vision techniques, but AI models like Gemini simplify the process by offering:
- Automated Image Interpretation – Extracts text, objects, and contextual insights.
- Conversational Responses – Allows you to interact with your images naturally.
- Scalability – Works efficiently across multiple images with ease.
By encoding images into base64 and passing them to Gemini, we can leverage these capabilities seamlessly.
Setting Up the Environment
Before we dive into the code, ensure you have the necessary dependencies installed:
pip install --upgrade --quiet google-genai
Additionally, you need access to the Google Vertex AI platform with Gemini enabled in your project!
Encoding Images to Base64
To send images to Gemini, we first need to convert them into a format that AI models can understand—base64 encoding. Here’s how you can do it:
import base64
from pathlib import Path
def encode_image_to_base64(image_path: str) -> str | None:
image_file = Path(image_path)
if not image_file.is_file():
print(f"Error: File not found - {image_path}")
return None
try:
return base64.b64encode(image_file.read_bytes()).decode("utf-8")
except Exception as e:
print(f"Error encoding image: {e}")
return None
Getting AI Responses from Gemini
Once we have the base64-encoded image, we can pass it to Gemini for analysis.
Note: You need to setup the Google Cloud CLI in your system and authenticate it before running the code.
Code:
import genai
from genai.types import Part
# Get the details of the project ID, location from Vertex AI Platform / Google Cloud
PROJECT_ID = "your-project-id"
LOCATION = "your-location"
MODEL_ID = "gemini-model-id"
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
def get_gemini_response(base64_image: str) -> str | None:
prompt = (
"You are an expert in analyzing images. Please extract all information from the provided image.\n"
"Response Format: Simple Text, no markdown; no bullet points."
)
try:
response = client.models.generate_content(
model=MODEL_ID,
contents=[
Part.from_bytes(data=base64.b64decode(base64_image), mime_type="image/jpeg"),
prompt,
],
)
return response.text if response else None
except Exception as e:
print(f"Error getting AI response: {e}")
return None
Running the Complete Script
Now, let’s put everything together and test our AI-powered image analysis:
def main():
image_path = "download.jpg"
base64_image = encode_image_to_base64(image_path)
if base64_image:
response = get_gemini_response(base64_image)
if response:
print("Response received:\n", response)
else:
print("Failed to get a response from Gemini.")
else:
print("Failed to encode the image.")
if __name__ == "__main__":
main()
Test Example
For testing, let's assume we provide an image of a cat as input. The output response from Gemini could be:
Response received:
Here's what I can tell about the image: It's a close-up shot of a tabby cat. The cat has a brown and black striped coat, and its eyes appear to be a shade of green or yellow. The background is a dark solid color.
Conclusion
With just a few lines of Python code, you can now talk to your images and extract valuable insights using Google's Gemini AI. Whether you're analyzing historical documents, identifying objects, or automating workflows, this technique opens up a world of possibilities.
Next Steps
- Try using different images and observe the responses.
- Experiment with different prompts for varied insights.
- Integrate Gemini’s image analysis with chatbots or automation tools.
Got any cool ideas for using Gemini with images? Share them in the comments! 🚀
Code
https://github.com/saswatsamal/talkwithphotowithgemini/
Acknowledgments
This is a project built during the Vertex sprints held by Google’s ML Developer Programs team. Thanks to the MLDP Team for their generous support in providing GCP credits units to help facilitate this project.
Comments
Post a Comment