Analyze images to generate captions, detect objects, or perform OCR
Enhance and upscale images with face restoration
Segment objects in images and videos using text prompts
Generate depth maps from images
Tag images with labels
FitDiT is a high-fidelity virtual try-on model.
Convert images of screens to structured elements
Generate clickable coordinates on a screenshot
Interact with Florence-2 to analyze images and generate descriptions
Find similar images from a collection
Generate 3D depth maps from images and videos
Generate depth map from images
Extract text from images using OCR
Find similar images by uploading a photo
Segment body parts in images
Meta Llama3 8b with Llava Multimodal capabilities
Art Institute of Chicago Gallery
Find images matching a text query
Compute normals for images and videos
Vote on background-removed images to rank models