Screenshot to HTML
Convert screenshots to HTML code
Microsoft Phi-3-Vision-128k
Generate image descriptions
VideoLLaMA2
Media understanding
ChartGemma
Generate insights from charts using text prompts
Mapping the AI OS community
Visualize AI network mapping: users and organizations
Paligemma Doc
Try PaliGemma on document understanding tasks
Joy Caption Alpha Two Vqa Test One
Ask questions about images and get detailed answers
PicQ
Demo for MiniCPM-o 2.6 to answer questions about images
Paligemma2 Vqav2
PaliGemma2 LoRA finetuned on VQAv2
Chinese LLaVA
Follow visual instructions in Chinese
OFA-Visual_Question_Answering
Answer questions about images
Llama 3.2V 11B Cot
Generate descriptions and answers by combining text and images
Compare Docvqa Models
Compare different visual question answering
EMNLP 2022 Papers
Display EMNLP 2022 papers on an interactive map
Magiv2 Demo
Transcribe manga chapters with character names
GET
Select a cell type to generate a gene expression plot
Experimental nanoLLaVA WebGPU
Generate answers by combining image and text inputs
Voronoi Cloth
Generate animated Voronoi patterns as cloth
GenAI Document QnA With Vision
Ask questions about text or images
Clembench
Browse and compare language model leaderboards