vision-language-model

Here are 412 public repositories matching this topic...

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated May 29, 2025
Python

QwenLM / Qwen-VL

Star

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-models vision-language-model

Updated Aug 7, 2024
Python

PKU-Alignment / align-anything

Star

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

Updated May 28, 2025
Jupyter Notebook

deepseek-ai / DeepSeek-VL

Star

DeepSeek-VL: Towards Real-World Vision-Language Understanding

foundation-models vision-language-pretraining vision-language-model

Updated Apr 24, 2024
Python

jingyaogong / minimind-v

Star

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

artificial-intelligence chatgpt vision-language-model

Updated Apr 27, 2025
Python

dvlab-research / MGM

Star

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

generation large-language-models vision-language-model

Updated May 4, 2024
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 26, 2025
Python

MiniMax-AI / MiniMax-01

Star

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

Updated Jun 5, 2025
Python

jingyi0000 / VLM_survey

Star

Collection of AWESOME vision-language models for vision tasks

computer-vision deep-learning survey transfer-learning clip knowledge-distillation vision-language-model multi-modal-model

Updated May 25, 2025

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

illuin-tech / colpali

Star

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

information-retrieval vision-language-model retrieval-augmented-generation colpali colqwen2 colsmol

Updated Jun 5, 2025
Python

AlibabaResearch / AdvancedLiterateMachinery

Star

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Updated Apr 9, 2025
C++

Blaizzy / mlx-vlm

Sponsor

Star

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics florence2 paligemma pixtral molmo

Updated Jun 4, 2025
Python

NVlabs / prismer

Star

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Jan 17, 2024
Python

showlab / ShowUI

Star

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

agent vision-language-model vision-language-action computer-use gui-agent

Updated May 29, 2025
Python

emcf / thepipe

Star

Get clean data from tricky documents, powered by vision-language models ⚡

python pdf web scraping openai document scrapers structured-data unstructured-data multimodal vision-transformer large-language-models vision-language-model

Updated Jun 2, 2025
Python

llm-jp / awesome-japanese-llm

Star

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jun 1, 2025
TypeScript

ByteDance-Seed / Seed1.5-VL

Star

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

cookbook large-language-model vision-language-model multimodal-large-language-models

Updated May 21, 2025
Jupyter Notebook

NVlabs / describe-anything

Star

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

vision-language-model large-multimodal-models describe-anything detailed-localized-captioning

Updated May 6, 2025
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-model

Here are 412 public repositories matching this topic...

haotian-liu / LLaVA

OpenGVLab / InternVL

QwenLM / Qwen-VL

PKU-Alignment / align-anything

deepseek-ai / DeepSeek-VL

jingyaogong / minimind-v

dvlab-research / MGM

InternLM / InternLM-XComposer

MiniMax-AI / MiniMax-01

jingyi0000 / VLM_survey

BAAI-Agents / Cradle

illuin-tech / colpali

AlibabaResearch / AdvancedLiterateMachinery

Blaizzy / mlx-vlm

NVlabs / prismer

showlab / ShowUI

emcf / thepipe

llm-jp / awesome-japanese-llm

ByteDance-Seed / Seed1.5-VL

NVlabs / describe-anything

Improve this page

Add this topic to your repo