multimodel-large-language-model

Here are 15 public repositories matching this topic...

dvlab-research / Seg-Zero

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

reinforcement-learning segmentation multimodal multimodel-large-language-model reasoning-language-models

Updated May 29, 2025
Python

🦙 echoOLlama: A real-time voice AI platform powered by local LLMs. Features WebSocket streaming, voice interactions, and OpenAI API compatibility. Built with FastAPI, Redis, and PostgreSQL. Perfect for private AI conversations and custom voice assistants.

agent docker docker-compose openai llama lgm realtime-api fastapi llm ollama llama3 multimodel-large-language-model

Updated Nov 9, 2024
Jupyter Notebook

xinyanghuang7 / Basic-Visual-Language-Model

Star

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

visual-language-learning large-language-models visual-language-models multimodel-large-language-model

Updated Jun 19, 2024
Python

BIGBALLON / GME-Search

Star

A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.

information-retrieval retrieval image-search image-retrieval universal-embedding composed-image-retrieval text-image-retrieval large-language-models multimodel-large-language-model

Updated Jan 1, 2025
Python

sun-hailong / TVC

Star

🎉 [ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.

reasoning r1 cot forgetting mllms multimodel-large-language-model

Updated May 16, 2025
Python

zhangguanghao523 / CMMCoT

Star

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl

Updated Apr 24, 2025
Python

charanhu / Assets_Youtube_Videos

Star

This repository showcases a collection of innovative projects by Charan H U, focusing on cutting-edge technologies such as facial emotion recognition, fitness tracking, and multi-model applications. Each project demonstrates practical implementations of advanced AI/ML techniques, making it a valuable resource for developers and researchers.

opencv machine-learning deep-learning neural-network agents rag llm generative-ai retrieval-augmented-generation multimodel-large-language-model

Updated Jan 10, 2025
Jupyter Notebook

Coding-Devil / AI-Multimodel-Hub

Star

AI multi-model using RAG and Langchain

rag langchain gen-ai multimodel-large-language-model

Updated Nov 22, 2024
Python

raminguyen / LLMP2

Star

Evaluating ‘Graphical Perception’ with Multimodal Large Language Models

computer-vision deep-learning visual-reasoning graphical-perception multimodel-large-language-model chart-intepretation

Updated May 22, 2025
Jupyter Notebook

iamafridi / elarova-2.0

Star

Elarova — A smart, multimodal research assistant designed to help students by combining speech, text, and other input modes for efficient academic research and study support. Powered by state-of-the-art speech recognition, text-to-speech, and AI models, including meta-llama/llama-4-scout-17b-16e-instruct, with an easy-to-use Gradio web interface.

meta ai llama groq multimodel-large-language-model

Updated May 30, 2025
Python

iamafridi / elaMath

Star

ElaMath is a smart, voice-enabled math assistant that helps students solve and understand math problems using both spoken questions and images. It’s powered by the powerful multimodal meta-llama/llama-4-scout-17b-16e-instruct model via Groq API, combined with Whisper for speech recognition and ElevenLabs/gTTS for natural voice responses.

ai llama groq groq-ai multimodel-large-language-model