Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]
-
Updated
Apr 14, 2025 - Python
Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]
A multi-threaded GitHub scraper to collect Python code with docstrings from public repositories, creating a well-documented dataset for the JaraConverse LLM model.
This repository is for the paper Lexical Substitution as Causal Language Modeling. In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), Mexico City, Mexico. Association for Computational Linguistics.
Causal language modeling and intent classification using GPT-2.
A quick and easy way to interact with open-source LLMs.
An AI generated picturebook.
Fine-tuning (or training from scratch) the library models for language modeling on a text dataset for GPT, GPT-2, ALBERT, BERT, DitilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tuned using a causal language modeling (CLM) loss while ALBERT, BERT, DistilBERT and RoBERTa are trained or fine-tuned using a masked language modeling (MLM…
Change the Bert model to a GPT-style autoregressive decoder.
Add a description, image, and links to the causal-language-modeling topic page so that developers can more easily learn about it.
To associate your repository with the causal-language-modeling topic, visit your repo's landing page and select "manage topics."