icon

Building your digital trust here.

We are dedicated to securing the applications driving our world. We help companies build and scale strong, resilient digital trust while ensuring compliance with industry standards and regulations. Our advanced solutions provide comprehensive protection from code to cloud.

LLM Research Papers: The 2024 List


It’s been a very eventful and exciting year in AI research. This is especially true if you are interested in LLMs.

I had big plans for this December edition and was planning to publish a new article with a discussion of all my research highlights from 2024. I still plan to do so, but due to an accident and serious injury, I am currently unable to work at a computer and finish the draft. But I hope to recover in the upcoming weeks and be back on my feet soon.

In the meantime, I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It’s just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays.

Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks!

  • 1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models, https://arxiv.org/abs/2401.00788

  • 2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models, https://arxiv.org/abs/2401.01286

  • 2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, https://arxiv.org/abs/2401.01325

  • 2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, https://arxiv.org/abs/2401.01335

  • 2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer, https://arxiv.org/abs/2401.01055

  • 3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity, https://arxiv.org/abs/2401.01967

  • 4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion, https://arxiv.org/abs/2401.02415

  • 4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition, https://arxiv.org/abs/2401.02412

  • 4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM, https://arxiv.org/abs/2401.02994

  • 5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, https://arxiv.org/abs/2401.02954

  • 5 Jan, Denoising Vision Transformers, https://arxiv.org/abs/2401.02957

  • 7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon, https://arxiv.org/abs/2401.03462

  • 8 Jan, Mixtral of Experts, https://arxiv.org/abs/2401.04088

  • 8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts, https://arxiv.org/abs/2401.04081

  • 8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2401.04056

  • 9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation, https://arxiv.org/abs/2401.04679

  • 10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, https://arxiv.org/abs/2401.05566

  • 11 Jan, Transformers are Multi-State RNNs, https://arxiv.org/abs/2401.06104

  • 11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance, https://arxiv.org/abs/2401.06091

  • 12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models, https://arxiv.org/abs/2401.06692

  • 16 Jan, Tuning Language Models by Proxy, https://arxiv.org/abs/2401.08565

  • 16 Jan, Scalable Pre-training of Large Autoregressive Image Models, https://arxiv.org/abs/2401.08541

  • 16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, https://arxiv.org/abs/2401.08500

  • 16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406

  • 17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967

  • 18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System, https://arxiv.org/abs/2401.10061

  • 18 Jan, Self-Rewarding Language Models, https://arxiv.org/abs/2401.10020

  • 18 Jan, VMamba: Visual State Space Model, https://arxiv.org/abs/2401.10166

  • 19 Jan, Knowledge Fusion of Large Language Models, https://arxiv.org/abs/2401.10491

  • 22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, https://arxiv.org/abs/2401.12168

  • 22 Jan, WARM: On the Benefits of Weight Averaged Reward Models, https://arxiv.org/abs/2401.12187

  • 22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, https://arxiv.org/abs/2401.12070

  • 24 Jan, MambaByte: Token-free Selective State Space Model, https://arxiv.org/abs/2401.13660

  • 24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection, https://arxiv.org/abs/2401.13160

  • 25 Jan, Rethinking Patch Dependence for Masked Autoencoders, https://arxiv.org/abs/2401.14391

  • 25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes, https://arxiv.org/abs/2401.14398

  • 25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities, https://arxiv.org/abs/2401.14405

  • 26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, https://arxiv.org/abs/2401.15077

  • 29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, https://arxiv.org/abs/2401.15947

  • 29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380

  • 31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, https://arxiv.org/abs/2401.18079

  • 1 Feb, Efficient Exploration for LLMs, https://arxiv.org/abs/2402.00396

  • 1 Feb, OLMo: Accelerating the Science of Language Models, https://arxiv.org/abs/2402.00838

  • 1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?, https://arxiv.org/abs/2402.00841

  • 1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying, https://arxiv.org/abs/2402.01032

  • 2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank, https://arxiv.org/abs/2402.01878

  • 2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild, https://arxiv.org/abs/2402.01355

  • 3 Feb, More Agents Is All You Need, https://arxiv.org/abs/2402.05120

  • 5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, https://arxiv.org/abs/2402.03300

  • 6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model, https://arxiv.org/abs/2402.03766

  • 6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention, https://arxiv.org/abs/2402.03902

  • 6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models, https://arxiv.org/abs/2402.04177

  • 6 Feb, MOMENT: A Family of Open Time-series Foundation Models, https://arxiv.org/abs/2402.03885

  • 6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, https://arxiv.org/abs/2402.03749

  • 6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620

  • 7 Feb, Grandmaster-Level Chess Without Search, https://arxiv.org/abs/2402.04494

  • 7 Feb, Direct Language Model Alignment from Online AI Feedback, https://arxiv.org/abs/2402.04792

  • 8 Feb, Buffer Overflow in Mixture of Experts, https://arxiv.org/abs/2402.05526

  • 9 Feb, The Boundary of Neural Network Trainability is Fractal, https://arxiv.org/abs/2402.06184

  • 11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF, https://arxiv.org/abs/2402.07319

  • 12 Feb, Policy Improvement using Language Feedback Models, https://arxiv.org/abs/2402.07876

  • 12 Feb, Scaling Laws for Fine-Grained Mixture of Experts, https://arxiv.org/abs/2402.07871

  • 12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model, https://arxiv.org/abs/2402.07610

  • 12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping, https://arxiv.org/abs/2402.07610

  • 12 Feb, Suppressing Pink Elephants with Direct Principle Feedback, https://arxiv.org/abs/2402.07896

  • 13 Feb, World Model on Million-Length Video And Language With RingAttention, https://arxiv.org/abs/2402.08268

  • 13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL, https://arxiv.org/abs/2402.08609

  • 14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation, https://arxiv.org/abs/2402.09353

  • 14 Feb, Transformers Can Achieve Length Generalization But Not Robustly, https://arxiv.org/abs/2402.09371

  • 15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data, https://arxiv.org/abs/2402.08093

  • 15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models, https://arxiv.org/abs/2402.10208

  • 15 Feb, Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906

  • 16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models, https://arxiv.org/abs/2402.10986

  • 17 Feb, OneBit: Towards Extremely Low-bit Large Language Models, https://arxiv.org/abs/2402.11295

  • 18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration, https://arxiv.org/abs/2402.11550

  • 19 Feb, Reformatted Alignment, https://arxiv.org/abs/2402.12219

  • 19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling, https://arxiv.org/abs/2402.12226

  • 19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs, https://arxiv.org/abs/2402.12030

  • 19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models, https://arxiv.org/abs/2402.12354

  • 20 Feb, Neural Network Diffusion, https://arxiv.org/abs/2402.13144

  • 21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, https://arxiv.org/abs/2402.13616

  • 21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, https://arxiv.org/abs/2402.13753

  • 21 Feb, Large Language Models for Data Annotation: A Survey, https://arxiv.org/abs/2402.13446

  • 22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models, https://arxiv.org/abs/2402.14289

  • 22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, https://arxiv.org/abs/2402.14740

  • 23 Feb, Genie: Generative Interactive Environments, https://arxiv.org/abs/2402.15391

  • 27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, https://arxiv.org/abs/2402.17764

  • 27 Feb, Sora Generates Videos with Stunning Geometrical Consistency, https://arxiv.org/abs/2402.17403

  • 27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, https://arxiv.org/abs/2402.17193

  • 29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, https://arxiv.org/abs/2402.19427

  • 1 Mar, Learning and Leveraging World Models in Visual Representation Learning, https://arxiv.org/abs/2403.00504

  • 3 Mar, Improving LLM Code Generation with Grammar Augmentation, https://arxiv.org/abs/2403.01632

  • 3 Mar, The Hidden Attention of Mamba Models, https://arxiv.org/abs/2403.01590

  • 4 Mar, Training-Free Pretrained Model Merging, https://arxiv.org/abs/2403.01753

  • 4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures, https://arxiv.org/abs/2403.02308

  • 5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, https://arxiv.org/abs/2403.03218

  • 5 Mar, Evolution Transformer: In-Context Evolutionary Optimization, https://arxiv.org/abs/2403.02985

  • 5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions, https://arxiv.org/abs/2403.03346

  • 5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, https://arxiv.org/abs/2403.03206

  • 5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering?, https://arxiv.org/abs/2403.03163

  • 6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect, https://arxiv.org/abs/2403.03853

  • 6 Mar, Backtracing: Retrieving the Cause of the Query, https://arxiv.org/abs/2403.03956

  • 6 Mar, Learning to Decode Collaboratively with Multiple Language Models, https://arxiv.org/abs/2403.03870

  • 6 Mar, SaulLM-7B: A pioneering Large Language Model for Law, https://arxiv.org/abs/2403.03883

  • 6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning, https://arxiv.org/abs/2403.03864

  • 6 Mar, 3D Diffusion Policy, https://arxiv.org/abs/2403.03954

  • 6 Mar, MedMamba: Vision Mamba for Medical Image Classification, https://arxiv.org/abs/2403.03849

  • 6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, https://arxiv.org/abs/2403.03507

  • 6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL, https://arxiv.org/abs/2403.03950

  • 7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning?, https://arxiv.org/abs/2403.04732

  • 7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities, https://arxiv.org/abs/2403.04706

  • 8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context, https://arxiv.org/abs/2403.05530

  • 8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity?, https://arxiv.org/abs/2403.05440

  • 8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models, https://arxiv.org/abs/2403.05286

  • 9 Mar, Algorithmic Progress in Language Models, https://arxiv.org/abs/2403.05812

  • 11 Mar, Stealing Part of a Production Language Model, https://arxiv.org/abs/2403.06634

  • 12 Mar, Chronos: Learning the Language of Time Series, https://arxiv.org/abs/2403.07815

  • 13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models, https://arxiv.org/abs/2403.08763

  • 13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks, https://arxiv.org/abs/2403.08540

  • 14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences, https://arxiv.org/abs/2403.09347

  • 14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan, https://arxiv.org/abs/2403.09338

  • 14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface, https://arxiv.org/abs/2403.09394

  • 14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training, https://arxiv.org/abs/2403.09611

  • 15 Mar, RAFT: Adapting Language Model to Domain Specific RAG, https://arxiv.org/abs/2403.10131

  • 18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models, https://arxiv.org/abs/2403.12173

  • 18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression, https://arxiv.org/abs/2403.15447

  • 19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2403.10704

  • 20 Mar, RewardBench: Evaluating Reward Models for Language Modeling, https://arxiv.org/abs/2403.13787

  • 20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, https://arxiv.org/abs/2403.13372

  • 21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese, https://arxiv.org/abs/2403.15484

  • 22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series, https://arxiv.org/abs/2403.15360

  • 22 Mar, Can Large Language Models Explore In-Context?, https://arxiv.org/abs/2403.15371

  • 22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement, https://arxiv.org/abs/2403.15042

  • 25 Mar, LLM Agent Operating System, https://arxiv.org/abs/2403.16971

  • 26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers, https://arxiv.org/abs/2403.17887

  • 27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text, https://arxiv.org/abs/2403.18421

  • 27 Mar, ViTAR: Vision Transformer with Any Resolution, https://arxiv.org/abs/2403.18361

  • 27 Mar, Long-form Factuality in Large Language Models, https://arxiv.org/abs/2403.18802

  • 27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models, https://arxiv.org/abs/2403.18814

  • 26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning, https://arxiv.org/abs/2403.17919

  • 26 Mar, Mechanistic Design and Scaling of Hybrid Architectures, https://arxiv.org/abs/2403.17844

  • 28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions, https://arxiv.org/abs/2403.19651

  • 28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models, https://arxiv.org/abs/2403.19522

  • 1 Apr, Do Language Models Plan Ahead for Future Tokens?, https://arxiv.org/abs/2404.00859

  • 1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models, https://arxiv.org/abs/2404.01367

  • 1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis, https://arxiv.org/abs/2404.01204

  • 1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models, https://arxiv.org/abs/2404.04478

  • 2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models, https://arxiv.org/abs/2404.02258

  • 2 Apr, Long-context LLMs Struggle with Long In-context Learning, https://arxiv.org/abs/2404.02060

  • 2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models, https://arxiv.org/abs/2404.02204

  • 2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, https://arxiv.org/abs/2404.02151

  • 3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation, https://arxiv.org/abs/2404.02883

  • 3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, https://arxiv.org/abs/2404.02827

  • 3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models, https://arxiv.org/abs/2404.02747

  • 4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences, https://arxiv.org/abs/2404.02151

  • 4 Apr, Training LLMs over Neurally Compressed Text, https://arxiv.org/abs/2404.03626

  • 4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues, https://arxiv.org/abs/2404.03820

  • 5 Apr, ReFT: Representation Finetuning for Language Models, https://arxiv.org/abs/2404.03592

  • 5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data, https://arxiv.org/abs/2404.03862

  • 5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation, https://arxiv.org/abs/2404.04256

  • 8 Apr, AutoCodeRover: Autonomous Program Improvement, https://arxiv.org/abs/2404.05427

  • 8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, https://arxiv.org/abs/2404.05892

  • 8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data, https://arxiv.org/abs/2404.05875

  • 9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies, https://arxiv.org/abs/2404.06395

  • 9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models, https://arxiv.org/abs/2404.06209

  • 9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, https://arxiv.org/abs/2404.05961

  • 10 Apr, Adapting LLaMA Decoder to Vision Transformer, https://arxiv.org/abs/2404.06773

  • 10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, https://arxiv.org/abs/2404.07143

  • 11 Apr, LLoCO: Learning Long Contexts Offline, https://arxiv.org/abs/2404.07979

  • 11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars, https://arxiv.org/abs/2404.07413

  • 11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models, https://arxiv.org/abs/2404.07503

  • 11 Apr, Rho-1: Not All Tokens Are What You Need, https://arxiv.org/abs/2404.07965

  • 12 Apr, Pre-training Small Base LMs with Fewer Tokens, https://arxiv.org/abs/2404.08634

  • 12 Apr, Dataset Reset Policy Optimization for RLHF, https://arxiv.org/abs/2404.08495

  • 13 Apr, LLM In-Context Recall is Prompt Dependent, https://arxiv.org/abs/2404.08865

  • 15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey, https://arxiv.org/abs/2404.09516

  • 15 Apr, Chinchilla Scaling: A Replication Attempt, https://arxiv.org/abs/2404.10102

  • 15 Apr, Learn Your Reference Model for Real Good Alignment, https://arxiv.org/abs/2404.09656

  • 16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study, https://arxiv.org/abs/2404.10719

  • 16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies, https://arxiv.org/abs/2404.08197

  • 16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs’ Internal Prior, https://arxiv.org/abs/2404.10198

  • 17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models, https://arxiv.org/abs/2404.10981

  • 18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes, https://arxiv.org/abs/2404.12365

  • 18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing, https://arxiv.org/abs/2404.12253

  • 18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data, https://arxiv.org/abs/2404.12195

  • 19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions, https://arxiv.org/abs/2404.13208

  • 22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study, https://arxiv.org/abs/2404.14047

  • 22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, https://arxiv.org/abs/2404.14219

  • 22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, https://arxiv.org/abs/2404.14619

  • 22 Apr, A Survey on Self-Evolution of Large Language Models, https://arxiv.org/abs/2404.14662

  • 23 Apr, Multi-Head Mixture-of-Experts, https://arxiv.org/abs/2404.15045

  • 23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution, https://arxiv.org/abs/2404.14662

  • 23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs), https://arxiv.org/abs/2404.14928

  • 24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality, https://arxiv.org/abs/2404.15574

  • 25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding, https://arxiv.org/abs/2404.16710

  • 25 Apr, Make Your LLM Fully Utilize the Context, https://arxiv.org/abs/2404.16811

  • 28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report, https://arxiv.org/abs/2405.00732

  • 30 Apr, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737

  • 30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543

  • 30 Apr, A Primer on the Inner Workings of Transformer-based Language Models, https://arxiv.org/abs/2405.00208

  • 30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively, https://arxiv.org/abs/2404.19705

  • 30 Apr, KAN: Kolmogorov–Arnold Networks, https://arxiv.org/abs/2404.19756

  • 1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3, https://arxiv.org/abs/2405.00664

  • 1 May, Self-Play Preference Optimization for Language Model Alignment, https://arxiv.org/abs/2405.00675

  • 1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic, https://arxiv.org/abs/2405.00332

  • 2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, https://arxiv.org/abs/2405.01535

  • 3 May, What Matters When Building Vision-Language Models?, https://arxiv.org/abs/2405.02246

  • 5 May, Is Flash Attention Stable?, https://arxiv.org/abs/2405.02803

  • 7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention, https://arxiv.org/abs/2405.04437

  • 7 May, xLSTM: Extended Long Short-Term Memory, https://arxiv.org/abs/2405.04517

  • 8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models, https://arxiv.org/abs/2405.05254

  • 8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, https://arxiv.org/abs/2405.04434

  • 8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models, https://arxiv.org/abs/2405.05417

  • 9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, https://arxiv.org/abs/2405.05904

  • 10 May, Value Augmented Sampling for Language Model Alignment and Personalization, https://arxiv.org/abs/2405.06639

  • 12 May, PHUDGE: Phi-3 as Scalable Judge, https://arxiv.org/abs/2405.08029

  • 13 May, RLHF Workflow: From Reward Modeling to Online RLHF, https://arxiv.org/abs/2405.07863

  • 15 May, LoRA Learns Less and Forgets Less, https://arxiv.org/abs/2405.09673

  • 15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model, https://arxiv.org/abs/2405.09215

  • 16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models, https://arxiv.org/abs/2405.09818

  • 17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs, https://arxiv.org/abs/2405.11157

  • 19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization, https://arxiv.org/abs/2405.11582

  • 20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2405.12130

  • 22 May, Attention as an RNN, https://arxiv.org/abs/2405.13956

  • 22 May, Dense Connector for MLLMs, https://arxiv.org/abs/2405.13800

  • 23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability, https://arxiv.org/abs/2405.14129

  • 23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward, https://arxiv.org/abs/2405.14734

  • 23 May, Instruction Tuning With Loss Over Instructions, https://arxiv.org/abs/2405.14394

  • 24 May, The Road Less Scheduled, https://arxiv.org/abs/2405.15682

  • 26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training, https://arxiv.org/abs/2405.15319

  • 26 May, gzip Predicts Data-dependent Scaling Laws, https://arxiv.org/abs/2405.16684

  • 27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning, https://arxiv.org/abs/2405.17258

  • 28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections, https://arxiv.org/abs/2405.17991

  • 28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models, https://arxiv.org/abs/2405.18377

  • 29 May, Contextual Position Encoding: Learning to Count What’s Important, https://arxiv.org/abs/2405.18719

  • 2 Jun, Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback, https://arxiv.org/abs/2406.00888

  • 3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models, https://arxiv.org/abs/2406.06563

  • 3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2406.01775

  • 3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models, https://arxiv.org/abs/2406.01506

  • 3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey, https://arxiv.org/abs/2406.01252

  • 4 Jun, Scalable MatMul-free Language Modeling, https://arxiv.org/abs/2406.02528

  • 4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference, https://arxiv.org/abs/2406.02657

  • 6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, https://arxiv.org/abs/2406.04271

  • 6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques, https://arxiv.org/abs/2406.06608

  • 6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks, https://arxiv.org/abs/2406.04267

  • 6 Jun, Are We Done with MMLU?, https://arxiv.org/abs/2406.04127

  • 6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step, https://arxiv.org/abs/2406.04314

  • 7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach, https://arxiv.org/abs/2406.04594

  • 7 Jun, CRAG — Comprehensive RAG Benchmark, https://arxiv.org/abs/2406.04744

  • 7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild, https://arxiv.org/abs/2406.04770

  • 7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities, https://arxiv.org/abs/2406.04692

  • 7 Jun, BERTs are Generative In-Context Learners, https://arxiv.org/abs/2406.04823

  • 7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination, https://arxiv.org/abs/2406.05132

  • 8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models, https://arxiv.org/abs/2406.05587

  • 10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, https://arxiv.org/abs/2406.06525

  • 10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference, https://arxiv.org/abs/2406.06424

  • 10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning, https://arxiv.org/abs/2406.06469

  • 10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters, https://arxiv.org/abs/2406.05955

  • 10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching, https://arxiv.org/abs/2406.06326

  • 11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation, https://arxiv.org/abs/2406.07550

  • 11 Jun, TextGrad: Automatic « Differentiation » via Text, https://arxiv.org/abs/2406.07496

  • 11 Jun, Simple and Effective Masked Diffusion Language Models, https://arxiv.org/abs/2406.07524

  • 11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent « Middle » Enhancement, https://arxiv.org/abs/2406.07138

  • 11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling, https://arxiv.org/abs/2406.07522

  • 12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing, https://arxiv.org/abs/2406.08464

  • 12 Jun, What If We Recaption Billions of Web Images with LLaMA-3?, https://arxiv.org/abs/2406.08478

  • 12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts, https://arxiv.org/abs/2406.07933

  • 12 Jun, Large Language Models Must Be Taught to Know What They Don’t Know, https://arxiv.org/abs/2406.08391

  • 12 Jun, An Empirical Study of Mamba-based Language Models, https://arxiv.org/abs/2406.07887

  • 12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models, https://arxiv.org/abs/2406.08414

  • 13 Jun, Transformers Meet Neural Algorithmic Reasoners, https://arxiv.org/abs/2406.09308

  • 13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding, https://arxiv.org/abs/2406.09297

  • 13 Jun, An Image is Worth More Than 16×16 Patches: Exploring Transformers on Individual Pixels, https://arxiv.org/abs/2406.09415

  • 13 Jun, FouRA: Fourier Low Rank Adaptation, https://arxiv.org/abs/2406.08798

  • 14 Jun, Bootstrapping Language Models with DPO Implicit Rewards, https://arxiv.org/abs/2406.09760

  • 14 Jun, Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs, https://arxiv.org/abs/2406.10209

  • 14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs, https://arxiv.org/abs/2406.10216

  • 16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation, https://arxiv.org/abs/2406.10996

  • 17 Jun, Task Me Anything, https://arxiv.org/abs/2406.11775

  • 17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining?, https://arxiv.org/abs/2406.11813

  • 17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models, https://arxiv.org/abs/2406.11839

  • 17 Jun, Nemotron-4 340B Technical Report, https://arxiv.org/abs/2406.11704

  • 17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models, https://arxiv.org/abs/2406.11794

  • 17 Jun, Tokenization Falling Short: The Curse of Tokenization, https://arxiv.org/abs/2406.11687

  • 17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence, https://arxiv.org/abs/2406.11931

  • 17 Jun, Unveiling Encoder-Free Vision-Language Models, https://arxiv.org/abs/2406.11832

  • 17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level, https://arxiv.org/abs/2406.11817

  • 17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency, https://arxiv.org/abs/2406.11410

  • 17 Jun, Measuring memorization in RLHF for code completion, https://arxiv.org/abs/2406.11715

  • 17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts, https://arxiv.org/abs/2406.12034

  • 18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries, https://arxiv.org/abs/2406.12824

  • 18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges, https://arxiv.org/abs/2406.12624

  • 19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?, https://arxiv.org/abs/2406.13121

  • 20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners, https://arxiv.org/abs/2406.14491

  • 20 Jun, Can LLMs Learn by Teaching? A Preliminary Study, https://arxiv.org/abs/2406.14629

  • 21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems, https://arxiv.org/abs/2406.14972

  • 21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, https://arxiv.org/abs/2406.15319

  • 21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression, https://arxiv.org/abs/2406.14909

  • 21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap, https://arxiv.org/abs/2406.14833

  • 24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers, https://arxiv.org/abs/2406.16747

  • 24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies, https://arxiv.org/abs/2406.16768

  • 24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More, https://arxiv.org/abs/2406.16793

  • 25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, https://arxiv.org/abs/2406.17557

  • 25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs, https://arxiv.org/abs/2406.17588

  • 25 Jun, Following Length Constraints in Instructions, https://arxiv.org/abs/2406.17744

  • 26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models, https://arxiv.org/abs/2406.18219

  • 26 Jun, RouteLLM: Learning to Route LLMs with Preference Data, https://arxiv.org/abs/2406.18665

  • 26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs, https://arxiv.org/abs/2406.18629

  • 27 Jun, Dataset Size Recovery from LoRA Weights, https://arxiv.org/abs/2406.19395

  • 27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data, https://arxiv.org/abs/2406.19292

  • 27 Jun, Changing Answer Order Can Decrease MMLU Accuracy, https://arxiv.org/abs/2406.19470

  • 28 Jun, Direct Preference Knowledge Distillation for Large Language Models, https://arxiv.org/abs/2406.19774

  • 28 Jun, LLM Critics Help Catch LLM Bugs, https://arxiv.org/abs/2407.00215

  • 28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas, https://arxiv.org/abs/2406.20094

  • 1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives, https://arxiv.org/abs/2407.01490

  • 1 Jul, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219

  • 1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models, https://arxiv.org/abs/2407.01906

  • 1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, https://arxiv.org/abs/2407.01392

  • 1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach, https://arxiv.org/abs/2407.01100

  • 2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, https://arxiv.org/abs/2407.02490

  • 2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM, https://arxiv.org/abs/2407.02392

  • 2 Jul, Reasoning in Large Language Models: A Geometric Perspective, https://arxiv.org/abs/2407.02678

  • 2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs, https://arxiv.org/abs/2407.02485

  • 3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows, https://arxiv.org/abs/2407.03502

  • 3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models, https://arxiv.org/abs/2407.03418

  • 4 Jul, Mixture of A Million Experts, https://arxiv.org/abs/2407.04153

  • 5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States, https://arxiv.org/abs/2407.04620

  • 9 Jul, Vision Language Models Are Blind, https://arxiv.org/abs/2407.06581

  • 9 Jul, Self-Recognition in Language Models, https://arxiv.org/abs/2407.06946

  • 10 Jul, Inference Performance Optimization for Large Language Models on CPUs, https://arxiv.org/abs/2407.07304

  • 11 Jul, Gradient Boosting Reinforcement Learning, https://arxiv.org/abs/2407.08250

  • 11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, https://arxiv.org/abs/2407.08608

  • 12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models, https://arxiv.org/abs/2407.09025

  • 12 Jul, New Desiderata for Direct Preference Optimization, https://arxiv.org/abs/2407.09072

  • 12 Jul, Context Embeddings for Efficient Answer Generation in RAG, https://arxiv.org/abs/2407.09252

  • 15 Jul, Qwen2 Technical Report, https://arxiv.org/abs/2407.10671

  • 15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism, https://arxiv.org/abs/2407.10457

  • 15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients, https://arxiv.org/abs/2407.11239

  • 16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression, https://arxiv.org/abs/2407.12077

  • 16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters, https://arxiv.org/abs/2407.11633

  • 16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?, https://arxiv.org/abs/2407.11963

  • 17 Jul, Patch-Level Training for Large Language Models, https://arxiv.org/abs/2407.12665

  • 17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models, https://arxiv.org/abs/2407.12772

  • 17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks, https://arxiv.org/abs/2407.12994

  • 17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models, https://arxiv.org/abs/2407.12327

  • 18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation, https://arxiv.org/abs/2407.13481

  • 18 Jul, Weak-to-Strong Reasoning, https://arxiv.org/abs/2407.13647

  • 18 Jul, Understanding Reference Policies in Direct Preference Optimization, https://arxiv.org/abs/2407.13709

  • 18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623

  • 19 Jul, BOND: Aligning LLMs with Best-of-N Distillation, https://arxiv.org/abs/2407.14622

  • 19 Jul, Compact Language Models via Pruning and Knowledge Distillation, https://arxiv.org/abs/2407.14679

  • 19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference, https://arxiv.org/abs/2407.14057

  • 22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training, https://arxiv.org/abs/2407.15892

  • 22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models, https://arxiv.org/abs/2407.16154

  • 23 Jul, Generation Constraint Scaling Can Mitigate Hallucination, https://arxiv.org/abs/2407.16908

  • 23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833

  • 23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences, https://arxiv.org/abs/2407.16637

  • 26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?, https://arxiv.org/abs/2407.16607

  • 28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge, https://arxiv.org/abs/2407.19594

  • 29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning, https://arxiv.org/abs/2407.19813

  • 29 Jul, Apple Intelligence Foundation Language Models, https://arxiv.org/abs/2407.21075

  • 30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning, https://arxiv.org/abs/2407.21018

  • 31 Jul, The Llama 3 Herd of Models, https://arxiv.org/abs/2407.21783

  • 31 Jul, Gemma 2: Improving Open Language Models at a Practical Size, https://arxiv.org/abs/2408.00118

  • 1 Aug, SAM 2: Segment Anything in Images and Videos, https://arxiv.org/abs/2408.00714

  • 2 Aug, POA: Pre-training Once for Models of All Sizes, https://arxiv.org/abs/2408.01031

  • 2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262

  • 2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129

  • 3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800

  • 5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545

  • 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666

  • 5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600

  • 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666

  • 7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model, https://arxiv.org/abs/2408.03541

  • 7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months — Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506

  • 8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560

  • 8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303

  • 12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292

  • 15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570

  • 19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159

  • 20 Aug, Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615

  • 20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training, https://arxiv.org/abs/2408.10914

  • 21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796

  • 22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570

  • 22 Aug, Controllable Text Generation for Large Language Models: A Survey, https://arxiv.org/abs/2408.12599

  • 23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233

  • 26 Aug, A Practitioner’s Guide to Continual Multimodal Pretraining, https://arxiv.org/abs/2408.14471

  • 26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637

  • 26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572

  • 27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models, https://arxiv.org/abs/2408.15237

  • 28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496

  • 29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737

  • 31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509

  • 1 Oct, Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907

  • 2 Oct Quantifying Generalization Complexity for Large Language Models, https://arxiv.org/abs/2410.01769

  • 2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1, https://arxiv.org/abs/2410.01792

  • 2 Oct, Were RNNs All We Needed?, https://arxiv.org/abs/2410.01201

  • 3 Oct, Selective Attention Improves Transformer, https://arxiv.org/abs/2410.02703

  • 3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, https://arxiv.org/abs/2410.02707

  • 3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models, https://arxiv.org/abs/2410.02712

  • 7 Oct, Differential Transformer, https://arxiv.org/abs/2410.05258

  • 7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229

  • 8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model, https://arxiv.org/abs/2410.05993

  • 8 Oct, O1 Replication Journey: A Strategic Progress Report — Part 1, https://arxiv.org/abs/2410.18982

  • 8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983

  • 9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning, https://arxiv.org/abs/2410.06456

  • 10 Oct, KV Prediction for Improved Time to First Token, https://arxiv.org/abs/2410.08391

  • 11 Oct, Baichuan-Omni Technical Report, https://arxiv.org/abs/2410.08565

  • 13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models, https://arxiv.org/abs/2410.10139

  • 13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models, https://arxiv.org/abs/2410.09732

  • 15 Oct, AFlow: Automating Agentic Workflow Generation, https://arxiv.org/abs/2410.10762

  • 15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation, https://arxiv.org/abs/2410.09584

  • 21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration, https://arxiv.org/abs/2410.16215

  • 23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models, https://arxiv.org/abs/2410.17637

  • 23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation, https://arxiv.org/abs/2410.18013

  • 23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models, https://arxiv.org/abs/2410.17891

  • 24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback, https://arxiv.org/abs/2410.19133

  • 25 Oct, Counting Ability of Large Language Models and Impact of Tokenization, https://arxiv.org/abs/2410.19730

  • 25 Oct, A Survey of Small Language Models, https://arxiv.org/abs/2410.20011

  • 26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing, https://arxiv.org/abs/2410.20305

  • 27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse, https://arxiv.org/abs/2410.21333

  • 28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback, https://arxiv.org/abs/2410.21252

  • 28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference, https://arxiv.org/abs/2410.21465

  • 29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications, https://arxiv.org/abs/2410.21943

  • 30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation, https://arxiv.org/abs/2410.23090

  • 31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective, https://arxiv.org/abs/2410.23743

  • 31 Oct, GPT or BERT: why not both?, https://arxiv.org/abs/2410.24159

  • 31 Oct, Language Models can Self-Lengthen to Generate Long Texts, https://arxiv.org/abs/2410.23933

  • 1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations, https://arxiv.org/abs/2411.00640

  • 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412

  • 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models, https://arxiv.org/abs/2411.00492

  • 3 Nov, Sample-Efficient Alignment for LLMs, https://arxiv.org/abs/2411.01493

  • 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350

  • 4 Nov, « Give Me BF16 or Give Me Death »? Accuracy-Performance Trade-Offs in LLM Quantization, https://arxiv.org/abs/2411.02355

  • 4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study, https://arxiv.org/abs/2411.02462

  • 5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, https://arxiv.org/abs/2411.02959

  • 6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination, https://arxiv.org/abs/2411.03823

  • 6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding, https://arxiv.org/abs/2411.04282

  • 6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It, https://arxiv.org/abs/2411.03766

  • 7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models, https://arxiv.org/abs/2411.04996

  • 7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs, https://arxiv.org/abs/2411.04965

  • 7 Nov, Scaling Laws for Precision, https://arxiv.org/abs/2411.04330

  • 8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation, https://arxiv.org/abs/2411.05966

  • 8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism, https://arxiv.org/abs/2411.05288

  • 11 Nov, Toward Optimal Search and Retrieval for RAG, https://arxiv.org/abs/2411.07396

  • 12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning, https://arxiv.org/abs/2411.08147

  • 12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning, https://arxiv.org/abs/2411.07133

  • 12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints, https://arxiv.org/abs/2411.07618

  • 13 Nov, Cut Your Losses in Large-Vocabulary Language Models, https://arxiv.org/abs/2411.09009

  • 15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance?, https://arxiv.org/abs/2411.10541

  • 17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization, https://arxiv.org/abs/2411.11909

  • 17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration, https://arxiv.org/abs/2411.10958

  • 18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models, https://arxiv.org/abs/2411.11843

  • 19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372

  • 20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models, https://arxiv.org/abs/2411.13676

  • 20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets, https://arxiv.org/abs/2411.12925

  • 21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training, https://arxiv.org/abs/2411.13476

  • 21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders, https://arxiv.org/abs/2411.14402

  • 21 Nov, Natural Language Reinforcement Learning, https://arxiv.org/abs/2411.14251

  • 22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models, https://arxiv.org/abs/2411.14982

  • 23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs, https://arxiv.org/abs/2411.15296

  • 23 Nov, TÜLU 3: Pushing Frontiers in Open Language Model Post-Training, https://arxiv.org/abs/2411.15124

  • 24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning, https://arxiv.org/abs/2411.15862

  • In progress…

  • This magazine is a personal passion project that does not offer direct compensation. However, for those who wish to support me, please consider purchasing a copy of my Build a Large Language Model (From Scratch) book. (I am confident that you’ll get lots out of this book as it explains how LLMs work in a level of detail that is not found anywhere else.)

    If you read the book and have a few minutes to spare, I’d really appreciate a brief review. It helps us authors a lot!


    Source link

    Leave a Reply

    Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *