Ben Burtenshaw’s Post

NLP Engineer

3w Edited

ORPO seems to be a streamlined way of aligning language models right now. It's real handy because benchmarks are showing that you can get improved results over SFT plus DPO in just one step. The main advantages of ORPO are: 📊 Mistral ORPO gets 66.19 on IFEval. 🦄 ORPO one method that replaces SFT+DPO/PPO's two. 🏆 ORPO outperforms SFT+DPO on PHI-2, Llama 2, Qwen, and Mistral 🚈 It doesn't require the reference model so uses less memory. This post by Argilla and MantisNLP is a concise overview of the method to bootstrap your experiments. Some more references are in the comments, like the original repo, paper, and a minimal implementation for reduced compute.

Argilla

7,516 followers

4w Edited

🔎 What if you only need a base model for preference alignment? Welcome to ORPO, summarized in our latest blog with MantisNLP. ORPO eliminates the complexity of multi-stage RLHF and leverages the power of an odds ratio-based loss approach. The result? Better performance and significant savings in resources and memory! 🌟 And the cherry on top? To showcase this, one of our favorite datasets, Argilla's UltraFeedback Binarized, was used.

RLHF and alternatives: ORPO

argilla.io

4 Comments

Ben Burtenshaw

NLP Engineer

I would also take a look at the paper (https://lnkd.in/eMZjKseK) and repo (https://lnkd.in/e4966VfT) by the ORPO authors.

2 Reactions

Ben Burtenshaw

NLP Engineer

And this post from last week where I shared a quickstart notebook for using ORPO a limited setup (https://lnkd.in/enXY47rB).

2 Reactions

Christopher Williams

Quantitative Analyst | Data Driven Investor

Sounds interesting... but I am still too scared to click on the image.

2 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

MantisNLP

4,133 followers
3w
Report this post
🔥 Want to know more about a new technique that combines instruction tuning and reinforcement learning into one step to aligning LLMs to follow the behaviour you want? Read our latest blog with Argilla 👇

Argilla

7,516 followers
4w Edited

🔎 What if you only need a base model for preference alignment? Welcome to ORPO, summarized in our latest blog with MantisNLP. ORPO eliminates the complexity of multi-stage RLHF and leverages the power of an odds ratio-based loss approach. The result? Better performance and significant savings in resources and memory! 🌟 And the cherry on top? To showcase this, one of our favorite datasets, Argilla's UltraFeedback Binarized, was used.

RLHF and alternatives: ORPO

argilla.io
Like Comment
To view or add a comment, sign in
Argilla

7,516 followers
3w Edited
Report this post
🔎 What if you only need a base model for preference alignment? Welcome to ORPO, summarized in our latest blog with MantisNLP. ORPO eliminates the complexity of multi-stage RLHF and leverages the power of an odds ratio-based loss approach. The result? Better performance and significant savings in resources and memory! 🌟 And the cherry on top? To showcase this, one of our favorite datasets, Argilla's UltraFeedback Binarized, was used.

RLHF and alternatives: ORPO

argilla.io

2 Comments
Like Comment
To view or add a comment, sign in
Muhammad Abdul-Mageed

Canada Research Chair in Natural Language Processing and Machine Learning, The University of British Columbia
5mo Edited
Report this post
Today we presented our paper “The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages”. Work by Chiyu Zhang, Khi Duy Doan, and Qisheng Liao at #EMNLP2023 Preprint:

2310.14557.pdf

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Max S.
7mo
Report this post
This is a great example of what an application of neurosymbolic principles in practice will look like. The paper it's based on, too, does a great job to treat neurosymbolic principles as an abstract, and one whose implementation is not the same as its concept. The questions raised here are really exciting: what might be gained and lost by representing knowledge as AMR? What would it look like for AMR to be learned automatically, that is, for an AMR-based NeSy engine to be adaptable to new syntactic and semantic forms? Does the AMR of a sentence change based on the domain context that the reader is primed to understand? Or is it objective? https://lnkd.in/gUqsdZN7

Semantic Parsing using Abstract Meaning Representation

medium.com
Like Comment
To view or add a comment, sign in
Yaoqing Yang

Assistant Professor at Dartmouth
9mo
Report this post
Happy to share our work on large language model selection using random matrix theory. Please check the two-minute KDD 2023 video: https://lnkd.in/ePSxcDQX The take-home message is that one can measure the weight matrices of LLM models (without data) to predict their generalizability. Previous papers in this field mostly focus on predicting the "generalization gap", but that metric does not tell much about the quality of LLMs. Our paper shows that generalization metrics motivated by the "heavy-tail self-regularization" theory significantly outperform those focused on the generalization gap. #llm #generalization #randommatrix #KDD2023 Full paper: https://lnkd.in/euDzYDin

KDD 2023 - Test accuracy vs generalization gap: Model selection in NLP

https://www.youtube.com/

2 Comments
Like Comment
To view or add a comment, sign in
Sandeep Ghael

Machine Learning Lead - LLMs and Generative AI @ Spotify
10mo
Report this post
Excellent presentation from Andrej Karpathy from last month. Many insights and intuition. - A fantastic explanation of how Reward Modeling and Reinforcement Learning on LLMs works. - The Retrieval (web search) vs. Memory (LLMs) spectrum. The best systems will be in the middle. - System 1 vs. System 2. LLMs are good at System 1; we’re still learning how to teach them System 2 thinking. - “Mode collapse”: Fine-tuned models lose entropy (and thus diversity that leads to “magic”). - RLHF: we don’t know why it works, but his intuition is because it’s easier to discriminate than generate. - RLHF is still in the research phase. Prefer SFT (supervised fine-tuning) as your default for optimizing. https://lnkd.in/eQmH6c3w

State of GPT | BRK216HFS

https://www.youtube.com/
Like Comment
To view or add a comment, sign in

4,597 followers

35 Posts

View Profile Follow

Ben Burtenshaw’s Post

More Relevant Posts

KDD 2023 - Test accuracy vs generalization gap: Model selection in NLP

https://www.youtube.com/

State of GPT | BRK216HFS

https://www.youtube.com/

Explore topics