The BGE Reranker v2 M3, developed by BAAI, is a robust model designed for text reranking tasks, specifically within the realm of text classification. With just over 568 million parameters, this model leverages the xlm-roberta architecture to efficiently process and refine text-based queries, making it particularly adept at improving the relevance and quality of search results or document rankings. The model's context length of 8192 tokens allows it to handle longer documents and more complex queries, which is a significant advantage in scenarios where context is crucial.
Despite its relatively modest size, the BGE Reranker v2 M3 punches well above its weight, offering performance that rivals larger models while maintaining high efficiency. This makes it an excellent choice for users who need a powerful yet lightweight solution for text reranking tasks. The model is available in FP16 quantization, requiring only 1.6 GB of VRAM, which means it can be deployed on a wide range of hardware, including laptops and mid-range desktops. Ideal use cases include enhancing search engines, improving document retrieval systems, and refining content recommendation algorithms. Users looking for a balance between performance and resource efficiency will find this model particularly useful.
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| FP16 | 16 | 1.08 GB | 1.58 GB | 2.08 GB | 98% |
How to run BGE Reranker v2 M3
Pick a runtime — copy & paste. Commands are pre-filled with this model’s repo.
Python — same API as BERT/MiniLM models.
Sentence-Transformers home →- 1
Install
pip install sentence-transformers - 2
Run
from sentence_transformers import SentenceTransformer m = SentenceTransformer("BAAI/bge-reranker-v2-m3") v = m.encode(["hello world"])
Community benchmarks
Real tokens/sec reports from people running BGE Reranker v2 M3 on actual hardware.
No community runs yet for this model. Be the first to submit your numbers.
how much VRAM do I need to run BGE Reranker v2 M3?
BGE Reranker v2 M3 requires 1.58 GB VRAM minimum with FP16 quantization. For full precision you need 1.58 GB.
which quant should I pick?
Q4_K_M is the best quality/VRAM balance — ~92% of FP16 quality at ~25% the footprint. Q8_0 is near-lossless if you have the headroom.
What GPU do I need to run BGE Reranker v2 M3?
To run BGE Reranker v2 M3, you need a GPU with at least 1.6 GB of VRAM. This is the minimum requirement for both the 4-bit and 8-bit quantized versions.
Is BGE Reranker v2 M3 good for coding?
BGE Reranker v2 M3 is primarily designed for multilingual text reranking and may not be optimized for coding tasks. For coding, models specifically trained on code datasets are more suitable.
BGE Reranker v2 M3 vs Llama 3.1 8B?
BGE Reranker v2 M3 has 0.568 billion parameters, making it significantly smaller than Llama 3.1 8B. While BGE Reranker v2 M3 excels in multilingual reranking, Llama 3.1 8B offers broader capabilities and better performance on a wider range of tasks.
Can I run BGE Reranker v2 M3 on a Mac?
Yes, you can run BGE Reranker v2 M3 on a Mac as long as your Mac meets the minimum hardware requirements, including having a GPU with at least 1.6 GB of VRAM.
How much VRAM does BGE Reranker v2 M3 need?
BGE Reranker v2 M3 requires 1.6 GB of VRAM, regardless of the quantization level used.
Is BGE Reranker v2 M3 censored?
BGE Reranker v2 M3 is not explicitly censored. However, it adheres to ethical guidelines and may filter out inappropriate content during training.
Is BGE Reranker v2 M3 commercial-use allowed?
Yes, BGE Reranker v2 M3 is released under the MIT license, which allows for commercial use without restrictions.
BGE Reranker v2 M3 context length?
BGE Reranker v2 M3 supports a context length of up to 8192 tokens, making it suitable for handling long documents and complex queries.
Does BGE Reranker v2 M3 support function calling?
BGE Reranker v2 M3 does not natively support function calling. It is primarily designed for text reranking tasks.
BGE Reranker v2 M3 quantization options?
BGE Reranker v2 M3 supports 4-bit and 8-bit quantization, allowing for efficient memory usage and faster inference times.
Can BGE Reranker v2 M3 run on CPU?
While BGE Reranker v2 M3 can run on a CPU, it is significantly slower compared to running on a GPU. For optimal performance, a GPU is recommended.
BGE Reranker v2 M3 fine-tuning?
BGE Reranker v2 M3 can be fine-tuned on specific datasets to improve its performance on particular tasks or domains.
BGE Reranker v2 M3 system requirements?
BGE Reranker v2 M3 requires a GPU with at least 1.6 GB of VRAM, 8 GB of RAM, and a modern CPU. It also needs a Python environment and relevant libraries installed.
BGE Reranker v2 M3 performance benchmark?
BGE Reranker v2 M3 processes approximately 100 tokens per second on a mid-range GPU. Performance can vary based on hardware and quantization level.
BGE Reranker v2 M3 for RAG?
BGE Reranker v2 M3 can be used as part of a Retrieval-Augmented Generation (RAG) pipeline to improve the quality of retrieved documents and generate more accurate responses.
BGE Reranker v2 M3 for agents?
BGE Reranker v2 M3 can be integrated into conversational agents to enhance their ability to rank and select the most relevant responses from a set of candidates.
BGE Reranker v2 M3 for coding vs general?
BGE Reranker v2 M3 is better suited for general text reranking tasks rather than coding-specific tasks. For coding, consider models trained on code datasets like CodeParrot or Codex.
BGE Reranker v2 M3 vs ChatGPT?
BGE Reranker v2 M3 is specialized for multilingual text reranking, while ChatGPT is a general-purpose language model. ChatGPT is better for generating coherent text and handling a wide range of tasks.
BGE Reranker v2 M3 download size?
The download size for BGE Reranker v2 M3 is approximately 1.1 GB, including the model weights and necessary files.
Best quant for BGE Reranker v2 M3?
The best quantization for BGE Reranker v2 M3 depends on your use case. 4-bit quantization offers better memory efficiency, while 8-bit provides a balance between performance and accuracy.