Shisa V2

japanese
llm
shisa
Author

Leonard Lin

Published

April 14, 2025

We’re proud to announce Shisa V2, the latest generation of our bilingual Japanese-English language models from Shisa.AI. Over the past few months, our team has been pushing incredibly hard on the frontier of Japanese language training, and today we’re excited to share our results.

TL;DR

Shisa V2 sets new SOTA in Japanese benchmarks across all size classes (7B–70B), improving JA accuracy by up to +32% vs base models. Open weights, commercial-ready, available now on Hugging Face.

Introducing Shisa V2

The Shisa V2 model family sets a new standard for open-weight Japanese language models. With parameter sizes ranging from 7B to 70B, Shisa V2 delivers exceptional performance across the board, achieving state-of-the-art or near state-of-the-art results in Japanese benchmarks at every model size.

License Model Parameters Context Length JA AVG EN AVG
Apache 2.0 shisa-v2-qwen2.5-7b 7B 128K/8K 71.06 54.86
Llama 3.1 shisa-v2-llama3.1-8b 8B 128K 70.83 54.75
Apache 2.0 shisa-v2-mistral-nemo-12b 12B 128K 72.83 53.33
MIT shisa-v2-unphi4-14b 14B 16K 75.89 60.10
Apache 2.0 shisa-v2-qwen2.5-32b 32B 128K/8K 76.97 67.41
Llama 3.3 shisa-v2-llama3.3-70b 70B 128K 79.72 67.71

Performance at every size class

Consistent Quality, Robust Scaling

While we maintain our focus on improving Japanese language capabilities for smaller models for local and edge applications, we also found that our improved data and training pipeline robustly scaled to larger and more capable models as well.

Recent open-weight LLMs have greatly improved their baseline Japanese capabilities, but we found that there’s still significant value in applying additional training.

Shisa V2 Improvement vs Base Models
Shisa V2 Base Model Base JA AVG Shisa V2 JA AVG Improvement
shisa-v2-qwen2.5-7b Qwen 2.5 7B Instruct 65.30 71.06 +8.8%
shisa-v2-llama3.1-8b Llama 3.1 8B Instruct 53.43 70.83 +32.6%
shisa-v2-mistral-nemo-12b Mistral Nemo Instruct 2407 58.44 72.83 +24.6%
shisa-v2-unphi4-14b Microsoft Phi-4 (Unsloth) 72.47 75.89 +4.7%
shisa-v2-qwen2.5-32b Qwen 2.5 32B Instruct 66.79 76.97 +15.3%
shisa-v2-llama3.3-70b Llama 3.3 70B Instruct 72.75 79.72 +9.6%

Built for Real-world Performance

Shisa V2 models excel not only in traditional metrics but also across new Japanese evaluations we developed focused on important, previously unmeasured, downstream use cases:

  • shisa-jp-ifeval: - Advanced instruction-following tasks in Japanese
  • shisa-jp-rp-bench: - Personas, role-play, and multi-turn conversational capabilities
  • shisa-jp-tl-bench: - High-quality Japanese-English translation proficiency

We will soon open-source these benchmarks to further benefit the broader Japanese LLM community.

Open and Accessible

We’re committed to openness. All the Shisa V2 models are released under the most permissive licenses the base models allow, and we’ve preferred to release Apache 2.0 or MIT licensed models to ensure unrestricted research, experimentation, and commercial use.

For those who are unaware, Japan is somewhat unique in that it has explicit copyright carve-outs for AI training. For more on this, see our writeup from last year on Copyright and AI Training Data in Japan.

Try Shisa V2 Now

The entire Shisa V2 model lineup is now available on Hugging Face (Shisa V2 collection):

UPDATE: Thanks to Jon Durbin and Chutes.ai you can now also try chatting with our best Shisa V2 70B model!

We have also spent considerable time and effort doing many tests on different data and training approaches and will be publishing additional reports and documentation in the coming weeks, so stay tuned for more on how we’re pushing the boundaries of Japanese language modeling.

Shisa V2

Compute for training these models were provided by Ubitus K.K. and METI GENIAC.