Shisa V2 – Shisa.AI

We’re proud to announce Shisa V2, the latest generation of our bilingual Japanese-English language models from Shisa.AI. Over the past few months, our team has been pushing incredibly hard on the frontier of Japanese language training, and today we’re excited to share our results.

TL;DR

Shisa V2 sets new SOTA in Japanese benchmarks across all size classes (7B–70B), improving JA accuracy by up to +32% vs base models. Open weights, commercial-ready, available now on Hugging Face.

Introducing Shisa V2

The Shisa V2 model family sets a new standard for open-weight Japanese language models. With parameter sizes ranging from 7B to 70B, Shisa V2 delivers exceptional performance across the board, achieving state-of-the-art or near state-of-the-art results in Japanese benchmarks at every model size.

License	Model	Parameters	Context Length	JA AVG	EN AVG
Apache 2.0	shisa-v2-qwen2.5-7b	7B	128K/8K	71.06	54.86
Llama 3.1	shisa-v2-llama3.1-8b	8B	128K	70.83	54.75
Apache 2.0	shisa-v2-mistral-nemo-12b	12B	128K	72.83	53.33
MIT	shisa-v2-unphi4-14b	14B	16K	75.89	60.10
Apache 2.0	shisa-v2-qwen2.5-32b	32B	128K/8K	76.97	67.41
Llama 3.3	shisa-v2-llama3.3-70b	70B	128K	79.72	67.71

Consistent Quality, Robust Scaling

While we maintain our focus on improving Japanese language capabilities for smaller models for local and edge applications, we also found that our improved data and training pipeline robustly scaled to larger and more capable models as well.

Recent open-weight LLMs have greatly improved their baseline Japanese capabilities, but we found that there’s still significant value in applying additional training.

Shisa V2	Base Model	Base JA AVG	Shisa V2 JA AVG	Improvement
shisa-v2-qwen2.5-7b	Qwen 2.5 7B Instruct	65.30	71.06	+8.8%
shisa-v2-llama3.1-8b	Llama 3.1 8B Instruct	53.43	70.83	+32.6%
shisa-v2-mistral-nemo-12b	Mistral Nemo Instruct 2407	58.44	72.83	+24.6%
shisa-v2-unphi4-14b	Microsoft Phi-4 (Unsloth)	72.47	75.89	+4.7%
shisa-v2-qwen2.5-32b	Qwen 2.5 32B Instruct	66.79	76.97	+15.3%
shisa-v2-llama3.3-70b	Llama 3.3 70B Instruct	72.75	79.72	+9.6%

Built for Real-world Performance

Shisa V2 models excel not only in traditional metrics but also across new Japanese evaluations we developed focused on important, previously unmeasured, downstream use cases:

shisa-jp-ifeval: - Advanced instruction-following tasks in Japanese
shisa-jp-rp-bench: - Personas, role-play, and multi-turn conversational capabilities
shisa-jp-tl-bench: - High-quality Japanese-English translation proficiency

We will soon open-source these benchmarks to further benefit the broader Japanese LLM community.

Open and Accessible

We’re committed to openness. All the Shisa V2 models are released under the most permissive licenses the base models allow, and we’ve preferred to release Apache 2.0 or MIT licensed models to ensure unrestricted research, experimentation, and commercial use.

For those who are unaware, Japan is somewhat unique in that it has explicit copyright carve-outs for AI training. For more on this, see our writeup from last year on Copyright and AI Training Data in Japan.

Try Shisa V2 Now

The entire Shisa V2 model lineup is now available on Hugging Face (Shisa V2 collection):

UPDATE: Thanks to Jon Durbin and Chutes.ai you can now also try chatting with our best Shisa V2 70B model!

We have also spent considerable time and effort doing many tests on different data and training approaches and will be publishing additional reports and documentation in the coming weeks, so stay tuned for more on how we’re pushing the boundaries of Japanese language modeling.

Compute for training these models were provided by Ubitus K.K. and METI GENIAC.