Shisa.AI

JP-TL-Bench: Why Translation Direction Matters (JA↔︎EN)

Unlike most existing translation evals, JP-TL-Bench gives you directional scores: JA→EN vs EN→JA. Is this really a big deal? Why does this matter? Here, we look into real model outputs to show you why it does.

JP-TL-Bench：日英双方向翻訳のためのアンカー付きペアワイズLLM評価

JP-TL-Benchを公開します。どちらも良い翻訳に見えるときに「結局どちらがより良いのか？」に答える、日英翻訳開発向けのオープンベンチマーク。

JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation

Announcing JP-TL-Bench, an open benchmark that finally answers: which of these two good translations is actually better? Built for Japanese-English translation development.

Shisa V2.1: Smaller, Smarter, More Accessible

Announcing Shisa V2.1 - our latest lineup of bilingual Japanese-English models, with improved Japanese language performance at better efficiency, and new capable, edge-friendly models as small as 1.2B in size.

Shisa V2.1：より小さく、賢く、使いやすく

日本語LLMの性能と効率を大幅に向上、API提供も新たに開始

Shisa.AI、国産モデルで最高性能を誇る多言語対応LLMを開発

〜GPT-4を超える日本語性能を実現、本日モデルをオープンソースで公開〜

Shisa V2 405B: Japan’s Highest Performing LLM

We are incredibly excited to announce one more addition to the Shisa V2 family of open-source, SOTA JA/EN bilingual models: Shisa V2 405B.

Qwen 3 Japanese Performance

Similar to our previous Llama 4 Japanese Performance review, here’s an initial one for Alibaba’s latest Qwen 3 release. This is going to be more of a first look/preview, and…

Shisa V2シリーズ正式リリース：日本語タスクに特化した次世代バイリンガルLLMを無料公開

～複数モデルクラスにおいて日本語ベンチマーク最高スコアを達成～

Shisa V2

We’re proud to announce Shisa V2, the latest generation of our bilingual Japanese-English language models from Shisa.AI.

Llama 4 Japanese Performance

Last weekend Meta launched Llama 4, starting with two models: Scout - a 17B active parameter, 16 expert (109B total parameter) model, and Maverick - a 17B active parameter…

1 Million Downloads of shisa-gamma-7b-v1!

Exactly, one year ago, Sakana AI first published Evolving New Foundation Models: Unleashing the Power of Automating Model Development, which used our shisa-gamma-7b-v1 as…

Tuning for Efficient Inferencing with vLLM on MI300X

Over the past couple weeks I’ve been doing testing on an 8 x AMD MI300X node provided by Hot Aisle. I’ll have an article on some of my experiments training with MI300’s…

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

All models have biases and most Instruct/Chat models are aligned for “safety”, with Western moral biases, etc. There’s spirited debate on when and where those lines should…

Copyright and AI Training Data in Japan

Currently, per Japanese copyright law (PDF), re-affirmed as current policy in April 2023 by Keiko Nagaoka, the Japanese Minister of Education, Culture, Sports, Science, and…

Evaling llm-jp-eval (evals are hard)

With training of shisa-v2 starting in earnest, I’ve been digging a bit more into llm-jp-eval, which I used as a quick and simple benchmark to help to track shisa-v1…

Sakana AI Evolves Models with shisa-gamma-7b-v1

Sakana AI just published some exciting new work on Evolutionary Model Merges of LLMs, applying evolutionary techniques to dicover optimal ways of combining different models.…

Shisa 7B

Shisa 7B was the original Japanese-English bilingual model that kicked everything off. The model card is posted here for posterity.