Overview
- Large-scale tests showed DeepSeek-R1 degraded code quality or halted generation when prompts referenced topics such as Uighurs, Falun Gong, Taiwan, Tibet or Tiananmen.
- CrowdStrike reported the model refused to generate code about Falun Gong in roughly 45% of trials and sometimes aborted after preparing an answer.
- Observed flaws included hard-coded credentials, missing session management and authentication, weak hashing, and storing passwords in plaintext.
- The study used 6,050 prompts per model with each task repeated five times to verify reproducibility and documented consistent security failures.
- Researchers hypothesize a censorship filter or training side effect as the cause and warn that similar value-based fine-tuning in other commercial models could induce comparable risks.