Algorithmic Censorship Resistance Toolkit

About

The Algorithmic Censorship Resistance Toolkit is a collection of tactics that obfuscate and encode text to make it unreadable by machines. The tools are inspired by both existing online practices and new approaches developed through research.

The Algorithmic Censorship Resistance Toolkit was designed and built by Qianqian Ye, in collaboration with Xiaowei Wang, as part of The Future of Memory, a project commissioned through the Mozilla Creative Media Award.

Research

Online, words are a form of data and expression. From hashtags to political dissent, words have the power to build new worlds and take down old ones. At the same time, language has also become a form of data, used to create machine learning systems for profit, and it has also become an arena for automated censorship and moderation.

Automated censorship has led to a surge of creativity as online netizens scramble to “fool the machine”, through creative use of homophones to images and new characters that bypass OCR (optical character recognition).

Algorithmic Censorship by Social Platforms: Power and Resistance.

Jennifer Cobbe, 2019.

(Can’t) Picture This: An Analysis of Image Filtering on WeChat Moments.

Jeffrey Knockel, Lotus Ruan, Masashi Crete-Nishihata, and Ron Deibert, 2018.

Resisting the Censorship Infrastructure in China.

Yubo Kou, Yong Ming Kow, and Xinning Gui, 2017.

The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects.

Jeffrey Knockel, Masashi Crete-Nishihata, and Lotus Ruan, 2018.

Tranßcripting: playful subversion with Chinese characters.

Li Wei, Zhu Hua, 2018.

Contribute

GitHub

An ongoing collection of
algorithmic censorship resistance tactics

Hanzi Maker

Creating new characters using the Chinese (Mandarin) radical system. Based on invented characters from the 2014 Hong Kong Protests, documented in the paper “Tranßcripting: playful subversion with Chinese characters”

Made by: The Future of Memory team

Chinese Homophone

In Chinese, a tonal language, words can be represented by different characters but sound similar. For example, 河蟹; pinyin: héxiè (river crab) sounds similar to 和谐; pinyin: héxié (harmony, used to mock the CCP espousal of a “harmonious society”). This tool translates censored words into their homophones — readers will know the real meaning of the word depending on the context.

Made by: The Future of Memory team

Morse Code Generator

In recent years, online netizens in China have been using Morse code as a way to post sensitive information onto WeChat and Weibo.

Made by: Burak Özdemir

Mojibake Generator

Mojibake is a phenomenon that describes when text is garbled, as a result of being decoded using a different character encoding than the one it was generated with. For more on Mojibake, which in Japanese means “character changing”, take a look here.

Made by: James Stanley

Text to Emoji Converter

Used throughout the world, emoji is an efficient and expressive way to convey all manner of content, from politically sensitive topics to whole novels.

Made by: Emoji Translate Team

Text to Hex Converter

Used to translate plain text (ASCII) into hexadecimal, students have also been storing messages on the Ethereum blockchain in hex to avoid censorship and takedown.

Made by: Browserling

Martian Language Generator

Martian Language, or 火星文; pinyin: huǒxīng, is popular amongst bloggers and is a combination of many strategies to bypass censorship — including replacing character with homophones, SMS slang and characters with similar radicals.

Made by: Aies

Braille Translator

Translating into Braille is also another strategy Chinese netizens have used to bypass takedown of posts. Braille is readable via a refreshable Braille display.

Made by: BrailleTranslator.org