The English Word Atlas is the foundational linguistic dataset that powers Synapse. It began as a practical solution for managing open-source word lists and evolved into a vocabulary creation tool that produced a custom word set tailored for the game.
Evolution of the Toolkit
The project started as a Python utility for managing various word lists (GSL, AWL, etc.) and evolved into a library for programmatic vocabulary design. The core WordlistBuilder
tool creates word lists with specific properties by:
- Integrating Diverse Sources: Combining historical lists (Roget's Thesaurus, Swadesh) with modern frequency data
- Applying Complex Filters: Adding/removing words based on semantic categories, frequency bands, and linguistic attributes
- Supporting Phrases: Treating multi-word phrases as core elements with full data coverage
Custom Vocabulary for Synapse
The Word Atlas produced a bespoke vocabulary for Synapse with two key properties:
Swadesh-Inspired: Built on conceptually fundamental words, drawing from the Swadesh list's focus on universal human core vocabulary for accessibility and meaning
Semantically Distributed: Intentionally designed with a linear frequency distribution, moving away from the natural Zipfian curve where few words dominate. This ensures balanced and varied gameplay
The English Word Atlas represents a complete journey: from functional need to vocabulary creation tool, finally producing the bespoke vocabulary that makes Synapse possible.
Repository: github.com/neumanns-workshop/english-word-atlas