English Word Atlas Creation

The English Word Atlas began as a utility for customizing open-source word lists and evolved into a linguistically-aware library. Its main output for Synapse was a custom, semantically-distributed vocabulary inspired by Swadesh list principles, providing a balanced foundation for the game.

The English Word Atlas is the foundational linguistic dataset that powers Synapse. It began as a practical solution for managing open-source word lists and evolved into a vocabulary creation tool that produced a custom word set tailored for the game.

Evolution of the Toolkit

The project started as a Python utility for managing various word lists (GSL, AWL, etc.) and evolved into a library for programmatic vocabulary design. The core WordlistBuilder tool creates word lists with specific properties by:

  • Integrating Diverse Sources: Combining historical lists (Roget's Thesaurus, Swadesh) with modern frequency data
  • Applying Complex Filters: Adding/removing words based on semantic categories, frequency bands, and linguistic attributes
  • Supporting Phrases: Treating multi-word phrases as core elements with full data coverage

Custom Vocabulary for Synapse

The Word Atlas produced a bespoke vocabulary for Synapse with two key properties:

  1. Swadesh-Inspired: Built on conceptually fundamental words, drawing from the Swadesh list's focus on universal human core vocabulary for accessibility and meaning

  2. Semantically Distributed: Intentionally designed with a linear frequency distribution, moving away from the natural Zipfian curve where few words dominate. This ensures balanced and varied gameplay

The English Word Atlas represents a complete journey: from functional need to vocabulary creation tool, finally producing the bespoke vocabulary that makes Synapse possible.

Repository: github.com/neumanns-workshop/english-word-atlas