TTS pronunciation rules

Speech accuracy improvements and Text-To-Speech
Jan 2025

PolyAI relies on Text-to-Speech (TTS) to generate natural, human-like audio for enterprise voice assistants. However, uncommon words, brand names, domain-specific terminology, or structured inputs like addresses and phone numbers were frequently mispronounced.

Engineers and Dialog Designers had no way to fix these issues inside of the platform and had to modify backend configuration files instead. I designed a dedicated TTS Pronunciation Rules feature that brings Regex-based matching, IPA replacements, and plain text replasment support giving experts full control over pronunciation and dramatically improving the usability and speed of their workflow.

Role
Lead product designer (solo designer)
Team
Collaborated with a PM, a full-stack engineer, a QA engineer, linguistics (DD) team, and a Tech writer
Problem definition
TTS mispronunciations lowered voice quality and created friction during client deployments, especially for enterprise customers with specialised vocabulary.
The internal Dialog Design team had to manually add rules via backend files, which slowed iteration, introduced engineering dependency, and limited their ability to respond quickly to client feedback.
Without an accessible in-product interface, correcting pronunciation required external tools, context switching, and guesswork.
Goals and success metrics  
  • Allow internal linguistics experts to create, edit, and delete TTS pronunciation rules on the UI to reduce engineering dependency and operational overhead.
  • Improve accuracy and naturalness of generated speech across enterprise use cases.
  • Decrease time needed to fix pronunciation errors and improve satisfaction and autonomy of the DD team.
Research and Insights
I interviewed members of the linguistics (Dialog Design) team to understand how they currently corrected pronunciations using IPA (International Phonetic Alphabet), Regex matching, and backend constants. The insights highlighted several core needs.
Experts needed full control. IPA and regex are powerful but require precision.
Workflow was fragmented and slow. Fixes lived outside, hidden in configuration files.
Future users would be less technical. The system needed to be flexible enough for experts now but easier later.
Technical constraints and Strategy
Collaborated with an engineer on strategy centered on empowering expert users with flexible control while making the system predictable, safe, and scalable. This approach allowed the team to embed pronunciation logic into the product without compromising reliability.
Design exploration
Focused on a simple TTS Rules object with three fields: Match(Expression), Replacement, and a Case Sensitive toggle. We added inline regex validation, flexible open-text inputs for expert use, and a clean list view showing each rule with examples and edit options. This created a fast, reliable workflow tailored for linguistics experts.
Final design and Validation
Tested early prototypes with the DD team to verify usability for real use cases, including brand name corrections, phone number formatting, and industry-specific terminology adjustments. The updates helped ensure the interface aligned tightly with expert workflows and reduced the risk of incorrect rule creation.
Design artefacts
All components and states were created using the PolyAI design system and documented for engineering handoff. I defined every interaction state, error message, and edge case to ensure accuracy and scalability in development.
Impact and Outcomes
Initial metrics looked acceptable, yet user feedback after 30 days revealed early signs of low satisfaction. This helped clarify where the experience still fell short and where further improvement was needed.
Next steps...
These improvements build on the foundation we released and help move the tool from expert-only to broadly accessible. I analysed the feedback, rated its importance, and aligned priorities with the PM and product goals to guide the next steps toward a smoother, more scalable pronunciation workflow.