Poli
Poli
A voice tutor that helps digital learners
practice speaking a new language
and connect with a new culture
2020
8 weeks
Poli can understand and respond in mixed speech, allowing learners to fall back on their primary language as they learn new words and grammar.
Context
Over 300 different languages are being spoken in the US today. In India, there are over 400. We are living in diverse, multicultural societies and English remains the dominant language to communicate across cultural lines. But as we work with each other, make friends, make deeper connections, fall in love, or want to appreciate the people who are from another culture, we must try and understand the languages they speak.
While existing learning tools are beneficial in reading and writing comprehension, they do little to improve speaking fluency. Interviews with learners revealed that they were often embarrassed to speak, and keep up with the pace of native speakers. The lack of practice then leads them to forget what they have learned or they simply lose interest. They also required the cultural context of how native speakers use the language, allowing students to connect better with a new culture.
Poli is a convenient companion that helps reduce the resistance to learn a new language.
Poli employs code-switching
In linguistics, code-switching is a phenomenon where bilingual speakers switch between two or more languages, in the same conversation. People code-switch to fit into a particular community or social group. It makes them come across as "one of their own." Politicians, comedians and advertising agencies exploit it heavily to connect with their audience. People also code-switch when they are learning a new language. Teachers switch to their native language to help students understand new or difficult concepts, build better relationships with them and repeat essential points.
If learners cannot recall a word or phrase in the language they are learning, they replace it with ones from the language in which they are comfortable. This habit helps with associative learning and retention. It sometimes becomes the natural way of speaking, leading to localised hybridisation of languages. Since Code-switching is entrenched in the identity of people, it could be an effective strategy to help learn a new language and provide new cultural contexts for how it is spoken.
Topics are the basis of interacting with Poli
Topics help learners to keep their responses focused and make for meaningful conversations. For example, if you’re learning Hindi, you can ask Poli for a topic, and it can ask you to describe what the weather is like today. As you respond, it can identify switches and give you suggestions of the words you may not recall, or help you with sentence structures.
Topics can be informal everyday conversations like talking about the weather, or formal dialogues like how to address different people or work-related. As you get better at speaking, Poli uses spaced repetition to increase the complexity of topics steadily.
Following conversations while learning can be hard; so you can adjust Poli’s pace, so it responds as fast or as slowly as you want.
How it works
Poli’s prototype was developed by prototyping with different natural language processing technologies and neural networks. Google’s Cloud Speech-to-text model converts utterances; an on-device parser then analyses the text to understand the intent of the statement. This includes identifying the language of switched words, what parts of speech they belong to, and if there are any unique entity labels. With this information, Poli can match appropriate topics and synthesise responses using WaveNet speech synthesis. The prototype uses a raspberry pi, a hacked PS3 eye, and is built on Mycroft—an open source voice assistant platform.
New roles and motives
Poli is an attempt to design new roles and motives for voice assistants apart from making our lives more streamlined and efficient. Language is at the heart of any culture. Unlike visual interfaces, which can be homogenised across countries and cultures, conversational interfaces must come from an understanding of how people use a language. Mixed-speech is a very natural way of how we communicate with each other, and I wanted to explore how grasping these nuances can make for more natural interactions with intelligent devices.
Prototyping and Testing
Namaste world! My initial experiments used the google AIY Voice kit, to programme basic mixed speech-interactions like turning a light on and off. But I wanted to move beyond input-output interactions, and see if I could achieve a somewhat flexible conversation.
Giving Poli its name
I used Mycroft Precise— an RNN wake-word listener to understand the words "Ok Poli."
1. Data Gathering – Recordings of the words “Ok Poli” were gathered from different people. Recordings of rhyming and similar sounding words, random other words and ambient noises were also included to reduce false activations.
2. Training – Data was broken into training and test sets and trained multiple times as more data was gathered.
3. Testing – Once training was done, the wake word was tested to see if it could trigger only when the right words we uttered against noises, and distance etc.
4. Result – After about 4-5 rounds of improvements the final model was integrated with the other modules
Crafting conversations
Speech and conversation are so intuitive in our minds; it is effortless for us to exchange complex ideas but very challenging to replicate with a machine. So I held online role-playing exercises to design the interactions with Poli. We enacted sample dialogue prompts while I recorded responses. It was like making a choose-your-own-adventure book. I also used the working prototype as a probe to gauge peoples reactions and get new ideas for topics.