mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

9.3K
active users

#speechAI

1 post1 participant1 post today

Yesterday, I ordered food online. However it went a little off. And I contacted Support. They called me and for one moment, I thought it's a bot or recorded voice or something. And I hated it. Then I realized it's a human on the line.

I was planning to do an LLM+TTS+Speech Recognition and deploy it on A311D. To see if I can practice british accent with it. Now I'm rethinking about what I want to do. This way we are going, it doesn't lead to a good destination. I would hate it if I would have to talk to a voice enabled chatbot as support agent rather than a human.

And don't get me wrong. Voice enabled chatbots can have tons of good uses. But replacing humans with LLMs, not a good one. I don't think so.

#LLM#AI#TTS

For the past couple of years, as each new @mozilla #CommonVoice dataset of #voice #data is released, I've been using @observablehq to visualise the #metadata coverage across the 100+ languages in the dataset.

Version 17 was released yesterday (big ups to the team - EM Lewis-Jong, @jessie, Gina Moape, Dmitrij Feller) and there's some super interesting insights from the visualisation:

➡ Catalan (ca) now has more data in Common Voice than English (en) (!)

➡ The language with the highest average audio utterance duration at nearly 7 seconds is Icelandic (is). Perhaps Icelandic words are longer? I suspect so!

➡ Spanish (es), Bangla (Bengali) (bn), Mandarin Chinese (zh-CN) and Japanese (ja) all have a lot of recorded utterances that have not yet been validated. Albanian (sq) has the highest percentage of validated utterances, followed closely by Erzya / Arisa (myv).

➡ Votic (vot) has the highest percentage of invalidated utterances, but with 76% of utterances invalidated, I wonder if this language has been the target of deliberate invalidation activity (invalidating valid sentences, or recording sentences to be deliberately invalid) given the geopolitical instability in Russia currently.

See the visualisation here and let me know your thoughts below!

➡ observablehq.com/@kathyreid/mo

ObservableMozilla Common Voice v17 dataset metadata coverageThis visualisation uses "@d3/stacked-horizontal-bar-chart" to visualise the Common Voice metadata coverage. The original data is taken from the Common Voice `cv-dataset` repository - direct link Table of contents Splits by age range - shows how many clips have been provided by speakers of different age ranges for each locale (language) Splits by age range scaled to 100% - as above, but scaled to 100% so that the metadata coverage of low resource languages is more visible Splits by gender - shows how many cl

Last week, as part of my #PhD program at the #ANU School of #cybernetics, I gave my final presentation, which is a summary of my methods and #research findings. I covered my interview work, the #dataset documentation analysis work I've been doing and my analysis work around #accents in @mozilla's #CommonVoice platform.

There were some insightful and thought-provoking questions from my panel and audience members, and of course - so many ideas for future research inquiry!

A huge thanks to my panel, chaired so well by Professor Alexandra Zafiroglu, to Dr Elizabeth Williams, my meticulous, methodical and always-encouraging Primary Supervisor, and to my co-supervisors Dr Jofish Kaye and Dr Paul Wong 黃仲熙 for their deep expertise in #HCI and #data respectively.

Similarly, a huge thank you to my #PhD cohort - Charlotte Bradley, Tom Chan, Danny Bettay and Sam Backwell - as well as the other cohorts in the School - for your encouragement and intellectual journeying.

#Quantiphi is working with #NeMo to build a modular generative AI solution to improve worker productivity. Nvidia also announced four inference GPUs, optimized for a diverse range of emerging LLM and generative AI applications. Each GPU is designed to be optimized for specific #AIInference workloads while also featuring specialized software.

#SpeechAI, #supercomputing in the #cloud, and #GPUs for #LLMs and #GenerativeAI among #Nvidia’s next big moves | #AI
venturebeat.com/ai/speech-ai-s

💡 Interesting read on how one of the biggest commercial players out there plans to use Mozilla Open Voice data to make speech AI more inclusive and open to more language.

💬 Sounds idealistic, but Open Voice datasets are created by unpaid volunteers who donate hours and hours of their speech. Not sure whether I feel comfortable with that, tbh.

💭 Thoughts?

venturebeat.com/ai/nvidia-ente

VentureBeatNvidia takes on Meta and Google in the speech AI technology raceBy Victor Dey