mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.2K
active users

#speechAI

0 posts0 participants0 posts today
pinage404.rss :nixos:<p>As of today, my computer can __nicely__ read aloud for me !</p><p>I'm lazy, i read slowly, so i don't like reading, i skip a lot of articles</p><p>I have been looking for a solution for several months</p><p><a href="https://mamot.fr/tags/Accessibility" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Accessibility</span></a> <a href="https://mamot.fr/tags/A11y" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>A11y</span></a> <a href="https://mamot.fr/tags/Orca" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Orca</span></a> <a href="https://mamot.fr/tags/WebBrowser" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebBrowser</span></a> <a href="https://mamot.fr/tags/ZenBrowser" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ZenBrowser</span></a> <a href="https://mamot.fr/tags/Firefox" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Firefox</span></a> <a href="https://mamot.fr/tags/Piper" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Piper</span></a> <a href="https://mamot.fr/tags/Pied" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Pied</span></a> <a href="https://mamot.fr/tags/SpeechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechAI</span></a> <a href="https://mamot.fr/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mamot.fr/tags/Nix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Nix</span></a> <a href="https://mamot.fr/tags/NixOS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NixOS</span></a></p>
Farooq | فاروق<p>Yesterday, I ordered food online. However it went a little off. And I contacted Support. They called me and for one moment, I thought it's a bot or recorded voice or something. And I hated it. Then I realized it's a human on the line.</p><p>I was planning to do an LLM+TTS+Speech Recognition and deploy it on A311D. To see if I can practice british accent with it. Now I'm rethinking about what I want to do. This way we are going, it doesn't lead to a good destination. I would hate it if I would have to talk to a voice enabled chatbot as support agent rather than a human.</p><p>And don't get me wrong. Voice enabled chatbots can have tons of good uses. But replacing humans with LLMs, not a good one. I don't think so.</p><p><a href="https://cr8r.gg/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://cr8r.gg/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://cr8r.gg/tags/TTS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TTS</span></a> <a href="https://cr8r.gg/tags/ASR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ASR</span></a> <a href="https://cr8r.gg/tags/speechrecognition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechrecognition</span></a> <a href="https://cr8r.gg/tags/speechai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechai</span></a> <a href="https://cr8r.gg/tags/ML" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ML</span></a> <a href="https://cr8r.gg/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://cr8r.gg/tags/chatbot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatbot</span></a> <a href="https://cr8r.gg/tags/chatbots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatbots</span></a> <a href="https://cr8r.gg/tags/artificialintelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>artificialintelligence</span></a></p>
Dirk Schnelle-Walka<p>SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems. Multi-modal LLM system simulates human communication using speech and generates human-like dialogues with consistent content, rhythm, &amp; emotion.</p><p>Funnily, they also elaborate on a "think before you speak" design aspect. This might also be applicable to our everyday lives. </p><p>doi: 10.48550/arXiv.2401.03945 <br><a href="https://mastodontech.de/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://mastodontech.de/tags/multimodal" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multimodal</span></a> <a href="https://mastodontech.de/tags/speechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechAI</span></a> <a href="https://mastodontech.de/tags/multiagent" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multiagent</span></a> <a href="https://mastodontech.de/tags/conversationalai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>conversationalai</span></a></p>
Winbuzzer<p>Amazon’s New Nova Sonic Voice Model Targets Voice AI Rivals With Real-Time Expressive Output</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/VoiceAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VoiceAI</span></a> <a href="https://mastodon.social/tags/NovaSonic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NovaSonic</span></a> <a href="https://mastodon.social/tags/AmazonAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AmazonAI</span></a> <a href="https://mastodon.social/tags/AlexaPlus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AlexaPlus</span></a> <a href="https://mastodon.social/tags/AIModel" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIModel</span></a> <a href="https://mastodon.social/tags/SpeechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechAI</span></a> <a href="https://mastodon.social/tags/RealTimeVoice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RealTimeVoice</span></a> <a href="https://mastodon.social/tags/BedrockAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BedrockAI</span></a> <a href="https://mastodon.social/tags/AIassistant" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIassistant</span></a></p><p><a href="https://winbuzzer.com/2025/04/08/amazons-new-nova-sonic-voice-model-targets-voice-ai-rivals-with-real-time-expressive-output-xcxwbn/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/04/08/amazo</span><span class="invisible">ns-new-nova-sonic-voice-model-targets-voice-ai-rivals-with-real-time-expressive-output-xcxwbn/</span></a></p>
Winbuzzer<p>ChatGPT’s Advanced Voice Mode Expands to Web and Improves Conversational Flow</p><p><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/ChatGPT" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatGPT</span></a> <a href="https://mastodon.social/tags/VoiceAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VoiceAI</span></a> <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/AIAssistants" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIAssistants</span></a> <a href="https://mastodon.social/tags/Chatbots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Chatbots</span></a> <a href="https://mastodon.social/tags/SpeechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechAI</span></a> <a href="https://mastodon.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenAI</span></a> </p><p><a href="https://winbuzzer.com/2025/03/25/chatgpts-advanced-voice-mode-expands-to-web-with-real-time-conversations-xcxwbn/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winbuzzer.com/2025/03/25/chatg</span><span class="invisible">pts-advanced-voice-mode-expands-to-web-with-real-time-conversations-xcxwbn/</span></a></p>
Kathy Reid<p>For the past couple of years, as each new <span class="h-card" translate="no"><a href="https://mozilla.social/@mozilla" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>mozilla</span></a></span> <a href="https://aus.social/tags/CommonVoice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CommonVoice</span></a> dataset of <a href="https://aus.social/tags/voice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>voice</span></a> <a href="https://aus.social/tags/data" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>data</span></a> is released, I've been using <span class="h-card" translate="no"><a href="https://vis.social/@observablehq" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>observablehq</span></a></span> to visualise the <a href="https://aus.social/tags/metadata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>metadata</span></a> coverage across the 100+ languages in the dataset. </p><p>Version 17 was released yesterday (big ups to the team - EM Lewis-Jong, <span class="h-card" translate="no"><a href="https://mastodon.social/@jessie" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>jessie</span></a></span>, Gina Moape, Dmitrij Feller) and there's some super interesting insights from the visualisation: </p><p>➡ Catalan (ca) now has more data in Common Voice than English (en) (!)</p><p>➡ The language with the highest average audio utterance duration at nearly 7 seconds is Icelandic (is). Perhaps Icelandic words are longer? I suspect so!</p><p>➡ Spanish (es), Bangla (Bengali) (bn), Mandarin Chinese (zh-CN) and Japanese (ja) all have a lot of recorded utterances that have not yet been validated. Albanian (sq) has the highest percentage of validated utterances, followed closely by Erzya / Arisa (myv).</p><p>➡ Votic (vot) has the highest percentage of invalidated utterances, but with 76% of utterances invalidated, I wonder if this language has been the target of deliberate invalidation activity (invalidating valid sentences, or recording sentences to be deliberately invalid) given the geopolitical instability in Russia currently. </p><p>See the visualisation here and let me know your thoughts below!</p><p>➡ <a href="https://observablehq.com/@kathyreid/mozilla-common-voice-v17-dataset-metadata-coverage" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">observablehq.com/@kathyreid/mo</span><span class="invisible">zilla-common-voice-v17-dataset-metadata-coverage</span></a></p><p><a href="https://aus.social/tags/linguistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linguistics</span></a> <a href="https://aus.social/tags/languages" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>languages</span></a> <a href="https://aus.social/tags/data" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>data</span></a> <a href="https://aus.social/tags/VoiceAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VoiceAI</span></a> <a href="https://aus.social/tags/VoiceData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VoiceData</span></a> <a href="https://aus.social/tags/SpeechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechAI</span></a> <a href="https://aus.social/tags/SpeechData" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechData</span></a> <a href="https://aus.social/tags/DataViz" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataViz</span></a></p>
Kathy Reid<p>Last week, as part of my <a href="https://aus.social/tags/PhD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PhD</span></a> program at the <a href="https://aus.social/tags/ANU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ANU</span></a> School of <a href="https://aus.social/tags/cybernetics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybernetics</span></a>, I gave my final presentation, which is a summary of my methods and <a href="https://aus.social/tags/research" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>research</span></a> findings. I covered my interview work, the <a href="https://aus.social/tags/dataset" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataset</span></a> documentation analysis work I've been doing and my analysis work around <a href="https://aus.social/tags/accents" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>accents</span></a> in <span class="h-card" translate="no"><a href="https://mozilla.social/@mozilla" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>mozilla</span></a></span>'s <a href="https://aus.social/tags/CommonVoice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CommonVoice</span></a> platform. </p><p>There were some insightful and thought-provoking questions from my panel and audience members, and of course - so many ideas for future research inquiry! </p><p>A huge thanks to my panel, chaired so well by Professor Alexandra Zafiroglu, to Dr Elizabeth Williams, my meticulous, methodical and always-encouraging Primary Supervisor, and to my co-supervisors Dr Jofish Kaye and Dr Paul Wong 黃仲熙 for their deep expertise in <a href="https://aus.social/tags/HCI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HCI</span></a> and <a href="https://aus.social/tags/data" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>data</span></a> respectively. </p><p>Similarly, a huge thank you to my <a href="https://aus.social/tags/PhD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PhD</span></a> cohort - Charlotte Bradley, Tom Chan, Danny Bettay and Sam Backwell - as well as the other cohorts in the School - for your encouragement and intellectual journeying. </p><p><a href="https://aus.social/tags/PhD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PhD</span></a> <a href="https://aus.social/tags/PhDlife" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PhDlife</span></a> <a href="https://aus.social/tags/cybernetics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybernetics</span></a> <a href="https://aus.social/tags/milestone" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>milestone</span></a> <a href="https://aus.social/tags/ANU" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ANU</span></a> <a href="https://aus.social/tags/voiceAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>voiceAI</span></a> <a href="https://aus.social/tags/speechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechAI</span></a> <a href="https://aus.social/tags/ASR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ASR</span></a> <a href="https://aus.social/tags/SpeechRecognition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechRecognition</span></a></p>
Norobiik @Norobiik@noc.social<p><a href="https://noc.social/tags/Quantiphi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Quantiphi</span></a> is working with <a href="https://noc.social/tags/NeMo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NeMo</span></a> to build a modular generative AI solution to improve worker productivity. Nvidia also announced four inference GPUs, optimized for a diverse range of emerging LLM and generative AI applications. Each GPU is designed to be optimized for specific <a href="https://noc.social/tags/AIInference" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIInference</span></a> workloads while also featuring specialized software.</p><p><a href="https://noc.social/tags/SpeechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechAI</span></a>, <a href="https://noc.social/tags/supercomputing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>supercomputing</span></a> in the <a href="https://noc.social/tags/cloud" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cloud</span></a>, and <a href="https://noc.social/tags/GPUs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPUs</span></a> for <a href="https://noc.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> and <a href="https://noc.social/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> among <a href="https://noc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Nvidia</span></a>’s next big moves | <a href="https://noc.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <br><a href="https://venturebeat.com/ai/speech-ai-supercomputing-cloud-gpus-llms-generative-ai-nvidia-next-big-moves/" rel="nofollow noopener" target="_blank"><span class="invisible">https://</span><span class="ellipsis">venturebeat.com/ai/speech-ai-s</span><span class="invisible">upercomputing-cloud-gpus-llms-generative-ai-nvidia-next-big-moves/</span></a></p>
Matt Coler<p>Hello Mastodon! Here's my belated <a href="https://fediscience.org/tags/introduction" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>introduction</span></a>. I am an Associate Professor of Language &amp; Technology and the Director of the MSc Voice Technology at the University of Groningen. <a href="https://fediscience.org/tags/OpenAcess" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAcess</span></a> Ambassador.</p><p>Interests: <a href="https://fediscience.org/tags/SpeechTechnology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpeechTechnology</span></a>, <a href="https://fediscience.org/tags/VoiceTechnology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VoiceTechnology</span></a>, voice <a href="https://fediscience.org/tags/synthesis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>synthesis</span></a>, <a href="https://fediscience.org/tags/speech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speech</span></a> <a href="https://fediscience.org/tags/recognition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>recognition</span></a> <a href="https://fediscience.org/tags/ASR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ASR</span></a>, <a href="https://fediscience.org/tags/speechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechAI</span></a>, <a href="https://fediscience.org/tags/multisensory" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multisensory</span></a> perception, <a href="https://fediscience.org/tags/audition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>audition</span></a>, <a href="https://fediscience.org/tags/soundscapes" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>soundscapes</span></a>, <a href="https://fediscience.org/tags/SituatedCognition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SituatedCognition</span></a>, music, Andean languages, <a href="https://fediscience.org/tags/Aymara" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Aymara</span></a>, <a href="https://fediscience.org/tags/Frisian" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Frisian</span></a></p>
Maaike<p>💡 Interesting read on how one of the biggest commercial players out there plans to use Mozilla Open Voice data to make speech AI more inclusive and open to more language.</p><p>💬 Sounds idealistic, but Open Voice datasets are created by unpaid volunteers who donate hours and hours of their speech. Not sure whether I feel comfortable with that, tbh.</p><p>💭 Thoughts?</p><p><a href="https://venturebeat.com/ai/nvidia-enters-the-speech-ai-race-joining-meta-and-google/" rel="nofollow noopener" target="_blank"><span class="invisible">https://</span><span class="ellipsis">venturebeat.com/ai/nvidia-ente</span><span class="invisible">rs-the-speech-ai-race-joining-meta-and-google/</span></a></p><p><a href="https://mastodon.design/tags/ethicalAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ethicalAI</span></a> <a href="https://mastodon.design/tags/transparentai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transparentai</span></a> <a href="https://mastodon.design/tags/voice" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>voice</span></a> <a href="https://mastodon.design/tags/speech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speech</span></a> <a href="https://mastodon.design/tags/voicetech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>voicetech</span></a> <a href="https://mastodon.design/tags/speechtech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechtech</span></a> <a href="https://mastodon.design/tags/speechAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>speechAI</span></a></p>