{"id":160,"date":"2006-07-17T08:27:57","date_gmt":"2006-07-17T15:27:57","guid":{"rendered":"http:\/\/www.red-sweater.com\/blog\/160\/dont-speak-twice-its-all-right"},"modified":"2006-07-17T14:57:18","modified_gmt":"2006-07-17T21:57:18","slug":"dont-speak-twice-its-all-right","status":"publish","type":"post","link":"https:\/\/redsweater.com\/blog\/160\/dont-speak-twice-its-all-right","title":{"rendered":"Don&#8217;t Speak Twice, It&#8217;s All Right"},"content":{"rendered":"<style type=\"text\/css\"><!-- .caption { border-style:dashed; border-width:1px; border-color:#BBBBBB; margin-left:20px; padding:10px;}--><\/style>\n<p>Andy Lee sent me a bunch of excellent feedback about FlexTime, and let me know about a strange, 100% reproducible crashing bug.  If you configure FlexTime such that both the ending cue of one activity and the starting cue of the one that follows are &#8220;Speak Text&#8221; cues, then the application crashes.\n<\/p>\n<p>\nFirst thought: damn I&#8217;m glad I put a beta out. Second thought: good lord, what I have done!?\n<\/p>\n<p>\nUnfortunately, the bug is not in my code. I was able to reproduce the problem quite easily with the simplest of command line tools:\n<\/p>\n<div class=\"caption\">\n<pre>\n#import &lt;Carbon\/Carbon.h><br \/>\nmain()\n{\n\tStr255 string1 = &quot;\\\\pHello&quot;;\n\tStr255 string2 = &quot;\\\\pHello Again&quot;;<br \/>\n\tSpeakString(string1);\n\tSpeakString(string2);\n}\n<\/pre>\n<\/div>\n<p>\nYou may not have realized that it was quite so simple to accomplish spoken text on a Mac. Unfortunately, the simplicity is deceptive, since compiling and running the above tool results in a nasty crash:\n<\/p>\n<div class=\"caption\">\n<pre>\nProgram received signal EXC_BAD_ACCESS, Could not access memory.\nReason: KERN_PROTECTION_FAILURE at address: 0x00000000\n0x0020da94 in MTBEAudioUnitSoundOutput::BufferComplete ()\n(gdb) bt\n#0  0x0020da94 in MTBEAudioUnitSoundOutput::BufferComplete ()\n#1  0x7006e9bc in AUScheduledSoundPlayerEntry ()\n#2  0x700097c8 in DefaultOutputAUEntry ()\n#3  0x700049d0 in DefaultOutputAUEntry ()\n#4  0x700da1a8 in dyld_stub__keymgr_get_and_lock_processwide_ptr ()\n#5  0x90bd9d24 in CallComponent ()\n#6  0x942647a0 in AudioUnitUninitialize ()\n#7  0x94178368 in AudioUnitNodeInfo::Uninitialize ()\n#8  0x94178300 in AudioUnitGraph::Uninitialize ()\n#9  0x941786a4 in AudioUnitGraph::Dispose ()\n#10 0x941785f0 in DisposeAUGraph ()\n#11 0x0020e0a8 in MTBEAudioUnitSoundOutput::~MTBEAudioUnitSoundOutput ()\n#12 0x0020a78c in SpeechChannelManager::~SpeechChannelManager ()\n#13 0x002215f4 in SECloseSpeechChannel ()\n#14 0x91997974 in KillSpeechChannel ()\n#15 0x91995088 in KillPrivateChannels ()\n#16 0x919969d8 in SpeakString ()\n#17 0x00002cec in main ()\n<\/pre>\n<\/div>\n<p>\nSpeakString has been around for a long time. Long before Mac OS X and long before CoreAudio, where the crash appears to be happening. I would guess it didn&#8217;t <em>used<\/em> to crash, but when it was ported to Mac OS X, something got overlooked and now it leads to whammy land.\n<\/p>\n<p>\nOK, so I how do I work around the problem? It is clearly related to attempting to speak text while some text is already speaking. Maybe if I could coddle the Speech Manager a little bit, I could prevent it from crashing.\n<\/p>\n<p>\nFrom the <a href=\"http:\/\/developer.apple.com\/documentation\/Carbon\/Reference\/Speech_Synthesis_Manager\/Reference\/reference.html#\/\/apple_ref\/c\/func\/SpeakString\">Speech Synthesis Manager Reference<\/a> documentation for SpeakString, we see that the behavior for overlapping speech is (supposed to be) very well defined:\n<\/p>\n<p><div class=\"caption\">\n&#8220;If SpeakString is called while a prior string is still being spoken, the sound currently being synthesized is interrupted immediately. Conversion of the new text into speech is then begun. If you pass a zero-length string (or, in C, a null pointer) to SpeakString, the Speech Synthesis Manager stops any speech previously being synthesized by SpeakString without generating additional speech. If your application uses SpeakString, it is often a good idea to stop any speech in progress whenever your application receives a suspend event. Calling SpeakString with a zero-length string has no effect on speech channels other than the one managed internally by the Speech Synthesis Manager for the SpeakString function.)&#8221;\n<\/div>\n<\/p>\n<p>\nTranslation: what we&#8217;re doing is supposed to work. But maybe by overdoing it we can achieve the desired goal. If Mac OS X falls down on the &#8220;interrupting immediately&#8221; behavior, perhaps we can manually stop any previous sound to help it keep its bearings. According to the documentation, calling &#8220;SpeakString(NULL)&#8221; should effectively cancel playback. Unfortunately, injecting it into my simple crash case changes nothing. Worse, when I add it to my live application, I observe a new failure path. The text &#8220;pure virtual method called&#8221; is printed to the console, with the following backtrace:\n<\/p>\n<div class=\"caption\">\n<pre>\n(gdb) x\/s $r4\n0x52c1ad8 <_zn24mtmbraisedsinecrossfader7scoeffse +5916>:<br \/>\t \"pure virtual method called\\n\"\n(gdb) bt\n#0  0x90014ac0 in write ()\n#1  0x052b2814 in MTFEClone::VisitCommand ()\n#2  0x05287604 in MTBEWorker::WorkLoop ()\n#3  0x052866f4 in MTBEWorkerStartMPTask ()\n#4  0x90bc1900 in PrivateMPEntryPoint ()\n#5  0x9002bc28 in _pthread_body ()\n<\/_zn24mtmbraisedsinecrossfader7scoeffse><\/pre>\n<\/div>\n<p>\nWell, this can is getting wormier and wormier. It is starting to look like I won&#8217;t be able to take advantage of the ease and simplicity of SpeakString. Ten years ago, sure. But in 2006 SpeakString es muy sucky. It&#8217;s probably time to start looking at the more advanced speech API, where I&#8217;m responsible for managing my own speech channels. With responsibility also comes (we hope) the ability to save ourselves from certain doom.\n<\/p>\n<p>\nBut let&#8217;s say I just <em>need<\/em> to stick with SpeakString, because I have a demo in 5 minutes, or users are just screaming bloody murder about this bug. There is a <em>crude workaround<\/em> that takes all the asynchronous fun out of speech, but also prevents the crash. By explicitly waiting for the Speech Manager to be done with any previous speech, I can prevent it from maiming itself:\n<\/p>\n<div class=\"caption\">\n<pre>\n#import <carbon \/Carbon.h><br \/>\nmain()\n{\n        Str255 string1 = &quot;\\\\pHello&quot;;\n        Str255 string2 = &quot;\\\\pHello Again&quot;;<br \/>\n        while (SpeechBusy()) { ; }\n        SpeakString(string1);\n        while (SpeechBusy()) { ; }\n        SpeakString(string2);\n        while (SpeechBusy()) { ; }\n}\n<\/carbon><\/pre>\n<\/div>\n<p>\nThis also &#8220;works&#8221; in FlexTime, for some definition of &#8220;working.&#8221; But it can cause hideous stalls in the playback UI, since I&#8217;m blocking there for an indeterminate length of time. Passable in a beta release, but not acceptable for a finished product.\n<\/p>\n<p>\nSigh. I&#8217;m going to have to do real work. But you don&#8217;t have to. <a href=\"http:\/\/www.red-sweater.com\/blog\/downloads\/RSSafeSpeaker.zip\">RSSafeSpeaker<\/a> is a simple singleton class designed to make worry-free overlapping speech easy for the Cocoa programmer. Instead of trying to manage a number of open speech channels, this class takes the approach that it&#8217;s &#8220;good enough&#8221; to just allocate and deallocate a channel for every speech made. Obviously for some purposes this will not be suitable, and you&#8217;ll want to manage a pool of open channels. For the &#8220;everyday, get this done easily&#8221; use though, I hope you&#8217;ll find this class handy. Rewriting our previous example using RSSafeSpeaker:\n<\/p>\n<div class=\"caption\">\n<pre>\nNSString* string1 = @&quot;Hello&quot;;\nNSString* string2 = @&quot;Hello Again&quot;;\n<br \/>\n[[RSSafeSpeaker sharedInstance] speakString:string1];\n[[RSSafeSpeaker sharedInstance] speakString:string2];\n<\/pre>\n<\/div>\n<p>\nNo crashes! And I get to use NSString. Everything is better. This is a good example of a situation where the shortcomings of Apple&#8217;s API caused me grief and made me go to a lot of extra work. But it&#8217;s also an example of such a situation where the extra work won&#8217;t be for naught. It&#8217;s a <em>good idea<\/em> for me to use the &#8220;deeper&#8221; speech APIs, because it&#8217;s inevitable that I&#8217;ll want to have finer control over the playback effects in my application. It was just a lot easier to choose &#8220;SpeakString&#8221; as the quickest solution. If anything else persnickety comes up, I&#8217;ll be in an excellent position to respond quickly and effectively. All in all, time well spent!\n<\/p>\n<\/p>\n<p>\nOh, and in case anybody was worried, I <em>did<\/em> report the crashing bug to Apple (<a href=\"rdar:\/\/problem\/4633582\">rdar:\/\/problem\/4633582<\/a>).<\/p>\n<p>\n<strong>Update:<\/strong> Oh man, don&#8217;t I feel like a dork!  I somehow missed the presence of NSSpeechSynthesizer, altogether.  Thanks to Jim Correia for pointing it out to me via email. It <em>does<\/em> seem to work, and doesn&#8217;t crash.  Of course, now that I&#8217;ve got the infrastructure in place, I might as well keep using it, since it will ultimately give me more control over the playback options.  But NSSpeechSynthesizer does seem a better choice for most purposes.<\/p>\n<p>It looks like each NSSpeechSynthesizer corresponds with a &#8220;speech channel,&#8221; so if you actually want to overlap voices (instead of just causing the previous speech to be canceled), you&#8217;d need to allocate multiple speech synths (similarly to how my RSSafeSpeaker allocates a speech channel for each request).<\/p>\n<p>Thanks again to Jim for sharing this! I am embarrassed to have overlooked it&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Andy Lee sent me a bunch of excellent feedback about FlexTime, and let me know about a strange, 100% reproducible crashing bug. If you configure FlexTime such that both the ending cue of one activity and the starting cue of the one that follows are &#8220;Speak Text&#8221; cues, then the application crashes. First thought: damn [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14,31,11,15],"tags":[],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-apple","category-carbon","category-cocoa","category-programming"],"_links":{"self":[{"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/comments?post=160"}],"version-history":[{"count":0,"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/posts\/160\/revisions"}],"wp:attachment":[{"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/media?parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/categories?post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redsweater.com\/blog\/wp-json\/wp\/v2\/tags?post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}