Alright, rather than waiting for a fix I did it myself...
All you need to do is add an extra condition to check if it's the client (or not in which case it's the agent) to the following...
if(!this.wt){window.focus();}
So it becomes:
if((!this.wt) && (window.location.href.indexOf('client') >= 0)){window.focus();}
Now the agents need to rely on sound only and check chat windows, but can actually work efficiently. Better than loosing focus in the middle of typing. I'll also post this into Tips & Tricks.