Voice Chat System

From Multiverse

Jump to: navigation, search

Contents

Overview

The Multiverse Voice System provides a highly scalable architecture for managing voice traffic for virtual worlds.

The Multiverse voice chat system supports:

  • Positional voice for large groups of users.
  • Non-positional voice ideal for raid groups.
  • Non-positional "broadcast" for presentations where a small number of speakers can be heard by a large number of listeners.

The Multiverse Voice Server is a server plug-in that can handle hundreds of users per server CPU.

The Multiverse voice system uses the Speex codec to encode speech. The Speex system provides for a wide range of quality levels and corresponding bit-rates and has excellent voice activity detection, so voice data flows to the server only when the user is actually speaking.

The speaker's client is entirely responsible for generating the encoded form of the speech, and any client can decode voice data from any other client, regardless of the quality level.

Both client and server voice chat features are layered, with the lowest layers being very general, and higher levels making more specific implementation choices. In most cases, you will want to use the high-level interfaces, but if you need more customization, the low-level interfaces are available.

The Client voice manager

The Sampleworld asset repository includes a voice chat configuration dialog box:

Image:Config dialog.jpg

You can use this dialog box "as is" to enable basic voice chat configuration, or customize it for your world.

With the default Sampleworld key bindings, users display the voice chat dialog box by typing ctrl-V; then they use the controls in the dialog box to:

  • Enable or disable voice chat.
  • Select the microphone and playback devices.
  • Set input and playback volume levels.
  • Set "push-to-talk," which requires the user to press a specific key to talk, useful to eliminate background noise interference.
  • Turn on "test mode" to perform loopback voice testing of volume levels.

See Configuring Voice Chat for more details.

Client voice API

At the lowest level, the voice client provides client API functions to control the underlying Voice Manager. These functions include:

  • Starting, stopping, and on-the-fly reconfiguration of the VoiceManager.
  • Getting the list of microphone and playback devices and getting and setting their volumes.
  • Blacklisting speakers and determining if a speaker is blacklisted.
  • Turning push-to-talk on and off; and getting a list of recent speakers.

The Voice Manager accepts a large number of parameters that control how speech is encoded and decoded. In most cases, you can use the default values. You can reconfigure the Voice Manager on-the-fly with minimal interruption in voice traffic. Connecting to the Voice Server Plug-in

Each client using voice chat maintains a connection directly to a voice server plug-in, instead of going through the proxy server; this minimizes delays. The client starts by sending an Extension Message to the proxy asking for the IP address and port number of the voice plug-in that will service that client. The client then connects to that voice plug-in, sending in the OID of the user and the OID of the voice group the client wishes to join.

The Multiverse voice system performs mixing on the client, not the server, because most of the time there are a small number of voice group members speaking (typically just one), and because mixing on the server would adversely affect scalability. So if several members of the group are speaking (in the case of positional voice, within range of the user), several voice streams are sent to the client.

Server voice API

You can customize the voice server plug-in by extending one of the voice group classes. You create voice groups as required, and convey their group OIDs to clients so the clients can join the world groups. You must also perform access control to allow into a group only those clients that are supposed to be in it. At the lowest level, the Multiverse voice server supports a very general interface called VoiceGroup, giving you wide discretion in mapping speakers to listeners. The BasicVoiceGroup class is a basic implementation of the interface. Two subclasses of BasicVoiceGroup provide the general functionality sufficient for most worlds:

  • NonpositionalVoiceGroup, for a non-positional voice group, in which sound volume is independent of the position of the users. At any instant all users hear the same collection of speakers. Use non-positional voice groups for presentations, raid groups, guild chat, and so on.
  • PositionalVoiceGroup, for a positional voice group, in which group membership for a given user is based on the other users nearby, and sound volume falls off with distance. For positional voice groups, hearing is not necessarily symmetric, in that it's possible for a speaker you can't hear to hear you. For example, use a positional voice group for chatting to nearby players on a dance floor.
Personal tools