FMUSER Wirless Transmit Video And Audio More Easier !
es.fmuser.org
it.fmuser.org
fr.fmuser.org
de.fmuser.org
af.fmuser.org ->Afrikaans
sq.fmuser.org ->Albanian
ar.fmuser.org ->Arabic
hy.fmuser.org ->Armenian
az.fmuser.org ->Azerbaijani
eu.fmuser.org ->Basque
be.fmuser.org ->Belarusian
bg.fmuser.org ->Bulgarian
ca.fmuser.org ->Catalan
zh-CN.fmuser.org ->Chinese (Simplified)
zh-TW.fmuser.org ->Chinese (Traditional)
hr.fmuser.org ->Croatian
cs.fmuser.org ->Czech
da.fmuser.org ->Danish
nl.fmuser.org ->Dutch
et.fmuser.org ->Estonian
tl.fmuser.org ->Filipino
fi.fmuser.org ->Finnish
fr.fmuser.org ->French
gl.fmuser.org ->Galician
ka.fmuser.org ->Georgian
de.fmuser.org ->German
el.fmuser.org ->Greek
ht.fmuser.org ->Haitian Creole
iw.fmuser.org ->Hebrew
hi.fmuser.org ->Hindi
hu.fmuser.org ->Hungarian
is.fmuser.org ->Icelandic
id.fmuser.org ->Indonesian
ga.fmuser.org ->Irish
it.fmuser.org ->Italian
ja.fmuser.org ->Japanese
ko.fmuser.org ->Korean
lv.fmuser.org ->Latvian
lt.fmuser.org ->Lithuanian
mk.fmuser.org ->Macedonian
ms.fmuser.org ->Malay
mt.fmuser.org ->Maltese
no.fmuser.org ->Norwegian
fa.fmuser.org ->Persian
pl.fmuser.org ->Polish
pt.fmuser.org ->Portuguese
ro.fmuser.org ->Romanian
ru.fmuser.org ->Russian
sr.fmuser.org ->Serbian
sk.fmuser.org ->Slovak
sl.fmuser.org ->Slovenian
es.fmuser.org ->Spanish
sw.fmuser.org ->Swahili
sv.fmuser.org ->Swedish
th.fmuser.org ->Thai
tr.fmuser.org ->Turkish
uk.fmuser.org ->Ukrainian
ur.fmuser.org ->Urdu
vi.fmuser.org ->Vietnamese
cy.fmuser.org ->Welsh
yi.fmuser.org ->Yiddish
When we use tools like Skype and QQ to smoothly conduct voice and video chats with friends, have we ever wondered what powerful technologies are behind it? This article will give a brief introduction to the technologies used in network voice calls, which can be regarded as a glimpse of the leopard.
1. Conceptual model
Internet voice calls are usually two-way, which is symmetrical at the model level. For the sake of simplicity, we can discuss the channel in one direction. One party speaks and the other party hears the voice. It seems simple and fast, but the process behind it is quite complicated.
This is the most basic model consisting of five important links: acquisition, encoding, transmission, decoding, and playback.
(1) Voice collection
Voice collection refers to the collection of audio data from a microphone, that is, the conversion of sound samples into digital signals. It involves several important parameters: sampling frequency, number of sampling bits, and number of channels.
To put it simply: the sampling frequency is the number of acquisition actions in 1 second; the number of sampling bits is the length of the data obtained for each acquisition action.
The size of an audio frame is equal to: (sampling frequency × number of sampling bits × number of channels × time)
Usually, the duration of a sampling frame is 10ms, that is, every 10ms of data constitutes an audio frame. Assuming: the sampling rate is 16k, the number of sampling bits is 16bit, and the number of channels is 1, then the size of a 10ms audio frame is: (16000*16*1*0.01)/8 = 320 bytes. In the calculation formula, 0.01 is a second, that is, 10ms.
(2) Coding
Assuming that we send the collected audio frame directly without encoding, then we can calculate the required bandwidth requirement. Still the above example: 320*100 = 32KBytes/s, if converted to bits/s, it is 256kb/ s. This is a lot of bandwidth usage. With network traffic monitoring tools, we can find that when voice calls are made with IM software like QQ, the traffic is 3-5KB/s, which is an order of magnitude smaller than the original traffic. This is mainly due to audio coding technology. Therefore, in the actual voice call application, this link of coding is indispensable. There are many commonly used speech coding technologies, such as G.729, iLBC, AAC, SPEEX and so on.
(3) Network transmission
When an audio frame is encoded, it can be sent to the caller via the network. For Realtime applications such as voice conversations, low latency and stability are very important, which requires our network to transmit very smoothly.
(4) Decoding
When the other party receives the encoded frame, it will decode it to restore it to data that can be played directly by the sound card.
(5) Voice playback
After the decoding is completed, the obtained audio frame can be submitted to the sound card for playback. Attachment: You can refer to the introduction and demo source code and SDK download of MPlayer, a voice playback component
2. Difficulties and solutions in practical applications
If only relying on the above-mentioned technology can realize a sound dialogue system applied to the wide area network, then there is not much need to write this article. It is precisely that many realistic factors have introduced many challenges for the above-mentioned conceptual model, which makes the realization of the network voice system not so simple, which involves many professional technologies. Of course, most of these challenges already have mature solutions. First of all, we need to define a "good effect" voice dialogue system. I think it should achieve the following points:
(1) Low latency. Only with low latency can the two parties on the call have a strong sense of Realtime. Of course, this mainly depends on the speed of the network and the distance between the physical locations of the two parties on the call. From the perspective of pure software, the possibility of optimization is very small.
(2) Low background noise.
(3) The sound is smooth, without the feeling of stuck or pause.
(4) There is no response.
Below we will talk about the additional technologies used in the actual network voice dialogue system one by one.
1. Echo cancellation AEC Almost everyone is now accustomed to directly using the PC or notebook voice playback function during voice chat. As everyone knows, this little habit has posed a big challenge for voice technology. When using the loudspeaker function, the sound played by the speaker will be collected by the microphone again and transmitted back to the other party, so that the other party can hear their own echo. Therefore, in practical applications, the function of echo cancellation is necessary. After the collected audio frame is obtained, this gap before encoding is the time for the echo cancellation module to work. The principle is simply that the echo cancellation module performs some cancellation-like operations in the collected audio frame according to the audio frame just played, so as to remove the echo from the collected frame. This process is quite complicated, and it is also related to the size of the room you are in when you are chatting, and your location in the room, because this information determines the length of the sound wave reflection. The intelligent echo cancellation module can dynamically adjust the internal parameters to best adapt to the current environment.
2. Noise suppression DENOISE Noise suppression, also known as noise reduction processing, is based on the characteristics of voice data to identify the part of background noise and filter it out of audio frames. Many encoders have this feature built in.
3. JitterBuffer The jitter buffer is used to solve the problem of network jitter. The so-called network jitter means that the network delay will be larger and smaller. In this case, even if the sender sends data packets regularly (for example, a packet is sent every 100ms), the receiver cannot receive the same timing. Sometimes No packet can be received in a cycle, and sometimes several packets are received in a cycle. In this way, the sound that the receiver hears is one card one card. JitterBuffer works after the decoder and before the voice playback. That is, after the speech decoding is completed, the decoded frame is put into the JitterBuffer, and when the playback callback of the sound card arrives, the oldest frame is retrieved from the JitterBuffer for playback. The buffer depth of JitterBuffer depends on the degree of network jitter. The greater the network jitter, the greater the buffer depth and the greater the delay in playing audio. Therefore, JitterBuffer uses a higher delay in exchange for smooth sound playback, because compared to the sound one card one card, a slightly larger delay but a smoother effect, its subjective experience is better. Of course, the buffer depth of JitterBuffer is not constant, but dynamically adjusted according to changes in the degree of network jitter. When the network is restored to be very smooth and unobstructed, the buffer depth will be very small, so that the increase in playback delay due to JitterBuffer will be negligible.
4. Mute detection VAD In a voice conversation, if one party is not speaking, no traffic will be generated. Mute detection is used for this purpose. Mute detection is usually also integrated in the encoding module. The silent detection algorithm combined with the previous noise suppression algorithm can identify whether there is voice input currently. If there is no voice input, it can encode and output a special coded frame (for example, the length is 0). Especially in a multi-person video conference, usually only one person is speaking. In this case, the use of silent detection technology to save bandwidth is still very considerable.
5. Mixing algorithm In a multi-person voice chat, we need to play voice data from multiple people at the same time, and the sound card plays only one buffer. Therefore, we need to mix multiple voices into one. This is what the mixing algorithm does. Even if you can find a way to bypass the mixing and let multiple sounds play at the same time, then for the purpose of echo cancellation, it must be mixed into one playback, otherwise, echo cancellation can only eliminate some of the multiple sounds at most. All the way. Mixing can be done on the client side or on the server side (which can save downstream bandwidth). If P2P channels are used, then mixing can only be done on the client side. If it is mixing on the client, usually, mixing is the last link before playing. This article is a rough summary of our experience in implementing the voice part of OMCS. Here, we just made a simple description of each link in the figure, and any one of them can be written into a long paper or even a book. Therefore, this article is just to provide an introductory map for those who are new to network voice system development, and give some clues.
|
Enter email to get a surprise
es.fmuser.org
it.fmuser.org
fr.fmuser.org
de.fmuser.org
af.fmuser.org ->Afrikaans
sq.fmuser.org ->Albanian
ar.fmuser.org ->Arabic
hy.fmuser.org ->Armenian
az.fmuser.org ->Azerbaijani
eu.fmuser.org ->Basque
be.fmuser.org ->Belarusian
bg.fmuser.org ->Bulgarian
ca.fmuser.org ->Catalan
zh-CN.fmuser.org ->Chinese (Simplified)
zh-TW.fmuser.org ->Chinese (Traditional)
hr.fmuser.org ->Croatian
cs.fmuser.org ->Czech
da.fmuser.org ->Danish
nl.fmuser.org ->Dutch
et.fmuser.org ->Estonian
tl.fmuser.org ->Filipino
fi.fmuser.org ->Finnish
fr.fmuser.org ->French
gl.fmuser.org ->Galician
ka.fmuser.org ->Georgian
de.fmuser.org ->German
el.fmuser.org ->Greek
ht.fmuser.org ->Haitian Creole
iw.fmuser.org ->Hebrew
hi.fmuser.org ->Hindi
hu.fmuser.org ->Hungarian
is.fmuser.org ->Icelandic
id.fmuser.org ->Indonesian
ga.fmuser.org ->Irish
it.fmuser.org ->Italian
ja.fmuser.org ->Japanese
ko.fmuser.org ->Korean
lv.fmuser.org ->Latvian
lt.fmuser.org ->Lithuanian
mk.fmuser.org ->Macedonian
ms.fmuser.org ->Malay
mt.fmuser.org ->Maltese
no.fmuser.org ->Norwegian
fa.fmuser.org ->Persian
pl.fmuser.org ->Polish
pt.fmuser.org ->Portuguese
ro.fmuser.org ->Romanian
ru.fmuser.org ->Russian
sr.fmuser.org ->Serbian
sk.fmuser.org ->Slovak
sl.fmuser.org ->Slovenian
es.fmuser.org ->Spanish
sw.fmuser.org ->Swahili
sv.fmuser.org ->Swedish
th.fmuser.org ->Thai
tr.fmuser.org ->Turkish
uk.fmuser.org ->Ukrainian
ur.fmuser.org ->Urdu
vi.fmuser.org ->Vietnamese
cy.fmuser.org ->Welsh
yi.fmuser.org ->Yiddish
FMUSER Wirless Transmit Video And Audio More Easier !
Contact
Address:
No.305 Room HuiLan Building No.273 Huanpu Road Guangzhou China 510620
Categories
Newsletter