FMUSER Wirless Transmit Video And Audio More Easier !
es.fmuser.org
it.fmuser.org
fr.fmuser.org
de.fmuser.org
af.fmuser.org ->Afrikaans
sq.fmuser.org ->Albanian
ar.fmuser.org ->Arabic
hy.fmuser.org ->Armenian
az.fmuser.org ->Azerbaijani
eu.fmuser.org ->Basque
be.fmuser.org ->Belarusian
bg.fmuser.org ->Bulgarian
ca.fmuser.org ->Catalan
zh-CN.fmuser.org ->Chinese (Simplified)
zh-TW.fmuser.org ->Chinese (Traditional)
hr.fmuser.org ->Croatian
cs.fmuser.org ->Czech
da.fmuser.org ->Danish
nl.fmuser.org ->Dutch
et.fmuser.org ->Estonian
tl.fmuser.org ->Filipino
fi.fmuser.org ->Finnish
fr.fmuser.org ->French
gl.fmuser.org ->Galician
ka.fmuser.org ->Georgian
de.fmuser.org ->German
el.fmuser.org ->Greek
ht.fmuser.org ->Haitian Creole
iw.fmuser.org ->Hebrew
hi.fmuser.org ->Hindi
hu.fmuser.org ->Hungarian
is.fmuser.org ->Icelandic
id.fmuser.org ->Indonesian
ga.fmuser.org ->Irish
it.fmuser.org ->Italian
ja.fmuser.org ->Japanese
ko.fmuser.org ->Korean
lv.fmuser.org ->Latvian
lt.fmuser.org ->Lithuanian
mk.fmuser.org ->Macedonian
ms.fmuser.org ->Malay
mt.fmuser.org ->Maltese
no.fmuser.org ->Norwegian
fa.fmuser.org ->Persian
pl.fmuser.org ->Polish
pt.fmuser.org ->Portuguese
ro.fmuser.org ->Romanian
ru.fmuser.org ->Russian
sr.fmuser.org ->Serbian
sk.fmuser.org ->Slovak
sl.fmuser.org ->Slovenian
es.fmuser.org ->Spanish
sw.fmuser.org ->Swahili
sv.fmuser.org ->Swedish
th.fmuser.org ->Thai
tr.fmuser.org ->Turkish
uk.fmuser.org ->Ukrainian
ur.fmuser.org ->Urdu
vi.fmuser.org ->Vietnamese
cy.fmuser.org ->Welsh
yi.fmuser.org ->Yiddish
3. collection
Acquisition mainly includes two aspects: video acquisition and audio acquisition. The video is collected by the camera, which involves the relevant operation of the camera and the parameter setting of the camera. Due to the differences in the cameras of various mobile phone manufacturers, there are some pitfalls in this regard, which will be described in the article about the camera. Audio is collected through a microphone. The microphones of different mobile phones support different audio sampling rates, and sometimes the audio needs to be echo canceled in order to support the microphone function.
Key points of video capture technology:
Check whether the camera can be used;
The image captured by the camera is horizontal, and the captured image needs to be rotated to a certain extent before being displayed;
There are a series of image sizes to choose from when the camera captures. When the captured image size is inconsistent with the screen size of the mobile phone, special processing is required;
The Android phone camera has a series of states, and the corresponding operation of the camera needs to be in the correct state;
Many parameters of the Android phone camera have compatibility issues, and these compatibility issues need to be better dealt with.
Key points of audio capture technology:
Check whether the microphone can be used;
Need to detect the mobile phone's support for a certain audio sampling rate;
In some cases, it is necessary to perform echo cancellation processing on the audio;
Set the correct buffer size during audio capture.
Note: There will be a special article about the collection later
4. processing
Video processing
Beauty is now almost a standard configuration of mobile phone live broadcast software. After beautification, the host has a higher appearance and is more attractive to fans. There are also some Android live broadcast applications that can recognize the host’s face and add fun animations. Special effects, sometimes we also need to add a watermark to the video.
In fact, beautifying video and adding special effects are processed through OpenGL. There is GLSurfaceView in Android, which is similar to SurfaceView, but it can be rendered with Renderer. Texture can be generated through OpenGL, SurfaceTexture can be generated through the texture Id, and SurfaceTexture can be handed over to Camera, and finally the camera preview screen and OpenGL are connected through the texture, so that a series of operations can be performed through OpenGL.
The whole process of beautification is nothing more than generating a new texture through the FBO technology in OpenGL based on the texture previewed by the Camera, and then using the new texture to draw on the onDrawFrame() in the Renderer. Adding a watermark is to first convert a picture into a texture, and then use OpenGL for drawing. Adding dynamic pendant special effects is more complicated. First, it is necessary to perform algorithmic analysis to identify the corresponding parts of the human face based on the current preview image, and then draw corresponding images on each corresponding part. The realization of the whole process is somewhat difficult.
The following figure is a flowchart of the entire beauty process:
Beauty process
The picture below shows the beauty and animation effects very well.
Beauty
Animation effects and watermarks
Note: There will be a special article about OpenGL and the realization of the whole process.
Audio processing
In some cases, the host needs to add some additional sounds to increase the live broadcast atmosphere, such as applause and so on. One way to deal with it is to play the additional sound directly, so that the microphone will collect it and then record it together, but this kind of processing will not work when the anchor wears headphones or needs to perform echo cancellation processing on the sound. Since the corresponding function has not been added to our project, there is no relevant experience to share for the time being, we may add this function later, and then share it with you.
5. coding
Through the camera and microphone, we can collect the corresponding video and audio data, but these are raw data in a fixed format. Generally speaking, the camera collects one frame by frame, and the microphone collects PCM audio data. If these data are sent directly, the amount of data is often very large, resulting in a large waste of bandwidth, so it is often necessary to encode video and audio before sending.
Video encoding
1. Predictive coding
As we all know, an image is composed of many so-called pixels. A large number of statistics show that there is a strong correlation between pixels in the same image. The shorter the distance between two pixels, the stronger the correlation. In layman's terms, the closer the values of the two pixels are. Therefore, people can use this correlation between pixels to perform compression coding. This compression method is called intra-frame prediction coding. Not only that, the correlation between adjacent frames is generally stronger than the correlation between pixels within a frame, and the compression ratio is also greater. It can be seen that by using the correlation between pixels (intra-frame) and the correlation between frames, that is, finding the corresponding reference pixel or reference frame as the predicted value, video compression coding can be realized.
2. Transform coding
A large number of statistics show that the video signal contains the most energy-intensive DC and low-frequency components, that is, the flat part of the image, and a small amount of high-frequency components, that is, the details of the image. Therefore, another method can be used for video encoding. After the image undergoes a certain mathematical transformation, the image in the transformed domain is obtained (as shown in the figure), where u and v are the spatial frequency coordinates respectively.
Transform coding
3. Waveform-based coding
Waveform-based coding uses a block-based hybrid coding method that combines predictive coding and transform coding. In order to reduce the coding complexity and make the video coding operation easier to perform, when using the hybrid coding method, first divide an image into blocks of fixed size, such as block 8×8 (that is, 8 rows per block, 8 pixels per row), Block 16×16 (16 lines per block, 16 pixels per line) and so on, and then compress and encode the block.
Since ITU-T released the first digital video coding standard-H.261 in 1989, it has successively released video coding standards such as H.263 and multimedia terminal standards such as H.320 and H.323. The Moving Picture Experts Group (MPEG) under ISO has defined MPEG-1, MPEG-2, MPEG-4 and other entertainment and digital TV compression coding international standards.
In March 2003, ITU-T promulgated the H.264 video coding standard. It not only makes video compression significantly improved compared with previous standards, but also has good network affinity, especially for IP Internet, wireless mobile network and other network video transmission performance that is easy to error, easy to block, and not easy to guarantee QoS. . All of these video coding uses block-based hybrid coding, which are all waveform-based coding.
4. Content-based coding
There is also a content-based encoding technology, where the video frame is first divided into regions corresponding to different objects, and then encoded. Specifically, it encodes the shape, motion, and texture of different objects. In the simplest case, a two-dimensional outline is used to describe the shape of an object, a motion vector is used to describe its motion state, and a texture is described by a color waveform.
When the types of objects in the video sequence are known, knowledge-based or model-based coding can be used. For example, for human faces, some predefined wireframes have been developed to encode the features of the face. At this time, the coding efficiency is very high, and only a few bits are needed to describe its features. For facial expressions (such as angry, happy, etc.), possible behaviors can be coded by semantics. Since the number of possible behaviors of an object is very small, very high coding efficiency can be obtained.
The coding method adopted by MPEG-4 is both block-based hybrid coding and content-based coding method.
5. Soft and hard knitting
There are two ways to implement video coding on the Android platform, one is soft coding and the other is hard coding. For soft editing, it often relies on the cpu and uses the computing power of the cpu to perform coding. For example, we can download the x264 encoding library, write the relevant jni interface, and then pass in the corresponding image data. After processing by the x264 library, the original image is converted into a video in h264 format.
The hard code uses the MediaCodec provided by Android itself. To use MediaCodec, you need to pass in the corresponding data. These data can be yuv image information or a Surface. Surface is generally recommended, which is more efficient. Surface directly uses local video data buffers without mapping or copying them to ByteBuffers; therefore, this approach will be more efficient. When using Surface, you can usually not directly access the original video data, but you can use the ImageReader class to access unreliable decoded (or original) video frames. This may still be more efficient than using ByteBuffers, because some local buffers can be mapped to direct ByteBuffers. When using ByteBuffer mode, you can use the Image class and getInput/OutputImage(int) methods to access the original video data frame.
Note: The following article will specifically describe how to perform video encoding
Audio coding
AudioRecord can be used in Android to record sound, and the recorded sound is PCM sound. If you want to express the sound in computer language, you must digitize the sound. The most common way to digitize sound is through Pulse Code Modulation (PCM). The sound passes through the microphone and is converted into a series of signals of voltage changes. The way to convert such a signal into PCM format is to use three parameters to represent the sound. They are: the number of channels, the number of sampling bits, and the sampling frequency.
1. Sampling frequency
That is, the sampling frequency, which refers to the number of times a sound sample is obtained per second. The higher the sampling frequency, the better the sound quality and the more realistic the sound reproduction, but at the same time it occupies more resources. Due to the limited resolution of the human ear, too high a frequency cannot be distinguished. There are 22KHz, 44KHz and other levels in 16-bit sound cards. Among them, 22KHz is equivalent to the sound quality of ordinary FM broadcasting, and 44KHz is equivalent to the sound quality of CD. The current common sampling frequency does not exceed 48KHz.
2. Number of sampling bits
That is, the sampling value or sampling value (that is, the amplitude of the sampling sample is quantized). It is a parameter used to measure the fluctuation of the sound, and it can also be said to be the resolution of the sound card. The larger its value, the higher the resolution and the stronger the sound power.
In the computer, the number of sampling bits is generally 8 bits and 16 bits, but please note that 8 bits does not mean dividing the ordinate into 8 parts, but divided into 2 to the 8th power, which is 256 parts; the same is true for 16 bits. It divides the ordinate into 2 to the 16th power of 65,536.
3. Number of channels
It is easy to understand that there are monophonic and stereophonic. Monophonic sound can only be produced by one speaker (some are also processed into two speakers to output the same channel sound), and stereo pcm can make two speakers Both sound (generally there is a division of labor between the left and right channels), so you can feel the spatial effect more.
So, now we can get the formula for the capacity of the pcm file:
Storage capacity = (sampling frequency ✖️ number of sampling bits ✖️ channel ✖️ time) ➗ 8 (unit: number of bytes)
If the audio is all transmitted in the PCM format, the occupied bandwidth is relatively large, so the audio needs to be encoded before transmission.
There are already some widely used sound formats, such as wav, MIDI, MP3, WMA, AAC, Ogg, etc. Compared with the pcm format, these formats compress the sound data, which can reduce the transmission bandwidth.
The audio coding can also be divided into two types: soft coding and hard coding. For soft editing, download the corresponding coding library, write the corresponding jni, and then pass in the data for coding. The hard code uses the MediaCodec provided by Android itself.
Note: The following article will specifically describe how to perform audio encoding
6, packaging
The video and audio need to define the corresponding format during the transmission process, so that it can be correctly parsed when it is transmitted to the opposite end.
1. HTTP-FLV
In the Web 2.0 era, the most popular types of websites are naturally Youtube from abroad, Youku and Tudou websites in China. The video content provided by such sites can be said to have their own merits, but they all use Flash as a video playback carrier without exception. The technical basis supporting these video sites is Flash Video (FLV). FLV is a brand new streaming media video format, which utilizes the widely used Flash Player platform on web pages to integrate video into Flash animation. In other words, as long as visitors to the website can watch Flash animations, they can naturally watch FLV format videos without the need to install additional video plug-ins. The use of FLV videos brings great convenience to video dissemination.
HTTP-FLV encapsulates audio and video data into FLV, and then transmits it to the client through the HTTP protocol. As the uploader, only the video and audio in FLV format need to be transmitted to the server.
Generally speaking, the video and audio in the FLV format generally use the h264 format for the video, and the audio generally uses the AAC-LC format.
The FLV format is to first transmit the FLV header information, then transmit the metadata with the video and audio parameters (Metadata), then transmit the video and audio parameter information, and then transmit the video and audio data.
Note: The following article will describe FLV in detail
2. RTMP
RTMP is the acronym for Real Time Messaging Protocol. The protocol is based on TCP and is a protocol cluster, including RTMP basic protocol and RTMPT/RTMPS/RTMPE and many other variants. RTMP is a network protocol designed for real-time data communication. It is mainly used for audio, video and data communication between the Flash/AIR platform and a streaming media/interactive server that supports the RTMP protocol.
The RTMP protocol is a real-time transmission protocol launched by Adobe, which is mainly used for real-time transmission of audio and video streams based on the flv format. After getting the encoded video and audio data, FLV packaging is required first, and then packaged into rtmp format, and then transmitted.
To use RTMP format for transmission, you need to connect to the server first, then create a stream, then publish the stream, and then transmit the corresponding video and audio data. The entire transmission is defined by messages, rtmp defines various forms of messages, and in order to send the messages well, the messages are divided into blocks, which makes the entire protocol more complicated.
Note:aLater articles will describe RTMP in detail
There are also several other forms of protocols, such as RTP, etc. The general principles are similar, so I will not explain them one by one.
7. poor network processing
The video and audio can be sent in time under a good network, without causing the accumulation of video and audio data locally, the live broadcast effect is smooth, and the delay is small. In a bad network environment, if the audio and video data cannot be sent out, we need to process the audio and video data. There are generally four processing methods for video and audio data in a poor network environment: buffer design, network detection, frame loss processing, and bit rate reduction processing.
1. Buffer design
Video and audio data is transferred to the buffer, and the sender gets the data from the buffer and sends it, thus forming an asynchronous producer-consumer mode. The producer only needs to push the collected and encoded video and audio data to the buffer, and the consumer is responsible for taking out the data from the buffer and sending it.
Video and audio buffer
Only the video frame is shown in the figure above, and obviously there are corresponding audio frames inside. To build an asynchronous producer-consumer model, Java has provided a good class. Since frame loss, insertion, removal, etc. need to be processed later, it is obvious that LinkedBlockingQueue is a very good choice.
2. Network detection
An important process in the process of poor network processing is network detection. When the network becomes poor, it can be quickly detected and then processed accordingly. This will make the network response more sensitive and the effect will be much better.
We calculate the data in the input buffer per second and the data sent out in real time. If the data sent out is smaller than the data in the input buffer, then the network bandwidth is not good. At this time, the data in the buffer will continue to increase. Activate the corresponding mechanism.
3. Drop frame processing
When network degradation is detected, frame loss is a good response mechanism. After the video is encoded, there are key frames and non-key frames. The key frame is a complete picture, and the non-key frame describes the relative change of the image.
The frame dropping strategy can be defined by itself. One thing to note is: if you want to drop P frames (non-key frames), you need to drop all non-key frames between the two key frames, otherwise mosaics will occur. The design of the frame loss strategy varies depending on the needs, and you can design it yourself.
4. Code reduction rate
In Android, if hard coding is used for encoding, in a poor network environment, we can change the bit rate of hard coding in real time to make the live broadcast smoother. When it is detected that the network environment is poor, we can also reduce the video and audio bit rate while dropping frames. When the Android sdk version is greater than or equal to 19, you can pass parameters to MediaCodec to change the bit rate of the data from the hard-coded encoder.
Bundle bitrate = new Bundle();bitrate.putInt(MediaCodec.PARAMETER_KEY_VIDEO_BITRATE, bps * 1024);
mMediaCodec.setParameters(bitrate);
8. send
After various processing, the data needs to be sent out finally, this step is relatively simple. Whether it is HTTP-FLV or RTMP, we use TCP to establish a connection. Before the live broadcast, you need to connect to the server through the Socket to verify whether you can connect to the server. After the connection, use this Socket to send data to the server, and close the Socket after the data is sent.
|
Enter email to get a surprise
es.fmuser.org
it.fmuser.org
fr.fmuser.org
de.fmuser.org
af.fmuser.org ->Afrikaans
sq.fmuser.org ->Albanian
ar.fmuser.org ->Arabic
hy.fmuser.org ->Armenian
az.fmuser.org ->Azerbaijani
eu.fmuser.org ->Basque
be.fmuser.org ->Belarusian
bg.fmuser.org ->Bulgarian
ca.fmuser.org ->Catalan
zh-CN.fmuser.org ->Chinese (Simplified)
zh-TW.fmuser.org ->Chinese (Traditional)
hr.fmuser.org ->Croatian
cs.fmuser.org ->Czech
da.fmuser.org ->Danish
nl.fmuser.org ->Dutch
et.fmuser.org ->Estonian
tl.fmuser.org ->Filipino
fi.fmuser.org ->Finnish
fr.fmuser.org ->French
gl.fmuser.org ->Galician
ka.fmuser.org ->Georgian
de.fmuser.org ->German
el.fmuser.org ->Greek
ht.fmuser.org ->Haitian Creole
iw.fmuser.org ->Hebrew
hi.fmuser.org ->Hindi
hu.fmuser.org ->Hungarian
is.fmuser.org ->Icelandic
id.fmuser.org ->Indonesian
ga.fmuser.org ->Irish
it.fmuser.org ->Italian
ja.fmuser.org ->Japanese
ko.fmuser.org ->Korean
lv.fmuser.org ->Latvian
lt.fmuser.org ->Lithuanian
mk.fmuser.org ->Macedonian
ms.fmuser.org ->Malay
mt.fmuser.org ->Maltese
no.fmuser.org ->Norwegian
fa.fmuser.org ->Persian
pl.fmuser.org ->Polish
pt.fmuser.org ->Portuguese
ro.fmuser.org ->Romanian
ru.fmuser.org ->Russian
sr.fmuser.org ->Serbian
sk.fmuser.org ->Slovak
sl.fmuser.org ->Slovenian
es.fmuser.org ->Spanish
sw.fmuser.org ->Swahili
sv.fmuser.org ->Swedish
th.fmuser.org ->Thai
tr.fmuser.org ->Turkish
uk.fmuser.org ->Ukrainian
ur.fmuser.org ->Urdu
vi.fmuser.org ->Vietnamese
cy.fmuser.org ->Welsh
yi.fmuser.org ->Yiddish
FMUSER Wirless Transmit Video And Audio More Easier !
Contact
Address:
No.305 Room HuiLan Building No.273 Huanpu Road Guangzhou China 510620
Categories
Newsletter