This design uses Samsung's S3C2440, combined with Icroute's high-performance speech recognition chip LD3320, the hardware and software design of speech recognition systems. Under the embedded Linux operating system, the multi-process mechanism is used to complete the control of the voice recognition chip, ultrasonic ranging, and cloud platform, and apply speech recognition technology to a multi-angle ultrasonic ranging system. By testing, the system can control the measurement direction by identifying the speech command, without manual intervention, and finally play the measurement results through voice.
1. introduction
Language is an important means of human dissemination information, and speech recognition is a key technique for realizing voice control. Embedded speech recognition techniques have advantages of low power consumption, simple, flexible, etc., get rid of complex buttons and buttons, play an important role in serving robots, smart home and consumer electronics.
2. System composition and principle
Speech recognition mainly includes two phases: training phase and identification phase. In the training or identification process, the input speech must be extracted and feature extraction. The training phase is obtained by several training speech entered by the user. After pretreatment and feature extraction, the feature parameters are obtained, and finally the reference model library of training speech is established by feature parameters. The identification phase is a similarity measurement of the characteristic vector parameters of the input voice and the reference model library, and then the highest similarity input feature vector is output as the identification result, thereby achieving speech recognition purposes, as shown in Figure 1.
Voice recognition techniques can be divided into two types of specific person identification and non-specific person identification. The specific person identifies that the voice of the person needs to be collected, identifying the object as a special person; non-specific person identifies that the identification object is most user, usually collecting multiple people's voice for recording, training, and learning, A high recognition rate.
In practical applications, modern technology development embedded speech recognition has two implementations: transfer to embedded voice development packages and external expansion tonic identification chips. The speech recognition system scheme of this paper is to expand the specific person's speech recognition chip LD3320 in an embedded processor S3C2440, and the ultrasonic ranging module and the cloud station are combined as a mechanical actuator of the system. The system measurement process is as follows: First control the bites of the two freelance clouds according to the speech command, so that the ultrasonic detector points to a specific direction, then turn on the ultrasonic detector, measure the distance from the front obstacle, and finally convert the measurement to the binary Data stream, complete data playback through the playback function of the LD3320.
3. Hardware circuit design
The hardware circuit mainly includes a speech recognition portion, a master portion, an ultrasonic ranging section, and a servo control section, as shown in Figure 2. The processor is SS3C2440, the system is up to 533MHz, supports SPI, I2C, UART and other interfaces to meet the requirements of the control system. The master chip S3C 2440 completes the reading and writing operation of the speech recognition module via the SPI bus, the ultrasonic ranging portion, and the serve control portion are unified by the processor's GPIO.
3.1 Speech Recognition Circuit Design
In order to enable the system to identify the speech instructions emitted by the operator, the non-specific people's speech recognition chip LD3320 designed by Icroute is designed, which integrates speech recognition processing circuits and some external circuits, including AD, DA converters, microphones. Interface, sound output interface, etc., do not need to be external to any auxiliary chip such as Flash, RAM. Under the control of the main controller, you can identify what is added to the identification list. In the design, refer to the LD3320 data sheet published by Icroute. In the picture, the P0, P1, P2 pins of the LD3320 are connected to the embedded processor, and the control signals WRB, CSB, RSTB, and interrupt return signal pins INTB and processors. S3C2440 is directly connected, as shown in Figure 3.
3.2 Ultrasonic ranging and steering gear control circuit design
The principle of ultrasonic ranging is relatively mature, and the ultrasonic ranging module HC-SR04 is used in the system. This module has two TTL level communication pins, compatible with 3.3V. Among them, the control port TRIG sends a high level of 10 su, the receiving port ECHO outputs a high level signal proportional to the distance. The processor timer is turned on when the ECHO has a high level output. When the port level hop is low, turn off the timer, according to the value of the timer, the distance from the obstacle can be calculated. Among them, the control port TRIG and the receiving port ECHO are connected to the processor's GPG9, GPG6 pin.
The induction angle of the ultrasonic ranging module is less than 15 °. In order to expand the induced angle range of the ranging, the ultrasonic ranging module is mounted on the two freelance cloud, wherein the steering machine is SG90 (9G), the rotation angle is 180 °. The processor controls two steering gears via GPB0 and GPB1, respectively, to realize the rotation of the cloud to measure the obstacles in different directions, as shown in FIG.
4. Software design
The system software is based on the embedded Linux operating system, which implements tasks such as speech recognition, voice play, ultrasonic ranging, and servo control. Using the Fock mechanism to assign independent processes for each task, so that the system can make multiple task processing. The corresponding underlying driver is written for different functional modules to provide a call interface for the upper application.
The system workflow is as follows: The processor performs universal initialization of the speech recognition chip LD3320 via the SPI bus, allows the voice recognition chip to enter the cycle recognition mode, and the system processor repeatedly starts the speech recognition process. If there is an identification result, the next recognition process is then launched after the corresponding processing (such as a response is played as a response) according to the identification. The processor reads the recognition result of the C5 register via the SPI bus, and the voice command is converted into an ultrasonic ranging and a control signal of the servo, complete a multi-directional ranging task, as shown in Figure 5.
4.1 Speech recognition function programming
The characteristics of the speech recognition chip LD3320 are both functions of speech recognition and MP3 playback. When the function is switched, universal initialization must be performed, and a series of sets of chips must be performed.
The driver workflow of the speech recognition function is common initialization à speech recognition with initialization → write recognition list → start recognition → response identification interrupt. In order to improve the identification success rate, "garbage keyword" is added to absorb the identification of errors in the identification list. The upper application assigns a separate process for the speech recognition function, and controls the working state of the LD3320 through the IOCTL () function, and the READ () function can read the identification result. The unstormless access of the READ () function is implemented using the SELECT mechanism. At the same time, set SELECT monitoring timeout, after the timeout, the voice recognition chip LD3320 is reinitial, preparing for the next speech recognition, as shown in Figure 6.
4.2 Voice Playback Function Program Design
The LD3320 supports MP3 data playback, the order in the program is: General Initialization à play mode initialization à volume adjustment à start playback, and ready to interrupt response functions, open interrupt allowed bit. In the program, the voice MP3 data of numbers 0 to 9, "Ten", "Hundred", "Point" is first converted to a standard C language array format file, and the file is added to the project to carry out unified compilation. Then split the distance data that needs to be played, and check the table operation to obtain the corresponding voice data. For example, splitting distance data 12.5 is: "1", "Ten", "2", "Point", "5". Finally, the speech data obtained by the table will be combined from the left to right, and stored to the LD3320 playing data memory. When the play is over, the chip will issue an interrupt request, and the interrupt response function continuously writes play data until The sound data is over.
4.3 Ultrasonic ranging and cloud ceiling control program design
The driver of the ultrasonic ranging function belongs to the Linux character type driver, using the IOCTL () function to perform timing control for the corresponding GPIO, complete the transmitting and reception of ultrasonic waves. When the receiving port outputs a high level pulse signal, the system interrupts and uses the timer to calculate the high level duration ΔT, and the measurement of the distance S is completed according to the formula (1). The V is the propagation speed of ultrasonic waves, and the propagation speed of ultrasonic waves in the air is 340 m / sec. In the application, you can read the measured distance values via the READ () function.
S = VXΔT / 2 (1)
The two-degree-of-freedom Yuntai consists of two steering gear, which controls the rotation angle of the pan stage and the vertical direction. In the driver, first open the timer PWM function and set the timer interrupt function, then map the timer interrupt function, and finally enable the timer to start running the timer. According to the speech instruction issued by the experimenter, the two PWM signals are output by the IOCTL () function control timer, and the rotation angle of the two steering gears is controlled, and finally the movement is combined into the pose of the PTZ.
5. Conclusion
This article describes an application and implementation of embedded speech recognition techniques and implementation, and experimental personnel can implement the pre-defined speech instructions (eg, "start measurement", "left", "front"). The control of the system and the distance measurement is performed using ultrasonic waves. After the measurement is completed, the system feeds back the measurement results to the experimenter via voice playback, completing the human-computer interaction, and improves the user experience. The system has the advantage of expanding, which can be applied to other embedded control systems.
Be
Be
Reprinted from -Wiku Electronic Market Network
Our other product: