Epoll, located in the header file SYS / EPOLL.H, which is the I / O event notification infrastructure on the Linux system. The Epoll API is proprietary for Linux systems, introduced in the core 2.5.44, Glibc joins support in version 2.3.2. Other systems that provide similar functions, including FreeBSD KQueue, Solaris / dev / poll, etc.
EPOLL API
The Epoll API implements a similar function with Poll: Monitor if I / O operations can be performed on multiple file descriptors. Support edge trigger ET and horizontal trigger LT, compared to POLL supporting the number of more file descriptors.
The following API is used to create and manage the EPOLL instance:
EPOLL_CREATE: Create an EPOLL instance and returns the file descriptor associated with the EPOLL instance. (The latest epoll_create1 extends the function of EPOLL_CREATE)
CREATE_CTL: Registered File Descriptor. A set of file descriptors registered on the same EPOLL instance is called the EPOLL SET, which can be viewed through the / proc / [pid] / fdinfo directory corresponding to the process.
EPOLL_WAIT: Wait for the I / O event, if there is no registration event in the available state, the calling thread will be blocked.
Horizontal trigger LT and edge trigger ET
EPOLL event distribution interfaces can use both mod and lt patterns. The differences in the two modes are described below.
Typical scenario:
1 Document Descriptor (RFD) of the 1 Pipe (PIPE) is registered in the EPOLL instance.
2 The writer (Writer) writes 2 kb data to the pipe (PIPE).
3 EPOLL_WAIT call ends, returns RFD as an ready file descriptor.
4 Pipe Reader (Pipe Reader) reads 1kb data from RFD.
5 Next time Epoll_Wait call.
If the RFD file descriptor is added to the Epoll interface using the EPOLLET, step 5 can be hung by the call to EPOLL_WAIT, although there is still available data in the file input buffer; at the same time, the remote entity has been sent data May be waiting for a response. The reason is that the edge trigger mode will only be delivered only when the monitored file descriptor state changes. So, step 5's caller may eventually wait for data to come, but the data is actually in the input buffer. After the write operation and the third step of step 2, only one event will only be generated on the RFD. Since the read operation of the fourth step does not read the full buffer data, step 5 will always block the call to EPOLL_WAIT.
When using the EPOLLET tag, you should set the file descriptor as non-blocking to avoid blocking read and write, so that the task of processing multiple file descriptors is starving. Therefore, use the EPOLL edge trigger (EPOLLET) mode interface, there are two suggestions:
1 Using a non-blocking file descriptor
2 Continue the event after returning Eagain in Read or Write (call ePoll_wait)
In contrast, when EPOLL is used as a horizontal trigger interface (LT, default mode), Epoll is equivalent to a faster POLL, which can be used for any scenario applicable to POLL because the two semantics are the same.
In edge trigger mode, multiple events may also occur when receiving multiple data blocks, and the caller can tell EPOLL to cancel the associated file descriptor when receiving the event via EPOLL_WAIT. When the EPOLLONESHOT tag is set to the EPOLL, the caller needs to set the EPOLL_CTL_MOD tag for the file descriptor via EPOLL_CTL.
Use example
When the EPOLL is used as the horizontal trigger interface, you need to pay attention to the details of the application layer event cycle when using the edge trigger interface, to avoid errors. The following example. Imagine a non-blocking socket as a listener, can call Listen on the Socket. Function DO_USE_FD () Processes the newly ready file descriptor until reading (read) or write (WRITE) returns eagain. The event-driven state machine application should record the current state after receiving EAGAIN so that when the DO_USE_FD is called next time, it can continue to read and write the read / write from the previously stopped read / write data.
1
2
3
4
5
6
Seduce
8
9
10
11
12
Mean
14
15
16
In one
In
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
Twist
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#define max_events 10
StruCtePoll_Event EV, Events [MAX_EVENTS];
Intlisten_Sock, Conn_Sock, NFDS, Epollfd
/ * Code to set up listenging socket, 'listen_sock',
(Socket (), Bind (), Listen ()) omitted * /
EPOLLFD = EPOLL_CREATE11 (0);
IF (ePollfd == -1) {
PERROR ("ePoll_create1");
EXIT (exit_failure);
}
ev. Events = EPOLLIN;
ev.Data.fd = listen_sock;
IF (EPOLL_CTL (EPOLLFD, EPOLL_CTL_ADD, LISTEN_SOCK, & EV) == -1) {
PERROR ("Epoll_CTL: Listen_Sock");
EXIT (exit_failure);
}
For (;;) {
NFDS = EPOLL_WAIT (Epollfd, Events, Max_Events, -1);
IF (nfds == -1) {
PERROR ("ePoll_wait");
EXIT (exit_failure);
}
FOR (n = 0; n < nfds; ++ n) {
IF (Events [n] .data.fd == listen_sock) {
CONN_SOCK = Accept (listen_sock,
Struct SockAddr *) & addr, & address;
IF (conn_sock == -1) {
PERROR ("accept");
EXIT (exit_failure);
}
Setnonblocking (CONN_SOCK);
Ev.Events = EPOLLIN | EPOLLET;
Ev.data.fd = conn_sock;
IF (Epoll_CTL (Epollfd, Epoll_CTL_ADD, CONN_SOCK,
& EV) == -1) {
PERROR ("Epoll_CTL: CONN_SOCK");
EXIT (exit_failure);
}
} else {
DO_USE_FD (events [n] .data.fd);
}
}
}
When used as an edge trigger interface, for performance, you can add the file descriptor to the EPOLL interface (EPOLL_CTL_ADD) by setting the ePollin | Epollout. This avoids switching between EPOLL_CTL (EPOLL_CTL_MOD) between EPOLLIN and EPOLLLOUT.
Automatic sleep problem
If the system sets an automatic sleep mode (via / sys / power / autosleep), the device driver remains awakening when the event of the wake-up device occurs until the event enters the queued state. In order to keep the device woke up until the event processing is complete, you must use the Epoll EPOLLWAKEUP tag.
Once the EPENTS field in Structe Poll_Event sets the EPOLLWAKEUP tag, the system will remain awake when the event is queued, starting from the EPOLL_WAIT call, and continues to be next ePoll_wait call.
Monitoring quantity limit
The following files can be used to limit the size memory space size used by EPOLL (LINUX 2.6.28):
/ proc / sys / fs / epoll / max_user_watches
The max_user_watches file is used to set the number of file descriptors registered in all EPOLL instances to each user ID. A single registration file descriptor consumes 90 bytes on the 32-bit kernel and consumes 160 bytes on the 64-bit core. The default value of Max_User_watches is 1/25 (4%) of the available kernel memory space divided by a single registration file descriptor.
Avoid hunger (edge trigger)
If I / O data is large, other files may not be processed during reading data, resulting in hunger. The solution is to maintain a list of ready lists, which can be used in the associated data structure as the ready state, whereby which files are waiting and rotate all ready files.
Event cache trap
If you use an event cache, or store all file descriptors returned by EPOLL_WAIT, you need to provide a way to dynamically mark off state (for example, since other event processing is turned off), assume that 100 events from EPOLL_WAIT, A event causes B The event is closed, if the B event structure is removed, and the file descriptor is removed, the event cache still thinks that there is an event waiting for the file descriptor, resulting in confusion.
The solution is that during the A event processing, ePoll_CTL (EPOLL_CTL_DEL) is called to remove the B file descriptor and close, then the mark is connected to the removed and associated to the removal list. During subsequent event processing, when new events of the B file descriptor are found, you can discover the file descriptor by checking the tag, avoiding confusion. Read more
Our other product: