AttenTIon Attention Mechanism--Principles and Applications
The attention mechanism, that is, the AttenTIon mechanism, has a huge improvement effect on the sequence learning task. In the codec framework, the data model is weighted by adding the A model to the code segment, or the A model is introduced at the decoding end. The weighted change of the target data can effectively improve the system performance of the sequence in the natural mode of the sequence.
What is AttenTIon?
The basic expression of the AttenTIon model can be understood as (I personally understand): When we are looking at something, we must always pay attention to somewhere in the way we are currently watching, in other words, when we When the gaze moves away, attention shifts with the movement of the gaze, which means that when people notice a certain target or a certain scene, the attention within the target and in every spatial position within the scene The distribution is different. This is also true in the following situations: When we try to describe a thing, the words and sentences we are talking about at the moment are the first to correspond to a certain segment of the thing being described, while the other parts follow the description. Relevance is constantly changing. From the above two situations, the reader can see that for the perspective of Attention, we can classify Attention categories from two perspectives: spatial attention and time attention, namely Spatial Attention and Temporal Attention. This classification is more from the application level, and from the Attention method, it can be divided into Soft Attention and Hard Attention. As we said, the vector distribution of the Attention output is a one-hot The unique heat distribution is also the soft distribution of soft, which directly affects the selection of context information.
Why join Attention?
After explaining what Attention did, let's discuss why the Attention model is needed, that is, the motivation for Attention to join:
As the sequence is input, as the sequence grows, the original performance of the time step is worse and worse. This is because the original structure of this time step model has a flawed structure, that is, all context input information is limited to Fixed length, the ability of the entire model is also subject to restrictions, we will temporarily call this original model a simple codec model.
The structure of the codec cannot be explained, which makes it impossible to design.
What is the principle of Attention?
Let's take a look at the specific principles of Attention:
First let the encoder output a structured representation, assuming these representations can be represented by the following set (Hold can't live, I want to take a screenshot, too much trouble!!!)
Due to the information loss caused by the fixed-length context feature representation, it is also a defect. Due to the obvious difference in the information volume of different time slices or spatial locations, the problem of loss cannot be solved well by using the regular expression, and Attention is just solved. This problem.
We can even further explain how the codec works in general. Of course, it seems to me that this is a bit of a post-mortem. After all, Attention is designed according to the a priori of the person, so the result of the final training is directed to the person. The goal is to go. It can be said that the relationship between the context representation information of the jth dimension and the output of the tth time slice is input, and the jth dimension may be in the spatial dimension or in the timing. Since the addition of Attention will perform a weight-based screening of the input context representation, the screening mode of this display is not the result of manually creating such a mechanism, but by this weighting method, the network can learn. The spatial or temporal structural relationship, of course, is based on the assumption that there is an original uninterpretable relationship. Figure 1 above clearly shows the relationship between the output weight of the added attention model and the input and output information in the machine translation problem.
So what is the role of Attention?
Attention appears for two purposes: 1. Reduce the computational burden of processing high-dimensional input data, and reduce the data dimension by structuredly selecting a subset of inputs. 2. “Go to false†allows the task processing system to focus more on finding useful information in the input data that is relevant to the current output, thereby improving the quality of the output. The ultimate goal of the Attention model is to help a framework like a codec to better understand the interrelationships between multiple content modalities, so as to better represent this information and overcome the flaws that are difficult to design due to its inability to interpret. From the above research problems, it can be found that the Attention mechanism is very suitable for inferring the mutual mapping relationship between different modal data. This relationship is difficult to explain, very concealed and complicated, which is the advantage of Attention - no supervision The signal is extremely effective for the above-mentioned problems with few cognitive a priori.
Let us look at a concrete example!
Here is a picture directly, give a concrete example, then let us slowly explain:
Let's take a look at how other researchers in the paper use AttentionModel:
Portable Power Station also known as Portable Power Bank. Our product regulated DC Power, AC Pass-Through Charging, Wireless Fast Charger, Bluetooth Playback, Attentive Start, Car charger, Type-C ....... Large capacity meet the needs of all kinds of equipment.
Portable Power Station,Indoor Portable Power Station,Solar Portable Power Station,Portable Power Station With Bms
Jiangsu Zhitai New Energy Technology Co.,Ltd , https://www.jszhitaienergy.com