Will BERT change NLP? Well, yes and no. ImageNet changed computer vision, but BERT will be no more than another tool in the arsenal of an NLP practitioner. Nevertheless, BERT is an impressive effort. However, NLP is a complex task that involves many meanings and subtleties. A successful system should be able to predict all of those meanings and subtleties.
The BERT framework can learn information from both the right and left side of a token, allowing it to understand context better than its predecessors. For example, consider the example of a homonym: ‘Jimmy sat down in an armchair reading a magazine’ and ‘Jimmy loaded the magazine into his assault rifle’. Both examples involve the same word, so BERT has the power to learn both meanings and predict the correct token given either context.
While BERT is far from being the best algorithm for NLP, it has sparked significant interest in the field. Its versatility has prompted many researchers and companies to experiment with Transformers, and some of these systems have even outperformed BERT on multiple NLP tasks. In addition, Facebook AI has recently improved its BERT algorithm by developing DistilBERT, a version of the BERT algorithm that has reduced the number of parameters but maintains 95% of its performance.
The newer language models are better at predicting missing words in text than humans. In a bidirectional training model, a random 15% of tokens is hidden from the model during training. The model then tries to predict the hidden words based on the other words in the sequence. It then uses this context to improve its performance. But can BERT really change NLP? Let’s find out!