Google recently denied reports that BERT, its Natural Language Processing (NLP) system, was trained using data sourced from ChatGPT, an open-source library of natural language conversations.
The reports stemmed from a paper authored by scientists from Google Research, OpenAI and Stanford University. In the paper, the team tested the effectiveness of ChatGPT alongside other NLP models, such as BERT. The paper found that ChatGPT showed superior performance compared to models like BERT.
After the paper was released, it was widely reported that Google had trained BERT using ChatGPT data, a claim that Google has now refuted.
In response to the reports, a spokesperson for Google said: “We built BERT using data from a variety of sources, including Common Crawl, Wikipedia, BooksCorpus, as well as our own Google Search query logs. We did not use any data from ChatGPT.”
The spokesperson went on to point out that the research paper was not testing the effectiveness of BERT, but rather the effectiveness of ChatGPT and other models compared to one another.
Despite the denial from Google, some experts remain skeptical. Specifically, some have argued that the performance improvements seen with the ChatGPT model could be the result of a transfer of knowledge from the ChatGPT model to the BERT model.
However, Google has maintained that this did not occur and that their models are based entirely on their own training data.
At the end of the day, it remains unclear whether or not Google did indeed train BERT with data obtained from ChatGPT. Whether they did or not, though, one thing is certain: natural language processing is continuing to make strides forward with tools such as BERT and ChatGPT.
Hey Subscribe to our newsletter for more articles like this directly to your email.