top of page
  • ChatGPT

can you give some examples of source material used to train chatGPT?

Updated: May 27, 2023

Yes, there are other large language models similar to ChatGPT that are available to the public. Some examples include:

  1. GPT-2: This is a similar model to ChatGPT also developed by OpenAI. It has similar capabilities and is also available for use through the OpenAI API.

  2. BERT: Developed by Google, BERT is a transformer-based model that is trained on a large corpus of text data and is capable of handling tasks such as natural language understanding and text classification.

  3. RoBERTa: Developed by Facebook AI, RoBERTa is also a transformer-based model that is pre-trained on a large corpus of text data. RoBERTa is designed to improve on the performance of BERT on a wide range of natural language understanding tasks.

  4. T5: A state-of-the-art language model developed by Google, that is pre-trained on a massive amount of text data and fine-tuned for a wide range of natural language understanding and generation tasks.

  5. XLNet: Developed by Google, it's a transformer-based model that is trained on a large corpus of text data, similar to BERT and RoBERTa, but it is designed to overcome some of the limitations of the BERT model by using a more powerful training method.

These models are available either via APIs or can be downloaded and fine-tuned on specific tasks or datasets.


can you give some examples of source material used to train chatGPT?


ChatGPT was trained on a diverse set of internet text, including books, articles, websites, and forums. Some examples of the types of sources that may have been used to train ChatGPT include:

  1. Wikipedia: This is a large and diverse source of knowledge that can be used to train language models. It contains articles on a wide range of topics and is written in a variety of styles.

  2. Common Crawl: This is a large dataset of web pages that is collected and made available for research. It can be used to train language models on a diverse set of web content.

  3. Project Gutenberg: This is a collection of over 60,000 free e-books that can be used to train language models on a wide range of literature.

  4. News outlets and articles: A large number of newspapers and news websites, provide articles on various topics that can be used to train language models on current events and news.

  5. Blogs: There are millions of blogs online, which can provide a wide variety of perspectives and writing styles that can be used to train language models.

It's important to note that these are just examples and the actual sources used to train ChatGPT are not publicly disclosed. Also, the training data is pre-processed and filtered to remove any unwanted or offensive content, to ensure that the model is not biased or perpetuate stereotypes.






3 views0 comments

Recent Posts

See All

Tech Trash

Tech Trash 'Women working in the tech industry' does not mean you are an Instagram influencer or have an OnlyFans page.

bottom of page