ChatGPT isn’t the first language model; it isn’t even the first GPT model. But it made a significant leap in natural language processing—popularizing large language models and accelerating the adoption of AI.
What factors contributed to ChatGPT’s success?
This article explores the history of ChatGPT, the technology behind it, and its applications, future developments, and impact on society.
Learn about the past, present, and future of ChatGPT.
Table of Contents
- What Is ChatGPT?
- The Technology Behind ChatGPT
- The History of OpenAI
- The History of ChatGPT
- Impact and Implications
- ChatGPT: From History to Future
What Is ChatGPT?
ChatGPT is a powerful AI chatbot capable of generating human-like text and performing tasks based on written commands. It’s an advanced form of narrow artificial intelligence (ANI) and a big step toward artificial general intelligence (AGI). (Our beginner’s guide to learning AI explains the difference between the types of AI in more detail.)
The GPT in ChatGPT stands for generative pre-trained transformer—a large language model that uses deep learning to produce human-like speech. In other words, ChatGPT is an AI solution powered by the GPT model. The GPT technology also powers products like OpenAI’s Codex, Copy.ai, Jasper, etc.
We’ll discuss the history of ChatGPT and the technology and company behind it. But first, let’s define the key terms, starting with large language models.
The Technology Behind ChatGPT
Large language models (LLMs) are neural networks trained with enormous data sets capable of understanding and generating human-like speech. This technology falls under the generative AI category—models explicitly designed to generate output—instead of discriminative AI, which distinguishes and classifies various data types.
Early LLMs were based on recurrent neural networks (RNNs) since these were the first models to handle sequences like text. But their ability to remember previous words was limited, and the training process was slow.
Long short-term memory (LSTM) networks (a type of RRNs) were introduced in 1997 as a solution to the limited memory problem. LSTMs demonstrated a significantly improved ability to remember longer sequences and became a popular model for natural language processing tasks. Still, their language capabilities were limited compared to recent solutions.
The transformer architecture behind today’s generation of LLMs was introduced in 2017 by a team of Google researchers. It uses attention mechanisms to track the position, order, and hierarchy of all words in a sentence, making it capable of retaining large amounts of contextual information and generating grammatically and semantically meaningful text.
OpenAI’s generative pre-trained transformer (GPT) and Google’s Bidirectional Encoder Representations from Transformer (BERT) models are based on the transformer architecture.
Generative pre-trained transformers are transformer-based language models designed to understand language and produce human-like speech. ‘Generative’ means they are designed to generate output, typically text or code. ‘Transformer’ implies they are based on the transformer architecture.
And pre-trained refers to GPT’s training process, which you can learn more about in our article ChatGPT: How to Understand and Compete with the AI Bot.
Now, let’s learn how ChatGPT was created.
The History of OpenAI
OpenAI was founded in 2015 by Elon Musk and Sam Altman (co-chairs), Greg Brockman (CTO), Ilya Sutskever (research director), and a group of research engineers and scientists. OpenAI started as a non-profit artificial intelligence research organization with the mission to develop artificial general intelligence (AGI) that benefits humanity.
In 2018, Elon Musk stepped down from the board of OpenAI but remained a significant investor, and Sam Altman became OpenAI’s CEO in 2019. Around the same time, the company restructured to a capped-profit model to attract new investors and accelerate the development of AI. The restructuring created the for-profit entity OpenAI LP, which remained under the control of the non-profit OpenAI Inc.
The new OpenAI CEO and future founder of ChatGPT didn’t waste time. Shortly after stepping into his new role, Altman attracted Microsoft as an investor and minority owner—providing the resources to train and improve the AI systems behind today’s breakthroughs.
The following exponential growth of OpenAI can be tracked the GPT models’ development.
The History of ChatGPT
ChatGPT isn’t OpenAI’s only product. Other notable technologies it developed include (among others):
- DALL-E – an AI system that creates realistic images based on text descriptions. The first version was released in January 2021, and the second in April 2022.
- Codex – an AI system that translates natural language into code in multiple coding languages. It was built on top of the GPT-3 model, launched in partnership with GitHub in July 2021, and released to the public in August 2021.
- Whisper – a web-based automatic speech recognition system that can transcribe audio files in multiple languages and translate them into English. The model was released in September 2022—one month before the launch of ChatGPT.
OpenAI’s contributions to AI development are numerous, but ChatGPT’s release date will remain a milestone in the history of generative AI.
Introduced in June 2018, GPT-1 was OpenAI’s first transformer-based language model. With 117 million parameters, GPT-1 was among the most prominent language models at the time. The model used books as training data and could perform various tasks, including textual alignment, semantic similarity, reading comprehension, commonsense reasoning, and sentiment analysis.
OpenAI introduced GPT-2 in February 2019. The model had 1.5 billion parameters and was trained with information from the internet. It was able to perform a broader scope of tasks without task-specific training.
OpenAI initially refrained from releasing the complete model due to concerns about misuse. Instead, it gradually released smaller model versions for research purposes.
The company behind ChatGPT released GPT-3 in 2020. As of August 2023, it is the only GPT model that can be fine-tuned. GPT-3 has 175 billion parameters and much more powerful capabilities than the previous models. But concerns about its subjectivity to disinformation and biases continued.
So, instead of releasing an open-source model, OpenAI provided public access to GPT-3 through an API. This allowed third parties to use the underlying technology, while OpenAI retains some control over the access.
Launched in January 2022, InstructGPT is a fine-tuned version of GPT-3. OpenAI’s primary goal with this model was to reduce offensive language and misinformation and provide answers that humans consider helpful.
GPT-3.5—the model behind ChatGPT—is a fine-tuned version of GPT-3 that can understand and generate natural language and code.
ChatGPT was released to the public in November 2022. The technical capabilities of InstructGPT and ChatGPT are almost identical. Both models were trained using the Reinforcement learning from Human Feedback (RLHF) method. (Learn more about how ChatGPT was trained in our article, ChatGPT: How to Understand and Compete with the AI Bot.)
What made ChatGPT the internet service with the fastest-growing user base?
The only changes OpenAI made between the January and November releases were adding conversational training data and tuning the training process. But these adjustments made ChatGPT more user-friendly and capable of understanding user preferences.
OpenAI has also addressed malicious content issues, deeming ChatGPT safer for public use than the previous models.
OpenAI released its GPT-4 model to ChatGPT Plus paid subscribers in March 2023. The model significantly improved ChatGPT’s capabilities, especially for complex tasks. It also reflects OpenAI’s efforts to diminish the frequency of undesirable or harmful responses.
The most significant GPT-3.5 vs GPT-4 difference is the context window, which increased from around 3,000 words upon ChatGPT’s release to approximately 25,000 for GPT-4.
In addition, the model produces more factually correct information, has fewer hallucinations, and is less likely to respond to sensitive requests or generate disallowed content.
Another notable improvement includes GPT-4’s ability to accept image inputs—although it can only provide text outputs in response. The following OpenAI product, however, takes multimodality to the next level.
Released in July 2023, Code Interpreter is OpenAI’s most recent AI system as of August 2023. It’s based on the GPT-4 model but introduces significant improvements.
Most notably, it can understand inputs and generate outputs in multiple formats (text, image, video, audio, code)—exponentially increasing its ability to comprehend information and produce the desired results.
ChatGPT became a global cultural phenomenon almost overnight, reaching unprecedented mainstream popularity. Using the momentum, OpenAI started releasing fine-tuned ChatGPT versions and new models much faster.
The GPT technology has now reached its peak—not regarding capabilities (its limitations are numerous) but concerning people’s expectations.
In a conversation with the MIT Technology Review, the OpenAI team revealed how they’re working to improve ChatGPT.
One of the most significant issues is jailbreaking—i.e., tricking ChatGPT into providing restricted information. The OpenAI team works to teach the AI to ignore such requests via adversarial training. This involves pitting two chatbots against each other, whereby one tries to make the other bypass its constraints, and using the outputs as training data for ChatGPT.
Another big issue with GPT models includes factuality. Each AI tool is only as good as the data it was trained on. And no one has complete control over that, especially with a model of this magnitude. Selecting the training data is a sensitive issue and a determining factor in the model’s performance. Factuality will likely remain an issue, and anyone using ChatGPT and other similar technologies should be mindful of that.
GPT-4 came soon after the launch of ChatGPT, and rumors about GPT-5 have already started. OpenAI even filed a trademark application for GPT-5 in July 2023—under examination by the United States Patent and Trademark Office (USPTO). But OpenAI’s CEO Sam Altman said that the company isn’t working on the next model yet and has no timeline for its release, emphasizing the amount of work needed to address safety issues beforehand.
Impact and Implications
ChatGPT has changed the AI timeline forever. It sparked an increased interest in natural language processing, leading to a wave of research and accelerated technological development. The market is flooded with AI solutions, and many businesses have incorporated ChatGPT in their workflow.
It may be tempting to believe that the omnipotent AI will solve all problems by providing insights hidden to the human eye. But even the most flawless technology is only as good as the data it was trained on and the user’s prompt.
Despite our best efforts, creating genuinely unbiased AI is impossible because it will always hold the biases of its training data. So, we must never blindly trust the AI tools’ output; we must think logically and strategically about how and when to leverage it.
Used the right way, ChatGPT is an effective tool that increases productivity, boosts creativity, and empowers people with capabilities beyond their skills. But to unleash this power, individuals and organizations must learn how to use AI tools effectively.
ChatGPT: From History to Future
The release of ChatGPT spurred transformation not only in the development of AI but in our daily and work lives, too. To be part of this AI revolution, you must learn how to leverage the new technologies.
365 Data Science can equip you with the necessary skills to navigate and succeed in this AI-driven world. Master the technologies of tomorrow with our series of ChatGPT courses.
Start your journey with our Intro to ChatGPT and Generative AI course, then take your skills to the next level with Data Analysis in Power BI with ChatGPT and Data Analysis with ChatGPT Code Interpreter.
Sign up for 365 Data Science and start learning for free.