The Rise of Generative Models
- Joeri Pansaerts

- Oct 1, 2023
- 2 min read
Updated: Oct 10, 2023
In recent times, the world has witnessed a significant shift in the capabilities of artificial intelligence, particularly with the rise of large language models (LLMs) like chatGPT. These models have not only showcased their prowess in creative tasks like poetry but have also demonstrated their potential in practical applications, such as planning vacations.
Diving into the technicalities, LLMs fall under a broader category known as foundation models. This term was introduced by a group of visionaries from Stanford who noticed a paradigm shift in the AI domain. Previously, AI applications were constructed by training a plethora of models, each tailored for a specific task using task-specific data. However, the prediction was that the industry would gradually transition to a model where a single foundational capability could drive a multitude of use cases and applications. This means that one model could be adapted to various tasks, making it a versatile tool in the AI arsenal.
The magic behind this adaptability lies in the extensive training these models undergo. They are fed vast amounts of unstructured data, allowing them to predict and generate content based on prior inputs. For instance, in the language domain, these models might be trained with sentences, aiming to predict the concluding word based on the preceding context. This predictive and generative capability is what places foundation models under the umbrella of generative AI.
Interestingly, even though these models are fundamentally designed for generative tasks, they can be fine-tuned for traditional natural language processing (NLP) tasks, such as classification or named-entity recognition. This adaptability is achieved through a process called tuning, where a small amount of labeled data can modify the model's parameters to perform specific tasks. Moreover, in scenarios with limited data, these models can still be effective through prompt engineering, making them invaluable assets in diverse settings.
The advantages of foundation models are manifold. Their performance is unparalleled, primarily because of the vast data they are exposed to during training. This extensive training means that when applied to specific tasks, they can outperform models trained on limited data. Additionally, the productivity gains are noteworthy. Due to their foundational nature, these models require significantly less labeled data to be fine-tuned for specific tasks, capitalizing on the vast amount of unlabeled data they were initially trained on.
However, every coin has two sides. The computational cost of training these behemoths is substantial, making it challenging for smaller enterprises to train a foundation model independently. Furthermore, once these models reach a certain size, the cost of running inferences can be prohibitive. Trustworthiness is another concern. Since these models are trained on vast amounts of data sourced from the internet, there's a risk of them internalizing biases, hate speech, or other toxic information. This lack of transparency and potential for bias poses challenges in terms of trust and reliability.
The world of generative AI and foundation models is vast, brimming with potential, and poised to revolutionize various industries. I see endless possibilities and am excited about the value these advancements can bring to businesses and customers alike.




