Why do we have so many GenAI Foundation Models?

Can you believe that there are more than 325000 models in Hugging Face now? Why do we have so many GenAI Foundation Models?

Mélony Qin Published on July 29, 2025 0

Can you believe that there are more than 325000 models in Hugging Face now? Given the recent hype around AI, there are new foundation models appearing every day. But here’s the real question. Do we find ourselves in a state where too much of something harms us more than it benefits us? So, in this video, we will explore the world of GenAI foundation models. And see whether they have gone beyond innovation to information overload. Is this a paradox of choice that only leads to confusion and makes life more complicated rather than easier? Let’s find out.

Stay tuned because this is something closer to today’s reality than you might think.

Table of Contents

What are Foundation Models?

So, what exactly are foundation models? These intelligent models are like the Swiss Army knives in the world of artificial intelligence. These gigantic, umbrella AI systems are trained on a wide, varied scope of information. It is easy to picture them as the foundational stones for developing more narrow artificial intelligence solutions. These models can understand and generate output in multiple formats, such as texts, images, audio, you name it. That is why they are called ‘foundation’ models. It means they provide a basis for the development of essentially anything from the ground up.

But here is the interesting part. These models are not just single-task machines. They have been trained on such a diverse range of data that it may be fine-tuned for so many different purposes. Whether you want to write an article using an AI model, or generate an art piece, or even participate in an extensive scientific analysis. Chances are : there is always a suitable foundation model waiting for you.

A few of them are better than others

A few of them are GPT-4, Google’s Gemini, or Meta’s LLaMa 2. These are some of the major names in the field of foundation models. The said models have their strengths in different aspects of learning. For example, GPT-4 is celebrated for its text generation ability. Gemini for multimodality and LLaMa 2 as a large language model LLM trained on vast quantities of textual information. All four are remarkably flexible. However, each has a unique use. With so many models that fit different requirements. Are we getting to that stage where choices are actually counterproductive and, to be more precise, pointless? Well, that’s exactly what we are here for, aren’t we?

Parameters and model size

Many people use terms like “parameters” and “size” when discussing foundation models, such as LLaMa 2.
Parameters are the connections in a model’s “neurons,” which the model adjusts during training to learn.
A model teaches itself and processes data by optimizing these parameters.
More parameters make a model more complex and versatile, much like extra neurons complicate brain activity.

LLaMa 2, are large scale machine learning models that act as foundations for the several AI solutions. They should be seen as unprocessed products, or “blanks,” like high-quality steel ready for shaping.
These models are highly customizable and can fit almost any application, including language or image generation.
For example, GPT-4 was fine-tuned using hundreds of billions of words to master many language structures.
The scale of these models is described as “7B,” where “B” stands for billion parameters in the model.

Ask ChatGPT Increased parameters need more computation and hence consumes more computational resources. For example, LLaMa 2 with 70 billion parameters is useful but requires a lot of computational power to run. Knowledge of these basics is useful in assessing the nature and capability of a model for a given task, as well as the growing trends in the field of artificial intelligence.

How are those AI models trained?

But what does it take to train these gross giants? Training a foundation model is not an easy task, it’s not something that one can do in a week or even a few days, it demands lots of computation. For example, Meta released LLaMa 2 in July 2023, but it was not developed overnight. It was trained using a supercluster with the help of 6 thousand GPUs. Indeed, that is true, a whopping 6000 GPUs!

They should be seen as unprocessed products, or “blanks,” like high-quality steel ready for shaping.
These models are highly customizable and fit almost any application, including language or image generation.
For example, GPT-4 was fine-tuned using hundreds of billions of words to master many language structures.
Model scale is described as “7B,” where “B” means billion parameters in the model.

Myth of GPU hours

“GPU hours” count the time a Graphics Processing Unit spends training a model.
If a model took 184,320 GPU hours, and used one GPU, training would take over 21 years.
In practice, many GPUs work together in parallel to speed up the process.
If one thousand GPUs were used, training would take about 184 hours.
The Llama 2 13B model needed 368,640 GPU hours, showing the power needed for training big models.

Let’s compare with hypothetical LLaMa 3 models.
A LLaMa 3 8B model, with 8 billion parameters, might take about 200,000 GPU hours for training.
It has more parameters and operations than LLaMa 2 7B, and is closer to a “big model.”
Still, it remains reasonable in terms of required computation.

Now let us take the example of LLaMa 3 70B. Following this trend, this model could require even higher computation, potentially reaching or surpassing 2 million GPU hours. Wow, that’s a lot of resources, a lot more than what was required for LLaMa 2 70B.

New models like LLaMa 3 sharpen AI capabilities, but Microsoft takes a different approach with Phi-3.
Phi-3, a Small Language Model (SLM) with 8 billion parameters, prioritizes both performance and efficiency.
You can even run it on smartphones or other low-powered devices, unlike LLaMa 3, which needs massive computing power.
Phi-3 delivers almost similar performance without high resource requirements.

Phi-3 achieves this balance by using advanced data processing, smart model design, and post-training steps such as RLHF (Reinforcement Learning from Human Feedback).
These techniques help Phi-3 perform efficiently, even when compared to much larger models, and make it ideal for limited-resource scenarios.
LLaMa 3 sits at the top for large-scale AI, while Phi-3 points to a future where AI becomes more scalable, flexible, and efficient.

Open vs Closed AI Models

However, not all foundation models are the same. Some are public domain, meaning that anyone can access and even has the full right to alter, modify, and even contribute toward the creation of the libraries.

A little kicker about open-source AI

However, it is crucial to understand that even free models exist under various licenses that define how to use them. For instance, Mistral is an open source tool that is available under Apache 2 license.

While there are open source models that are available for anyone to use, there are models that are accessible only to the owning company. These models are often ‘closed,’ which means that access to them is tightly controlled by the company owning the model, and their use is usually restricted to the people who acquired the license or work within the ecosystem of the owning company.

Other models that are available for training, such as those available on the Hugging Face platform, are open for everyone. They’re like a playground for developers and researchers. These models are not set in concrete; it is possible to adapt them to your particular company requirements, even improve them by adding new functions or optimizing their operation. It’s a community effort, and it’s incredibly powerful.

Premium AI models

On the other hand the proprietary models are like the ‘V.I.P’ of the club, the best of the best. They are both strong and limited however only individuals with the permit or entry code have the access codes. For instance, GPT-4 belongs to OpenAI and is carefully guarded with strict use parameters while Google’s Gemini model is owned and restricted by Google. You can’t just change or adjust them for your needs; rather, you are often tied into the company’s environment, and you’re a part of it. Although these models may possess more data and the latest tools and techniques, such a privilege is paid for as you do not have complete control over the process.

Thus, the question arises: is there a need for both open-source and proprietary models in the modern world? The state-of-the-art, open-source models like LLaMa 3 are flexible, affordable, and have an active development community. They could also potentially become the standard for the industry in the same way that Linux did, as Mark Zuckerberg mentioned. While proprietary models have the backing of big companies with great resources, the rise of powerful open-source alternatives raises the question: are we making things unnecessarily complicated by clinging on to too many proprietary solutions?

Locally vs Cloud-Hosted Models

Here’s another consideration: where would you like to host your AI? In the Local environment, or in the Cloud?

It’s still a judgement call

As mentioned before, running the models locally will provide you with complete autonomy over the models. It is done at your own will, and that means you can adjust the settings according to your preference. But here’s the catch: the fact is that running the AI models on the local computer means you will need quite substantial hardware. We’re talking about powerful GPUs with high capacity, and they are not affordable by any measure. Another aspect is the scalability and the possibility to increase computing power as you go along. Managing this locally can be a massive undertaking, specifically if you consider the storage requirements for the data and model acceleration.

Cloud has an advantage in scaling

In contrast, the cloud-hosted models can be more flexible. It does not require costly installation of hardware devices and equipment. However, what you can do is execute your AI tasks on clouds like AWS, Azure or Google Cloud. They allow you to expand your computation capacity depending on your requirements. Sounds convenient, right? But there’s a downside: latency. When your model is based on API’s in the cloud, there is always some lag because data has to go back and forth between the servers.

Suppose you are in a conversation with a human-like robot that is operated through an AI in the cloud. You say to it ‘Hey, give me some information about Paris’ and then you wait for some seconds. The response that you receive from the robot is not immediate because the data you enter has to go all the way to the server, then be processed before the reply is sent back to you. Such to-and-fro can cause those uncomfortable stalls, thus making the conversation seem less real.

Customizing Your AI Models

Another advantage of foundation models is their flexibility. You can easily modify these models and customize them to meet your specific needs. For example, if you want to build a customer service chatbot, you can start with a foundation model and fine-tune it on your own dataset. In this way, you create a chatbot tailored for your business, which makes foundation models even more attractive.

When you use AI models, you need to consider two key factors: context length and model format. Context length refers to the amount of data a model can process during a conversation session. Each model has its own context window, which you cannot change. For instance, Claude from Anthropic and Gemini from Google each have different context lengths that affect how they handle dense dialogues.

Model format impacts every aspect of how you deliver and use a model. Today, you can rely on tools like Hugging Face’s TGI container or the GGUF format to support and deploy your models more easily. These modern solutions allow you to scale between local hardware and cloud services without needing more infrastructure. As a result, you gain more control and efficiency while keeping hardware requirements lower.

AI Orchestration and Tools

However, once you have fine-tuned your model, the next step is to make the processes seamless. And that is where AI orchestration tools like LangChain, LlamaIndex, and Semantic Kernel come in. In particular, LangChain assists in coordinating and synchronizing composite tasks, which involve multiple AI models. LlamaIndex enhances data queries and makes sure that your model gets the correct information. Semantic Kernel makes it easier to build complex systems where multiple AI agents work together on different tasks. Such tools are crucial when designing systems in which several AI models cooperate in tandem with other models.

Picture facing several AI models that work in parallel to each other, like having a team of employees each with their own responsibilities. These AI orchestration tools allow every model to run its job well and at the correct time, comparable to a group assignment.

Agentic age is coming

Thus, we again ask ourselves: Do we have too many GenAI foundation models? This question is very much relevant and may not be easily answered. On one side, the increase in the number of models offers more choices, freedom, and scope for creativity. However, the availability of such models can be overwhelming, and thus it becomes difficult to choose the model that best fits a given problem.

The fact of the matter is that foundation models are here to stay. Not only developers but also business owners and researchers can benefit greatly from these models. However, with so much to see and do, it’s important to navigate sensibly.

The real challenge is that it does not end at choosing the right model. It also entails an understanding of how these different models can complement each other. Agentic is coming! Envision a world where, rather than having to select a single model to guide all your automations, these models are integrated together concisely and efficiently into one super AI system. It’s a fascinating topic, and I will explore this topic in a follow-up blog post.

Looking forward

What do you think? Are we in danger of overcomplicating things by having too many different GenAI foundation models or is it necessary to have a variety of options to develop new ideas? So, how does one begin to sort through the massive amount of options available within the AI spectrum? Please feel free to follow me here on Medium and subscribe to my newsletter or my YouTube channel if you’d like to learn more, and share your thoughts in the comment section. I had to take some time off due to personal circumstances, but I will continue to practice my entrepreneurship muscle every week here in 2025! Stay tuned, and see you in the next one!

NEWSLETTER

Follow Us

Why do we have so many GenAI Foundation Models?

Like this: