The State of Open Source Generative AI for Developers
Paolo Mainardi | 28 August 2024
Introduction
The definition of open source emerged during a different era for computer programming. Even though the open source movement had its roots in the 1970s and 1980s, by the end of the 90s, most commercial software and operating systems were ‘closed source.’ At the time, the most significant commercial software was almost entirely based on a closed-source paradigm, and it was battle-tested in the market. However, the invention of the internet, rapid market growth, and the hunger for innovation led to the rise of other models like Free Software and open source.
Initially, people did not fully understand what open source meant beyond simply sharing code 'for free.' They had yet to test many aspects, such as whether the model could succeed in business and how it could remain sustainable when code was considered the most important source of added value, based on general economic theories of scarce resource consumption and copyright laws.
This changed when projects like Linux and the economy built around it proved successful in the market. A similar shift occurred with Kubernetes 10 years ago, to take two examples from recent history.
Open source software is the de facto standard for modern software solutions and cloud technologies. It is so widespread that an average of 90% of organizations worldwide declare they use open source at a moderate, significant, or widespread level.
Open Source is no longer a ‘nice to have’ but a ‘must have.’ Even in the AI market, which gained massive momentum in 2023 and is still growing rapidly, “open source has shown the same impressive innovation and disruption already seen in other fields” (Gabriele Columbro, Europe Spotlight 2023 LF Research).
This of course comes with many new and unprecedented issues such as adapting the definition of open source Software to LLM models (which involve massive datasets, dedicated hardware, and code). However, this article will focus more on the technical aspects of the underlying systems instead of license issues, which deserve a dedicated article.
For more on the license topic, see the immense work by OSI on the OSSAI definition and related articles.
This post will instead explore LLM runtimes and open source models, development frameworks, vector databases, IDEs, and other open source utilities available today, as well as the current state of the GenAI programming ecosystem.
Specifically, the projects consist of four main categories:
- Runtime platforms and open source models.
- Vector databases.
- Development frameworks.
- IDE and development assistants.
They represent everything you need to build AI-based solutions, whether they are Retrieval-Augmented Generation (RAG) applications, chatbots, semi-autonomous agents, or semantic searches. These range from runtime platforms (e.g. Ollama), which are used to run actual LLM models such as Llama 3, Phi 3, Mistral, and Gemma 2, to IDE assistants such as GitHub Copilot, but in the open source sauce.
Before we examine the details in depth, let's discuss how we evaluate them.
Projects evaluation criteria
We can define an open source project as one that uses an open source license; this is the essential starting point for determining it. However, the license alone cannot express a project's health in terms of contributions, best practices, maintenance, support, and other non-functional aspects that are important and sometimes vital for the project's existence.
To gather those non-functional quality and security metrics, Scorecard from OpenSSF will run a set of security predefined checks against the project.Then, we will extract other data such as stars, commits, open issues, and PRs from the GitHub APIs to better understand the community and maintenance aspects
It is important to note that:
- The author did not rank or list selected projects as supported or recommended. He chose them based on his current knowledge of fast-growing projects.
- The unassessed data presented is not a scientifically relevant sample; it is only meant to give an objective metric of the projects presented.
You can find the code and list of projects here; anyone can contribute: https://github.com/paolomainardi/eval-genai-oss-prjs
When code is pushed, the GitHub Action workflow runs the evaluation code, saves the results as a pipeline artifact, and you can find it here.
You can also explore the latest data in this public Google Sheets file.
Runtime platforms and open source models
When we talk about inference engines and runtime platforms, we refer to low-level and high-level software that allows to run, manage and expose LLM models,via API or SDK.
Depending on the use case, we either have to choose how much we want to control the entire stack by fine-tuning every single bit, or we just use a more opinionated platform that already makes decisions for us. The latter option gives us less flexibility, but it allows us to develop our solution quickly.
This diagram shows all the components in action and how they interconnect.
Similar to commercial platforms like OpenAI, Mistral, or Anthropic, running an open source LLM locally today requires several well-coordinated and interconnected components.
This journey starts by choosing the inference engines that will run the LLM model. Depending on the vendor, they can also be very specialised or optimised for different kinds of hardware like CPUs or GPUs; this is a critical choice because a slow or unoptimised inference engine can compromise the overall quality of the project.
They are also in charge of supporting LLM model formats and architecture; even if the GGUF standard is slowly becoming the norm, there are many out there (GPTQ, AWQ, HuggingFace, etc.). The same goes for architectures: we all know that Transformers are the predominant ones, but other promising architectures are emerging (e.g. Mamba and RWKV), and models supporting them are already available, such as the European Codestral Mamba from Mistral.
Once the inference engine and model are up and running, we are still only halfway because to profit from the LLM models, we need ways to interact with it via API, CLI, or programmatically via SDK.
Generally, the inference engines already have the primitive functions to do that, even exposing APIs that are more or less compatible with OpenAI specs to make applications more portable. However, they focus more on developing or debugging than for production use. This is not always the case, but it is common.
This is why higher-level, more opinionated platforms are in place to abstract the underlying complexity of managing, running, and deploying engines, models, and APIs (trading less fine-tuning with higher development velocity and community). It is a similar approach to building an application platform from scratch or implementing it on top of Kubernetes.
Today, the several existing open source platforms (i.e. Ollama, LocalAI, GPT4ALL, and Jan) have more or less the same basic feature set but with precise technical accents for different categories of users, from developers to end users looking for an open source alternative to commercial platforms.
For example, Ollama is more developer-oriented, thanks to its close resemblance to the Docker CLI taxonomy and its ability to create and share models with the Modelfile.
The automated scanner analyzed the projects listed in the following table:
Project |
Category |
Stars |
Scorecard |
Language |
License |
---|---|---|---|---|---|
ollama/ollama |
Platform |
81539 |
4.6 |
Go |
MIT |
mudler/localai |
Platform |
22180 |
6.5 |
C++ |
MIT |
opendevin/opendevin |
Platform |
29794 |
5.3 |
Python |
MIT |
nomic-ai/gpt4all |
Platform |
67935 |
5.1 |
C++ |
MIT |
janhq/jan |
Platform |
21166 |
5.6 |
TypeScript |
AGPL-3.0 |
pytorch/torchchat |
Inference engine |
830 |
5.5 |
Python |
BSD-3-Clause |
ggerganov/llama.cpp |
Inference engine |
62638 |
4.4 |
C++ |
MIT |
vllm-project/vllm |
Inference engine |
23785 |
5.2 |
Python |
Apache-2.0 |
NVIDIA/TensorRT-LLM |
Inference engine |
7699 |
4.6 |
C++ |
Apache-2.0 |
BlinkDL/RWKV-LM |
Inference engine |
12049 |
3.7 |
Python |
Apache-2.0 |
AutoGPTQ/AutoGPTQ |
Inference engine |
4165 |
4.5 |
Python |
MIT |
mistralai/mistral-inference |
Inference engine |
9350 |
6.3 |
Jupyter Notebook (Python) |
Apache-2.0 |
huggingface/text-generation-inference |
Inference engine |
8487 |
4.9 |
Python |
Apache-2.0 |
microsoft/DeepSpeed-MII |
Inference engine |
1787 |
6.1 |
Python |
Apache-2.0 |
mlc-ai/mlc-llm |
Inference engine |
17941 |
5.2 |
Python |
Apache-2.0 |
Vector Databases
LLMs train with large amounts of public datasets, and thanks to this data, they are capable of generating valid and coherent text for the majority of general dialogue contexts.
This general built-in knowledge works well for generic dialogues, but it has its limits: it’s static and has a precise cut-off date based on the training date, which means that it can hallucinate and produce imprecise, false, or outdated information.
To overcome this problem, one technique used is the Retrieval-Augmented Generation (RAG), which connects the LLM to external data sources.
As explained here, RAG architectures work by retrieving data from an external source, processing it into an LLM context, and generating an answer based on the combined sources. This process includes three main stages: data preparation, retrieval, and generation.
Data preparation consists of multiple aspects, but at 10th feet, it is basically gathering data from different sources (PDF, XLS, doc, markdown, audio, video, etc.), converting them to plain text, separating them into meaningful chunks, and saving them in a database with its vectorised representation using specialised AI models specifically trained for this scope. There are many of them, and with different specialisations.
This process is Embedding, and it looks like this:
This is a very powerful process because once the text is stored alongside its vectorised representations, we can perform mathematical operations to search similar documents based on the distance between vector points.
Finally, after the data has been retrieved successfully, it is injected into the prompt as the context and sent to the LLM, which will elaborate the answers based on our data.
Now that we have a general overview of what RAG is and when and why we should use it, it is clear that this process requires a database capable of talking vectors, from the embedding process to the similarity functions.
In the last few years, several new projects have emerged either from scratch or as extensions of traditional projects such as PostgreSQL. They all offer the needed features and cover the simplest to the most advanced use cases, such as being capable of natively embedding media files (i.e. audio, video, and images).
Here is a list of the example scanned projects and some metrics.
Project |
Stars |
Scorecard |
Language |
License |
---|---|---|---|---|
chroma-core/chroma |
13814 |
4.6 |
Rust |
Apache-2.0 |
weaviate/weaviate |
10391 |
6.2 |
Go |
BSD-3-Clause |
pgvector/pgvector |
10738 |
5.6 |
C |
|
milvus-io/milvus |
28667 |
3.9 |
Go |
Apache-2.0 |
qdrant/qdrant |
19321 |
4.8 |
Rust |
Apache-2.0 |
Development frameworks
Whether you are building an agent, a chatbot, a semantic search, or any other use case for integrating an LLM into a development workflow, you’ll need many components. These include working with data, importing from different sources, converting it to text, chunking it into smaller pieces, and saving it in a vector database. You may need to repeat this process multiple times because data frequently changes.
Once the data pipeline is in place, it must connect to the LLMs. These can be varied and specialised, such as embedding, vision, code, and text, each with different APIs, weights, and configurations.
When applications become complex, and there are multiple interactions between data and models (e.g. managing the history of a conversation to always fit the context window), they require specific data pipelines and dedicated functions.
So, as it usually happens when new technologies emerge in the market, we can think about how the invention of Docker and Kubernetes just ten years ago and how big the CNCF landscape is now (even if it initially looked like it was just about packaging applications in containers and deploying them somewhere in the cloud).
Breaking-change technologies like containers in the past and Generative AI (GenAI) with LLMs now have the potential to reshape the entire tech industry in just a few years.
These advancements require an accelerated pace across all sectors, including developing new frameworks and supporting tools within the ecosystem.
New open source frameworks for the largest language ecosystem have surfaced in the last few years. These range from comprehensive frameworks covering all aspects, such as Langchain, to specialised ones such as Microsoft Autogen.
Here is the list of analysed projects ranked by GitHub stars.
Project |
Stars |
Scorecard |
Language |
License |
---|---|---|---|---|
langchain-ai/langchain |
89878 |
6.1 |
Python |
MIT |
run-llama/llama_index |
33992 |
6.1 |
Python |
MIT |
microsoft/semantic-kernel |
20860 |
7.5 |
C# |
MIT |
microsoft/autogen |
28816 |
7.1 |
Jupyter Notebook |
CC-BY-4.0 |
deepset-ai/haystack |
14845 |
5.9 |
Python |
Apache-2.0 |
vercel/ai |
8819 |
6.4 |
TypeScript |
Apache-2.0 |
superagent-ai/superagent |
5008 |
4.3 |
TypeScript |
MIT |
spring-projects/spring-ai |
2628 |
5.6 |
Java |
Apache-2.0 |
cheshire-cat-ai/core |
2185 |
5.4 |
Python |
GPL-3.0 |
theodo-group/LLPhant |
700 |
5.8 |
PHP |
MIT |
IDE and development assistants
We can trace the concept of automated code assistant or autocompletion back to the early 1970s; in fact, the first code editor to incorporate a simple auto-completion feature was the Vi editor (they released the first version in 1976), even though they did not design it explicitly for code.
Between the late 1980s and the early 1990s, Borland’s Turbo Pascal and Turbo C++ IDEs included features that one can consider an early form of code completion, like syntax highlighting and keyword completion.
In the late 1990s, with Visual Studio 97, Microsoft pushed the boundaries of what a code suggestion tool could do. It invented IntelliSense technology and provided more advanced code suggestions, including method and property suggestions, parameter information, and more. Today, IntelliSense is a generic term to describe features across different environments.
These tools are powerful, and they can really help developers' efficiency and productivity. They can inspect the codebase, suggest smart autocompletion, and provide refactoring and coding functionalities. Considering that open source projects today consist of thousands of dependencies, it would be difficult to think of building solutions without any automated assistance.
The landscape of code assistance has evolved dramatically with the introduction of tools like GitHub Copilot, which then sparked interest in creating open source models specifically trained for code assistance, such as Codestral, Codestral Mamba, Google CodeGemma and Alibaba Qwen1.5.
Unlike traditional IntelliSense, which offers deterministic suggestions based on static analysis, the GenAI code models leverage their ability to produce text based on the context; as a negative effect, they can hallucinate precisely as they do with text generation.
They are capable of producing relevant code with autocompletion based on the current context; they do this in the form of entire lines or blocks of code.
Another very powerful thing they can do is explain code, which is a superb way to understand a complex codebase or speed up the onboarding of new team members. They can also write automated tests and automatic documentation based on the context.
Let’s see this in action with Ollama and Codestral from a simple terminal session: asking to explain the famous Fast inverse square root function:
❯ ollama run codestral:latest
>>> ‘‘‘can you explain to me this function ?
...
... float Q_rsqrt( float number )
... {
... long i;
... float x2, y;
... const float threehalfs = 1.5F;
...
... x2 = number * 0.5F;
... y = number;
... i = * ( long * ) &y;
... i = 0x5f3759df - ( i >> 1 );
... y = * ( float * ) &i;
... y = y * ( threehalfs - ( x2 * y * y ) );
...
... #ifndef Q3_VM
... #ifdef __linux__
... assert( !isnan(y) );
... #endif
... #endif
... return y;
... }
... ‘‘‘
This function, named `Q_rsqrt`, is an approximation of the reciprocal square root (1/sqrt(x)) for a floating point number. It's often used in computer graphics to perform fast inverse square roots. The calculation involves some bit manipulation and a simple iterative algorithm for refining the initial guess.
Here's a step-by-step explanation of the function:
- `float x2 = number * 0.5F;` - This line calculates half of the input number,
which is used in later steps.
- `long i = * ( long * ) &y;` - Here, we're treating the floating point number as a long integer by reinterpreting its binary representation. This line gets us the integer bits of the float.
- `i = 0x5f3759df - ( i >> 1 );` - This is the magic number method for generating an initial guess of 1/sqrt(number). The magic number `0x5f3759df` was determined through a specific mathematical process, and subtracting the right-shifted version of `i` from it provides this initial guess.
- `y = * ( float * ) &i;` - This line converts our integer guess back into floating point representation.
- `y = y * ( threehalfs - ( x2 * y * y ) );` - This step performs one iteration of the Newton-Raphson method to refine our initial guess. The Newton-Raphson method is a way to find successively better approximations to a function's roots, and here it's being used to improve our reciprocal square root approximation.
- Finally, we return our refined guess `y` as the approximate reciprocal square root of the input number.
This function is often used in performance-critical applications because it provides a fast and acceptable approximation while using fewer computational resources than more precise methods.
>>> do you know who is the author of this function ?
Unfortunately, I don't have information about the author of this particular `Q_rsqrt` function. The provided implementation looks somewhat similar to a
variant of the ‘Fast Inverse Square Root’ algorithm, which was popularised by Quake III Arena developers (id Software) and is often attributed to John
Carmack. However, it's important to note that the specific version you've shared may have been developed by someone else or modified in some way.
This is a very simple example demonstrating the knowledge and capabilities of models, how easy it is to have something up and running in a matter of seconds (or minutes to download the models), and how they express their full potential when integrated into an IDE. Like other models, code models suffer from the training cut-off date, limiting their knowledge to the time when they received training.
They can overcome this limitation using the same RAG techniques described earlier for text models; this is exactly what Continue allows you to do with their context providers. It is important to note that RAG not only adds updated information, but it also has the capability to specialise the LLM by adding domain-specific information, such as private documents or code. Unlike other categories, the number of open source projects here is smaller (at least in this dataset). Still, it is growing quickly and already looks promising when competing with commercial solutions.
Project |
Stars |
Scorecard |
Language |
License |
---|---|---|---|---|
sourcegraph/cody |
2316 |
4.4 |
TypeScript |
Apache-2.0 |
continuedev/continue |
13838 |
4.5 |
TypeScript |
Apache-2.0 |
Conclusions
In conclusion, the open source ecosystem for GenAI for developers looks promising, and it is rapidly evolving in all sectors, from runtime platforms and inference engines to vector databases and development frameworks.
Even in this space, open source is proving to be the best option, allowing communities and companies to collaborate in unlocking the full potential of technologies that might otherwise be controlled by a few individuals.
However, training models requires a vast amount of costly hardware; Meta said that to train Llama 3.1, their latest flagship open source model needed over 16,000 of Nvidia’s H100, which means spending up to $640 million. This means that there is limited competition in this space. The training techniques will probably see a lot of research and development in the following months and years, pushing forward alternatives to transformers like Mamba or RWKV, which could simplify and make training or fine-tuning of models less expensive and more accessible to anyone.
Another important aspect that we did not discuss in this article but is in some ways crucial regards all the ethical questions and the license aspects of the GenAI technologies, either because new regulations are becoming a fact, like the European AI Act, or because the definition of what really is open source for AI is still changing, thanks to the work of the OSI Foundation.
As the field continues to grow, we can expect even more innovative solutions and improvements in the existing tools, further lowering the barrier to entry for developers looking to leverage AI in their applications. This means that in the following years, most of the codebase will have some AI-generated parts, saving developers time to focus on the business logic. Still, it also means opening the doors to malicious, insecure code and increasing the supply chain risks. OpenSSF has a dedicated working group that deals with AI/ML security.
We have entered a new era for society: these technologies have the potential to impact society the same way the Internet has.
We are lucky to be technologists and scientists in this era when open source is the predominant development model, and everyone has the opportunity to participate in it.
This is the movement's most significant success: a goal that seemed impossible when open source was taking its first steps on the market.
Similar Articles
Browse Categories
Newsletter Blog Announcements My Open Source Insights Blog 2024 LF Europe