Comparing Haystack and LlamaIndex: Two Popular LLM Frameworks

Haystack Jul 12, 2024
Comparing Haystack and LlamaIndex

As large language models (LLMs) become increasingly important for AI applications, developers need robust frameworks to build with these models efficiently. Two popular options are Haystack and LlamaIndex. In a recent technical discussion, our team compared these frameworks based on hands-on experience building a toy chat bot REST API. Here are some key takeaways:

Architecture and Abstraction Level

Haystack provides a more opinionated, "on-rails" experience with clearly defined high-level components. It uses a pipeline architecture where you connect components together. This can make development more straightforward, but also less flexible. Custom components can be added easily but must follow exactly the input and output format expected by Haystack.

LlamaIndex offers a lower-level, more flexible approach. You have more freedom in how you connect different pieces, but this comes with increased complexity. As one developer noted, "LlamaIndex just feels a bit more low level...you're dealing with more of the details of the objects in the code rather than just using these higher level abstractions."

Here is the code for a ChatBot class using both frameworks. They are very similar on the surface. In Haystack, pipelines and components have a very uniform API: a run method is always provided to trigger the pipeline or component to perform its actions as seen below. In LlamaIndex, the call signature is dependent on the specific objects involved. In Haystack, there is a component for rendering prompts; whereas in LlamaIndex, this is not provided. The PromptBuilder in Haystack is just a thin wrapper around Jinja2 and does not save many LOC in practice. LlamaIndex leaves this detail to the developer, reflecting the more low level, DIY design philosophy.

Haystack:

class ChatBot:
    def __init__(self, url: str, model: str = DEFAULT_CHAT_MODEL) -> None:
        self.model = model
        self.url = url
        self.prompt_builder = PromptBuilder(template=DEFAULT_PROMPT_TEMPLATE)
        self.chat_llm = OllamaChatGenerator(model=self.model, url=self.url)

    def run(
        self, message: ChatMessageInput, history: list[ChatMessage], context: dict
    ) -> ChatMessageInput:
        prompt = self.prompt_builder.run(message=message.content, **context)
        prompt_message = _ChatMessage(content=prompt["prompt"], role=ChatRole.USER, name=None)
        _history = [_ChatMessage(content=m.content, role=m.role, name=None) for m in history]
        resp = self.chat_llm.run(_history + [prompt_message])
        reply_msg_input = ChatMessageInput(content=resp["replies"][0].content, role=ChatRole.ASSISTANT)
        return reply_msg_input

LlamaIndex:

class ChatBot:
    def __init__(
        self,
        url: str,
        model: str = DEFAULT_CHAT_MODEL,
        prompt_template: Template = DEFAULT_PROMPT_TEMPLATE,
    ) -> None:
        self.model = model
        self.url = url
        self._prompt_template = prompt_template
        self._chat_llm = Ollama(model=self.model, base_url=self.url, request_timeout=120.0)

    def chat(
        self, message: ChatMessageInput, history: list[ChatMessage], context: dict
    ) -> ChatMessageInput:
        prompt = self._prompt_template.render(message=message.content, **context)
        prompt_msg = _ChatMessage(content=prompt, role=MessageRole.USER)
        _history = [_ChatMessage(content=m.content, role=m.role) for m in history]
        resp = self._chat_llm.chat(messages=_history + [prompt_msg])
        reply_msg_input = ChatMessageInput(content=resp.message.content, role=MessageRole.ASSISTANT)
        return reply_msg_input

The code samples below present a VectorDatabase class implemented in both frameworks. This class orchestrates operations against the underlying vector store. The pipeline oriented design of Haystack is clearly evidenced here. It's easily understandable and predictable for anyone reading the code familiar with this paradigm. On the other hand, the Haystack code is rather verbose as we require many api calls to setup the declarative pipelines. Conversely, LlamaIndex is not structured in terms of pipelines and as a result the constructor code is much simpler: it defines a VectorStoreIndex from the WeaviateVectorStore we provide which will then be used for all subsequent operations.

Haystack:

class VectorDatabase:
    def __init__(
        self,
        document_store: WeaviateDocumentStore,
        embedding_model: str = DEFAULT_EMBEDDING_MODEL,
        similarity_top_k: int = 10,
    ) -> None:
        self._document_store = document_store
        self._embedding_model = embedding_model
        self._similarity_top_k = similarity_top_k
        self._indexing_pipeline = self._build_indexing_pipeline()
        self._query_pipeline = self._build_query_pipeline()

    def _build_indexing_pipeline(self) -> Pipeline:
        pipeline = Pipeline()
        pipeline.add_component("loader", PyPDFToDocument())
        pipeline.add_component("cleaner", DocumentCleaner())
        pipeline.add_component(
            "splitter", DocumentSplitter(split_by="sentence", split_length=20, split_overlap=5)
        )
        pipeline.add_component("embedder", OllamaDocumentEmbedder(model=self._embedding_model))
        pipeline.add_component("writer", DocumentWriter(document_store=self._document_store))
        pipeline.connect("loader", "cleaner")
        pipeline.connect("cleaner", "splitter")
        pipeline.connect("splitter", "embedder")
        pipeline.connect("embedder", "writer")
        return pipeline

    def _build_query_pipeline(self) -> Pipeline:
        pipeline = Pipeline()
        text_embedder = OllamaTextEmbedder(model=self._embedding_model)
        pipeline.add_component("text_embedder", text_embedder)
        pipeline.add_component(
            "retriever", WeaviateEmbeddingRetriever(
                document_store=self._document_store,
                top_k=self._similarity_top_k,
            )
        )
        pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
        return pipeline

LlamaIndex:

class VectorDatabase:
    def __init__(
        self,
        vector_store: WeaviateVectorStore,
        embedding_model: str = DEFAULT_EMBEDDING_MODEL,
        similarity_top_k: int = 10,
    ) -> None:
        self._vector_store = vector_store
        self._embedding_model = OllamaEmbedding(
            model_name=embedding_model,
            base_url=os.getenv("OLLAMA_URL", "http://localhost:11434"),
        )
        self._similarity_top_k = similarity_top_k
        self._storage_context = StorageContext.from_defaults(vector_store=vector_store)
        self._index = VectorStoreIndex(
            nodes=[],
            storage_context=self._storage_context,
            embed_model=self._embedding_model,
        )

Loading the data with the VectorDatabase.load_data method looks as below. In the Haystack version, we use the indexing pipeline we set up and just pass data to it. The workflow has already been defined previously so this calling code is quite simple. In the LlamaIndex version, we have not defined the indexing workflow ahead of time with a pipeline as in the Haystack case, and as a result the code appears with a more imperative flavour. However, the code is still very readable and there isn't any real difference in LOC terms.

Haystack:

    def load_data(self, data_dir: str) -> None:
        pdf_paths = [
            os.path.join(data_dir, file) for file in os.listdir(data_dir) if file.endswith(".pdf")
        ]
        for path in pdf_paths:
            self._indexing_pipeline.run({"sources": [path]})

LlamaIndex:

    def load_data(self, data_dir: str) -> None:
        papers = SimpleDirectoryReader(data_dir).load_data()
        parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
        nodes = parser.get_nodes_from_documents(papers)
        self._index.insert_nodes(nodes)

Querying the data with the VectorDatabase.query method is then as below. Again, in Haystack, we use the pipeline we already defined resulting in a more declarative workflow: return the results of the predefined pipeline using these arguments. In LlamaIndex we have to do a bit more work in this method, reproducing the steps which are defined in the pipeline in the Haystack case: first we embed the query; then we create a query object based on the embedding and some other query parameters; and finally, we submit the query to the vector index and return the results.

Haystack:

    def query(self, query: str, filters: _t.Optional[dict] = None) -> list[Document]:
        _filters = filters or {}
        results = self._query_pipeline.run(
            {"text_embedder": {"text": query}, "retriever": {"filters": _filters}}
        )
        documents = results["retriever"]["documents"]
        return documents

LlamaIndex:

    def query(self, query: str, filters: _t.Optional[MetadataFilters] = None) -> list[BaseNode]:
        query_embedding = self._embedding_model.get_text_embedding(query)
        vdb_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            filters=filters,
        )
        query_result = self._vector_store.query(vdb_query)
        return query_result.nodes

Ease of Use vs Flexibility

Haystack's opinionated approach makes it easier to get started and ensures consistent patterns across projects. However, it may be limiting for more custom use cases.

LlamaIndex provides more flexibility, allowing developers to structure their code as they see fit. But this comes at the cost of having to make more decisions and potentially inconsistent code across projects.

Ecosystem and Community

LlamaIndex appears to have a larger, more active community. It has more GitHub stars and contributors compared to Haystack. This translates to a wider range of components and integrations available.

However, Haystack's smaller, more curated ecosystem means potentially higher quality and more consistent components. With LlamaIndex, there's a greater variety but also more uncertainty about the quality and maintenance of community contributions.

Maintenance and Longevity

There were some concerns raised about Haystack's long-term sustainability due to its smaller community. As one developer put it, "I have a bit of a concern about Haystack's sustainability. Right now the project is healthy, but we need to keep an eye on it going forward."

LlamaIndex's larger community could mean better long-term maintenance, but it also leads to a more rapidly changing ecosystem that could introduce instability.

Dependencies and Project Structure

An important aspect to consider when choosing a framework is its dependency footprint. This affects not only the initial setup but also ongoing maintenance, Docker image sizes, and potential conflicts with other libraries. Let's examine the differences in dependencies between Haystack and LlamaIndex based on the pyproject.toml file:

Haystack:

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "0.110.3"
haystack-ai = "^2.1.2"
ollama-haystack = "^0.0.6"
pypdf = "^4.2.0"
uvicorn = "^0.29.0"
weaviate-client = "^4.5.7"
weaviate-haystack = "^2.0.0"

LlamaIndex:

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "0.110.3"
jinja2 = "^3.1.4"
llama-index-core = "^0.10.38.post2"
llama-index-embeddings-ollama = "^0.1.2"
llama-index-llms-ollama = "^0.1.5"
llama-index-readers-file = "^0.1.23"
llama-index-vector-stores-weaviate = "^1.0.0"
pypdf = "^4.2.0"
uvicorn = "^0.29.0"
weaviate-client = "^4.5.7"

Modular vs Monolithic Approach

LlamaIndex takes a more modular approach to dependencies. Instead of a single, large package, it splits functionality into smaller, focused packages (e.g., llama-index-core, llama-index-embeddings-ollama, etc.). This modularity offers several advantages:

  1. Smaller Docker Images: You can include only the components you need, potentially resulting in smaller Docker images.
  2. Flexibility: It's easier to swap out or upgrade individual components without affecting the entire system.
  3. Reduced Conflict Potential: Smaller, focused packages are less likely to conflict with other libraries in your project.

Haystack, on the other hand, uses a more monolithic approach with the haystack-ai package. This can simplify initial setup but may lead to larger Docker images and less flexibility in terms of including only needed components.

Conclusion

Both Haystack and LlamaIndex are capable frameworks for building LLM-powered applications. Haystack offers a more structured, opinionated approach that can be great for teams wanting a consistent, guided experience. LlamaIndex provides more flexibility and a larger ecosystem, which can be beneficial for more custom or cutting-edge use cases.

The choice between them will depend on your specific needs, team expertise, and how much you value structure versus flexibility. As with any rapidly evolving technology, it's important to keep an eye on the health and direction of these projects as you make your decision.

Brainpool AI

Brainpool is an artificial intelligence consultancy specialising in developing bespoke AI solutions for business.