Comparing Haystack and LlamaIndex: Two Popular LLM Frameworks
As large language models (LLMs) become increasingly important for AI applications, developers need robust frameworks to build with these models efficiently. Two popular options are Haystack and LlamaIndex. In a recent technical discussion, our team compared these frameworks based on hands-on experience building a toy chat bot REST API. Here are some key takeaways:
Architecture and Abstraction Level
Haystack provides a more opinionated, "on-rails" experience with clearly defined high-level components. It uses a pipeline architecture where you connect components together. This can make development more straightforward, but also less flexible. Custom components can be added easily but must follow exactly the input and output format expected by Haystack.
LlamaIndex offers a lower-level, more flexible approach. You have more freedom in how you connect different pieces, but this comes with increased complexity. As one developer noted, "LlamaIndex just feels a bit more low level...you're dealing with more of the details of the objects in the code rather than just using these higher level abstractions."
Here is the code for a ChatBot
class using both frameworks. They are very similar on the surface. In Haystack, pipelines and components have a very uniform API: a run
method is always provided to trigger the pipeline or component to perform its actions as seen below. In LlamaIndex, the call signature is dependent on the specific objects involved. In Haystack, there is a component for rendering prompts; whereas in LlamaIndex, this is not provided. The PromptBuilder
in Haystack is just a thin wrapper around Jinja2 and does not save many LOC in practice. LlamaIndex leaves this detail to the developer, reflecting the more low level, DIY design philosophy.
Haystack:
class ChatBot:
def __init__(self, url: str, model: str = DEFAULT_CHAT_MODEL) -> None:
self.model = model
self.url = url
self.prompt_builder = PromptBuilder(template=DEFAULT_PROMPT_TEMPLATE)
self.chat_llm = OllamaChatGenerator(model=self.model, url=self.url)
def run(
self, message: ChatMessageInput, history: list[ChatMessage], context: dict
) -> ChatMessageInput:
prompt = self.prompt_builder.run(message=message.content, **context)
prompt_message = _ChatMessage(content=prompt["prompt"], role=ChatRole.USER, name=None)
_history = [_ChatMessage(content=m.content, role=m.role, name=None) for m in history]
resp = self.chat_llm.run(_history + [prompt_message])
reply_msg_input = ChatMessageInput(content=resp["replies"][0].content, role=ChatRole.ASSISTANT)
return reply_msg_input
LlamaIndex:
class ChatBot:
def __init__(
self,
url: str,
model: str = DEFAULT_CHAT_MODEL,
prompt_template: Template = DEFAULT_PROMPT_TEMPLATE,
) -> None:
self.model = model
self.url = url
self._prompt_template = prompt_template
self._chat_llm = Ollama(model=self.model, base_url=self.url, request_timeout=120.0)
def chat(
self, message: ChatMessageInput, history: list[ChatMessage], context: dict
) -> ChatMessageInput:
prompt = self._prompt_template.render(message=message.content, **context)
prompt_msg = _ChatMessage(content=prompt, role=MessageRole.USER)
_history = [_ChatMessage(content=m.content, role=m.role) for m in history]
resp = self._chat_llm.chat(messages=_history + [prompt_msg])
reply_msg_input = ChatMessageInput(content=resp.message.content, role=MessageRole.ASSISTANT)
return reply_msg_input
The code samples below present a VectorDatabase
class implemented in both frameworks. This class orchestrates operations against the underlying vector store. The pipeline oriented design of Haystack is clearly evidenced here. It's easily understandable and predictable for anyone reading the code familiar with this paradigm. On the other hand, the Haystack code is rather verbose as we require many api calls to setup the declarative pipelines. Conversely, LlamaIndex is not structured in terms of pipelines and as a result the constructor code is much simpler: it defines a VectorStoreIndex
from the WeaviateVectorStore
we provide which will then be used for all subsequent operations.
Haystack:
class VectorDatabase:
def __init__(
self,
document_store: WeaviateDocumentStore,
embedding_model: str = DEFAULT_EMBEDDING_MODEL,
similarity_top_k: int = 10,
) -> None:
self._document_store = document_store
self._embedding_model = embedding_model
self._similarity_top_k = similarity_top_k
self._indexing_pipeline = self._build_indexing_pipeline()
self._query_pipeline = self._build_query_pipeline()
def _build_indexing_pipeline(self) -> Pipeline:
pipeline = Pipeline()
pipeline.add_component("loader", PyPDFToDocument())
pipeline.add_component("cleaner", DocumentCleaner())
pipeline.add_component(
"splitter", DocumentSplitter(split_by="sentence", split_length=20, split_overlap=5)
)
pipeline.add_component("embedder", OllamaDocumentEmbedder(model=self._embedding_model))
pipeline.add_component("writer", DocumentWriter(document_store=self._document_store))
pipeline.connect("loader", "cleaner")
pipeline.connect("cleaner", "splitter")
pipeline.connect("splitter", "embedder")
pipeline.connect("embedder", "writer")
return pipeline
def _build_query_pipeline(self) -> Pipeline:
pipeline = Pipeline()
text_embedder = OllamaTextEmbedder(model=self._embedding_model)
pipeline.add_component("text_embedder", text_embedder)
pipeline.add_component(
"retriever", WeaviateEmbeddingRetriever(
document_store=self._document_store,
top_k=self._similarity_top_k,
)
)
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
return pipeline
LlamaIndex:
class VectorDatabase:
def __init__(
self,
vector_store: WeaviateVectorStore,
embedding_model: str = DEFAULT_EMBEDDING_MODEL,
similarity_top_k: int = 10,
) -> None:
self._vector_store = vector_store
self._embedding_model = OllamaEmbedding(
model_name=embedding_model,
base_url=os.getenv("OLLAMA_URL", "http://localhost:11434"),
)
self._similarity_top_k = similarity_top_k
self._storage_context = StorageContext.from_defaults(vector_store=vector_store)
self._index = VectorStoreIndex(
nodes=[],
storage_context=self._storage_context,
embed_model=self._embedding_model,
)
Loading the data with the VectorDatabase.load_data
method looks as below. In the Haystack version, we use the indexing pipeline we set up and just pass data to it. The workflow has already been defined previously so this calling code is quite simple. In the LlamaIndex version, we have not defined the indexing workflow ahead of time with a pipeline as in the Haystack case, and as a result the code appears with a more imperative flavour. However, the code is still very readable and there isn't any real difference in LOC terms.
Haystack:
def load_data(self, data_dir: str) -> None:
pdf_paths = [
os.path.join(data_dir, file) for file in os.listdir(data_dir) if file.endswith(".pdf")
]
for path in pdf_paths:
self._indexing_pipeline.run({"sources": [path]})
LlamaIndex:
def load_data(self, data_dir: str) -> None:
papers = SimpleDirectoryReader(data_dir).load_data()
parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
nodes = parser.get_nodes_from_documents(papers)
self._index.insert_nodes(nodes)
Querying the data with the VectorDatabase.query
method is then as below. Again, in Haystack, we use the pipeline we already defined resulting in a more declarative workflow: return the results of the predefined pipeline using these arguments. In LlamaIndex we have to do a bit more work in this method, reproducing the steps which are defined in the pipeline in the Haystack case: first we embed the query; then we create a query object based on the embedding and some other query parameters; and finally, we submit the query to the vector index and return the results.
Haystack:
def query(self, query: str, filters: _t.Optional[dict] = None) -> list[Document]:
_filters = filters or {}
results = self._query_pipeline.run(
{"text_embedder": {"text": query}, "retriever": {"filters": _filters}}
)
documents = results["retriever"]["documents"]
return documents
LlamaIndex:
def query(self, query: str, filters: _t.Optional[MetadataFilters] = None) -> list[BaseNode]:
query_embedding = self._embedding_model.get_text_embedding(query)
vdb_query = VectorStoreQuery(
query_embedding=query_embedding,
similarity_top_k=self._similarity_top_k,
filters=filters,
)
query_result = self._vector_store.query(vdb_query)
return query_result.nodes
Ease of Use vs Flexibility
Haystack's opinionated approach makes it easier to get started and ensures consistent patterns across projects. However, it may be limiting for more custom use cases.
LlamaIndex provides more flexibility, allowing developers to structure their code as they see fit. But this comes at the cost of having to make more decisions and potentially inconsistent code across projects.
Ecosystem and Community
LlamaIndex appears to have a larger, more active community. It has more GitHub stars and contributors compared to Haystack. This translates to a wider range of components and integrations available.
However, Haystack's smaller, more curated ecosystem means potentially higher quality and more consistent components. With LlamaIndex, there's a greater variety but also more uncertainty about the quality and maintenance of community contributions.
Maintenance and Longevity
There were some concerns raised about Haystack's long-term sustainability due to its smaller community. As one developer put it, "I have a bit of a concern about Haystack's sustainability. Right now the project is healthy, but we need to keep an eye on it going forward."
LlamaIndex's larger community could mean better long-term maintenance, but it also leads to a more rapidly changing ecosystem that could introduce instability.
Dependencies and Project Structure
An important aspect to consider when choosing a framework is its dependency footprint. This affects not only the initial setup but also ongoing maintenance, Docker image sizes, and potential conflicts with other libraries. Let's examine the differences in dependencies between Haystack and LlamaIndex based on the pyproject.toml
file:
Haystack:
[tool.poetry.dependencies]
python = "^3.12"
fastapi = "0.110.3"
haystack-ai = "^2.1.2"
ollama-haystack = "^0.0.6"
pypdf = "^4.2.0"
uvicorn = "^0.29.0"
weaviate-client = "^4.5.7"
weaviate-haystack = "^2.0.0"
LlamaIndex:
[tool.poetry.dependencies]
python = "^3.12"
fastapi = "0.110.3"
jinja2 = "^3.1.4"
llama-index-core = "^0.10.38.post2"
llama-index-embeddings-ollama = "^0.1.2"
llama-index-llms-ollama = "^0.1.5"
llama-index-readers-file = "^0.1.23"
llama-index-vector-stores-weaviate = "^1.0.0"
pypdf = "^4.2.0"
uvicorn = "^0.29.0"
weaviate-client = "^4.5.7"
Modular vs Monolithic Approach
LlamaIndex takes a more modular approach to dependencies. Instead of a single, large package, it splits functionality into smaller, focused packages (e.g., llama-index-core
, llama-index-embeddings-ollama
, etc.). This modularity offers several advantages:
- Smaller Docker Images: You can include only the components you need, potentially resulting in smaller Docker images.
- Flexibility: It's easier to swap out or upgrade individual components without affecting the entire system.
- Reduced Conflict Potential: Smaller, focused packages are less likely to conflict with other libraries in your project.
Haystack, on the other hand, uses a more monolithic approach with the haystack-ai
package. This can simplify initial setup but may lead to larger Docker images and less flexibility in terms of including only needed components.
Conclusion
Both Haystack and LlamaIndex are capable frameworks for building LLM-powered applications. Haystack offers a more structured, opinionated approach that can be great for teams wanting a consistent, guided experience. LlamaIndex provides more flexibility and a larger ecosystem, which can be beneficial for more custom or cutting-edge use cases.
The choice between them will depend on your specific needs, team expertise, and how much you value structure versus flexibility. As with any rapidly evolving technology, it's important to keep an eye on the health and direction of these projects as you make your decision.