LlamaIndex
An open source data framework and managed parsing platform used to build the retrieval layer over a firm's proprietary corpus for RAG and agent workflows.
llamaindex.ai ↗LLM application framework — ingestion, indexing, retrieval, agents.
SSO + scoped API keys
LangChain · Haystack · Custom vector DB (Pinecone, Weaviate, pgvector)
An open-source data framework for building LLM applications — particularly Retrieval-Augmented Generation (RAG) and agent workflows over private/unstructured data — plus a commercial managed platform (LlamaCloud / LlamaParse).
Python and TypeScript libraries providing data connectors, indexing (e.g., VectorStoreIndex), retrieval, query engines, and an event-driven Workflows architecture for agents. Commercial LlamaCloud/LlamaParse handles document parsing, extraction, classification, splitting, and indexing — including complex PDFs, PowerPoints, images, charts, and tables.
This IS AI infrastructure. Best-in-class document parsing (LlamaParse), RAG pipelines, agentic workflows, hundreds of data connectors, and integrations with major LLMs, embedding models, and vector databases. Model-agnostic. Used to build knowledge agents that search, synthesize, and generate reports over enterprise data.
Open-source framework is free (you pay underlying LLM/embedding/infra costs). LlamaCloud uses a credit system (1,000 credits ≈ $1.25; some sources cite $1) with parsing tiers from ~1 credit/page (fast) to ~60 credits/page (premium). New users get ~10,000 free credits/month; Starter plan ~$50/month (~40,000 credits). VPC/on-prem available.
Hundreds of connectors (LLMs, vector DBs, data sources); Python/TypeScript SDKs. Used by Salesforce (Agentforce), Rakuten, Carlyle, KPMG. Competes with LangChain, AWS Textract, Google Document AI.
LlamaCloud/LlamaParse is SOC 2 Type 2 certified; data encrypted in transit and at rest; GDPR-compliant EU SaaS option with EU data residency; RBAC and SSO for enterprise; VPC/on-prem deployment for higher security.
Founded 2023; co-founders Jerry Liu (CEO) and Simon Suo (CTO). Raised a $19M Series A (Feb 2025) led by Norwest Venture Partners with Greylock, bringing total to $27.5M. 3M+ monthly downloads; 38,000+ GitHub stars.
Developer-oriented — relevant for a firm building custom internal AI tools (e.g., RAG over data rooms, diligence document analysis, portfolio reporting synthesis) and as a reference point for technical diligence on AI-infrastructure investments. Requires engineering resources.
Requires engineering skill (not a turnkey app); costs can be unpredictable at scale (LLM API + parsing credits); premium parsing is expensive at high volume; managing the full production stack adds overhead.