EclipseStore 4 Beta: Build Vector Database Apps with Pure Java

Markus Kett

In the rapidly evolving landscape of GenAI, Java developers have often been forced to look outside their ecosystem for specialized vector databases. With the new EclipseStore 4 (version 4.0.0 Beta 1) release, that gap has now been closed.

By integrating JVector, we’ve transformed EclipseStore into a complete, pure Java vector database. Our goal was to provide the Java community with a high-performance, embedded solution that allows building GenAI applications completely without the operational overhead of external vector databases.

Understanding the Tech: RAG and Vectors

Retrieval-Augmented Generation (RAG) is the current standard for building reliable GenAI apps. It works by retrieving relevant facts from a private dataset and providing them to an LLM (like GPT-4) to ensure the model’s response is grounded in reality rather than “hallucinations.” To make this retrieval possible, we use vector data.

  • What are vectors? Vectors are long lists of numbers (embeddings) that represent the meaning of a piece of data – be it text, an image, or a POJO.
  • Vectorization: This process uses an embedding model to map data into a multi-dimensional space where similar concepts are mathematically close to one another.
  • The Challenge: Storing and searching millions of these high-dimensional points efficiently requires specialized indexing, which is where EclipseStore version 4 comes in.

Enter JVector

JVector is a state-of-the-art, high-performance Java library designed specifically for vector similarity search. It utilizes the HNSW (Hierarchical Navigable Small World) algorithm, providing sub-linear search times and high recall. Unlike many other libraries, JVector is optimized for the modern JVM, leveraging the Panama SIMD API for lightning-fast distance calculations. It’s designed to be “disk-aware,” meaning it can handle datasets much larger than your available RAM – a perfect match for the EclipseStore philosophy.

The Integration: GigaMap meets JVector

The core of EclipseStore’s massive-scale data handling is the GigaMap. In version 4, we have integrated JVector directly into the GigaMap indexing system. The GigaMap now supports a specialized vector index. When you store your Java object graph, you can now index the associated vector embeddings seamlessly. This combination provides:

  • Java-Native Persistence: Your vectors and your business objects live together in the same native Java heap/storage.
  • Lazy Entity Access: Search results from the vector index provide direct, lazy-loaded access to your entities. No more manual lookups or mapping between a vector ID and your database.
  • On-Disk Indexing: Utilizing memory-mapped files to ensure queries remain fast even when the index exceeds memory limits.
  • PQ Compression: Integrated Product Quantization to reduce the memory footprint of your embeddings by up to 90%.

Distributed GenAI Apps

While EclipseStore 4 is a powerful embedded solution for a single JVM, modern enterprise apps often require distribution. This is where Eclipse Data Grid (EDG) comes in.

EDG is a software-as-code Eclipse open source project. Instead of managing a complex platform, you simply include the EDG library in your single-JVM EclipseStore app. When you deploy this JAR into an Eclipse Data Grid environment (like a Kubernetes cluster), EDG automatically handles:

  1. Scaling: Running your app across multiple nodes.
  2. Data Replication: Ensuring your vector data and object graph are synchronized across the cluster.
  3. High Availability: Managing node failures without data loss.

This allows you to build a sophisticated, distributed GenAI application as if you were writing a simple local Java program.

What Can You Build Now?

With the combination of EclipseStore + JVector + EDG, the possibilities for Java-centric AI are massive:

  • Semantic Document Search: Build a corporate “brain” that indexes millions of PDFs and allows employees to ask questions in natural language.
  • Recommendation Engines: Real-time product recommendations based on user behavior vectors, fully replicated for high-traffic web stores.
  • AI-Powered Fraud Detection: Compare transaction patterns against known fraud vectors in microseconds.
  • Image/Media Similarity: Search through vast media libraries by comparing visual embeddings natively in the JVM.

Your Benefits: You get the performance of a C++ engine with the simplicity of POJOs, zero impedance mismatch, and a deployment model that fits perfectly into your existing CI/CD pipeline.

A Community Effort

Everything mentioned here – EclipseStore, JVector, and Eclipse Data Grid – is open source. These projects are the result of passionate teams working to make the Java ecosystem the best place for modern data processing.

We want to extend a massive “Thank You” to the makers of JVector for their incredible work on the core search algorithms.

If you find value in what we’re building, the best way to support us is to star the projects on GitHub. For an open-source developer, those stars are the fuel that keeps the passion alive.

Check out the release on GitHub:

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

EclipseStore 3 with GigaMap is Now Available

Related Posts
Secured By miniOrange