Not all Data are Equally

Avatar

All enterprise applications are using data at their core. There are maybe a few applications that interact with several services and aggregate/ process them, but the vast majority are maintaining data. That can be from the shopping basket on a webshop, production planning data in a factory, or data events for automotive or health applications.

But not all data that your application is using, are the same. Not every request makes use of all your data. So what does this mean for your implementation?

Different types of data

Your application can have a lot of data, gigabytes or even terabytes of data. But all of them are not needed every time, all the time. And due to the different usage frequencies, you might need to apply a different strategy for how you store and access them. This will assure the optimal solution, especially in terms of performance and user experience, especially when you have a lot of data.

When you only have a small dataset, there is no problem to keep them all of them in memory, all the time, with MicroStream.

So let’s discuss the different types more in detail. You can roughly identify 3 types of data.

Core data

The core data or hot data are needed almost every time. For example, in a web shop, these are the products that you sell. They are needed when people search for what they need in our example. So, they should be easily and fast available.

In traditional environments, these data are often cached so that they don’t need to be loaded from the external system, the database, or the noSQL solution, every time.
When you are using MicroStream, these data should not be lazily loaded so that they are available in the heap.

Although this seems similar to using a cache, there are important differences when using MicroStream. Using a cache is a problem to solve the slow access of your data using databases and noSQL solutions. The main reason for this slowness is the latency (remote storage) and conversion to the specific format of the system. With MicroStream, the storage of your data as plain porous within the JVM heap is the default operational modus. Once loaded, data is available together with your program statements and thus accessible extremely fast as no loading nor conversion is needed.

Depending on your situation, this data corresponds with 10 to 30% of your total data set.

Request Data

Besides the core data, requests need some specific data to fulfill the requests. These can be the account data related to the shopping basket handling in our webshop example. The account data of all our shop users are not needed all the time. Only the data specific to the user making the purchase is needed at the time. And most of the time, we don’t need the data at all.

This type of data is ideal to lazily load with MicroStream. Once we have identified what data we need to load into memory, this can be through various _indexing_ mechanisms like a simple `HashMap` or by using Apache Lucene if we want to search for the required data. And once loaded, it can stay there for the time needed when the user finishes his interaction with your application. Lazily loaded data within MicroStream are removed after a certain time when not accessed or can be marked for removal at the end of the user session.

This lazy loading functionality of MicroStream is still more efficient than regular database or noSQL solutions.

This data corresponds with 20 to 50% of your total data set, depending on your situation.

Historical data

The last category of data is historical data which is only required in rare cases. Access to this type can be initiated by the user, like requesting his order history, or by background jobs like monthly calculations on your platform. These data don’t need to be highly available and it is fine that it take ‘a bit of time’ to retrieve them. The Lazy loading feature of MicroStream can still be a way of loading the historical data. But even when you are using MicroStream for the Core and request data, you might decide to use an alternative solution. Storing these historical data in a database or data warehouse solution makes sense. Since this kind of data is used for various types of purposes and not only online access, these environments make sense.

Again depending on the specific case, the amount might vary between 30 to 70% of your total data set.

Conclusion

Not all your data is used for the same purpose or scenario. Some data are essential for every request and others are only needed for batch processing with low-performance requirements. So it is only natural that you make your data available in different ways, corresponding to the accessibility requirements of the data. MicroStream with its two types of data access, in memory available or lazily loaded on demand, is ideal to have a single system that can provide your data using different characteristics.
But the MicroStream focus on easy and data available with high performance does not exclude that you explore other solutions, like storing the historical data. A database or data warehouse is a logical choice for that type of data.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

A peek into upcoming version 8.0

Next Post

MicroStream Cloud is now online!

Related Posts
Secured By miniOrange