Name	Provider	Purpose	Expiration	Type
c3kie	microstream.one	Saves the consent status of the user whether the cookie window should be displayed.	180 days	HTTP Cookie
c3kie_googleAnalytics	microstream.one	Saves the consent status of the user as to whether Google Analytics is allowed to run.	1 year	HTML Local Storage
c3kie_tagManager	microstream.one	Saves the consent status of the user as to whether Google Tag Manager is allowed to run.	1 year	HTML Local Storage
c3kie_facebook	microstream.one	Saves the consent status of the user as to whether Facebook is allowed to run.	1 year	HTML Local Storage
c3kie_matomo	microstream.one	Saves the consent status of the user as to whether Matomo is allowed to run.	1 year	HTML Local Storage
c3kie_clarity	microstream.one		1 year	HTML Local Storage
c3kie_tiktok	microstream.one		1 year	HTML Local Storage
c3kie_youtube	microstream.one	Saves the consent status of the user as to whether YouTube is allowed to run.	1 year	HTML Local Storage
c3kie_linkedin	microstream.one	Saves the consent status of the user as to whether LinkedIn is allowed to run.	1 year	HTML Local Storage

Name	Provider	Purpose	Expiration	Type
_ga	Google Tag Manager	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.	2 years	HTTP Cookie
_ga_	Google Tag Manager	Collects data on how often a user visited a website, as well as data on the first and last visit. Used by Google Analytics.	2 years	HTTP Cookie
_dc_gtm_	Google	Used by Google Analytics to limit the request rate.	1 year	HTTP Cookie
_gid_	Google	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.	2 years	HTTP Cookie
_gcl_au	Google	Used to send data to Google Analytics about the device and visitor behavior. Captures the visitor across devices and marketing channels.	Session	Pixel Tracker
_gat_	Google	Used to store a unique user ID.	1 day	HTTP Cookie
_gat_gtag_UA_	Google	Used to store a unique user ID.	1 day	HTTP Cookie
_fbp	Facebook	Used to store and track visits to websites.	3 months	HTTP Cookie
MATOMO_SESSID	microstream.one	Registers a unique ID to create statistics about user behaviour.	Session	HTTP Cookie
_clck	Microsoft	Persists the Clarity User ID and preferences, unique to that site is attributed to the same user ID.	1 year	HTTP Cookie
_clsk	Microsoft	Connects multiple page views by a user into a single Clarity session recording.	1 year	HTTP Cookie
CLID	Microsoft	Identifies the first-time Clarity saw this user on any site using Clarity.	1 year	HTTP Cookie
ANONCHK	Microsoft	Indicates whether MUID is transferred to ANID, a cookie used for advertising. Clarity doesn't use ANID and so this is always set to 0.	1 year	HTTP Cookie
MR	Microsoft	Indicates whether to refresh MUID.	1 year	HTTP Cookie
MUID	Microsoft	Identifies unique web browsers visiting Microsoft sites. These cookies are used for advertising, site analytics, and other operational purposes.	1 year	HTTP Cookie
SM	Microsoft	Used in synchronizing the MUID across Microsoft domains.	1 year	HTTP Cookie

Name	Provider	Purpose	Expiration	Type
yt-player-headers-readable	YouTube	Used to determine the optimal video quality based on the visitor's device and network settings.	Persistent	HTML Local Storage
VISITOR_INFO1_LIVE	YouTube	Tries to estimate the range of users on pages with built-in YouTube videos.	179 days	HTTP Cookie
YSC	YouTube	Registers a unique ID to keep statistics on which videos from YouTube the user has seen.	Session	HTTP Cookie
yt.innertube::nextId	YouTube	Registers a unique ID to keep statistics on which videos from YouTube the user has seen.	Persistent	HTML Local Storage
yt.innertube::requests	YouTube	Registers a unique ID to keep statistics on which videos from YouTube the user has seen.	Persistent	HTML Local Storage
ytidb::LAST_RESULT_ENTRY_KEY	YouTube	Saves the user's video player settings with embedded YouTube video.	Persistent	HTML Local Storage
yt-remote-cast-available	YouTube	Saves the user's video player settings with embedded YouTube video.	Session	HTML Local Storage
yt-remote-cast-installed	YouTube	Saves the user's video player settings with embedded YouTube video.	Session	HTML Local Storage
yt-remote-connected-devices	YouTube	Saves the user's video player settings with embedded YouTube video.	Persistent	HTML Local Storage
yt-remote-device-id	YouTube	Saves the user's video player settings with embedded YouTube video.	Persistent	HTML Local Storage
yt-remote-fast-check-period	YouTube	Saves the user's video player settings with embedded YouTube video.	Session	HTML Local Storage
yt-remote-session-app	YouTube	Saves the user's video player settings with embedded YouTube video.	Session	HTML Local Storage
yt-remote-session-name	YouTube	Saves the user's video player settings with embedded YouTube video.	Session	HTML Local Storage

4 minute read

The optimal Lazy List size

March 10, 2023

As you know by now, MicroStream has the concept of a Lazy object reference. The data within such a lazy object is not loaded into memory when the StorageManager starts. It is a proxy that, when accessed, can read the data when needed.

The question is now, if we have a large list of data, what is the ideal size of those individual Lazy Lists? That is the question we will try to answer in this blog and some considerations for your project.

List Size matters

If we have a large list of data, we need to make some segmentation decisions. I think it is obvious that loading the list all at once, or loading each individual item are not viable solutions.

Loading the list all at once will load of course way too much data into memory. You probably only need a part of the entire list to be able to respond to the user request. And when your list is huge, it might not even fit into the JVM heap unless we make it extremely large.

And loading each individual item separately looks good from a memory usage point of view, but each Lazy reference also takes up some memory. When having millions of Lazy references, and they are loaded when the StorageManager is created, is also a significant amount of memory.

So we need something between 1 and the entire list of several million. But is there an ideal size?

In code, we will have something like

Map<EntityDiscrimination, Lazy<List<Entity>>> data

where we make a segmentation of our data, hold within the Entity class, based on some grouping represented by the EntityDiscrimination key of the Map. This allows us to access a subset of our data corresponding to some criteria in the EntityDiscrimination.

Testing it out

So what better way do we have than testing out a scenario where we vary the number of items in a list. The following experiment is carried out. We have a list of 10 million numbers for which we need to calculate the average value.

In this test, we access all data which is probably not the use case that you have. But it will give us some insight into the performance impact of the lazy List size for exactly the same set of data.

And we timed the case when we have 5 elements in the List, 10, 50, etc up to 10 million, so having all items in 1 lazy list.

We have the time required to start the StorageManager, and the time to access the data within the Lazy list(s). I’m not showing the actual values, only the graph, as the numbers don’t really matter, only the trend that we can see within the results.

Having a lot of small lists is not performant. And that is not a surprise. If our program access a Lazy List, it needs to be loaded into the memory, scanning the data storage for the required data. And if we need to do that a million times instead of a thousand times, that results in a performance difference.

And from the graph, we also see that a List of 500 items and more is the most efficient size. There is a very slight indication that very large lists are again a bit less efficient but that is difficult to prove based on the current setup.

Choose a Large list?

As indicated earlier on, having larger lists is more memory efficient since we have fewer Lazy instances which also take up some memory. But also mentioned that loading a large list into memory might be problematic because you use a lot of memory and probably don’t need all the data for a user request.

So there is some kind of optimal value that will be application dependent.

I can also show the following graph, where we have tested a similar scenario but we just processed a List with 5, 10, etc .. up to 10 million items in a Lazy reference.

The results show a nice linear relationship between the list size and the time required to handle it. So our algorithm within MicroStream is of order O(n) which is not too bad.

Therefore, it is no surprise that loading a lazy reference with fewer items is more efficient.

Choose the optional size

So what are the criteria to choose the optimal size?

First, you must make a segmentation based on an application requirement. That is, make a grouping that makes sense for your scenario. Like all the ‘active’ orders per customer so that you do not need to load all orders of that customer.

This might mean that you need different ‘indexes’ for the same data set. You can have different maps that are holding the orders, and you access the one that gives you the data in the most efficient way. And remember, MicroStream works with references so that the same order is only loaded once if accessed through different indexes.

Making lists smaller is efficient, but not too small as the memory consumption grows. An average size of 500 to 1000 is probably the most efficient in a wide range of scenarios.

You can make use of Apache Lucene for example to efficiently define the Map value for the index you are using if the Map value is not based on simple values like Customer Id and Order status.

Conclusion

The Lazy option of MicroStream allows you to load only the data that is needed, and not everything when the StorageManager starts. However, using a lot of Lazy instances take up memory and must be avoided also. Larger blocks have the drawback that they load slower and probably read more data than needed. Some tests indicate that a List of 500 to 1000 values was ideal, but this might be different for your data.

MicroStream Cloud is now online!

March 8, 2023

Training

April 26 – 17:00 – 21:00 CEST MicroStream Fundamentals Course

March 13, 2023

The optimal Lazy List size

List Size matters

Testing it out

Choose a Large list?

Choose the optional size

Conclusion

Leave a Reply Cancel reply

Previous Post

MicroStream Cloud is now online!

Next Post

April 26 – 17:00 – 21:00 CEST MicroStream Fundamentals Course

Upcoming Event

Products

Community

Resources

Services

Company

Stay Connected

The optimal Lazy List size

List Size matters

Testing it out

Choose a Large list?

Choose the optional size

Conclusion

Leave a Reply Cancel reply

Previous Post

MicroStream Cloud is now online!

Next Post

April 26 – 17:00 – 21:00 CEST MicroStream Fundamentals Course

Related Posts

MicroStream 7 Overview

MicroStream at WeAreDevelopers World Congress

MicroStream as Spring Cache provider