Memory optimization for feeds on Android


Memory optimization for feeds on Android

Reaction to a book or an article

In Java, HashSet can only store objects inherited from objcet. In this case, the “basic data type” is converted to objects inherited from(IntegerLongMany intermediate object objects will be generated and occupy too much memory, which will lead to garbage collection and system jam.

Below is the original text and Translation

Millions of people use Facebook on Android devices every day, scrolling through News Feed, Profile, Events, Pages, and Groups to interact with the people and information they care about. All of these different feed types are powered by a platform created by the Android Feed Platform team, so any optimization we make to the Feed platform could potentially improve performance across our app. We focus on scroll performance, as we want people to have a smooth experience when scrolling through their feeds.

Every day, millions of users use Facebook on Android platform to interact with the people and information they care about by sliding the “news news”, “user profile”, “events”, “pages” and “groups” dynamic lists. All of these different dynamic types are provided by the Android dynamic platform group, so any optimization of our dynamic list may improve the performance of the Facebook app. We focus on the performance of dynamic list sliding. When users slide dynamic list, we hope to provide a smooth user experience for users.

To help us achieve that, we have several automatic tools that run performance tests on the Feed platform, across different scenarios and on different devices, measuring how our code performs in runtime, memory use, frame rate, and more. One of those tools, Traceview, showed a relatively high number of calls to the Long.valueOf() function, which led to objects accumulating in memory and causing the app to stall. This post describes the problem, the potential solutions we weighed, and the set of optimizations we made to improve the Feed platform.

To help us achieve this, we used several automatic tools to test the memory usage and frame rate of dynamic list sliding in different devices and scenarios. Traceview displays in these test toolsLong.valueof()Method is called more times, which makes the memory free objects increase rapidly, thus affecting the running speed of app. In this article, we introduced problems, potential solutions, and optimizations we did on dynamic lists.

The downside of convenience

After noticing the high number of calls to the Long.valueOf() function in one of our method profiling reports from Traceview, we ran further tests that confirmed that as we scrolled through News Feed, there were an unexpectedly high number of invocations for this method.

After analyzing the running report of traceview, we found thatLong.valueof()After the method has been called many times, we have further tests to confirm that when we slide the news dynamic, the number of calls to this method is unexpectedly high.


When we looked at the stacktrace, We found that the function was not being called explicitly from Facebook’s code but implicitly by code inserted by the compiler. This function was called when assigning a primary long value where a long object is expected. Java supports both object and primary representation of the simple types (e.g., integer, long character) and provides a way to seamlessly convert between them. This feature is called autoboxing, because it “boxes” a primitive type into a corresponding Object type. While that’s a convenient development feature, it creates new objects unbeknownst to the developer.

When we look at stacktrace, we find that the code on Facebook does not call the method directly, but implicitly by the code inserted by the compiler. This function is in thelongTranslate intoLongIs called. Java provides “basic data types” (for example:intlong)And inherited fromObjectData types (for example:IntegerLong)Support for transformation. This feature, called autoboxing, converts the “basic data type” to the corresponding object type. This is a convenient development feature, which creates a new object for developers without their awareness.

And these objects add up.

In this heap dump taken from a sample app, LongObject s have a noticeable presence; while each object is not big by itself, there are so many that they occupy a large port of the app’s memory in the heap. This is particularly problematic for devices running the Dalvik runtime. Like art, which is the new Android runtime environment, Dalvik doesn’t have a general garbage collection, known to be more optimal for handling many small objects. As we scroll through news feed and the number of objects growth, the garbage collection will cause the app to pause and sweep (cleaning) unused objects from the memory. The more objects that accumulate, The more frequently the garbage collector will have to pause the app, causing it to stutter and stall and making for a poor user experience

The screenshot is taken from a sample app, whereLongThere is an obvious call. It can be seen from the figure that each object itself is not large, but occupies a large part of APP heap memory. There must be something wrong with that. Unlike the Android runtime (Art virtual machine) environment, the Dalvik virtual machine has no generational garbage collection mechanism when it processes many small objects. As we scroll through news dynamics, the number of objects grows rapidly, and the garbage collection mechanism causes the application to pause and clean up the memory of unreferenced objects. The more objects accumulate, the more frequent garbage collection occurs. During garbage collection, Dalvik will pause the application, causing the app to jam, resulting in a poor user experience.

Luckily, tools such as Traceview and Allocation Tracker can help find where these calls are made. After reviewing the origins of these autoboxing occurrences, we found that the majority of them were done while inserting long values into a HashSetdata structure. (We use this data structure to store hash values of News Feed stories, and later to check if a certain story’s hash is already in the Set.) HashSet provides quick access to its items, An important feature allowing interaction with it as the user is scrolling through news feed. Since the hash is calculated and stored into a primary long variable, and our HashSet works only with objects, we get the inevitable autoboxing when calling setstores.put (lstoryhash)

Fortunately,TraceviewandAllocation TrackerThese tools can help us find these calls. Looking back on theseautoboxingTransformation, we found that most of theautoboxingTransformation takes place inlongInsert intoHashSetWhen. (we useHashSetThis data structure stores “news dynamic” data, and searches whether a “news dynamic” data is in this data structure through hash). When users slide the “news dynamic” list, HashSet provides fast and fast data search capabilities.HashSetOnly objects inherited from object can be stored, not “basic data type”. The object we use for calculation and storage is alongType, whensetStories.put(lStoryHash)Inevitable during operationlongreachLongTransformation.

As a solution, a SetImplementation for primitive types can be used, but that turned out not to be as straightforward (simple) as we expected

One solution can implement a collection that supports “basic data type” data types, but this solution is not as simple as we expected.

Existing solutions

There are a few existing Java libraries that provide a Set implementation for primitive types. Almost all of these libraries were created more than 10 years ago, when the only Java running on mobile devices was J2ME. So, to determine viability, we needed to test them under Dalvik/ART and ensure they could perform well on more constrained mobile devices. We created a small testing framework to help compare these libraries with the existing HashSet and with one another. The results showed that a few of these libraries had a faster runtime than HashSet, and with fewer Long objects, but they still internally allocated a lot of objects. As an example, TLongHashSet, part of the Trove library, allocated about 2 MB worth of objects when tested with 1,000 items:

There are some existing Java libraries for “basic data types”SetRealization. Almost all of these libraries were created more than a decade ago when J2ME was the only device that used java on mobile devices. Therefore, in order to determine whether it is feasible, we need toDalvik/ARTTest on to make sure they can run on more mobile devices. We built a small test framework to implement these existing libraries andHashSetCompare. The results show that some of these libraries are better thanHashSetFaster, createdLongThere are fewer objects, but they still allocate a large number of objects in memory. For example, theTLongHashSet, 1000 items are allocated the object object object of the control with about 2 MB memory.


Testing other libraries, such as PCJ and Colt, showed similar results.

Testing other libraries, such as pcj and colt, yielded similar results.

It was possible that the existing solutions didn’t fit our needs. We considered whether we could instead create a new Set implementation and optimize it for Android. Looking inside Java’s HashSet, there’s a relatively simple implementation using a single HashMap to do the heavy lifting.

The existing solution does not meet our needs. Let’s consider whether we can create an Android optimizedSetRealization. View JavaHashSetSource code implementation, found that its implementation is relatively simple, internal useHashMapTo achieve.

public class HashSet extends AbstractSet implements Set, ... {
    transient HashMap> backingMap;    

    @Override public boolean add(E object) {
        return backingMap.put(object, this) == null;    

    @Override public boolean contains(Object object) {
        return backingMap.containsKey(object);    

Adding a new item to the HashSet means adding it to the internal HashMap where the object is the key and the HashSet‘s instance is the value. To check object membership, HashSet checks whether its internal HashMap contains the object as a key. An alternative to HashSet could be implemented using an Android-optimized map and the same principles.

Add an item toHashSetMeans to add this item toHashMapIn HashMap, the item object exists as key; inHashSetWhen searching for the item object in, it is actually comparingHashMapFind whether the item object exists in the key in. The same set of guidelines are used for data exchange on Android platforms.


You may already be familiar with LongSparseArray, a class in the Android Support Library that acts as a map using a primitive long as a key. Example usage:

You may already be familiar withLongSparseArray, which exists in Android’sSupport LibraryLibrary, as aMapThe key islongType. Use example:

LongSparseArray longSparseArray = new LongSparseArray<>();
longSparseArray.put(3L, "Data");
String data = longSparseArray.get(3L); // the value of data is "Data"

LongSparseArray works differently than HashMap, though. When calling mapHashmap.get(KEY5), this is how the value is found in the HashMap:

LongSparseArrayAndHashMapWorking principle is different. When calledmapHashmap.get(KEY5)When, the following isHashMapSearch principle:


When retrieving a value using a key on a HashMap, it’s accessing the value in the array using a hash of the key as the index, a direct access in O(1). Making the same call on a LongSparseArray looks like this:

stayHashMapWhen retrieving a value in, the time complexity of direct retrieval isO(1)LongSparseArrayWhen doing the same:


LongSparseArray searches a sorted keys array for the key’s value using a binary search, an operation with runtime of O(log N). The index of the key in the array is later used to find the value in the values array.

LongSparseArrayWhen searching a key, the binary search algorithm is used to find the array where the key is storedO(log N)。 The index of the key found is used to find the corresponding value in the value array.

HashMap allocates one big array in an effort to avoid collisions, which are causing the search to be slower. LongSparseArray allocates two small arrays, making its memory footprint smaller. But to support its search algorithm, LongSparseArray needs to allocate its internal arrays in consecutive memory blocks. Adding more items will require allocating new arrays when there’s no more space in the current ones. The way LongSparseArray works makes it less ideal when holding more than 1,000 items, where these differences have a more significant impact on performance. (You can learn more about LongSparseArray in the official documentation and by watching this short video by Google.)

To avoid data conflictsHashMapA large array was created, resulting in a slow search.LongSparseArrayAllocate two smaller arrays to make them less memory intensive. But in order to support its search algorithm,LongSparseArrayContiguous blocks of memory need to be allocated. When adding more items, when the array space is insufficient, new array space needs to be allocated.LongSparseArrayWhen the number of items exceeds 1000, their performance is not very good. These differences have a more significant impact on performance. (want to learn more aboutLongSparseArrayYou can read official documentation and short video provided by Google.

Since the LongSparseArray's keys are of a primitive long type, we can create a data structure with the same approach as HashSet but with a LongSparseArray as the internal map instead of HashMap.

BecauseLongSparseArrayUse basic data typeslongAs a key, we can useLongSparseArrayreplaceHashSetMediumHashMapTo create aHashSetSimilar data structure.

And so LongArraySet was created.

thereforeLongArraySetIt’s created.

The new data structure looked promising, but the first rule of optimization is “always measure.” By using the same testing framework from earlier, we compared the new data structure with HashSet. Each data structure was tested by adding X number of items, checking the existence of each item, and later removing all of them. We ran tests using different numbers of items (X=10, X=100, X=1,000 …) and averaged the time it took to complete each operation per item.

The new data structure looks promising, but the optimal solution is always weighing. Through the small test program introduced between, we put the new data structure andHashSetCompared with each data structure, we add x pieces of data, detect the existence of item data and remove data. We tested using different numbers of items (x = 10, x = 100, x = 1000 ), and calculate the average time of the corresponding operation.

The runtime results (time shown is in nanoseconds):
Test results in nanoseconds


We saw a runtime improvement for the contains and remove methods using the new data structure. Also, as the number of items increased in the array set, it took more time to add new items. That’s consistent with what we already knew about LongSparseArray — it doesn’t perform as well as HashMap when the number of items is more than 1,000. In our use cases, we’re dealing with only hundreds of items, so this is a trade-off we’re willing to make.

We see that with the new data structure,containsAndremoveSignificant improvement in operational performance. At the same time, as the number of items increases, it takes more time to add a new item. This is in line with ourLongSparseArrayThe expected operation performance after the item exceeds 1000. In our case, we only need to deal with hundreds of data, so after weighing, we will adopt this scheme.

We also saw a big improvement related to memory. In reviewing the heap dumps and Allocation Tracker reports, we noticed a decrease in object allocations. Here’s a side-by-side allocation report for the HashSet and LongArraySet implementations, when adding 1,000 items for 20 iterations:

We also see a significant increase in memory. reviewheap dumpsandAllocation TrackerIn the report, we note a decrease in the allocation of objects. When adding 1000 pieces of data below,HashSetandLongArraySetMemory allocation report for.


In addition to avoiding all the Long object allocations, LongSparseArray was more memory-efficient internally, with about 30 percent fewer allocations in this scenario.

AvoidedLongObjects occupy memory. In this case,LongSparseArray30% less memory.


By understanding how other data structures work, we were able to create a more optimized data structure for our needs. The less the garbage collector has to work, the lower the likelihood of dropped frames. Using the new LongArraySet class, and a similar IntArraySet for the primitive int data type, we were able to cut down a significant number of allocations in our entire app.

By understanding how data structures work, we can create a better data structure that meets our requirements. The less the number of garbage collection, the less the impact on the frame rate. UseLongArraySetandIntArraySet, we can reduce the memory allocation of objects in the whole application.

This case study demonstrates the importance of measuring our options to ensure that we were introducing an improvement. We also accepted that while this solution is not perfect for all use cases, since this implementation is slower for very large data sets, it was an acceptable trade-off for us to optimize our code.

This learning case demonstrates the importance of constant weighing to ensure that we can introduce improvements that are feasible. This scheme is not suitable for all situations, which we can accept. Although the efficiency of this scheme is poor when the data volume is large, it is a compromise optimization scheme that we can accept.

You can find the source code for the two data structures here. We are excited to continue working on challenges and optimizing our Feed platform and to share our solutions with the community.

Download from hereLongArraySetAndIntArraySetThe code.

If you can’t open it, you can get the source code from the following connection:

My English is poor. If there are translation mistakes, please point out that I will actively correct them. Thank you

========== THE END ==========


Recommended Today

Entity mapping of core extension library

In the layered design pattern, the data between each layer is usually transferred through the data transfer object (dto). In most cases, the definition structures of the data of each layer are similar. How to transform each other in these definition structures? Before, we used the automapper library, but the function of automapper is huge […]