1338. Halve array size
Hi, everyone. I’m a pig. Welcome to the weekly leetcode quiz of the “baby can understand” series.
This is question 2 of 174 and 1338 of the list of questions – “halving array size”
Title Description
Here’s an array of integersarr
。 You can select a set of integers and delete each occurrence of these integers in the array.
Returnat leastThe minimum size of the set of integers that can delete half of the integers in an array.
Example 1:
Input: ar3,5,5,3,3
Output: 2
Explanation: select {3,7} to make the result array [5,5,5,2,2] and length 5 (half of the original array length).
The feasible sets of size 2 are {3,5}, {3,2}, {5,2}.
It is not feasible to select {2,7}. The result array is [3,3,3,3,5,5,5], and the length of the new array is more than half of the original array.
Example 2:
Input: arr = [7,7,7,7,7,7]
Output: 1
Explanation: we can only select the set {7}, and the result array is empty.
Example 3:
Input: arr = [1,9]
Output: 1
Example 4:
Input: arr = [10001000,3,7]
Output: 1
Example 5:
Input: arr = [1,2,3,4,5,6,7,8,9,10]
Output: 5
Tips:
1 <= arr.length <= 10^5

arr.length
Even number 1 <= arr[i] <= 10^5
Official difficulty
MEDIUM
Solutions
Well, this problem is very straightforward, there is no packaging, so the piglets don’t know what to write. The cute little pig hates you! Hum ><
Since the topic needs to delete half or more of the data with the least number of times, the first reaction of the pig after reading it is, should we choose the deletion with the largest number of remaining data each time? Because there is no relationship between the data, and there is only one requirement for deleting data, that is, the same number, then this idea can directly get the optimal solution.
In fact, the process of finding the global optimal solution with the greedy algorithm is the local optimal solution.
Direct scheme
Based on the above ideas, we can get the following process:
 Count and count the original data.
 Sort in descending order.
 Gradually sum the sorted results until the sum is greater than half of the original data length.
Based on this process, we can implement code similar to the following:
const minSetSize = arr => {
const LEN = arr.length;
if (LEN < 3) return 1;
const max = Math.max(...arr);
const freq = new Uint16Array(max + 1);
for (const val of arr) ++freq[val];
let step = 0;
let sum = 0;
freq.sort((a, b) => b  a);
for (const val of freq) {
sum += val;
++step;
if (sum >= LEN / 2) return step;
}
};
This code runs 96ms, temporarily beats 100%.
optimization
In the above code, we do the traditional sorting of statistical counts, and the complexity is O (nlogn). Is there a way to reduce this complexity?
Here we introduce a sort method which is not so traditional — bucket sort. Let’s start with a chestnut
Let’s now assume that there are 2000 students who have just finished an exam, and the range of their scores is[1, 100]
。 Now we need to sort their scores in ascending order.
Now we have 100 barrels of test results, and it is impossible for us to cover each bucket. Then we put the 1point test paper into the No. 1 barrel and the 2point test paper into the No. 2 barrel. And so on, until all the papers are put into the 100 barrels. I don’t know if my friends have found out. At this time, we have already finished sorting the 2000 papers. We just need to check the number of papers in each bucket from low to high.
This sort method has a great advantage, that is, its time complexity is only O (n), which is better than the traditional sorting algorithm based on comparison and exchange. But it also has a lot of limitations, and we need to be able to list all the possibilities. And if this range is too large, and the amount of data to be sorted is relatively small, then it will not be worth the loss.
Based on the bucket sorting described above, we can get the following flow:
 Count and count the original data.
 Sort based on bucket sorting, and record the number of data for each count frequency.
 Traverse the results from large to small and sum until the sum is greater than or equal to half of the original data length.
Based on this process, we can implement code similar to the following:
const minSetSize = arr => {
const LEN = arr.length;
if (LEN < 3) return 1;
const max = Math.max(...arr);
const freq = new Uint16Array(max + 1);
let maxFreq = 0;
for (const val of arr) ++freq[val] > maxFreq && (maxFreq = freq[val]);
const freqBased = new Uint32Array(maxFreq + 1);
for (const val of freq) ++freqBased[val];
let step = 0;
let sum = 0;
for (let i = maxFreq; i >= 1; i) {
while (freqBased[i]) {
sum += i;
++step;
if (sum >= LEN / 2) return step;
}
}
};
This code runs 80 ms, instead of the above code, temporarily beats 100%.
summary
In fact, the problem itself is very straightforward, and the train of thought is also very direct. Therefore, in the process of optimization, a sort method which is not very common is introduced and explained. I hope that those who have not been in contact with can gain something.
Related links
 Weekly contest 174 topic list
 Repo
 My segment fault column
 My Zhihu column