在将重复项移动到末尾时对数组进行排序?
这是我朋友的一个编程课上的一个问题.
This was a question in one my friend's programming class.
问.如何对 int
数组进行排序,然后排列它们,使所有重复的元素都出现在数组的末尾?
Q. How do you sort an array of int
s and then arrange them such that all duplicate elements appear at the end of the array?
例如,给定输入
{5, 2, 7, 6, 1, 1, 5, 6, 2}
输出将是
{1, 2, 5, 6, 7, 1, 2, 5, 6}
注意,数字是排序的,重复的数字在7之后,这是数组中的最大值.
Note that the numbers are sorted and duplicate numbers are after 7, which is the maximum in the array.
这必须在不使用任何 Java 库包/工具的情况下实现.
我建议先使用插入或冒泡排序对数组进行排序,然后遍历数组,执行如下操作:
I suggested to sort the array first using insertion or bubble sort, and then go over the array, perform something like the following :
for (int i = 0; i < nums.length - 2; i++) {
for (int j = i + 1; j < nums.length; j++) {
//current and next are same, move elements up
//and place the next number at the end.
if (nums[i] == nums[j]) {
int temp = nums[j];
for (int k = j; k < nums.length - 1; k++) {
nums[k] = nums[k + 1];
}
nums[nums.length - 1] = temp;
break;
}
}
}
我后来自己尝试了这个(上面的代码就是这样) - 当我尝试这个时,我认为这可以通过使用更少的代码来实现,效率更高.可能是我给出了错误的建议.
I tried this myself later (and that is how the code above) - As I try this out, I think this could be achieved by using less code, be more efficiently. And may be I gave a wrong advice.
有什么想法吗?
推荐答案
根据你的问题的参数,有很多方法可以解决这个问题.
Depending on the parameters of your problem, there are many approaches to solving this.
如果不允许使用 O(n) 外部存储器,那么一种选择是使用标准排序算法在 O(n log n) 时间内对数组进行就地排序,然后在它上面运行第二遍以将重复项移动到最后(如您所建议的那样).您在上面发布的代码需要 O(n2) 时间,但我认为这一步可以使用稍微复杂的算法在 O(n log n) 时间内完成.这个想法分两步起作用.在第一步中,在 O(n log n) 时间内,您将所有非重复元素按排序顺序放在前面,并将所有重复元素按非排序顺序放在后面.完成后,您可以使用第一步中的排序算法在 O(n log n) 时间内对数组的后半部分进行排序.
If you are not allowed to use O(n) external memory, then one option would be to use a standard sorting algorithm to sort the array in-place in O(n log n) time, then to run a second pass over it to move the duplicates to the end (as you've suggested). The code you posted above takes O(n2) time, but I think that this step can be done in O(n log n) time using a slightly more complicated algorithm. The idea works in two steps. In the first step, in O(n log n) time you bring all non-duplicated elements to the front in sorted order and bring all the duplicates to the back in non-sorted order. Once you've done that, you then sort the back half of the array in O(n log n) time using the sorting algorithm from the first step.
我不打算进入对数组进行排序的代码.我真的很喜欢排序,但是关于如何对数组进行就地排序还有很多其他好的资源,所以我在这里没有很好地利用我的时间/空间来进入它们.如果有帮助,这里是 堆排序、快速排序和smoothsort,所有这些都在 O(n log n) 时间内运行.堆排序和平滑排序仅使用 O(1) 外部内存,而快速排序在最坏的情况下可以使用 O(n)(尽管好的实现可以使用可爱的技巧将其限制为 O(log n)).
I'm not going to go into the code to sort the array. I really love sorting, but there are so many other good resources on how to sort arrays in-place that it's not a good use of my time/space here to go into them. If it helps, here's links to Java implementations of heapsort, quicksort, and smoothsort, all of which runs in O(n log n) time. Heapsort and smoothsort use only O(1) external memory, while quicksort can use O(n) in the worst case (though good implementations can limit this to O(log n) using cute tricks).
有趣的代码是将所有非重复元素带到范围前面的逻辑.直观地说,代码通过存储两个指针来工作——一个读指针和一个写指针.读指针指向下一个要读取的元素,而写指针指向应该放置下一个唯一元素的位置.例如,给定这个数组:
The interesting code is the logic to bring all the non-duplicated elements to the front of the range. Intuitively, the code works by storing two pointers - a read pointer and a write pointer. The read pointer points to the next element to read, while the write pointer points to the location where the next unique element should be placed. For example, given this array:
1 1 1 1 2 2 3 4 5 5
我们从最初指向 1 的读写指针开始:
We start with the read and write pointers initially pointing at 1:
write v
1 1 1 1 2 2 3 4 5 5
read ^
接下来,我们将读取指针向前跳过到下一个不是 1 的元素.这会找到 2:
Next, we skip the read pointer ahead to the next element that isn't 1. This finds 2:
write v
1 1 1 1 2 2 3 4 5 5
read ^
然后,我们将写指针撞到下一个位置:
Then, we bump the write pointer to the next location:
write v
1 1 1 1 2 2 3 4 5 5
read ^
现在,我们将 2 交换到写指针所持有的位置:
Now, we swap the 2 into the spot held by the write pointer:
write v
1 2 1 1 1 2 3 4 5 5
read ^
将读取指针前进到下一个不是 2 的值:
advance the read pointer to the next value that isn't 2:
write v
1 2 1 1 1 2 3 4 5 5
read ^
然后推进写指针:
write v
1 2 1 1 1 2 3 4 5 5
read ^
再次,我们交换'read'和'write'指向的值,并将写指针向前移动,然后将读指针移动到下一个唯一值:
Again, we exchange the values pointed at by 'read' and 'write' and move the write pointer forward, then move the read pointer to the next unique value:
write v
1 2 3 1 1 2 1 4 5 5
read ^
再次收获
write v
1 2 3 4 1 2 1 1 5 5
read ^
最后的迭代给出了
write v
1 2 3 4 5 2 1 1 1 5
read ^
如果我们现在从写指针到读指针排序,我们得到
If we now sort from the write pointer to the read pointer, we get
write v
1 2 3 4 5 1 1 1 2 5
read ^
还有宾果游戏!我们已经得到了我们正在寻找的答案.
and bingo! We've got the answer we're looking for.
在(未经测试,抱歉...)Java 代码中,此修复步骤可能如下所示:
In (untested, sorry...) Java code, this fixup step might look like this:
int read = 0;
int write = 0;
while (read < array.length) {
/* Swap the values pointed at by read and write. */
int temp = array[write];
array[write] = array[read];
array[read] = temp;
/* Advance the read pointer forward to the next unique value. Since we
* moved the unique value to the write location, we compare values
* against array[write] instead of array[read].
*/
while (read < array.length && array[write] == array[read])
++ read;
/* Advance the write pointer. */
++ write;
}
该算法在 O(n) 时间内运行,这导致该问题的整体算法为 O(n log n).由于重新排序步骤使用 O(1) 内存,因此总体内存使用量将是 O(1)(对于平滑排序或堆排序等)或 O(log n)(对于快速排序等).
This algorithm runs in O(n) time, which leads to an overall O(n log n) algorithm for the problem. Since the reordering step uses O(1) memory, the overall memory usage would be either O(1) (for something like smoothsort or heapsort) or O(log n) (for something like quicksort).
在与朋友讨论后,我认为基于快速排序的修改有一个更优雅的问题解决方案.通常,当您运行快速排序时,您最终会将数组划分为三个区域:
After talking this over with a friend, I think that there is a much more elegant solution to the problem based on a modification of quicksort. Typically, when you run quicksort, you end up partitioning the array into three regions:
+----------------+----------------+----------------+
| values < pivot | values = pivot | values > pivot |
+----------------+----------------+----------------+
然后递归对第一个和最后一个区域进行排序,以将它们排序.但是,我们可以针对我们的问题版本进行修改.我们需要 rotation 算法作为原始算法,它在一个数组中获取两个相邻的值块,并在 O(n) 时间内交换它们.它不会改变这些块中元素的相对顺序.例如,我们可以使用旋转来转换数组
The recursion then sorts the first and last regions to put them into sorted order. However, we can modify this for our version of the problem. We'll need as a primitive the rotation algorithm, which takes two adjacent blocks of values in an array and exchanges them in O(n) time. It does not change the relative order of the elements in those blocks. For example, we could use rotation to convert the array
1 2 3 4 5 6 7 8
进入
3 4 5 6 7 8 1 2
并且可以在 O(n) 时间内完成.
and can do so in O(n) time.
快速排序的修改版本可以通过使用 Bentley-McIlroy 三路分区算法(描述 here) 使用 O(1) 额外空间,将数组元素重新排列为上面显示的配置.接下来,我们应用旋转来重新排序元素,使它们看起来像这样:
The modified version of quicksort would work by using the Bentley-McIlroy three-way partition algortihm (described here) to, using O(1) extra space, rearrange the array elements into the configuration shown above. Next, we apply a rotation to reorder the elements so that they look like this:
+----------------+----------------+----------------+
| values < pivot | values > pivot | values = pivot |
+----------------+----------------+----------------+
接下来,我们执行交换,以便将枢轴元素的一个副本准确地移动到至少与枢轴一样大的元素集合中.这可能在后面有额外的枢轴副本.然后我们递归地将排序算法应用于 <和 > 范围.当我们这样做时,结果数组将如下所示:
Next, we perform a swap so that we move exactly one copy of the pivot element into the set of elements at least as large as the pivot. This may have extra copies of the pivot behind. We then recursively apply the sorting algorithm to the < and > ranges. When we do this, the resulting array will look like this:
+---------+-------------+---------+-------------+---------+
| < pivot | dup < pivot | > pivot | dup > pivot | = pivot |
+---------+-------------+---------+-------------+---------+
然后我们对范围应用两次旋转以将其放入最终顺序.首先,使用大于枢轴的值旋转小于枢轴的重复值.这给了
We then apply two rotations to the range to put it into the final order. First, rotate the duplicate values less than the pivot with the values greater than the pivot. This gives
+---------+---------+-------------+-------------+---------+
| < pivot | > pivot | dup < pivot | dup > pivot | = pivot |
+---------+---------+-------------+-------------+---------+
此时,第一个范围是升序排列的唯一元素:
At this point, this first range is the unique elements in ascending order:
+---------------------+-------------+-------------+---------+
| sorted unique elems | dup < pivot | dup > pivot | = pivot |
+---------------------+-------------+-------------+---------+
最后,对大于枢轴的重复元素和等于枢轴的元素进行最后一次旋转,得到这个:
Finally, do one last rotation of the duplicate elements greater than the pivot and the elements equal to the pivot to yield this:
+---------------------+-------------+---------+-------------+
| sorted unique elems | dup < pivot | = pivot | dup > pivot |
+---------------------+-------------+---------+-------------+
请注意,最后三个块只是排序后的重复值:
Notice that these last three blocks are just the sorted duplicate values:
+---------------------+-------------------------------------+
| sorted unique elems | sorted duplicate elements |
+---------------------+-------------------------------------+
瞧!我们已经按照我们想要的顺序得到了一切.使用与普通快速排序相同的分析,再加上我们在每个级别只做 O(n) 的工作(三个额外的旋转),这在最好的情况下可以达到 O(n log n)O(log n) 内存使用量.在 O(log n) 内存的最坏情况下仍然是 O(n2),但这种情况发生的概率极低.
and voila! We've got everything in the order we want. Using the same analysis that you'd do for normal quicksort, plus the fact that we're only doing O(n) work at each level (three extra rotations), this works out to O(n log n) in the best case with O(log n) memory usage. It's still O(n2) in the worst case with O(log n) memory, but that happens with extremely low probability.
如果允许您使用 O(n) 内存,一种选择是从所有存储键/值对的元素中构建一个平衡的二叉搜索树,其中每个键是数组的一个元素,值是它出现的次数.然后,您可以按以下格式对数组进行排序:
If you are allowed to use O(n) memory, one option would be to build a balanced binary search tree out of all of the elements that stores key/value pairs, where each key is an element of the array and the value is the number of times it appears. You could then sort the array in your format as follows:
- 对于数组中的每个元素:
- 如果该元素已存在于 BST 中,则增加其计数.
- 否则,向 BST 添加一个新节点,该节点的计数为 1.
这个算法的运行时间是 O(n log n),但是从头开始编写 BST 会非常棘手.它还需要外部空间,我不确定你是否可以这样做.
The runtime of this algorithm is O(n log n), but it would be pretty tricky to code up a BST from scratch. It also requires external space, which I'm not sure you're allowed to do.
但是,如果您被允许使用外部空间并且您正在排序的数组很小并且包含小整数,您可以通过使用修改后的 计数排序.只需将 BST 替换为一个足够大的数组,以使原始数组中的每个整数都成为键.这将运行时间减少到 O(n + k),内存使用 O(k),其中 k 是数组中的最大元素.
However, if you are allowed external space and the arrays you are sorting are small and contain small integers, you could modify the above approach by using a modified counting sort. Just replace the BST with an array large enough for each integer in the original array to be a key. This reduces the runtime to O(n + k), with memory usage O(k), where k is the largest element in the array.
希望这会有所帮助!
相关文章