Top K Frequent Words II
Data Stream
, HashMap
, TreeSet
Description
Find top_k _frequent words in realtime data stream.
Implement three methods for Topk Class:
TopK(k)
. The constructor.add(word)
. Add a new word.topk()
. Get the current top _k _frequent words.
If two words have the same frequency, rank them by alphabet.
Example
TopK(2)
add("lint")
add("code")
add("code")
topk()
>> ["code", "lint"]
Solution
Jiuzhang's Solution: HashMap + TreeSet
注意:
这里add()时,如果TreeSet中存在这个word,要先删去该word,更新HashMap中word对应的count,再将其加入TreeSet,这样就可以利用更新过的count来排序了。否则TreeSet中已经存在的word没法改变其排序。
这里的TreeSet用的是根据words count来排序,并且count越高排序越靠前,因此可以想成(用PriorityQueue)的Max-Heap,与PriorityQueue相对应的,这里TreeSet因为可以直接用
pollLast()
删去last(highest) element,也就是移除最小的word count,因此这里可以使用Max-Heap的思想而不是Min-Heap。
/**
* This reference program is provided by @jiuzhang.com
* Copyright is reserved. Please indicate the source for forwarding
*/
import java.util.NavigableSet;
public class TopK {
private Map<String, Integer> words = null;
private NavigableSet<String> topk = null;
private int k;
private Comparator<String> myComparator = new Comparator<String>() {
public int compare(String left, String right) {
if (left.equals(right))
return 0;
int left_count = words.get(left);
int right_count = words.get(right);
if (left_count != right_count) {
return right_count - left_count;
}
return left.compareTo(right);
}
};
public TopK(int k) {
// initialize your data structure here
this.k = k;
words = new HashMap<String, Integer>();
topk = new TreeSet<String>(myComparator);
}
public void add(String word) {
// Write your code here
if (words.containsKey(word)) {
if (topk.contains(word))
topk.remove(word);
words.put(word, words.get(word) + 1);
} else {
words.put(word, 1);
}
topk.add(word);
if (topk.size() > k) {
topk.pollLast();
}
}
public List<String> topk() {
// Write your code here
List<String> results = new ArrayList<String>();
Iterator it = topk.iterator();
while(it.hasNext()) {
String str = (String)it.next();
results.add(str);
}
return results;
}
}
Last updated
Was this helpful?