Top K Frequent Words II

Data Stream, HashMap, TreeSet

Description

Find top_k _frequent words in realtime data stream.

Implement three methods for Topk Class:

TopK(k). The constructor.
add(word). Add a new word.
topk(). Get the current top _k _frequent words.

If two words have the same frequency, rank them by alphabet.

Example

TopK(2)
add("lint")
add("code")
add("code")
topk()
>> ["code", "lint"]

Solution

Jiuzhang's Solution: HashMap + TreeSet

注意：

这里add()时，如果TreeSet中存在这个word，要先删去该word，更新HashMap中word对应的count，再将其加入TreeSet，这样就可以利用更新过的count来排序了。否则TreeSet中已经存在的word没法改变其排序。
这里的TreeSet用的是根据words count来排序，并且count越高排序越靠前，因此可以想成（用PriorityQueue）的Max-Heap，与PriorityQueue相对应的，这里TreeSet因为可以直接用pollLast()删去last(highest) element，也就是移除最小的word count，因此这里可以使用Max-Heap的思想而不是Min-Heap。

/**
* This reference program is provided by @jiuzhang.com
* Copyright is reserved. Please indicate the source for forwarding
*/

import java.util.NavigableSet;

public class TopK {
    private Map<String, Integer> words = null;
    private NavigableSet<String> topk = null;
    private int k;

    private Comparator<String> myComparator = new Comparator<String>() {
        public int compare(String left, String right) {
            if (left.equals(right))
                return 0;

            int left_count = words.get(left);
            int right_count = words.get(right);
            if (left_count != right_count) {
                return right_count - left_count;
            }
            return left.compareTo(right);
        }
    };

    public TopK(int k) {
        // initialize your data structure here
        this.k = k;
        words = new HashMap<String, Integer>();
        topk = new TreeSet<String>(myComparator);
    }

    public void add(String word) {
        // Write your code here
        if (words.containsKey(word)) {
            if (topk.contains(word))
                topk.remove(word);
            words.put(word, words.get(word) + 1);
        } else {
            words.put(word, 1);
        }

        topk.add(word);
        if (topk.size() > k) {
            topk.pollLast();
        }
    }

    public List<String> topk() {
        // Write your code here
        List<String> results = new ArrayList<String>();
        Iterator it = topk.iterator();
        while(it.hasNext()) {
             String str = (String)it.next();
             results.add(str);
        }
        return results;
    }
}

PreviousTop K Frequent Words NextTop K Frequent Elements

Last updated 5 years ago

Was this helpful?