Display Pages (Pagination)

Problem

Host Crowding

Problem

You’re given an array of CSV strings representing search results. 
Results are sorted by a score initially. 
A given host may have several listings that show up in these results.

a) Suppose we want to show 12 results per page,
b) We don’t want the same host to dominate the results. 
    Write a function that will reorder the list so that 
    a host shows up at most once on a page if possible, but otherwise
c) preserves the ordering.

Your program should return the new array and print out the results in blocks representing the pages.

Test Data

Test Data

[
"host_id,listing_id,score,city",
"1,28,300.1,San Francisco",
"4,5,209.1,San Francisco",
"20,7,208.1,San Francisco",
"23,8,207.1,San Francisco",
"16,10,206.1,Oakland",
"1,16,205.1,San Francisco",
"1,31,204.6,San Francisco",
"6,29,204.1,San Francisco",
"7,20,203.1,San Francisco",
"8,21,202.1,San Francisco",
"2,18,201.1,San Francisco",
"2,30,200.1,San Francisco",
"15,27,109.1,Oakland",
"10,13,108.1,Oakland",
"11,26,107.1,Oakland",
"12,9,106.1,Oakland",
"13,1,105.1,Oakland",
"22,17,104.1,Oakland",
"1,2,103.1,Oakland",
"28,24,102.1,Oakland",
"18,14,11.1,San Jose",
"6,25,10.1,Oakland",
"19,15,9.1,San Jose",
"3,19,8.1,San Jose",
"3,11,7.1,Oakland",
"27,12,6.1,Oakland",
"1,3,5.1,Oakland",
"25,4,4.1,San Jose",
"5,6,3.1,San Jose",
"29,22,2.1,San Jose",
"30,23,1.1,San Jose"
]

Also available in this gist (https://gist.git.musta.ch/martin-nguyen/3d759317e793bbd01ea5). 
You can copy and paste it into a string for convenience.

中文翻译:

给一个array of string, 每个string由"host_id,listing_id,score,city" 组成,并以score从大到小排列。给定一个target值N,按照每一页N行分页,并且host_id不能在一页里有重复。但是如果有哪一页(除最后一页外)没有放满N行,就要把原本应该放在后面几页的string拿过来填满(打破host_id不能重复的规则)。

Solution

Use iterator of the input LinkedList to remove element once they are added to page.

Use visited hashset to store host ids in current page. And clear visited set for new page.

在去重的同时保证排名的相对次序,每页用hashset记录已出现的,erase(iter.remove())已经列出在page上的id,为了保证erase为O(1),所以要用linked list来转换原始记录。

There is a trick in this question. When do we need to get to a new page? There are two cases need to consider:

  1. When the current page has pageSize (12) entries.

  2. When the current page has less than pageSize (12) but the iterator has reached to the end. In this case, we need wrap back and iterator the list again.

High Level过程:

  1. 每次填当前页面,循环的是iterator;删除用过的element;hashset记录当前页面所用id;如果reachEnd = true,说明已经读完全部input list,但是当前仍需要id来填充当前页,因此允许加入重复元素

  2. 用一个boolean flag reachEnd来决定是否加入重复元素

  3. 填满当前页面后,重置iterator,重置reachEnd = false

  4. 当读到input的末尾时,也要重置iterator,设reachEnd = true

// "static void main" must be defined in a public class.
public class Main {
    public static List < String > displayPages(List < String > input, int pageSize) {
        List < String > res = new ArrayList < > ();
        if (input == null || input.size() == 0) {
            return res;
        }
        HashSet < String > visited = new HashSet < > ();
        Iterator < String > iter = input.iterator();
        boolean reachEnd = false;
        while (iter.hasNext()) {
            String curr = iter.next();
            String hostId = curr.split(",")[0];
            if (!visited.contains(hostId) || reachEnd) {
                res.add(curr);
                visited.add(hostId);
                iter.remove();
            }
            if (visited.size() == pageSize) {
                visited.clear();
                reachEnd = false;
                if (!input.isEmpty()) {
                    res.add(" ");
                }
                iter = input.iterator();
            }
            if (!iter.hasNext()) {
                iter = input.iterator();
                reachEnd = true;
            }
        }
        return res;
    }

    public static void main(String[] args) {
        String[] data = {
            "host_id,listing_id,score,city",
            "1,28,300.1,San Francisco",
            "4,5,209.1,San Francisco",
            "20,7,208.1,San Francisco",
            "23,8,207.1,San Francisco",
            "16,10,206.1,Oakland",
            "1,16,205.1,San Francisco",
            "1,31,204.6,San Francisco",
            "6,29,204.1,San Francisco",
            "7,20,203.1,San Francisco",
            "8,21,202.1,San Francisco",
            "2,18,201.1,San Francisco",
            "2,30,200.1,San Francisco",
            "15,27,109.1,Oakland",
            "10,13,108.1,Oakland",
            "11,26,107.1,Oakland",
            "12,9,106.1,Oakland",
            "13,1,105.1,Oakland",
            "22,17,104.1,Oakland",
            "1,2,103.1,Oakland",
            "28,24,102.1,Oakland",
            "18,14,11.1,San Jose",
            "6,25,10.1,Oakland",
            "19,15,9.1,San Jose",
            "3,19,8.1,San Jose",
            "3,11,7.1,Oakland",
            "27,12,6.1,Oakland",
            "1,3,5.1,Oakland",
            "25,4,4.1,San Jose",
            "5,6,3.1,San Jose",
            "29,22,2.1,San Jose",
            "30,23,1.1,San Jose"
        };
        List<String> input = new LinkedList<String>(Arrays.asList(data));
        List<String> result = displayPages(input, 12);
        for (String r: result) {
            System.out.println(r);
        }
    }
}

Another implementation:

https://github.com/gabhi/leetcode-1/blob/master/b/DividePage.java

import java.util.*;

public class DividePage {

    // 10 : 35
    private static final int CAPACITY = 12;
    public static void displayPages(List<String> input) {
        if(input == null || input.size() == 0) return;
        Iterator<String> iter = input.iterator();
        Set<String> set = new HashSet<>();        
        StringBuilder sb = new StringBuilder();
        int pageNum = 1;
        sb.append("page " + pageNum + "\n\n");
        while(iter.hasNext()) {
            String s = iter.next();
            String pageId = s.split(",")[0];
            if(!set.contains(pageId)) {
                sb.append(s).append("\n");
                set.add(pageId);
                iter.remove();
            }

            if(set.size() == CAPACITY || !iter.hasNext()) {
                set.clear();
                iter = input.iterator();
                if(iter.hasNext()) {
                    pageNum++;
                    sb.append("\npage " + pageNum + "\n\n");
                }
            }    
        }
        System.out.println(sb.toString());
    }

    public static void main(String[] args) {
        // "host_id,listing_id,score,city"
        // every page 12 lines
        String[] strs = new String[] {
            "1,28,300.1,SanFrancisco",   
            "4,5,209.1,SanFrancisco",
            "20,7,208.1,SanFrancisco",
            "23,8,207.1,SanFrancisco",
            "16,10,206.1,Oakland",
            "1,16,205.1,SanFrancisco",
            "6,29,204.1,SanFrancisco",
            "7,20,203.1,SanFrancisco",
            "8,21,202.1,SanFrancisco",
            "2,18,201.1,SanFrancisco",
            "2,30,200.1,SanFrancisco",
            "15,27,109.1,Oakland",
            "10,13,108.1,Oakland",
            "11,26,107.1,Oakland",
            "12,9,106.1,Oakland",
            "13,1,105.1,Oakland",
            "22,17,104.1,Oakland",
            "1,2,103.1,Oakland",
            "28,24,102.1,Oakland",
            "18,14,11.1,SanJose",
            "6,25,10.1,Oakland",
            "19,15,9.1,SanJose",
            "3,19,8.1,SanJose",
            "3,11,7.1,Oakland",
            "27,12,6.1,Oakland",
            "1,3,5.1,Oakland",
            "25,4,4.1,SanJose",
            "5,6,3.1,SanJose",
            "29,22,2.1,SanJose",
            "30,23,1.1,SanJose"
            };


        List<String> input = new ArrayList<>(Arrays.asList(strs));
        displayPages(input);
    }        
}

Python Version

https://repl.it/@absolute100/airbnb-display-page

def pagedisplay(input_csv_array, k):
    ids = [line.split(',')[0] for line in input_csv_array]
    hmap = {}
    pages = []
    start = 0

    for i, id in enumerate(ids):
        if id not in hmap or hmap[id]<start:
            hmap[id]=start
        if hmap[id]==len(pages):
            pages.append([])
        pages[hmap[id]].append(input_csv_array[i])
        hmap[id]+=1
        if len(pages[start])==k:
            start+=1

    # if you need to print exact k lines in a page (i.e., tolerate some dup)
    # then have a third loop to print the page
    for page in pages:
        print '---- page ----'
        for line in page:
            print line

input_csv_array = [
  "1,28,300.1,SanFrancisco",
  "4,5,209.1,SanFrancisco",
  "20,7,208.1,SanFrancisco",
  "23,8,207.1,SanFrancisco",
  "16,10,206.1,Oakland",
  "1,16,205.1,SanFrancisco",
  "6,29,204.1,SanFrancisco",
  "7,20,203.1,SanFrancisco",
  "8,21,202.1,SanFrancisco",
  "2,18,201.1,SanFrancisco",
  "2,30,200.1,SanFrancisco",
  "15,27,109.1,Oakland",
  "10,13,108.1,Oakland",
  "11,26,107.1,Oakland",
  "12,9,106.1,Oakland",
  "13,1,105.1,Oakland",
  "22,17,104.1,Oakland",
  "1,2,103.1,Oakland",
  "28,24,102.1,Oakland",
  "18,14,11.1,SanJose",
  "6,25,10.1,Oakland",
  "19,15,9.1,SanJose",
  "3,19,8.1,SanJose",
  "3,11,7.1,Oakland",
  "27,12,6.1,Oakland",
  "1,3,5.1,Oakland",
  "25,4,4.1,SanJose",
  "5,6,3.1,SanJose",
  "29,22,2.1,SanJose",
  "30,23,1.1,SanJose"
]

input_csv_array2 = [
  "1,28,300.1,SanFrancisco",
  "29,22,2.1,SanJose",
  "28,22,2.1,SanJose",
  "29,22,2.1,SanJose",
  "1,5,209.1,SanFrancisco",
  "1,7,208.1,SanFrancisco"
]

input_csv_array3 = [
  "1,28,300.1,SanFrancisco",
  "1,5,209.1,SanFrancisco",
  "1,7,208.1,SanFrancisco",
  "28,22,2.1,SanJose",
  "29,22,2.1,SanJose",
]

print "\ninput_csv_array"
pagedisplay(input_csv_array, 12)
print "\ninput_csv_array2"
pagedisplay(input_csv_array2, 2) # same id skip a page
print "\ninput_csv_array3"
pagedisplay(input_csv_array3, 2) # page grows faster than ids

Reference

相关参考:

http://massivealgorithms.blogspot.com/2015/11/buttercola-airbnb-page-display.html

https://www.1point3acres.com/bbs/forum.php?mod=viewthread&tid=231042

https://www.1point3acres.com/bbs/forum.php?mod=viewthread&tid=300128&extra=&page=1

https://github.com/jxr041100/system_design/blob/master/Airbnb: Page Display

Last updated