Host Crowding
Problem
You’re given an array of CSV strings representing search results.
Results are sorted by a score initially.
A given host may have several listings that show up in these results.
a) Suppose we want to show 12 results per page,
b) We don’t want the same host to dominate the results.
Write a function that will reorder the list so that
a host shows up at most once on a page if possible, but otherwise
c) preserves the ordering.
Your program should return the new array and print out the results in blocks representing the pages.
Test Data
Test Data
[
"host_id,listing_id,score,city",
"1,28,300.1,San Francisco",
"4,5,209.1,San Francisco",
"20,7,208.1,San Francisco",
"23,8,207.1,San Francisco",
"16,10,206.1,Oakland",
"1,16,205.1,San Francisco",
"1,31,204.6,San Francisco",
"6,29,204.1,San Francisco",
"7,20,203.1,San Francisco",
"8,21,202.1,San Francisco",
"2,18,201.1,San Francisco",
"2,30,200.1,San Francisco",
"15,27,109.1,Oakland",
"10,13,108.1,Oakland",
"11,26,107.1,Oakland",
"12,9,106.1,Oakland",
"13,1,105.1,Oakland",
"22,17,104.1,Oakland",
"1,2,103.1,Oakland",
"28,24,102.1,Oakland",
"18,14,11.1,San Jose",
"6,25,10.1,Oakland",
"19,15,9.1,San Jose",
"3,19,8.1,San Jose",
"3,11,7.1,Oakland",
"27,12,6.1,Oakland",
"1,3,5.1,Oakland",
"25,4,4.1,San Jose",
"5,6,3.1,San Jose",
"29,22,2.1,San Jose",
"30,23,1.1,San Jose"
]
Also available in this gist (https://gist.git.musta.ch/martin-nguyen/3d759317e793bbd01ea5).
You can copy and paste it into a string for convenience.
中文翻译:
给一个array of string, 每个string由"host_id,listing_id,score,city" 组成,并以score从大到小排列。给定一个target值N,按照每一页N行分页,并且host_id不能在一页里有重复。但是如果有哪一页(除最后一页外)没有放满N行,就要把原本应该放在后面几页的string拿过来填满(打破host_id不能重复的规则)。
Solution
Use iterator of the input LinkedList to remove element once they are added to page.
Use visited hashset to store host ids in current page. And clear visited set for new page.
There is a trick in this question. When do we need to get to a new page? There are two cases need to consider:
When the current page has pageSize (12) entries.
When the current page has less than pageSize (12) but the iterator has reached to the end. In this case, we need wrap back and iterator the list again.
defpagedisplay(input_csv_array,k): ids = [line.split(',')[0] for line in input_csv_array] hmap ={} pages = [] start =0for i,idinenumerate(ids):ifidnotin hmap or hmap[id]<start: hmap[id]=startif hmap[id]==len(pages): pages.append([]) pages[hmap[id]].append(input_csv_array[i]) hmap[id]+=1iflen(pages[start])==k: start+=1# if you need to print exact k lines in a page (i.e., tolerate some dup)# then have a third loop to print the pagefor page in pages:print'---- page ----'for line in page:print lineinput_csv_array = ["1,28,300.1,SanFrancisco","4,5,209.1,SanFrancisco","20,7,208.1,SanFrancisco","23,8,207.1,SanFrancisco","16,10,206.1,Oakland","1,16,205.1,SanFrancisco","6,29,204.1,SanFrancisco","7,20,203.1,SanFrancisco","8,21,202.1,SanFrancisco","2,18,201.1,SanFrancisco","2,30,200.1,SanFrancisco","15,27,109.1,Oakland","10,13,108.1,Oakland","11,26,107.1,Oakland","12,9,106.1,Oakland","13,1,105.1,Oakland","22,17,104.1,Oakland","1,2,103.1,Oakland","28,24,102.1,Oakland","18,14,11.1,SanJose","6,25,10.1,Oakland","19,15,9.1,SanJose","3,19,8.1,SanJose","3,11,7.1,Oakland","27,12,6.1,Oakland","1,3,5.1,Oakland","25,4,4.1,SanJose","5,6,3.1,SanJose","29,22,2.1,SanJose","30,23,1.1,SanJose"]input_csv_array2 = ["1,28,300.1,SanFrancisco","29,22,2.1,SanJose","28,22,2.1,SanJose","29,22,2.1,SanJose","1,5,209.1,SanFrancisco","1,7,208.1,SanFrancisco"]input_csv_array3 = ["1,28,300.1,SanFrancisco","1,5,209.1,SanFrancisco","1,7,208.1,SanFrancisco","28,22,2.1,SanJose","29,22,2.1,SanJose",]print"\ninput_csv_array"pagedisplay(input_csv_array, 12)print"\ninput_csv_array2"pagedisplay(input_csv_array2, 2)# same id skip a pageprint"\ninput_csv_array3"pagedisplay(input_csv_array3, 2)# page grows faster than ids