The document discusses finding the longest common subsequence between two sequences and provides an algorithm using dynamic programming. It explains using a matrix to store the current alignment results, where each cell Aij is calculated based on the adjacent cells, with scores considered. There are two steps - find the length of the LCS using the matrix, then trace back to find the exact alignment. It also discusses the knapsack problem and how dynamic programming can be applied to optimize combination problems.
1. SureInterview PREPARATION ON ALGORITHM
http://www.sureinterview.com
Mar 6, 2011
The latest version can be found at http://code.google.com/p/sureinterview/downloads
2. Application
Search
Design data structure and algorithm for interactive spell checker, should provides correct candidates.
Design data structure and algorithm for interactive spell checker, should provides correct candidates.
Check how to write a spelling corrector, and how to improve performance of generating potential candidates.
Search
millions of book, how to find duplicates(book title may have error)
millions of book, how to find duplicates(book title may have error)
submit my answer
3. Search
In which case O(n^2) is better than O(nlgn)?
In which case O(n2) is better than O(nlgn)?
Consider following aspects:
saving space
more function
simpler code
4. Dynamic programming
Search
Longest Common Subsequence
Finding the longest common subsequence of two sequences of items.
For example, the longest common subsequence of following two sequences is ACAAA.
A b C d A A A
A C e A f A A
Check longest common subsequence (@algorithmist) for some general description, and Dynamic programming and sequence
alignment (@IBM) for detail explanation and examples.
As a quick recap, a m*n matrix is used to bookmark the current alignment result. A i,j is calculated from previous 3 adjacent
alignment result, with different score/penalty considered.
There are two steps in the longest common subsequence (or, alignment algorithm).
1. Find the length or score of the longest common subsequence. That is, calculate from A 0,0 to A m-1,n-1.
2. Trace back from A m-1,n-1 to A 0,0 to find the exact alignment.
Search
Given a n stair case of n levels, step up one or two levels at a time.
1. How many ways to get to 10th stair?
2. Print all the possible path?
3. Only {1,3,5} levels are allowed?
submit my answer
Search
Knapsack problem
Given some items, pack the knapsack to get the maximum total value. Each item has some weight and some value. Total weight
that we can carry is no more than some fixed number W.
This slide [1] illustrates the idea of dynamic programming using knapsack problem as an example.
The similar idea applies to more general combination problems. Please check out the way combination is progressively calculated
in Yang_Hui's_Triangle.
References
1. ↑ Dynamic Programming
5. Search
How many different binary trees with n nodes ?
Different topology are counted as different. For example, following two trees are treated as different.
Tree 1:
o
/
o
Tree 2:
o
o
submit my answer
Search
sub-matrix sum
Given a matrix of integers. How to calculate the sum of a sub-matrix. A sub-matrix is represented by x, y, h, w, where x and y
is the position of the upper-left corner, h is the height, w is the width.
int[][] matrix int sum(int x, int y, int h, int w){...}
2. if this function is called frequently on the same matrix, how to optimise it?
orig
submit my answer
Search
viewable blocks
You are given N blocks of height 1..N. In how many ways can you arrange these blocks in a row such that when viewed from
left you see only L blocks (rest are hidden by taller blocks) and when seen from right you see only R blocks?
Example given N=3, L=2, R=1 there is only one arrangement {2, 1, 3} while for N=3, L=2, R=2 there are two ways {1, 3, 2}
and {2, 3, 1}.
General Idea
Reduce the size of the problem
By examine the test cases, we know the size of the problem can be reduced by fixing the smallest element.
Suppose the number of combination is F(N, L, R). If the shortest block is on left most position, there are F(N-1, L-1, R) ways of
combination. Similarly, right most position gives F(N-1, L, R-1) ways of combination. Taking out this shortest block, it will be
F(N-1, L, R). And there is N-2 positions to put this shortest block back. So, we have
F(N, L, R) = F(N-1, L-1, R) + F(N-1, L, R-1) + (N-2)*F(N-1, L, R)
Divide the problem
The tallest block divides the blocks into two parts. Both the left part and right parts can be calculated independently in a similar
fashion.
Now, Consider a simplified problem that we only look at the blocks from left. Using the similar logic in both-side version, we
have:
G(N, L) = G(N-1, L-1) + (N-1)*G(N-1, L)
Back to the original problem, if the tallest block is on the left most position, we have:
F(N-1, 1, R-1) = G(N-1, R-1)
If it is on position i ( 2 <= i <= N-1 ), we have:
F(N-1, L-1, R-1) = C(N-1, i-1) * (G(i-1, L-1) * G(N-i, R-1))
C(N, K) is the number of combination taking K out N elements. (Note that C(N, K) = C(N-1, K-1) + C(N-1, K).)
Finally, F can be obtained by combining all those different conditions.
Calculating F through G is better in that G is one dimensional, which saves both time and space. F can be calculated on-the-fly.
6. Search
Fibonacci number
Fibonacci number[1] is defined as: With seed values
1. Coding to recursively calculate .
2. How to optimize the solution from recursive version.
References
1. ↑ wiki
The recursive Version
Translate the recursive description of an algorithm to code is a fundamental skill.
1 /** ?
2 * Fibonacci number. Recursive version.
3 *
4 * @param n
5 * @return Fn
6 */
7 long Fn_recursive(int n) {
8 // base
9 if (n <= 0)
10 return 0;
11 if (n == 1)
12 return 1L;
13
14 // induction
15 return Fn_recursive(n - 1) + Fn_recursive(n - 2);
16 }
Dynamic Programming Version
Dynamic Programming[1] is essentially a technique optimizing recursive problem calculation by caching the solution of questions
of smaller size. Fibonacci number is used as an example by wiki explaining how DP works [2].
Unless the subproblems are not overlapped, caching the result of subproblem will more or less accelerate the solution by
avoiding duplicated calculation. There are two ways to do this, top down and bottom up. The top down version needs to cache
the result and all parameters that affect the result. The bottom up version starts from the smallest problem and gradually
approaches to the final solution.
All subproblem might not be useful in calculating the bigger problem. So, the top down version might be more efficient in that it
calculate the subproblems on demand. But its time complexity is not easy to analysis. Try the bottom up version in stead when
possible.
This is the bottom up fashion of Fibonacci number calculation.
1 /** ?
2 * Fibonacci number -- DP version.
3 *
4 * @param n
5 * @return Fn
6 */
7 long Fn_non_recursive1(int n) {
8 if (n <= 0)
9 return 0;
10 if (n == 1)
11 return 1L;
12
13 long f[] = new long[n + 1];
14
15 f[0] = 0;
16 f[1] = 1;
17 for (int i = 2; i <= n; i++) {
18 f[i] = f[i - 1] + f[i - 2];
19 }
20 return f[n];
21 }
The DP algorithm usually is very space demanding. So, we can go back and identify which item is not used and revoke the
space. For example, in this case, actually only two items are necessary for the further calculation, we can easily rewrite the DP in
a form of O(1) space.
1 /** ?
2 * Fibonacci number. DP version with space reduced.
3 *
4 * @param n
5 * @return Fn
6 */
7 long Fn_non_recursive2(int n) {
8 if (n <= 0)
9 return 0;
10 if (n == 1)
7. 10 if (n == 1)
11 return 1L;
12
13 long fn2 = 0;
14 long fn1 = 1;
15 long fn = fn1 + fn2;
16
17 for (int i = 2; i <= n; i++) {
18 fn = fn1 + fn2;
19 fn2 = fn1;
20 fn1 = fn;
21 }
22
23 return fn;
24 }
The top-down version is to fill the array recursively.
Other Versions
There are also other methods to calculate Fibonacci number. But those methods are not much more than tricks that specific to
the Fibonacci number. For example, the close form is [3]: It can be calculation in O(lgn)
time.
Reverse print the sequence
Because f(n)=f(n-1)+fn(-2), we have f(n-2)=f(n)-f(n-1). We just needs two numbers and back trace to f(0). In the process, no
new variables are needed.
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/test/solution/dp/FibonacciNumber.java
Reference
1. ↑ Dynamic programming wiki
2. ↑ Fibonacci number and DP
3. ↑ the close form of Fibonacci number
Search
The Maximal Rectangle Problem
Given: A two-dimensional array b (M rows, N columns) of Boolean values ("0" and "1").
Required: Find the largest (most elements) rectangular subarray containing all ones.
submit my answer
Search
nuts in an oasis
A pile of nuts is in an oasis, across a desert from a town. The pile contains 'N' kg of nuts, and the town is 'D' kilometers away
from the pile. The goal of this problem is to write a program that will compute 'X', the maximum amount of nuts that can be
transported to the town.
The nuts are transported by a horse drawn cart that is initially next to the pile of nuts. The cart can carry at most 'C' kilograms
of nuts at any one time. The horse uses the nuts that it is carrying as fuel. It consumes 'F' kilograms of nuts per kilometer
traveled regardless of how much weight it is carrying in the cart. The horse can load and unload the cart without using up any
nuts.
Your program should have a function that takes as input 4 real numbers D,N,F,C and returns one real number: 'X'
Suppose the pile of nuts can afford the cart go back and forth R rounds fully loaded. We have
C-D*F + R*(C-2*D*F) <= N
R has to be an integer number. So,
R = floor((N - (C-D*F))/(C-2*D*F))
Let f(R)=C-D*F + R*(C-2*D*F)
Then X = max(f(R), f(R+1))
8. Permutation and Combination
Search
Given a telephone number, output all valid words.
Each key on the telephone represents a list of letters. Given a telephone number and a dictionary, please write a program to
output all the valid words the telephone number represents.
submit my answer
Search
output all combinations
orig
submit my answer
Search
output all permutations
generate all permutations
submit my answer
9. Swap and order
Search
calculate the number of inversions.
Each user ranks N songs in order of preference. Given a preference list, find the user with the closest preferences. Measure
"closest" according to the number of inversions. Devise an N log N algorithm for the problem.
submit my answer
Search
transpose a two line matrix with O(n) time and O(1) space.
orig
submit my answer
Search
Reverse the bits in a byte
Reverse the bits in a byte
http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel
10. Sort and Search
Search
Iterative Mergesort (not recursive)
Iterative Mergesort (not recursive).
analysis the time complexity.
General idea
The merge sort has two parts as shown in the pseudocode.
function merge_sort(m)
if length(m) ? 1
return m
var list left, right, result
var integer middle = length(m) / 2
for each x in m up to middle
add x to left
for each x in m after middle
add x to right
left = merge_sort(left)
right = merge_sort(right)
result = merge(left, right)
return result
The recursive part merge_sort(m) is to bookmark the segments to merge. The function merge(left,right) is to merge the two
sorted segments to a larger one.
The merge_sort can be simply done iteratively in a bottom up fashion. For example, for a list of 9 elements,
0 1 2 3 4 5 6 7 8 <xxx0 <-> xxx1
/ / / / /
0 2 4 6 8 <xx00 <-> xx10 = 0000<->0010, 0100<->0110, etc with step 2.
/ / /
0 4 8 <x000 <-> x100
/ /
0 8 <0000 <-> 1000
/
0 done
So, it is just a matter of counting the indexes with corresponding steps.
Search
compare different sort methods
when to use merge sort and when to use quick sort
submit my answer
Search
Find a number in a rotated sorted array.
Given an sorted array but rotated, for example, {4, 5, 0, 1, 2, 3}.
Find an element in the element and analysis the complexity.
General Idea
Two observations on the rotated array:
1. There is a breakpoint in the rotated array, which is also the single point that makes the array not fully sorted. By
appending the first sorted subarray {4,5} to the end, this search problem is transformed to a normal binary search.
2. In the normal binary search, for a given mid index, we know the value being searched falls on the left or right in constant
11. time. This is still true for the rotated sorted array.
Each of these observation can lead to a different solution.
1. Find the breakpoint
For a fully sorted array, if i < j, we have a[i] <= a[j]. Because the array is rotated, we will have a[i] >= a[j] instead, if there is a
breakpoint between i and j. For example, in the array {4,5,0,1}, we have a[0]<a[3]. So, what we need is to find the breakpoint
in O(logn) time.
Following code find the breakpoint in O(logn) time on average. The complete source code can be found at here.
1 // binary search for the break point ?
2
3 //
4 // the size of the final subarray is 2. So, when this loop terminates,
5 // rotArr[lo] > rotArr[hi] and lo + 1 = high;
6 // Of course, this loop can be modified to terminate when final sub array is empty.
7 //
8 while (lo + 1 < hi) {
9 int mid = (lo + hi) / 2;
10 if (rotArr[lo] > rotArr[mid]) {
11 // The lower part has the break point.
12 hi = mid;
13 } else if (rotArr[lo] < rotArr[mid]) {
14 // The higher part has the break point.
15 lo = mid;
16 } else {
17 /*
18 * when rotArr[lo] == rotArr[mid] == rotArr[hi], we cannot tell
19 * which part has the break point, try each element instead.
20 */
21 if (rotArr[lo] > rotArr[lo + 1]) {
22 start.setValue(lo + 1); // check if lo is the break point.
23 end.setValue(lo);
24 return;
25 }
26 lo++;
27 }
28 }
Note that the special case of {1,0,1,1,1,1}, which drags down the performance to O(n) in worst case.
Given the position of the breakpoint, it is trivial to do a binary search to find the key. The only place worth noting is to extend
the array to the right, so the normal binary search can kick in. The code below illustrates how to use binary search in the
rotated array given breakpoint.
1 int find(Integer[] rotArr, Integer key, Integer lo, Integer hi) { ?
2 // if the array has a break point, mapping the hi index to the right.
3 // for example, if the rotated array is {3,4,1,2}, imagine the array is
4 // extended as {3,4,1,2, 3,4,1,2}
5 int length = rotArr.length;
6 if (hi < lo) {
7 hi += length;
8 }
9
10 // binary search.
11 while (hi >= lo) { // the final sub array is empty.
12 int mid = (lo + hi) / 2;
13 Integer midVal = rotArr[mid % length];
14
15 // found the key
16 if (midVal == key)
17 return mid % length;
18
19 if (midVal > key) {
20 // lower the high boundary.
21 hi = mid - 1;
22 } else {
23 // raise the low boundary.
24 lo = mid + 1;
25 }
26 }
27 // when return, the subarray rotArr[ lo...hi ] is empty.
28 return -1;
29 }
2. Direct binary search
Suppose there is a rotated sorted array {4, 5, 0, 1, 2, 3, 4}. We want to find the key 0.
4 5 0 1 2 3 4
^ ^ ^
lo mid hi
The first mid value picked out is 1. It is obvious that all values in [4, +inf), (-inf,1] should go to the left part and values
between [1, 4] should go to the right part. So, the next step is to search 0 in subarray {4, 5, 0}. And so on.
The direct binary search is based on this observation. The implementation has been discussed several times.[1] [2][3] And the
12. complete code can be found here.
Note the special case of {1,0,1,1,1,1}.
Important
This problem is a good example of binary search. Make sure the code looks clean and bug free.
References
1. ↑ http://www.ihas1337code.com/2010/04/searching-element-in-rotated-array.html
2. ↑ http://talk.interviewstreet.com/questions/32/Search-in-rotated-sorted-array
3. ↑ http://stackoverflow.com/questions/1878769/searching-a-number-in-a-rotated-sorted-array
Search
Give an array, find the minimum of elements to delete so that the remaining array is sorted.
Give an array, find the minimum of elements to delete so that the remaining array is sorted.
General Ideal
This problem is equivalent to find the longest increasing subsequence.
We can
1. find the longest increasing subsequence
2. delete the elements that are not in the subsequence.
Search
Find the median in a large unsorted array, each number is between 0 and 255.
Find the median in a large unsorted array, each number is between 0 and 255.
scan once and collect the frequency of each number (mode).
Find the median based on the frequency.
Search
petrol bunks in circle.
There are N petrol bunks arranged in circle. Each bunk is separated from the rest by a certain distance. You choose some mode
of travel which needs 1 litre of petrol to cover 1 km distance. You can't infinitely draw any amount of petrol from each bunk as
each bunk has some limited petrol only. But you know that the sum of litres of petrol in all the bunks is equal to the distance to
be covered.
That is, let P1, P2, ... Pn be n bunks arranged circularly. d1 is distance between p1 and p2, d2 is distance between p2 and p3.
dn is distance between pn and p1.Now find out the bunk from where the travel can be started such that your mode of travel
never runs out of fuel.
General idea
We want to find the starting bunk that the gasoline never run out during the travel. Because the sum of all fuel equals the
needs for travel, such starting position always exists.
Consider following example with 5 bunks and we choose p3 to starting from. At p5, we will run out of gas, which means starting
from p3 won't work. So, we'll borrow some fuel from p2, and then p1. Then the p5 can reach to p1 and close the circle. So, we
know p1 is the starting position.
1 # 1 2 3 4 5 <- number ?
2 p 3 0 2 0 0 <- amount of petrol
3 d 1 1 1 1 1 <- distance to travel
4 ^ starting position
5
6 p 3 0 2 0 0 : 2 amount of petrol
7 d 1 1 1 1 1 : 1 distance to travel
8 ^ ^
9
10 p 3 0 2 0 0 : 2
11 d 1 1 1 1 1 : 2
12 ^ ^
13
14 p 3 0 2 0 0 : 2
15 d 1 1 1 1 1 : 3 <- run out of gas. need to borrow more from #2.
16 ^ - ^
17
18 p 3 0 2 0 0 : 2
19 d 1 1 1 1 1 : 4
20 ^ - - ^
13. 20 ^ - - ^
21
22 p 3 0 2 0 0 : 5
23 d 1 1 1 1 1 : 5 < get enough fuel and get back to starting position. done.
24 ^ - - - ^
Implementation
Check the Java implementation below:
1 int getStartPos(int[][] travleInfo) { ?
2 int startPos = 0;
3 int ttlLegs = travleInfo.length;
4
5 int ttlPetrol = 0; // accumulated fuel.
6 int distanceToTravle = 0; // total distance.
7 int curPos = startPos;
8 // or loop for ttlLengts times: for (int i = 0; i < ttlLegs; i++)
9 do {
10 if (ttlPetrol >= distanceToTravle) {
11 // can move to another bunk
12 ttlPetrol += travleInfo[curPos][0];
13 distanceToTravle += travleInfo[curPos][1];
14 curPos = (curPos + 1) % ttlLegs;
15 } else {
16 // cannot move any more. need to borrow some fuel by moving
17 // starting point backwards.
18 startPos = (startPos + ttlLegs - 1) % ttlLegs;
19 ttlPetrol += travleInfo[startPos][0];
20 distanceToTravle += travleInfo[startPos][1];
21 }
22 } while (curPos != startPos);
23
24 return startPos;
25 }
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/list/BunksInCircle.java#19
An alternative solution
General Idea
Follow the example above. If there is a station does not have enough fuel to move on, the stations passed by do not have the
starting point for sure. So we can start over again from the next station.
Implementation
1 int getStartPos2(int[][] travleInfo) { ?
2 int startPos = 0;
3 int ttlLegs = travleInfo.length;
4
5 int ttlPetrol = 0;
6 int distanceToTravle = 0;
7 int curPos = startPos;
8 for (int i = 0; i < ttlLegs; i++) {
9 if (ttlPetrol >= distanceToTravle) {
10 // can move on.
11 ttlPetrol += travleInfo[curPos][0];
12 distanceToTravle += travleInfo[curPos][1];
13 curPos = (curPos + 1) % ttlLegs;
14 } else {
15 // those passed station does not contain the staring point.
16 // set the starting point to next new station.
17 startPos = curPos;
18 ttlPetrol = 0;
19 distanceToTravle = 0;
20 }
21 }
22
23 return startPos;
24 }
Search
Merge k sorted arrays
Given k sorted arrays, merge them into one sorted array
1. time and space complexity.
2. optimise for only two arrays.
Merge k sorted arrays into one larger sorted array
This problem can be solved by the standard merge sort algorithm. We can maintain a small data structure, preferably a min-
heap, that holds the first element of the k data streams. Each time we extract the smallest element from the min heap. Because
the streams are also sorted, this min value is also the smallest for all current streams. In this way, we merge the sorted
14. streams into one larger stream.
In the implementation, at the end of each input stream, there needs a marker that tells the end, which is treated as larger than
any meaningful number. When this marker appears at the top of the min heap, we know all elements are merged into the
output stream.
Some note worthy points
1. The sorted arrays are usually files on disk, which can be generally represented by an interface as follows.
1 public interface AStream { ?
2 Integer curData(); //the current data in the stream.
3 Integer readNext(); //advance one vavlue
4 }
2. Marker for the end of the stream.
1 /** ?
2 * Marker of the end, which is defined as larger than any number.
3 */
4 final Integer SUPER_LARGE = null;
In Java, this marker can be implemented through the Comparable interface. Or simply append the Integer.MAX_VALUE at the
end of each stream.
1 public int compareTo(QElem qElem) { ?
2 Integer curData = stream.curData();
3 Integer qElmData = qElem.stream.curData();
4 if (qElmData == SUPER_LARGE && curData == SUPER_LARGE)
5 return 0;
6 if (curData == SUPER_LARGE)
7 return 1;
8 if (qElmData == SUPER_LARGE)
9 return -1;
10 return curData.compareTo(qElmData);
11 }
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/sort/MergeNSorted.java#102
Merge 2 sorted arrays into one sorted array
Following code is quite self-explanatory.
1 Integer[] merge2(Integer[] data1, Integer[] data2) { ?
2 // 1. no need to merge when one queue is empty
3 if (data1 == null)
4 return data2;
5 if (data2 == null)
6 return data1;
7
8 // 2. merge
9 int p1 = 0, p2 = 0, m = 0;
10 Integer[] mrgData = new Integer[data1.length + data2.length];
11 while (p1 < data1.length && p2 < data2.length) {
12 if (data1[p1] < data2[p2]) {
13 mrgData[m++] = data1[p1++];
14 } else {
15 mrgData[m++] = data2[p2++];
16 }
17 }
18
19 // 3. handing remaining data still in the queue.
20 while (p1 < data1.length) {
21 mrgData[m++] = data1[p1++];
22 }
23 while (p2 < data2.length) {
24 mrgData[m++] = data2[p2++];
25 }
26 return mrgData;
27 }
Merge 2 sorted arrays into one sorted array in place
There are algorithms that merges array like [1,3,5,7,2,4,6,8] into [1,2,3,4,5,6,7,8] with O(1) space and O(n) time.
This problem sounds simple but the solution is not trivial at all. StackOverflow has some discussion.
A simple problem starting with will be expanded to be more complex,
1. Check with the interviewer and understand what he wants.
2. Go through some examples to get some clue. Do not put down the code on whiteboard until you have the code in
your mind.
3. Give a simple answer that easy to explain and implement.
Code
15. Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/sort/MergeNSorted.java#142
Search
25th fastest car in 49 cars
49 race cars and no two have the same speed. Now give you 7 tracks with equal length to find the 25th fastest car. At least
how many races are needed.(no time recorder)
(or 25 horses)
General idea
Checking median of medians algorithm. We can divide and concur the problem using the median of medians as the pivot.
Solution 1
Round one
1. (7 races) Divide the cars into 7 groups and get the order within each group.
2. (1 race) Take the 7 medians and get the order. Find the median of medians (denote as o). In following example, it is 34.
3. (3 races) Find the rank of the median of medians. Take 6 elements from lower-left corner (25 ~ 33) and upper-right
corner (13 ~ 21) and race against the o (34). After 3 rounds, we know the rank of this median of medians within in the
whole set. The best case is that o is the global median (25th fastest). The worst case is that o is the 16th or 34th
fastest.
This example shows one possible worst case.
1 1 2 3 4 13 14 15 <- group 1 ?
2 5 6 7 8 16 17 18
3 9 10 11 12 19 20 21 ...
4 22 23 24 34 35 36 37
5 25 26 27 38 39 40 41 <- group 5
6 28 29 30 42 43 44 45 <- group 6
7 31 32 33 46 47 48 49 <- group 7
Round two
We want to find the rank of other medians in a binary search fashion.
1. (3 races) Pick the median less than 34, which is 12. Race it against the lower-left and upper-right corner cars. After 3
races, we know its rank is 12.
Now, the gap between those two medians are at most 21, as shown in this example.
Round three
Rearrange the 21 cars (>12 and <34) as follows.
1 13 14 15 <- group 1 ?
2 16 17 18
3 19 20 21 ...
4 22 23 24
5 25 26 27 <- group 5
6 28 29 30 <- group 6
7 31 32 33 <- group 7
Each row is still sorted.
1. (1 race) Find the median of medians again, which is 23.
2. (1 race) Find its rank. After this step, we know the car in previous step is ranked 23 for sure.
3. (1 race) Similar to a binary search, check the rank of another median, 29.
4. (1 race) Sort all cars between 23 ~ 29 (exclusive). The 25th fastest car is found.
So at most 18 races are needed to get the 25th fastest car.
! An incorrect solution !
This question is not as easy as it seems.
One common error is to exclude elements that are not globally larger or smaller. For example, if we pick 7 elements out of 49.
After a round of sort, obviously, we cannot exclude any element out of this 7 elements. Because the element excluded might be
the median of the 49 elements. Similarly, following solution is not correct, in that the median can be within s or l region.
There are 7 + 1 + 1 + 1 = 10 rounds needed to get the 25 fastest car.
Steps
1. divide the cars into 7 groups and get the order within each group.
2. find the 7 medians and get the order of medians.
Now we have.
s s s s + + +
16. s s s s + + +
s s s s + + +
s s s o l l l
- - - l l l l
- - - l l l l
- - - l l l l
We know s < {+, -, o} < l. We can safely exclude those s and l. The 25th car must still remain in +, -, or o. wrong!
now we have
. . .
. . .
. . .
o
. . .
. . .
. . .
Note that each line is still ordered.
3. Pick the medians of each line and run once again and exclude 2 * 6 = 12 cars.
4. Race once again among the left 7 cars. Pick the median, which is the 25th fastest car.
Search
Find the intersection of two sets represented by sorted arrays.
1. How to find the common elements in two sorted arrays?
2. What if the sizes of arrays are quite different?
General Idea
Follow the steps as in merging two arrays, output the elements that appears at the head of both arrays.
If the sizes are quite different
case 1) If the larger array can be accessed randomly
For each element in the smaller array, search it in the larger one.
case 2) If the larger array is too large to save on one computer
Split the larger array and distribute it to multiple computers. Also split the smaller array according to the lower and higher
boundary of the sub-array on each computer. Query the intersection and combine the result.
Search
Young tableau
Given two sorted positive integer arrays A[n] and B[n] (W.L.O.G, let's say they are decreasingly sorted), we define a set
. Obviously there are n^2 elements in S. The value of such a pair is defined as Val(a,b) = a + b. Now we
want to get the n pairs from S with largest values. The tricky part is that we need an O(n) algorithm.
submit my answer
Search
Young tableau.
A m*n matrix of integer, all rows and columns are sorted in ascending order. Find the most efficient way to print out all
numbers in ascending order.
Young tableaus. CLRS 6-3.
orig
submit my answer
Search
find the k-th largest number in two sorted lists
find the k-th largest number in two sorted lists
submit my answer
17. Search
binary search for a range.
Given a sorted array of float numbers, find the start and end position of a range.
For example,
input
array : {0, 0.1, 0.1, 0.2, 0.3, 0.3, 0.4}
range : 0.1 <= x <= 0.3
output:
1 5
General idea
The key point here is to search using the previous result.
First, we should find where the data roughly is, which is done by a binary search to find an element in the range. In the search,
the search region will be narrowed down to [posStart,posEnd]. The element in the middle further divide this region into
[posStart,mid] and [mid,posEnd].
Then, we can search in these two separate regions for the real starting and ending position.
Be careful not having dead loop in the binary search.
1 public void findRange(double[] data, double rangeStart, double rangeEnd, ?
2 Mutable<Integer> pStart, Mutable<Integer> pEnd) {
3 pStart.setValue(-1);
4 pEnd.setValue(-1);
5
6 if (data == null || data.length == 0)
7 return;
8
9 int posStart = 0, posEnd = data.length - 1;
10
11 // find where the data roughly is.
12 int inRange = 0;
13 while (posStart <= posEnd) {
14 inRange = (posStart + posEnd) / 2;
15 if (data[inRange] < rangeStart) {
16 posStart = inRange + 1;
17 } else if (data[inRange] > rangeEnd) {
18 posEnd = inRange - 1;
19 } else {
20 // found: rangeStart <= data[inRange] <= rangeEnd;
21 break;
22 }
23 }
24 // not found
25 if (posStart > posEnd)
26 return;
27
28 // Now, data[inRange] is in the range of data.
29 // We need to find the index that points to rangeStart.
30 int pEnd2 = inRange;
31 while (posStart <= pEnd2) {
32 int n = (posStart + pEnd2) / 2;
33 if (data[n] < rangeStart) {
34 posStart = n + 1;
35 } else {
36 pEnd2 = n - 1;
37 }
38 // note: there is no break when rangeStart was found.
39 }
40
41 // and find the end position in [inRange,posEnd]
42 int pStart2 = inRange;
43 while (pStart2 <= posEnd) {
44 int n = (pStart2 + posEnd) / 2;
45 if (data[n] > rangeEnd) {
46 posEnd = n - 1;
47 } else {
48 pStart2 = n + 1;
49 }
50 // note: there is no break;
51 }
52
53 if (posStart <= posEnd) {
54 pStart.setValue(posStart);
55 pEnd.setValue(posEnd);
56 }
57 }
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/search/BinarySearch.java#30
18. Just a reminder how it works
The binary search is to reduce the search range by divide and conquer. For example, we want to find 5 in a sorted array {0, 1,
2, 3, 4, 5, 6, 7, 8, 9}. Using (0+9)/2=4 as the mid value, the target value must be on the right side. so, the search range
becomes {4, .. 9}.
1 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ?
2 ^ ^ ^
3 lo mid hi
4
5 0, 1, 2, 3, 4, {5, 6, 7, 8, 9}
6 ^ ^ ^
7 lo mid hi
8
9 0, 1, 2, 3, 4, 5,{} 6, 7, 8, 9
10 ^ ^
11 hi lo // when the search terminates, the search range is empty
The code:
1 while(lo <= hi){ ?
2 in mid = (lo + hi) / 2; // or mid = lo + (hi-lo)/2, to avoid overflow.
3 if(arr[mid] == result)
4 return mid; //return when there is a match.
5 if(arr[mid] < lo){
6 lo = mid + 1;
7 }else{
8 hi = mid - 1;
9 }
10 }
Note that the terminate condition can also be 'only one element left'. The code is like:
1 while(lo < hi){ ?
2 in mid = (lo + hi) / 2; // or mid = lo + (hi-lo)/2, to avoid overflow.
3 if(arr[mid] == result)
4 return mid; //return when there is a match.
5 if(arr[mid] < lo){
6 lo = mid;
7 }else{
8 hi = mid;
9 }
10 }
Search
Sort a partially sorted array.
1. An array is preprocessed so that A[i] < A[i+N]. Sort this array.
2. An array is partially sorted so that A[i] < A[j], when i < j - N. Sort this array.
1. It is shell sort half way done. So, continue the shell sort until the step length (N) is 1.
2. Use a min-heap of size N as a buffer. After the stream passing through this buffer, the data will be sorted.
That is,
... sorted ... [min-heap of size N] ... partially sorted ...
Search
Axis Aligned Rectangles
Describe an algorithm that takes an unsorted array of axis-aligned rectangles and returns any pair of rectangles that overlaps,
if there is such a pair.
Axis-aligned means that all the rectangle sides are either parallel or perpendicular to the x and y axis. You can assume that
each rectangle object has two variables in it: the x/y coordinates of the upper-left corner and the bottom-right corner.
1. Hacking_a_Google_Interview_Practice_Questions_Person_A
1. http://courses.csail.mit.edu/iap/interview/materials.php
2. Interval tree
19. Manipulate data in a stream
Search
Find a number shows up over half times
Given a stream of integers, at a given time, there is a number appeared more than half time. How to find this number.
submit my answer
Search
find most frequently visited pages (url)
Find k most frequently visited (clicked) pages from a big log file that contains list of time stamp, session ID and Page ID in each
line. The file is too big to fit into memory.
1. Find the most k frequently visited pages within a month.
2. Find the most k frequently visited pages within a couple months. Each month should be reported separately.
3. Find the most k frequently visited pages with some patten. For example, patten a>b>a is for a user visits page a, page b,
then page a again, The lines are not strictly sorted by time.
submit my answer
Search
find the kth largest number in a list
1. . sorted.
2. . unsorted.
limitation of space.
sorted
If the list is sorted, the problem is reduced to find the n-th number to the end, which can be solved by a queue of size k.
Return the tail after the last element is scanned.
unsorted
Maintain a min-heap of size k. During the scan, if the current number is larger than the top, throw this number into the min-
heap. After the scan, the top of the heap is the k-th largest of the list. The time complexity is O(n log(k)).
Search
In a linked list find the nth node from the end of this list.
You can only scan once.
submit my answer
Search
pick up a object from a stream with equal chance
There is a object steam of unknown length. Limited space to hold only one object. Scan the stream once. At the end of the
stream, the place should hold an random object in the stream of equal possibility.
submit my answer
Search
find a subarray with maximum sum in a given array.
variant:
1. . circular array.
2. . size <=n
submit my answer
20. Search
Find the most k frequently visited pages with some patten
Find k most frequently visited (clicked) pages from a big log file that contains list of time stamp, session ID and Page ID in each
line. The file is too big to fit into memory.
Find the most k frequently visited pages with some patten. For example, patten a>b>a is for a user visits page a, page b,
then page a again.
submit my answer
21. String manipulation
Search
Implement atoi; convert a string into integer
Implement following c/c++ function:
1 int atoi ( const char * str ); ?
Write your test cases.
submit my answer
Search
finds the longest palindrome in a given string
finds the longest palindrome in a given string
submit my answer
Search
URL match
1. Match a input URL to the ones in the list.
For example, given a list:
1 http://www.example.com ?
2 example.org/test
3 test/test2
4 test2/test3
5 /root/test/test.html
6 /root/test/test/t.html
And the input "/test/test/", print the url "/root/test/test/t.html"
If this routine is frequently used, how to improve the performance.
submit my answer
Search
How to detect and remove near duplicate files among large amount files.
How to detect and remove near duplicate files among large amount files. For example, web pages are only different in
advertisement part.
submit my answer
Search
Ransom Notes
Ransom Notes is a note that each word are cut and paste from a magazine. [1]
Given a paragraph and a sentence, check if the words of the sentence are all in the paragraph. Each word in the paragraph can
be used only once.
references
1. ↑ Hacking a Google Interview Handout
String manipulation is very tricky. Before putting down any code, make sure
1. you understand the question well, and
2. discuss some test cases with the interviewer.
Analysis
22. The ransom notes problem checks if one set contains another set. We can use BST or hash table/map to help reduce the time
complexity in looking up.
Because it is a string manipulation problem, the tricky part actually comes from how to collect all words in the paragraph rather
than how to use the HashMap/BST.
For example, given "AA BB CC", the word is not only delimited by the blanks, but also implicitly by the begin and end of the
string. One way to solve this problem is to pad it with blank so that the words are uniformly delimited by blanks.
1 "AA BB CC" --> " AA BB CC " ?
Another way to extends the "isAlpha" function and treat all out of boundary characters are non alphabetical.
1 boolean isAlpha(int i) { ?
2 // Note: the code takes advantage of this definition.
3 if (i < 0 || i >= sbuf.length) //all chars out of boundary are considered to be non alphabetical.
4 return false;
5
6 int c = sbuf[i];
7 if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
8 return true;
9 return false;
10 }
The next step is to extract words from the notes. The the words can be identified by the alternating of non-alphabetical -->
alphabetical, which is the starting point of a word, and alphabetical --> non-alphabetical, which is the end of a word.
So, we have the code:
1 boolean checkRansomNote(String paragraph, String notes) { ?
2 if (paragraph == null)
3 return false;
4
5 if (notes == null)
6 return true;
7
8 // get the needed words for ransom.
9 Map<String, Integer> wordCol = getWords(notes);
10
11 sbuf = paragraph.toCharArray();
12 int convStart = -1, convEnd = -1;
13 for (int i = 0; i <= sbuf.length; i++) {
14 if (!isAlpha(i - 1) && isAlpha(i)) {
15 // find ...@A...
16 convStart = i;
17 } else if (!isAlpha(i) && isAlpha(i - 1)) {
18 // find ...A@...
19 convEnd = i - 1;
20 String wd = String.valueOf(sbuf, convStart, convEnd
21 - convStart + 1);
22
23 // check the current word.
24 if (!wordCol.containsKey(wd)) {
25 continue;
26 }
27
28 // the current word is useful, update the collection
29 int count = wordCol.get(wd) - 1;
30 if (count == 0) {
31 wordCol.remove(wd);
32 if (wordCol.isEmpty())
33 return true;
34 } else {
35 wordCol.put(wd, count);
36 }
37 }
38 }
39
40 // NOTE: make sure all words are handled, especially the first word
41 // and the last word.
42 return false;
43 }
One last thing, since the note tends to be smaller than the paragraph, the words in ransom note are saved in a HashMap for
better space/time efficiency.
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/test/solution/string/WordsSentence.java#24
reference
23. Search
min cover window
Given some chars and a string. Find the shortest substring that contains all given chars.
For example, given {'a', 'b', 'c'} and string "aabbcc", the shortest substring should be "abbc".
The key to this problem is to maintain a substring so that [startPos .. endPos] contains all keywords.
Following pseudo code sketches the process moving endPos and startPos alternatively to find the min cover window.
1 do{ ?
2 endPos = move endPos to right so that all keywords are found;
3 startPos = move startPos to right so that the substring still has all keywords but cannot be shorter;
4 //[startPos],[endPos] are keywords show up only once in the substring.
5
6 move startPos right to skip this left-most keyword for a new window;
7 } while( more words to scan );
In this Java implementation, keyCount bookmarks the number of occurrence of the keywords in the substring
para[startPos..endPos].
1 int numMissingKey = keys.length, minLen = para.length + 1; ?
2 for (int startPos = 0, endPos = 0; endPos < para.length; endPos++) {
3 // move endPos to include all keywords
4 if (keyCount.containsKey(para[endPos])) {
5 int cnt = keyCount.get(para[endPos]);
6 keyCount.put(para[endPos], cnt + 1);
7 if (cnt == 0) {
8 numMissingKey--; // find one missing keyword.
9 }
10 }
11 if (numMissingKey > 0)
12 continue;
13
14 // move startPos to find a min cover window, which has all keywords but
15 // cannot be shorter
16 for (; numMissingKey == 0 && startPos <= endPos; startPos++) {
17 if (!keyCount.containsKey(para[startPos])) {
18 continue;
19 }
20 int cnt = keyCount.get(para[startPos]);
21 keyCount.put(para[startPos], cnt - 1);
22 if (cnt > 1)
23 continue;
24 // this keyword is the only one in the substring. This
25 // keyword will be missing by moving startPos.
26
27 // so, [startPos..endPos] is a candidate for min cover.
28 numMissingKey++;
29 if (endPos - startPos < minLen) {
30 minLen = endPos - startPos;
31 start.setValue(startPos);
32 end.setValue(endPos);
33 }
34 }
35 }
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/string/MinCoverWindow.java
Search
Design the data structure storing dictionary
cases:
1. the dictionary size is small.
2. the size is too large to fit in memory.
3. the size is too large to fit in one computer.
4. optimize to improve the performance. What is the bottle neck of your system.
orig
submit my answer
24. Search
implement readline using read
Implement readline using API read. The signature is defined as:
int read(char* buffer, int size); // return chars read in buffer
For example, given input stream "abcdnefgh",
read(buffer,3) returns 3 and next char is d.
read(buffer,7) returns 7 and next char is g.
readline(buffer,3) returns 3 and next char is d.
readline(buffer,7) returns 4 and next char is e.
submit my answer
Search
reverse a long list of words
There is a very long list of words delimited by blank. Output the words in reversed order.
submit my answer
Search
split string to words
A dictionary has n words.
Given a string, find how many ways to split the string into words so that all words are in the dictionary.
Code
Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/dp/SplitStringToWords.java#67
1 public int splitWords(String longString) { ?
2 wdArr = longString.toCharArray();
3
4 // cache the result. the top-down DP comes into play
5 cache = new int[wdArr.length];
6 Arrays.fill(cache, -1); // initialize the cache.
7
8 // for words being split out
9 q = new LinkedList<String>();
10 return splitWords_rec(0);
11 }
12
13 /**
14 * split string into words staring from 'pos'
15 *
16 * @param pos
17 * @return
18 */
19 int splitWords_rec(int pos) {
20 if (pos >= wdArr.length)
21 return 0;
22
23 // if it is already cached, don't bother calculate it again.
24 if (cache[pos] >= 0)
25 return cache[pos];
26
27 int splits = 0;
28
29 for (int len = 1; len <= dict.getMaxLen() && len + pos <= wdArr.length; len++) {
30 // check if current string starts with a word in the dictionary
31 String wd = String.valueOf(wdArr, pos, len);
32 if (!dict.hasWord(wd)) {
33 continue;
34 }
35
36 // if this word ends the whole string, we have found one split
37 if (pos + len == wdArr.length) {
38 q.add(wd);
39 // System.out.println(StringUtils.join(q, ","));
40 q.remove(q.size() - 1);
41 splits++;
42 continue;
43 }
44
45 // go ahead and split the string to the end.
46 q.add(wd);
47 splits += splitWords_rec(pos + len);
48 q.remove(q.size() - 1);
49 }
50 cache[pos] = splits;
25. 50 cache[pos] = splits;
51 return splits;
52 }
Search
Remove all duplicated chars in a string
Implement an algorithm
1 int removeDuplicate(char[] s) ?
For instance, change "abbcccdda" to "abcda" and return 4(the number of characters deleted).
Count the number of breakpoints, where a char is different from the previous one.
a|bb|cc|dd|a|
1 2 3 4 5
Pay attention to the last char. (Or just +1 after counting the internal breakpoints).
Search
log file processing
A large time-stamped log files, how to find the logs within a time range.
submit my answer
Search
Find whether one string is a subset of another string
Find whether one string is a subset of another string (not need to be contiguous, but the order should match).
General Idea
By examine following test cases, the question above is just a special case of string matching.
1 a b b c d d e f //<- target ?
2 a b c //<- pattern = a b c
3
4 a b b c d d e f
5 a d ~ c //<- pattern = a d c
6
7 a b b c d d e f
8 a b b f //<- pattern = a b b f
The solution is to consume one char in pattern when it matches against the target. If there is any char unmatched, the pattern
is not a substring.
Search
given a string check if the string is cycles of some pattern. O(nlgn)
For example, "abcabcabc" is "abc" repeated three times. or, "(abc){3}".
orig
submit my answer
Search
How to remove duplicate url from search engine crawler
How to remove duplicate url from search engine crawler
submit my answer
Search
find phone numbers in files
A large file contains phone numbers. Each line has at most one phone number. How do you process and return the total
number of phone numbers.
Check command grep, sed, and wc.
26. Search
implement the command "cd" or "dir"; simplify directory path.
Implement the command line "cd", or "dir".
Given a path, output the equivalent path with "." or ".." removed.
1. ".." is for parent directory
2. "." is for current directory
For example, given a path like: dir1/dir2/../dir3/./dir4, the result is dir1/dir3/dir4.
submit my answer
Search
design a data structure for wildcard match.
1. * only, once. e.g., a*b, *a, b*, or *
2. pattern is not restricted. including ? and *
3. target is zillions of url stored on multiple computer.
submit my answer
Search
given a large text file,find all the anagrams.
given a large text file,find all the anagrams.
submit my answer
Search
Implement putlong, itoa, atoi
Implement putlong, itoa, and atoi.
test cases:
1. . 0
2. . positive/negative number
3. . overflow
Search
JSON prettier
format the JSON format data by proper indent and new line.
For example, given
{"id":"id-123","woe_id":[123,456,789],"attribute":{"title":"a","desc":"b"}}
output:
{
"id":"id-123",
"woe_id":[123,456,789],
"attribute":{
"title":"a",
"desc":"b"
}
}
General Idea
Scan the stream, take action on corresponding char:
current char action
{ print char; indent+=2; insert("n"); insertIndent();
, print char; insert("n"); insertIndent();
} indent-=2; insert("n"); insertIndent(); print char;
27. Search
Find links/urls from one html page
Find links/urls from one html page using C++.
How do you store those links.
submit my answer
Search
Given an arbitrarily long string, design an algorithm to find the longest repetitive substring.
Given an arbitrarily long string, design an algorithm to find the longest repetitive substring. For example, the longest repetitive
substring of "abcabcabc" is "abcabc".
submit my answer