Boyer-Moore

Features

  • Time: O(m+n), where m is the length of pattern, and n is the length of input string.
  • Space: O(N), right array holds, if finite number of different chars as pattern.
  • Times of compare: It will take O(m/n) times compares. However, it is normally less than that. You may apply some rules to define a better right array in some case.

Code

/**
 * Created by Readman on 6/23/14.
 */
public class BoyerMoore {
    private static int[] right; // The array holds every single char 's rightmost position in pattern string.
    private static String pat = "ee"; // the pattern string

    /**
     *  Time (M+N), where m is the length of pattern string, and n is the length of input string.
     * @param s input string
     * @return the index of match's beginning
     */

    public static int BoyerMoore(String s) {
        calculateRight();   //return right array
        int N = s.length(); // the length of input string
        int M = pat.length(); // the length of input pattern string
        int skip; // the number of skip
        for (int i = 0; i < N - M; i += skip) { // i go through from 0 to N-M, and jump skip's char
            skip = 0; // set skip to 0
            for (int j = M - 1; j >= 0; j--) // j go through back from M -1 (the last char in pattern string) to 0(inclusive)
            {
                if (pat.charAt(j) != s.charAt(i + j)) {
                    // if char in j is not same to the char in i+j
                    skip = j - right[s.charAt(i + j)]; // find input string 's i+j char in right array, and use j minus to find the jump position.
                    if (skip < 1) // if the skip is less than 1 (not found)
                        skip = 1; // jump to next
                    break; // from start again
                }
            }
                if (skip == 0) // if not skip needed
                    return i; // found it
            }
        return N; // not found
    }

    /**
     * we use this method to calculate the char's most-right position in pattern string.
     */

    public static void calculateRight() {
        int M = pat.length(); // the length of pattern string
        int R = 256; // the number of all possible char, 256 is for ascii
        right = new int[R];
        for (int c = 0; c < R; c++) // mark all possible char to -1
            right[c] = -1;
        for (int i = 0; i < M; i++) { // continues to increase the i from to M(the length of pattern string
            int q = pat.charAt(i);
            right[q] = i; // to mark all same chars in string to their most-right index.
        }
    }

    public static void main(String[] args) {
        String s = "findinahaystackneedle";
        System.out.println(BoyerMoore(s));
    }
}

Code in English

  1. Build a right array to hold all chars.
  2. For each different char in pattern string, set each char in right array to -1.
  3. For each different char in pattern string, loop every char and put each one in the right array to find the most-right one.
  4. We have done pre-caculate part.
  5. A variable named skip is initialised. It use to hold the number of char we skip in input string. So it will be initialised to 0, which means there is no need to skip (found the positiion of substring).
  6. Loop input String with i, from 0 to N-M, where N is the length of input string, and M is the length of parttern string. Simply loop from
  7. Loop pattren string from back to start with j, from M-1(M is the length of pattern string) to 0
  8. we compare j to i+j (i+j is match position in input string), if they are not same, we find the char in our right array, and use j minus it to get our number of skips.
  9. if skips is less than 1, it means that we have to go back to find again. At this point, we don’t want this happen. we have to set it to 1, to make sure our algorithm will not loop back.
  10. Finnally, becase we don’t get match, we break the inner loop.
  11. After we jump out inner loop, we need to check variable skip, if it equals to 0, which means, after pass through all chars in pattern, we have all chars matchs with input string, we simply return i.
  12. if our loops didn’t find anything, return N.