algorithm-reading
leetcode/lintcode题解/算法学习笔记
1. Part I - Basics
2. Data Structure
- 2.1. Linked List
- 2.2. Binary Tree
- 2.3. Binary Search Tree
- 2.4. Huffman Compression
- 2.5. Priority Queue
3. Basics Sorting
- 3.1. Bubble Sort
- 3.2. Selection Sort
- 3.3. Insertion Sort
- 3.4. Merge Sort
- 3.5. Quick Sort
- 3.6. Heap Sort
- 3.7. Bucket Sort
- 3.8. Counting Sort
- 3.9. Radix Sort
4. Basics Misc
- 4.1. Bit Manipulation
5. Part II - Coding
6. String - 字符串
- 6.1. strStr
- 6.2. Two Strings Are Anagrams
- 6.3. Compare Strings
- 6.4. Anagrams
- 6.5. Longest Common Substring
- 6.6. Rotate String
- 6.7. Reverse Words in a String
7. Integer Array - 整型数组
- 7.1. Remove Element
- 7.2. Zero Sum Subarray
- 7.3. Subarray Sum K
- 7.4. Subarray Sum Closest
- 7.5. Product of Array Exclude Itself
- 7.6. Partition Array
- 7.7. First Missing Positive
- 7.8. 2 Sum
- 7.9. 3 Sum
- 7.10. 3 Sum Closest
- 7.11. Remove Duplicates from Sorted Array
- 7.12. Remove Duplicates from Sorted Array II
- 7.13. Merge Sorted Array
- 7.14. Merge Sorted Array II
- 7.15. Median
8. Binary Search - 二分搜索
- 8.1. Binary Search
- 8.2. Search Insert Position
- 8.3. Search for a Range
- 8.4. First Bad Version
- 8.5. Search a 2D Matrix
- 8.6. Find Peak Element
- 8.7. Search in Rotated Sorted Array
- 8.8. Find Minimum in Rotated Sorted Array
- 8.9. Search a 2D Matrix II
- 8.10. Median of two Sorted Arrays
- 8.11. Sqrt x
- 8.12. Wood Cut
9. Math and Bit Manipulation - 数学技巧与位运算
- 9.1. Single Number
- 9.2. Single Number II
- 9.3. Single Number III
- 9.4. O1 Check Power of 2
- 9.5. Convert Integer A to Integer B
- 9.6. Factorial Trailing Zeroes
- 9.7. Unique Binary Search Trees
- 9.8. Update Bits
- 9.9. Fast Power
10. Linked List - 链表
- 10.1. Remove Duplicates from Sorted List
- 10.2. Remove Duplicates from Sorted List II
- 10.3. Remove Duplicates from Unsorted List
- 10.4. Partition List
- 10.5. Two Lists Sum
- 10.6. Two Lists Sum Advanced
- 10.7. Remove Nth Node From End of List
- 10.8. Linked List Cycle
- 10.9. Linked List Cycle II
- 10.10. Reverse Linked List
- 10.11. Reverse Linked List II
- 10.12. Merge Two Sorted Lists
- 10.13. Merge k Sorted Lists
- 10.14. Reorder List
- 10.15. Copy List with Random Pointer
- 10.16. Sort List
- 10.17. Insertion Sort List
- 10.18. Check if a singly linked list is palindrome
11. Reverse - 翻转法
- 11.1. Recover Rotated Sorted Array
12. Binary Tree - 二叉树
- 12.1. Binary Tree Preorder Traversal
- 12.2. Binary Tree Inorder Traversal
- 12.3. Binary Tree Postorder Traversal
- 12.4. Binary Tree Level Order Traversal
- 12.5. Maximum Depth of Binary Tree
- 12.6. Balanced Binary Tree
- 12.7. Binary Tree Maximum Path Sum
- 12.8. Lowest Common Ancestor
13. Binary Search Tree - 二叉搜索树
- 13.1. Insert Node in a Binary Search Tree
- 13.2. Validate Binary Search Tree
- 13.3. Search Range in Binary Search Tree
- 13.4. Convert Sorted Array to Binary Search Tree
- 13.5. Convert Sorted List to Binary Search Tree
- 13.6. Binary Search Tree Iterator
14. Exhaustive Search - 穷竭搜索
- 14.1. Subsets
- 14.2. Unique Subsets
- 14.3. Permutation
- 14.4. Unique Permutations
- 14.5. Unique Binary Search Trees II
15. Dynamic Programming - 动态规划
- 15.1. Triangle
- 15.2. Knapsack - 背包问题
  - 15.2.1. Backpack
- 15.3. Matrix
  - 15.3.1. Minimum Path Sum
  - 15.3.2. Unique Paths
- 15.4. Sequence
  - 15.4.1. Climbing Stairs
  - 15.4.2. Jump Game
- 15.5. Word Break
- 15.6. Longest Increasing Subsequence
- 15.7. Palindrome Partitioning II
- 15.8. Longest Common Subsequence
- 15.9. Edit Distance
16. Appendix I Interview and Resume
- 16.1. Interview
- 16.2. Resume

algorithm-reading

Word Break

category: [DP_Sequence]

Source

leetcode: Word Break | LeetCode OJ
lintcode: (107) Word Break

Given a string s and a dictionary of words dict, determine if s can be
segmented into a space-separated sequence of one or more dictionary words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

题解

单序列(DP_Sequence) DP 题，由单序列动态规划的四要素可大致写出：

State: f[i] 表示前i个字符能否根据词典中的词被成功分词。
Function: f[i] = or{f[j], j < i, letter in [j+1, i] can be found in dict}, 含义为小于i的索引j中只要有一个f[j]为真且j+1到i中组成的字符能在词典中找到时，f[i]即为真，否则为假。具体实现可分为自顶向下或者自底向上。
Initialization: f[0] = true, 数组长度为字符串长度 + 1，便于处理。
Answer: f[s.length]

考虑到单词长度通常不会太长，故在s较长时使用自底向上效率更高。

Python

class Solution:
    # @param s, a string
    # @param wordDict, a set<string>
    # @return a boolean
    def wordBreak(self, s, wordDict):
        if not s:
            return True
        if not wordDict:
            return False

        max_word_len = max([len(w) for w in wordDict])
        can_break = [True]
        for i in xrange(len(s)):
            can_break.append(False)
            for j in xrange(i, -1, -1):
                # optimize for too long interval
                if i - j + 1 > max_word_len:
                    break
                if can_break[j] and s[j:i + 1] in wordDict:
                    can_break[i + 1] = True
                    break
        return can_break[-1]

C++

class Solution {
public:
    bool wordBreak(string s, unordered_set<string>& wordDict) {
        if (s.empty()) return true;
        if (wordDict.empty()) return false;

        // get the max word length of wordDict
        int max_word_len = 0;
        for (unordered_set<string>::iterator it = wordDict.begin();
         it != wordDict.end(); ++it) {

            max_word_len = max(max_word_len, (*it).size());
        }

        vector<bool> can_break(s.size() + 1, false);
        can_break[0] = true;
        for (int i = 1; i <= s.size(); ++i) {
            for (int j = i - 1; j >= 0; --j) {
                // optimize for too long interval
                if (i - j > max_word_len) break;

                if (can_break[j] && 
            wordDict.find(s.substr(j, i - j)) != wordDict.end()) {

                    can_break[i] = true;
                    break;
                }
            }
        }

        return can_break[s.size()];
    }
};

Java

public class Solution {
    public boolean wordBreak(String s, Set<String> wordDict) {
        if (s == null || s.length() == 0) return true;
        if (wordDict == null || wordDict.isEmpty()) return false;

        // get the max word length of wordDict
        int max_word_len = 0;
        for (String word : wordDict) {
            max_word_len = Math.max(max_word_len, word.length());
        }

        boolean[] can_break = new boolean[s.length() + 1];
        can_break[0] = true;
        for (int i = 1; i <= s.length(); i++) {
            for (int j = i - 1; j >= 0; j--) {
                // optimize for too long interval
                if (i - j > max_word_len) break;

                String word = s.substring(j, i);
                if (can_break[j] && wordDict.contains(word)) {
                    can_break[i] = true;
                    break;
                }
            }
        }

        return can_break[s.length()];
    }
}

源码分析

Python 之类的动态语言无需初始化指定大小的数组，使用时下标i比 C++和 Java 版的程序少1。使用自底向上的方法求解状态转移，首先遍历一次词典求得单词最大长度以便后续优化。

复杂度分析

求解词典中最大单词长度，时间复杂度为词典长度乘上最大单词长度 $O(L_D \cdot L_w)$
词典中找单词的时间复杂度为 $O(1)$ (哈希表结构)
两重 for 循环，内循环在超出最大单词长度时退出，故最坏情况下两重 for 循环的时间复杂度为 $O(n L_w)$ .
故总的时间复杂度近似为 $O(n L_w)$ .
使用了与字符串长度几乎等长的布尔数组和临时单词word，空间复杂度近似为 $O(n)$ .