Find Words That Follow a Bigram Pattern
Find Words That Follow a Bigram Pattern
You have a sentence where words are separated by spaces. Given two words in sequence, you want to find all the third words that come right after them.
The approach is simple. You look for all instances of “first word + space + second word + space”. Once you find such a pattern, you grab the word that follows.
Here’s how to do it:
// time: O(N)
// space: O(k) k is the number of results
vector<string> findOcurrences(string str, string first, string second) {
string sub = first + " " + second + " ";
vector<string> res;
int p = str.find(sub, 0);
while(p != string::npos)
{
int st=p+sub.size();
string s;
for(int i=st;i<str.size()&&str[i]!=' ';++i) {
s.push_back(str[i]);
}
if (!s.empty()) res.push_back(s);
p = str.find(sub,st);
}
return res;
}The code works like this:
- Build the search pattern: first word, space, second word, space
- Find each occurrence of this pattern in the sentence
- For each match, extract the word that comes after
- Skip ahead and look for the next occurrence
This runs in linear time relative to the input size. The space used depends only on how many matches we find.
#string