Tokenizing a string denotes splitting a string with respect to some delimiter(s). There are many ways to tokenize a string. In this article four of them are explained:
Using stringstream
A stringstream associates a string object with a stream allowing you to read from the string as if it were a stream.
Below is the C++ implementation :Â
C++
// Tokenizing a string using stringstream#include <bits/stdc++.h>Â
using namespace std;Â
int main(){         string line = "GeeksForGeeks is a must try";         // Vector of string to save tokens    vector <string> tokens;         // stringstream class check1    stringstream check1(line);         string intermediate;         // Tokenizing w.r.t. space ' '    while(getline(check1, intermediate, ' '))    {        tokens.push_back(intermediate);    }         // Printing the token vector    for(int i = 0; i < tokens.size(); i++)        cout << tokens[i] << '\n';} |
GeeksForGeeks is a must try
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(n-d) where n is the length of string and d is the number of delimiters.
Using strtok()
// Splits str[] according to given delimiters. // and returns next token. It needs to be called // in a loop to get all tokens. It returns NULL // when there are no more tokens. char * strtok(char str[], const char *delims);
Below is the C++ implementation :Â
C++
// C/C++ program for splitting a string// using strtok()#include <stdio.h>#include <string.h>Â
int main(){Â Â Â Â char str[] = "Geeks-for-Geeks";Â
    // Returns first token     char *token = strtok(str, "-");Â
    // Keep printing tokens while one of the    // delimiters present in str[].    while (token != NULL)    {        printf("%s\n", token);        token = strtok(NULL, "-");    }Â
    return 0;} |
Geeks for Geeks
Â
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Another Example of strtok() :
C
// C code to demonstrate working of// strtok#include <string.h>#include <stdio.h>Â
// Driver functionint main(){ // Declaration of string    char gfg[100] = " Geeks - for - neveropen - Contribute";Â
    // Declaration of delimiter    const char s[4] = "-";    char* tok;Â
    // Use of strtok    // get first token    tok = strtok(gfg, s);Â
    // Checks for delimiter    while (tok != 0) {        printf(" %s\n", tok);Â
        // Use of strtok        // go through other tokens        tok = strtok(0, s);    }Â
    return (0);} |
Geeks for neveropen Contribute
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Using strtok_r()
Just like strtok() function in C, strtok_r() does the same task of parsing a string into a sequence of tokens. strtok_r() is a reentrant version of strtok().
There are two ways we can call strtok_r()Â
// The third argument saveptr is a pointer to a char * // variable that is used internally by strtok_r() in // order to maintain context between successive calls // that parse the same string. char *strtok_r(char *str, const char *delim, char **saveptr);
Below is a simple C++ program to show the use of strtok_r() :Â
C++
// C/C++ program to demonstrate working of strtok_r()// by splitting string based on space character.#include<stdio.h>#include<string.h>Â
int main(){Â Â Â Â char str[] = "Geeks for Geeks";Â Â Â Â char *token;Â Â Â Â char *rest = str;Â
    while ((token = strtok_r(rest, " ", &rest)))        printf("%s\n", token);Â
    return(0);} |
Geeks for Geeks
Time Complexity: O(n ) where n is the length of string.
Auxiliary Space: O(1).
Using std::sregex_token_iterator
In this method the tokenization is done on the basis of regex matches. Better for use cases when multiple delimiters are needed.
Below is a simple C++ program to show the use of std::sregex_token_iterator:
C++
// CPP program for above approach#include <iostream>#include <regex>#include <string>#include <vector>Â
/** * @brief Tokenize the given vector    according to the regex * and remove the empty tokens. * * @param str * @param re * @return std::vector<std::string> */std::vector<std::string> tokenize(                     const std::string str,                          const std::regex re){    std::sregex_token_iterator it{ str.begin(),                              str.end(), re, -1 };    std::vector<std::string> tokenized{ it, {} };Â
    // Additional check to remove empty strings    tokenized.erase(        std::remove_if(tokenized.begin(),                             tokenized.end(),                       [](std::string const& s) {                           return s.size() == 0;                       }),        tokenized.end());Â
    return tokenized;}Â
// Driver Codeint main(){    const std::string str = "Break string                    a,spaces,and,commas";    const std::regex re(R"([\s|,]+)");       // Function Call    const std::vector<std::string> tokenized =                            tokenize(str, re);       for (std::string token : tokenized)        std::cout << token << std::endl;    return 0;} |
Break string a spaces and commas
Time Complexity: O(n * d) where n is the length of string and d is the number of delimiters.
Auxiliary Space: O(n)
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 neveropen!
