Abhinav Kaushik

Remove Duplicates from String

Problem Statement

Given a string, we have to remove duplicate characters from the string such that each character appears only once (all the characters in the string should become unique). We can print the resultant string in any order.

Example

Input : “aaabbccd”
Output : “abcd”

Explanation

As we can see the frequency of all the characters

CharacterFrequency
a3
b2
c2
d1

After removing the duplicates, the frequency of all the characters became 1, so all the duplicate characters have been removed.

CharacterFrequency
a1
b1
c1
d1

So the output is abcd

Constraints

$1 <= \text{|S|} <= 10^6$ ($1 <= \text{Length of S} <= 10^6$)
$\text{a} <= S[i] <= \text{z}$ (all the characters in the string will be in lowercase)

Approach 1 (Using the Simple for loop)

This is the Brute-Force method to remove duplicates from string. In this method, we will use simple for loops to remove duplicates from string. In this algorithm, we will create an empty string and we will add the first occurrence of every character in our answer i.e. we will traverse our answer string and if the current character of the given string is present in our answer string, it means this is not the first occurrence of this character and it is already added to our answer. The algorithm and implementation of the approach are given below.

Algorithm:

  • Given a string s of length n
  • Create an empty string named ans = "", which will be our final answer
  • Run a for loop from i=0 to i=n-1
  • Run a nested for loop from j=0 to j=i-1
  • If at any point we found s[i]==s[j], we will break out of the loop (it means that this is not the first occurrence of this character and it is already added to our answer)
  • else if the inner loop ends without breaking, it means that we are visiting the character s[i] for the first time, so we will add it into our answer by ans=ans+s[i]

C++ Implementation

#include<bits/stdc++.h>
using namespace std;

void removeDuplicates(string &s,int n){
    string ans="";           
        int i,j;
        for(i=0;i<n;i++){
            for(j=0;j<i;j++){
                if(s[i]==s[j]){   
                    break;   // We are breaking here because this caharacter is already 
                }           // added to our answer because it was found earlier

            }
            if(j==i){       // The loop ends without breaking, it means this 
                ans+=s[i];  // is the first occurence of this character in the string
            }               // so we add this character to our answer
        }
    cout<<ans;
}

int main(){
    string s="aaabaababbccbccd";
    removeDuplicates(s,s.size());
}

Java Implementation

public class Main {

    public static void removeDuplicates(char s[], int n)
    {
        String ans="";
        int i,j;
        for(i=0;i<n;i++){
            for(j=0;j<i;j++){
                if(s[i]==s[j]){
                    break;     // We are breaking here because this caharacter is already
                }              // added to our answer because it was found earlier

            }
            if(j==i){          // The loop ends without breaking, it means this 
                ans+=s[i];     // is the first occurence of this character in the string 
            }                  // so we add this character to our answer
        }
        System.out.print(ans);
    }
    public static void main(String[] args)
    {
        char s[] = "aaabaababbccbccd".toCharArray();
        int n = s.length;
        removeDuplicates(s, n);
    }
}

Python Implementation

def removeDuplicates(s, n):
    ans = ""
    for i in range (0,n):
        k=0
        for j in range (0,i):
            if(s[i]==s[j]):
                k=1
                break       # We are breaking here because this caharacter is already 
                            # added to our answer because it was found earlier

        if(k==0):           # The loop ends without breaking, which means this is the first occurrence 
            ans=ans+s[i]    # of this character in the string so we add this character to our answer
    print(ans)

s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)

Output

abcd

Complexity Analysis

Time Complexity Analysis

  • In this algorithm, we are running a for loop from i=0 to i=n-1, which will perform i operations every time because we are running a loop from j=0 to j=i-1 every time. So in the worst case, the inner loop will take O(N) time complexity, and the outer loop will also take O(N) time complexity.

So the total worst-case time complexity for this approach to remove duplicates from string is O(N) * O(N) = O(N*N)

Space Complexity Analysis

In this approach we are using one array of characters to store the result i.e. ans, so we are using O(N) extra space.

Time Complexity: O(N*N)
Space Complexity: O(N)

Approach 2 (Using a Set)

In this method, we are going to use the Set Data structure to remove duplicates from string. Finally, we will modify the given string such that the k elements from the starting are the unique elements, and we will return k which is the total number of unique elements in the string. (k is a positive integer which is the total number of unique characters in the string). Then we will print all the unique characters.

Note: Set is a data structure that stores only one occurrence of each element inserted into it.

Algorithm:

  • Given a string s of length n
  • Create a set named ans
  • Insert all the characters of the string into the set
  • Maintain a pointer named k and initialize it with 0
  • Start traversing the set, let x be the current element of the set and modify the string by performing the following operations.
    • s[k] = x
    • k++
  • Finally return the value of k, which is now the total number of unique characters in the string, so we will print the characters present at the index from i=0 to i=k-1

C++ Implementation

#include<bits/stdc++.h>
using namespace std;

int removeDuplicates(string &s,int n){
    int k=0;
    set<int> ans;
    for(int i=0;i<n;i++){
        ans.insert(s[i]);   // Inserting every element into the set
    }
    for(auto x:ans){
        s[k]=x;             // Modifying the first k characters of the string 
        k++;                // beacuse every character x in the set is unique
    }
    return k;               // Return k, the total number of unique characters in the string
}

int main(){
    string s="aaabaababbccbccd";
    int k = removeDuplicates(s,s.size());
    for(int i=0;i<k;i++)
        cout<<s[i];
}

Java Implementation

import java.util.*;
public class Main {

    public static int removeDuplicates(char s[], int n)
    {
        int k=0;
        Set<Character> hash_set = new HashSet<Character>();
        for (int i = 0; i<n; i++) {
            hash_set.add(s[i]);   // Inserting every element into the set
        }  
        for(Character x:hash_set){
            s[k]=x;              // Modifying the first k characters of the string
            k++;                 // beacuse every character x in the set is unique
        } 
        return k;                // Return k, the total number of unique characters in the string
    }
    public static void main(String[] args)
    {
        char s[] = "aaabaababbccbccd".toCharArray();
        int n = s.length;
        int k = removeDuplicates(s, n);
        for(int i=0;i<k;i++)
            System.out.print(s[i]);
    }
}

Python Implementation

def removeDuplicates(s, n):
    ans=set()
    k=0
    for i in range(0, n):
        ans.add(s[i])   # Inserting every element into the set
    for x in ans:
        print(x,end="")

s = "aaabaababbccbccd"
n = len(s)
k = removeDuplicates(s, n)

Output

abcd

Complexity Analysis

Time Complexity Analysis

  • In the first step, we are performing n operations by inserting all the characters of the string into the set which will take O(N) time.
  • Internally, the set uses hashing and takes O(NlogN) time for sorting the characters.
  • Finally, we are traversing the set which will take O(N) time in the worst case (if every element is unique)

So the total worst-case time complexity for this approach to remove duplicates from string is O(N)+O(NlogN)+O(N) = O(NlogN)

Space Complexity Analysis

In this approach, we are using a set and we are inserting all the characters of the string into the set. So we are using O(N) extra space in this approach.

Time Complexity: O(NlogN)
Space Complexity: O(N)

Approach 3 (Using the Sorting Algorithm)

In this method, we will use sorting to remove duplicates from string. We will use the concept that in a sorted string all the characters which are equal to each other will be adjacent to each other, so we can easily check the adjacent elements of the string and store only the unique characters in our result. (The intuition behind this approach is to reduce the time complexity from O(N*N) to O(NlogN))

Algorithm:

  • Sort the string
  • Create an empty string named ans = "", which will be our final answer, and add the first character in the answer string
  • We know that the first element will always be unique and we have already added it to our answer string, so we maintain a pointer named k, which will start from 1 and keep track of the unique elements.
  • Run a loop from i=1 to i=n-1, if at any point s[i]!=s[i-1], then add s[i] into the answer string, else continue

C++ Implementation

#include<bits/stdc++.h>
using namespace std;

void removeDuplicates(string s,int n){
    sort(s.begin(),s.end());  // Sort the string
    string ans="";
        int i,j;
    ans+=s[0];
    for(i=1;i<n;i++){
        if(s[i]!=s[i-1]){     // If the previous element is not same as current, then add it to the
            ans+=s[i];        // answer because it the the first occurence of this character
        }
    }
    cout<<ans;
}

int main(){
    string s="aaabaababbccbccd";
    removeDuplicates(s,s.size());
}

Java Implementation

import java.util.Arrays;

public class Main {

    public static void removeDuplicates(char s[], int n)
    {
        String ans="";
        Arrays.sort(s);           // Sort the string
        int i,j;
        ans+=s[0];
        for(i=1;i<n;i++){
            if(s[i]!=s[i-1]){     // If the previous element is not same as current, then add it to the    
                ans+=s[i];        // answer because it the the first occurence of this character
            }
        }
        System.out.print(ans);
    }
    public static void main(String[] args)
    {
        char s[] = "aaabaababbccbccd".toCharArray();
        int n = s.length;
        removeDuplicates(s, n);
    }
}

Python Implementation

def removeDuplicates(s, n):
    s = ''.join(sorted(s))    # Sort the string
    ans = ""
    ans=ans+s[0]
    for i in range (1,n):
        if(s[i]!=s[i-1]):     # If the previous element is not same as current, then add it to the
            ans=ans+s[i]      # answer because it the the first occurence of this character
    print(ans)

s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)

Output

abcd

Complexity Analysis

Time Complexity Analysis

  • In the first step, we are sorting the string which takes O(NlogN) time
  • Then we are running the loop from 1 to n-1 which takes O(N) time

So the total worst-case time complexity for this approach to remove duplicates from string is O(NlogN)+O(N) = O(NlogN)

Space Complexity Analysis

In this approach we are using one array of characters to store the result i.e. ans, so we are using O(N) extra space.

Time Complexity: O(NlogN)
Space Complexity: O(N)

Approach 4 (Using Hashing)

In this method, we are going to use the Hashing to remove duplicates from string. In this approach, we will create a map that will have a maximum size of 26 (because the given string only contains lower case characters specified in the problem statement). Then we will initialize the frequency 1 for each character present in the string. Then we will again traverse the string and if the count of any character is greater than 0, we will add this character to our answer and update its frequency to 0 so that next time we should not add it to our answer.

Algorithm:

  • Given a string s of length n
  • Create a map of named m
  • Create an empty string named ans = "", which will be our final answer
  • Run a for loop from i=0 to i=n-1 and make the frequency of s[i] equal to 1
  • Now we will traverse the string again, and if m[s[i]] > 0 (frequency of s[i] is more than 0), we will add this character to our answer and make m[s[i]]=0
  • Finally print the ans string

C++ Implementation

#include<bits/stdc++.h>
using namespace std;

void removeDuplicates(string s,int n){
    map<char,int> m;  
    string ans="";
    int i;
    for(i=0;i<n;i++){
        m[s[i]]=1;       // Initializing frequency of every character equal to 1
    }
    for(i=1;i<n;i++){
        if(m[s[i]]>0){    // If the frequency is not 0, then add it to the answer,
            ans+=s[i];    // beacause it is the first occurence of this character
        }
        m[s[i]]=0;        // Updating the frequency of every character to 0
    }
    cout<<ans;
}

int main(){
    string s="aaabaababbccbccd";
    removeDuplicates(s,s.size());
}

Java Implementation

import java.util.*;
public class Main {

    public static void removeDuplicates(char s[], int n)
    {
        int i=0;
        String ans="";
        HashMap<Character, Integer> m = new HashMap<>();
        for(i=0;i<n;i++){
            Integer a = m.get(s[i]);
            if(a==null){
                m.put(s[i],1);       // Initializing frequency of every character equal to 1
            }
        }
        for(i=0;i<n;i++){
            Integer a = m.get(s[i]);
            if(a>0){                // If the frequency is not 0, then add it to the answer
                ans+=s[i];          // beacause it is the first occurence of this character
            }

            m.put(s[i],0);          // Updating the frequency of every character to 0
        }
        System.out.print(ans);
    }
    public static void main(String[] args)
    {
        char s[] = "aaabaababbccbccd".toCharArray();
        int n = s.length;
        removeDuplicates(s, n);
    }
}

Python Implementation

def removeDuplicates(s, n):
    m = {}
    ans = ""
    for x in s:
        m[x]=1           # Initializing frequency of every character equal to 1
    for x in s:
        if(m[x]>0):      # If the frequency is not 0, then add it to the answer
            ans=ans+x    # because it is the first occurrence of this character

        m[x]=0           # Updating the frequency of every character to 0
    print(ans)

s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)

Output

abcd

Complexity Analysis

Time Complexity Analysis

  • In the first step, we are performing n operations to initialize the frequency of every character with 1, which will take O(N) time.
  • In the second step, we are again performing n operations which will take O(N) time.

So the total worst-case time complexity for this approach to remove duplicates from string is O(N)+O(N) = O(N)

Space Complexity Analysis

In this approach, we are using a map and we are inserting 1 for all the characters of the string. So we are using O(N) extra space in this approach.

Time Complexity: O(N)
Space Complexity: O(N)

Approach 5 (Using indexOf() Method)

In this approach, we will use the inbuilt methods already implemented in C++, Java, and Python languages. Just like in the first approach, we will find the first occurrence of each character and add it to our answer. But in this approach, we will find the first occurrence using the inbuilt methods.

Algorithm:

  • Given a string s of length n
  • Create an empty string named ans = "", which will be our final answer
  • Run a for loop from i=0 to i=n-1
  • Now for each character s[i], we will check if this character (s[i]) is already added to our answer or not.
  • If we are unable to find this character in our answer string, we will add this character to the answer string
  • Finally print the ans string

C++ Implementation

#include<bits/stdc++.h>
using namespace std;

void removeDuplicates(string s,int n){
    string ans="";
    int i;
    for(i=0;i<n;i++){
        if(ans.find(s[i])==string::npos){    // Using the inbuilt  method to check if this character is already present in our answer or not
            ans+=s[i];                       // If not present, add it to the answer
        }
    }
    cout<<ans;
}

int main(){
    string s="aaabaababbccbccd";
    removeDuplicates(s,s.size());
}

Java Implementation

import java.util.*;
public class Main {

    public static void removeDuplicates(char s[], int n)
    {
        int i=0;
        String ans="";
        for(i=0;i<n;i++){
            if (ans.indexOf(s[i])<0){    // Using the inbuilt method to check if this character is already present in our answer or not
                ans+=s[i];               // If not present, add it to the answer
            }
        }
        System.out.print(ans);
    }
    public static void main(String[] args)
    {
        char s[] = "aaabaababbccbccd".toCharArray();
        int n = s.length;
        removeDuplicates(s, n);
    }
}

Python Implementation

def removeDuplicates(s, n):
    m = {}
    ans = ""
    for x in s:
        if x not in ans:      # Using the inbuilt method to check if this character is already present in our answer or not
            ans = ans+x       # If not present, add it to the answer

    print(ans)

s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)

Output

abcd

Complexity Analysis

Time Complexity Analysis

  • In the outer loop, we are performing n operations (by checking the occurrence of each character in the answer string) which takes O(N) time
  • Inside the loop, we are checking if s[i] is already present in ans or not. The built-in methods will take the worst-case time complexity of O(N)

So the total worst-case time complexity for this approach to remove duplicates from string is O(N) * O(N) = $O(N^2)$

Space Complexity Analysis

In this approach, we are using one array of characters to store the result i.e. ans, so we are using O(N) extra space.

Time Complexity: $O(N^2)$

Space Complexity: O(N)

Conclusion

In this quick tutorial, we have discussed 5 different approaches to remove duplicates from string

  • In Approach 1, we used simple for loops that took O(N*N) time complexity
  • In Approach 2, we used the Set data structure that took O(NLogN) time complexity
  • In Approach 3, we sorted the string which took O(NLogN) time complexity
  • In approach 4, we used hashing by using the map data structure that took O(N) time complexity
  • In approach 5, we used the built-in methods in C++, Java, and Python that took O(N*N) time complexity

FAQ

Q.What can be the best time complexity for removing the duplicates?

A. By using an unordered set, we can remove duplicates from the string in only O(N) time complexity and O(N) auxiliary space. This is because the unordered_set internally uses hashing which will take an average of O(1) time complexity for each element, so the total time complexity to remove the duplicates from the string comes out to be O(N).

Author