Problem Statement
Given a string, we have to remove duplicate characters from the string such that each character appears only once (all the characters in the string should become unique). We can print the resultant string in any order.
Example
Input : “aaabbccd”
Output : “abcd”
Explanation
As we can see the frequency of all the characters
Character | Frequency |
---|---|
a | 3 |
b | 2 |
c | 2 |
d | 1 |
After removing the duplicates, the frequency of all the characters became 1, so all the duplicate characters have been removed.
Character | Frequency |
---|---|
a | 1 |
b | 1 |
c | 1 |
d | 1 |
So the output is abcd
Constraints
$1 <= \text{|S|} <= 10^6$ ($1 <= \text{Length of S} <= 10^6$)
$\text{a} <= S[i] <= \text{z}$ (all the characters in the string will be in lowercase)
Approach 1 (Using the Simple for loop)
This is the Brute-Force method to remove duplicates from string. In this method, we will use simple for loops to remove duplicates from string. In this algorithm, we will create an empty string and we will add the first occurrence of every character in our answer i.e. we will traverse our answer string and if the current character of the given string is present in our answer string, it means this is not the first occurrence of this character and it is already added to our answer. The algorithm and implementation of the approach are given below.
Algorithm:
- Given a string
s
of lengthn
- Create an empty string named
ans = ""
, which will be our final answer - Run a for loop from
i=0
toi=n-1
- Run a nested for loop from
j=0
toj=i-1
- If at any point we found
s[i]==s[j]
, we will break out of the loop (it means that this is not the first occurrence of this character and it is already added to our answer) - else if the inner loop ends without breaking, it means that we are visiting the character
s[i]
for the first time, so we will add it into our answer byans=ans+s[i]
C++ Implementation
#include<bits/stdc++.h>
using namespace std;
void removeDuplicates(string &s,int n){
string ans="";
int i,j;
for(i=0;i<n;i++){
for(j=0;j<i;j++){
if(s[i]==s[j]){
break; // We are breaking here because this caharacter is already
} // added to our answer because it was found earlier
}
if(j==i){ // The loop ends without breaking, it means this
ans+=s[i]; // is the first occurence of this character in the string
} // so we add this character to our answer
}
cout<<ans;
}
int main(){
string s="aaabaababbccbccd";
removeDuplicates(s,s.size());
}
Java Implementation
public class Main {
public static void removeDuplicates(char s[], int n)
{
String ans="";
int i,j;
for(i=0;i<n;i++){
for(j=0;j<i;j++){
if(s[i]==s[j]){
break; // We are breaking here because this caharacter is already
} // added to our answer because it was found earlier
}
if(j==i){ // The loop ends without breaking, it means this
ans+=s[i]; // is the first occurence of this character in the string
} // so we add this character to our answer
}
System.out.print(ans);
}
public static void main(String[] args)
{
char s[] = "aaabaababbccbccd".toCharArray();
int n = s.length;
removeDuplicates(s, n);
}
}
Python Implementation
def removeDuplicates(s, n):
ans = ""
for i in range (0,n):
k=0
for j in range (0,i):
if(s[i]==s[j]):
k=1
break # We are breaking here because this caharacter is already
# added to our answer because it was found earlier
if(k==0): # The loop ends without breaking, which means this is the first occurrence
ans=ans+s[i] # of this character in the string so we add this character to our answer
print(ans)
s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)
Output
abcd
Complexity Analysis
Time Complexity Analysis
- In this algorithm, we are running a for loop from i=0 to i=n-1, which will perform
i operations
every time because we are running a loop from j=0 to j=i-1 every time. So in the worst case, the inner loop will take O(N) time complexity, and the outer loop will also take O(N) time complexity.
So the total worst-case time complexity for this approach to remove duplicates from string is O(N) * O(N) = O(N*N)
Space Complexity Analysis
In this approach we are using one array of characters to store the result i.e. ans
, so we are using O(N) extra space.
Time Complexity: O(N*N)
Space Complexity: O(N)
Approach 2 (Using a Set)
In this method, we are going to use the Set
Data structure to remove duplicates from string. Finally, we will modify the given string such that the k elements from the starting are the unique elements, and we will return k which is the total number of unique elements in the string. (k is a positive integer which is the total number of unique characters in the string). Then we will print all the unique characters.
Note: Set is a data structure that stores only one occurrence of each element inserted into it.
Algorithm:
- Given a string
s
of lengthn
- Create a set named
ans
- Insert all the characters of the string into the set
- Maintain a pointer named
k
and initialize it with0
- Start traversing the set, let
x
be the current element of the set and modify the string by performing the following operations.s[k] = x
k++
- Finally return the value of
k
, which is now the total number of unique characters in the string, so we will print the characters present at the index fromi=0 to i=k-1
C++ Implementation
#include<bits/stdc++.h>
using namespace std;
int removeDuplicates(string &s,int n){
int k=0;
set<int> ans;
for(int i=0;i<n;i++){
ans.insert(s[i]); // Inserting every element into the set
}
for(auto x:ans){
s[k]=x; // Modifying the first k characters of the string
k++; // beacuse every character x in the set is unique
}
return k; // Return k, the total number of unique characters in the string
}
int main(){
string s="aaabaababbccbccd";
int k = removeDuplicates(s,s.size());
for(int i=0;i<k;i++)
cout<<s[i];
}
Java Implementation
import java.util.*;
public class Main {
public static int removeDuplicates(char s[], int n)
{
int k=0;
Set<Character> hash_set = new HashSet<Character>();
for (int i = 0; i<n; i++) {
hash_set.add(s[i]); // Inserting every element into the set
}
for(Character x:hash_set){
s[k]=x; // Modifying the first k characters of the string
k++; // beacuse every character x in the set is unique
}
return k; // Return k, the total number of unique characters in the string
}
public static void main(String[] args)
{
char s[] = "aaabaababbccbccd".toCharArray();
int n = s.length;
int k = removeDuplicates(s, n);
for(int i=0;i<k;i++)
System.out.print(s[i]);
}
}
Python Implementation
def removeDuplicates(s, n):
ans=set()
k=0
for i in range(0, n):
ans.add(s[i]) # Inserting every element into the set
for x in ans:
print(x,end="")
s = "aaabaababbccbccd"
n = len(s)
k = removeDuplicates(s, n)
Output
abcd
Complexity Analysis
Time Complexity Analysis
- In the first step, we are performing n operations by inserting all the characters of the string into the set which will take O(N) time.
- Internally, the set uses hashing and takes O(NlogN) time for sorting the characters.
- Finally, we are traversing the set which will take O(N) time in the worst case (if every element is unique)
So the total worst-case time complexity for this approach to remove duplicates from string is O(N)+O(NlogN)+O(N) = O(NlogN)
Space Complexity Analysis
In this approach, we are using a set and we are inserting all the characters of the string into the set. So we are using O(N) extra space in this approach.
Time Complexity: O(NlogN)
Space Complexity: O(N)
Approach 3 (Using the Sorting Algorithm)
In this method, we will use sorting to remove duplicates from string. We will use the concept that in a sorted string all the characters which are equal to each other will be adjacent to each other, so we can easily check the adjacent elements of the string and store only the unique characters in our result. (The intuition behind this approach is to reduce the time complexity from O(N*N) to O(NlogN))
Algorithm:
- Sort the string
- Create an empty string named
ans = ""
, which will be our final answer, and add the first character in the answer string - We know that the first element will always be unique and we have already added it to our answer string, so we maintain a pointer named
k
, which will start from 1 and keep track of the unique elements. - Run a loop from
i=1 to i=n-1
, if at any points[i]!=s[i-1]
, then adds[i]
into the answer string, else continue
C++ Implementation
#include<bits/stdc++.h>
using namespace std;
void removeDuplicates(string s,int n){
sort(s.begin(),s.end()); // Sort the string
string ans="";
int i,j;
ans+=s[0];
for(i=1;i<n;i++){
if(s[i]!=s[i-1]){ // If the previous element is not same as current, then add it to the
ans+=s[i]; // answer because it the the first occurence of this character
}
}
cout<<ans;
}
int main(){
string s="aaabaababbccbccd";
removeDuplicates(s,s.size());
}
Java Implementation
import java.util.Arrays;
public class Main {
public static void removeDuplicates(char s[], int n)
{
String ans="";
Arrays.sort(s); // Sort the string
int i,j;
ans+=s[0];
for(i=1;i<n;i++){
if(s[i]!=s[i-1]){ // If the previous element is not same as current, then add it to the
ans+=s[i]; // answer because it the the first occurence of this character
}
}
System.out.print(ans);
}
public static void main(String[] args)
{
char s[] = "aaabaababbccbccd".toCharArray();
int n = s.length;
removeDuplicates(s, n);
}
}
Python Implementation
def removeDuplicates(s, n):
s = ''.join(sorted(s)) # Sort the string
ans = ""
ans=ans+s[0]
for i in range (1,n):
if(s[i]!=s[i-1]): # If the previous element is not same as current, then add it to the
ans=ans+s[i] # answer because it the the first occurence of this character
print(ans)
s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)
Output
abcd
Complexity Analysis
Time Complexity Analysis
- In the first step, we are sorting the string which takes O(NlogN) time
- Then we are running the loop from 1 to n-1 which takes O(N) time
So the total worst-case time complexity for this approach to remove duplicates from string is O(NlogN)+O(N) = O(NlogN)
Space Complexity Analysis
In this approach we are using one array of characters to store the result i.e. ans, so we are using O(N) extra space.
Time Complexity: O(NlogN)
Space Complexity: O(N)
Approach 4 (Using Hashing)
In this method, we are going to use the Hashing
to remove duplicates from string. In this approach, we will create a map
that will have a maximum size of 26 (because the given string only contains lower case characters specified in the problem statement). Then we will initialize the frequency 1 for each character present in the string. Then we will again traverse the string and if the count of any character is greater than 0, we will add this character to our answer and update its frequency to 0 so that next time we should not add it to our answer.
Algorithm:
- Given a string
s
of lengthn
- Create a map of named
m
- Create an empty string named
ans = ""
, which will be our final answer - Run a for loop from
i=0
toi=n-1
and make the frequency ofs[i]
equal to1
- Now we will traverse the string again, and if
m[s[i]] > 0
(frequency ofs[i]
is more than0
), we will add this character to our answer and makem[s[i]]=0
- Finally print the
ans
string
C++ Implementation
#include<bits/stdc++.h>
using namespace std;
void removeDuplicates(string s,int n){
map<char,int> m;
string ans="";
int i;
for(i=0;i<n;i++){
m[s[i]]=1; // Initializing frequency of every character equal to 1
}
for(i=1;i<n;i++){
if(m[s[i]]>0){ // If the frequency is not 0, then add it to the answer,
ans+=s[i]; // beacause it is the first occurence of this character
}
m[s[i]]=0; // Updating the frequency of every character to 0
}
cout<<ans;
}
int main(){
string s="aaabaababbccbccd";
removeDuplicates(s,s.size());
}
Java Implementation
import java.util.*;
public class Main {
public static void removeDuplicates(char s[], int n)
{
int i=0;
String ans="";
HashMap<Character, Integer> m = new HashMap<>();
for(i=0;i<n;i++){
Integer a = m.get(s[i]);
if(a==null){
m.put(s[i],1); // Initializing frequency of every character equal to 1
}
}
for(i=0;i<n;i++){
Integer a = m.get(s[i]);
if(a>0){ // If the frequency is not 0, then add it to the answer
ans+=s[i]; // beacause it is the first occurence of this character
}
m.put(s[i],0); // Updating the frequency of every character to 0
}
System.out.print(ans);
}
public static void main(String[] args)
{
char s[] = "aaabaababbccbccd".toCharArray();
int n = s.length;
removeDuplicates(s, n);
}
}
Python Implementation
def removeDuplicates(s, n):
m = {}
ans = ""
for x in s:
m[x]=1 # Initializing frequency of every character equal to 1
for x in s:
if(m[x]>0): # If the frequency is not 0, then add it to the answer
ans=ans+x # because it is the first occurrence of this character
m[x]=0 # Updating the frequency of every character to 0
print(ans)
s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)
Output
abcd
Complexity Analysis
Time Complexity Analysis
- In the first step, we are performing n operations to initialize the frequency of every character with 1, which will take O(N) time.
- In the second step, we are again performing n operations which will take O(N) time.
So the total worst-case time complexity for this approach to remove duplicates from string is O(N)+O(N) = O(N)
Space Complexity Analysis
In this approach, we are using a map and we are inserting 1 for all the characters of the string. So we are using O(N) extra space in this approach.
Time Complexity: O(N)
Space Complexity: O(N)
Approach 5 (Using indexOf() Method)
In this approach, we will use the inbuilt methods already implemented in C++, Java, and Python languages. Just like in the first approach, we will find the first occurrence of each character and add it to our answer. But in this approach, we will find the first occurrence using the inbuilt methods.
Algorithm:
- Given a string
s
of lengthn
- Create an empty string named
ans = ""
, which will be our final answer - Run a for loop from
i=0
toi=n-1
- Now for each character
s[i]
, we will check if this character (s[i]
) is already added to our answer or not. - If we are unable to find this character in our answer string, we will add this character to the answer string
- Finally print the
ans
string
C++ Implementation
#include<bits/stdc++.h>
using namespace std;
void removeDuplicates(string s,int n){
string ans="";
int i;
for(i=0;i<n;i++){
if(ans.find(s[i])==string::npos){ // Using the inbuilt method to check if this character is already present in our answer or not
ans+=s[i]; // If not present, add it to the answer
}
}
cout<<ans;
}
int main(){
string s="aaabaababbccbccd";
removeDuplicates(s,s.size());
}
Java Implementation
import java.util.*;
public class Main {
public static void removeDuplicates(char s[], int n)
{
int i=0;
String ans="";
for(i=0;i<n;i++){
if (ans.indexOf(s[i])<0){ // Using the inbuilt method to check if this character is already present in our answer or not
ans+=s[i]; // If not present, add it to the answer
}
}
System.out.print(ans);
}
public static void main(String[] args)
{
char s[] = "aaabaababbccbccd".toCharArray();
int n = s.length;
removeDuplicates(s, n);
}
}
Python Implementation
def removeDuplicates(s, n):
m = {}
ans = ""
for x in s:
if x not in ans: # Using the inbuilt method to check if this character is already present in our answer or not
ans = ans+x # If not present, add it to the answer
print(ans)
s = "aaabaababbccbccd"
n = len(s)
removeDuplicates(s, n)
Output
abcd
Complexity Analysis
Time Complexity Analysis
- In the outer loop, we are performing n operations (by checking the occurrence of each character in the answer string) which takes O(N) time
- Inside the loop, we are checking if s[i] is already present in ans or not. The built-in methods will take the worst-case time complexity of O(N)
So the total worst-case time complexity for this approach to remove duplicates from string is O(N) * O(N) = $O(N^2)$
Space Complexity Analysis
In this approach, we are using one array of characters to store the result i.e. ans, so we are using O(N) extra space.
Time Complexity: $O(N^2)$
Space Complexity: O(N)
Conclusion
In this quick tutorial, we have discussed 5 different approaches to remove duplicates from string
- In Approach 1, we used simple for loops that took O(N*N) time complexity
- In Approach 2, we used the Set data structure that took O(NLogN) time complexity
- In Approach 3, we sorted the string which took O(NLogN) time complexity
- In approach 4, we used hashing by using the map data structure that took O(N) time complexity
- In approach 5, we used the built-in methods in C++, Java, and Python that took O(N*N) time complexity
FAQ
Q.What can be the best time complexity for removing the duplicates?
A. By using an unordered set, we can remove duplicates from the string in only O(N) time complexity
and O(N) auxiliary space.
This is because the unordered_set internally uses hashing which will take an average of O(1) time complexity for each element, so the total time complexity to remove the duplicates from the string comes out to be O(N)
.