Most Frequent Substring

easy

Software engineer

A text analysis program needs to identify the most frequently occurring substring of a specific length within a given lowercase string.

Given a string str and an integer n, return the substring of length n that appears the most times in str. If more than one substring has the highest frequency, return the lexicographically smallest one. Return an empty string if no such valid substring is found.

A substring is defined as a sequence of contiguous characters in str. Overlapping substrings are counted separately; for example, in "aaaa" with n = 2, the substring "aa" appears three times.

Example 1

Input

str = "inengineering", n = 2

Output

"in"

Explanation

Substrings of length 2 are: 'in', 'ne', 'en', 'ng', 'gi', 'in', 'ne', 'ee', 'er', 'ri', 'in', 'ng'. 
The substring 'in' appears 3 times. 
'ne', 'en', and 'ng' appear 2 times each. 
Since 'in' has the highest frequency (3), it is the answer.

Example 2

Input

str = "zabzab", n = 2

Output

"ab"

Explanation

Both 'ab' and 'za' appear twice. 'ab' is lexicographically smaller than 'za'.

Example 3

Input

str = "aaaaa", n = 1

Output

"a"

Constraints

1 <= str.length <= 10⁵
1 <= n <= str.length
str consists only of lowercase English letters

Screening

CountingHash TableSliding WindowSortingString

Most Frequent Substring

Overview

The goal is to find the most frequent substring of a fixed length n within a string. This problem is a classic application of the Sliding Window technique combined with a Frequency Map (hash map).

We traverse the string, extracting every possible substring of length n, and count their occurrences. If multiple substrings share the same maximum frequency, we apply a secondary rule to select the lexicographically smallest one (e.g., "apple" comes before "apply").

Brute Force

A brute force approach involves generating all possible substrings of length n and, for each one, scanning the rest of the string to count how many times it appears. This would involve nested loops: an outer loop to pick a starting position and an inner loop (or string search method) to count occurrences.

def most_frequent_substring(str: str, n: int) -> str:
    if not str or n > len(str):
        return ""
    
    max_freq = 0
    best_sub = ""
    
    # Iterate through every possible starting index for a substring of length n
    for i in range(len(str) - n + 1):
        current_sub = str[i : i + n]
        current_freq = 0
        
        # Count occurrences of current_sub by scanning the string again
        for j in range(len(str) - n + 1):
            if str[j : j + n] == current_sub:
                current_freq += 1
        
        # Update best_sub based on frequency and lexicographical order
        if current_freq > max_freq:
            max_freq = current_freq
            best_sub = current_sub
        elif current_freq == max_freq:
            if best_sub == "" or current_sub < best_sub:
                best_sub = current_sub
                
    return best_sub

Time complexity: O(L^2 * n) — where L is the length of the string; we iterate L times, and for each iteration, we perform a count/search operation that takes O(L * n). Space complexity: O(n) — to store the current substring and the best substring found so far.

Optimal / Best Approach

The optimal approach uses a Hash Map (or collections.Counter in Python) to store the frequencies of all substrings in a single pass. By using a sliding window of size n, we can extract each substring and update its count in constant time on average. After populating the map, we iterate through the unique substrings to find the one that meets our criteria.

from collections import Counter

def most_frequent_substring(str: str, n: int) -> str:
    if not str or n <= 0 or n > len(str):
        return ""
    
    # counts will store {substring: frequency}
    counts = Counter()
    
    # Slide a window of size n across the string
    for i in range(len(str) - n + 1):
        substring = str[i : i + n]
        counts[substring] += 1
    
    max_freq = 0
    result = ""
    
    # Find the substring with the highest frequency
    # Use lexicographical comparison as a tie-breaker
    for sub, freq in counts.items():
        if freq > max_freq:
            max_freq = freq
            result = sub
        elif freq == max_freq:
            if sub < result:
                result = sub
                
    return result

Time complexity: O(L * n) — we iterate through the string of length L once, and slicing/hashing a substring of length n takes O(n) time. Space complexity: O(L * n) — in the worst case (all substrings unique), the hash map stores L - n + 1 substrings, each of length n.

Edge Cases

n = length of string: The function should correctly return the entire string.
n = 1: The function should find the most frequent single character.
All characters unique: The function should return the lexicographically smallest substring of length n.
Overlapping substrings: The problem explicitly states overlaps count (e.g., "aaaa", n=2, "aa" appears 3 times). The sliding window naturally handles this.

Fast

Comments on the solution

No comments on the solution yet. Start a thread about the editorial or your own approach.

Language

Code editor loads in the browser.

Output

Input

str"inengineering"

Expected

"in"

Your output

Run to see your output.

Example 1

Example 2

Example 3

Constraints

Interview stages

Topics

Most Frequent Substring

Overview

Brute Force

Optimal / Best Approach

Edge Cases

Comments on the solution

Output