Most Frequent Substring
A text analysis program needs to identify the most frequently occurring substring of a specific length within a given lowercase string.
Given a string str and an integer n, return the substring of length n that appears the most times in str. If more than one substring has the highest frequency, return the lexicographically smallest one. Return an empty string if no such valid substring is found.
A substring is defined as a sequence of contiguous characters in str. Overlapping substrings are counted separately; for example, in "aaaa" with n = 2, the substring "aa" appears three times.
Example 1
Input
str = "inengineering", n = 2Output
"in"Explanation
Substrings of length 2 are: 'in', 'ne', 'en', 'ng', 'gi', 'in', 'ne', 'ee', 'er', 'ri', 'in', 'ng'. The substring 'in' appears 3 times. 'ne', 'en', and 'ng' appear 2 times each. Since 'in' has the highest frequency (3), it is the answer.
Example 2
Input
str = "zabzab", n = 2Output
"ab"Explanation
Both 'ab' and 'za' appear twice. 'ab' is lexicographically smaller than 'za'.
Example 3
Input
str = "aaaaa", n = 1Output
"a"
Constraints
1 <= str.length <= 10⁵1 <= n <= str.lengthstrconsists only of lowercase English letters
Most Frequent Substring
Overview
The goal is to find the most frequent substring of a fixed length n within a string. This problem is a classic application of the Sliding Window technique combined with a Frequency Map (hash map).
We traverse the string, extracting every possible substring of length n, and count their occurrences. If multiple substrings share the same maximum frequency, we apply a secondary rule to select the lexicographically smallest one (e.g., "apple" comes before "apply").
Brute Force
A brute force approach involves generating all possible substrings of length n and, for each one, scanning the rest of the string to count how many times it appears. This would involve nested loops: an outer loop to pick a starting position and an inner loop (or string search method) to count occurrences.
def most_frequent_substring(str: str, n: int) -> str:
if not str or n > len(str):
return ""
max_freq = 0
best_sub = ""
# Iterate through every possible starting index for a substring of length n
for i in range(len(str) - n + 1):
current_sub = str[i : i + n]
current_freq = 0
# Count occurrences of current_sub by scanning the string again
for j in range(len(str) - n + 1):
if str[j : j + n] == current_sub:
current_freq += 1
# Update best_sub based on frequency and lexicographical order
if current_freq > max_freq:
max_freq = current_freq
best_sub = current_sub
elif current_freq == max_freq:
if best_sub == "" or current_sub < best_sub:
best_sub = current_sub
return best_sub
Time complexity: O(L^2 * n) — where L is the length of the string; we iterate L times, and for each iteration, we perform a count/search operation that takes O(L * n). Space complexity: O(n) — to store the current substring and the best substring found so far.
Optimal / Best Approach
The optimal approach uses a Hash Map (or collections.Counter in Python) to store the frequencies of all substrings in a single pass. By using a sliding window of size n, we can extract each substring and update its count in constant time on average. After populating the map, we iterate through the unique substrings to find the one that meets our criteria.
from collections import Counter
def most_frequent_substring(str: str, n: int) -> str:
if not str or n <= 0 or n > len(str):
return ""
# counts will store {substring: frequency}
counts = Counter()
# Slide a window of size n across the string
for i in range(len(str) - n + 1):
substring = str[i : i + n]
counts[substring] += 1
max_freq = 0
result = ""
# Find the substring with the highest frequency
# Use lexicographical comparison as a tie-breaker
for sub, freq in counts.items():
if freq > max_freq:
max_freq = freq
result = sub
elif freq == max_freq:
if sub < result:
result = sub
return result
Time complexity: O(L * n) — we iterate through the string of length L once, and slicing/hashing a substring of length n takes O(n) time. Space complexity: O(L * n) — in the worst case (all substrings unique), the hash map stores L - n + 1 substrings, each of length n.
Edge Cases
- n = length of string: The function should correctly return the entire string.
- n = 1: The function should find the most frequent single character.
- All characters unique: The function should return the lexicographically smallest substring of length
n. - Overlapping substrings: The problem explicitly states overlaps count (e.g., "aaaa", n=2, "aa" appears 3 times). The sliding window naturally handles this.
Fast
Comments on the solution
No comments on the solution yet. Start a thread about the editorial or your own approach.
No comments yet for Most Frequent Substring. Be the first to start the thread.
No submissions recorded for this problem yet. Use Submit in the editor — each judge run is stored here.
Output
Input
Expected
Your output
Run to see your output.