[LeetCode] C++ 819. Most Common Word

Problem Solving/LeetCode

[LeetCode] C++ 819. Most Common Word

LeeJaeJun 2024. 8. 2. 16:45

728x90

문제

문제 분석

- 주어진 문자열 중에서 가장 많이 나온 단어를 찾는다. (banned에 있는 단어를 제외하고)

- unordered_map으로 관리하는 것이 제일 효율적이라고 생각하였다. <- 단어를 key, 값을 빈도수로 저장해놓고 빠르게 접근하는 것이 필요했기 때문에

- 정규표현식을 사용하고 소문자로 통일시켜서 전처리를 한 다음에 단어 개수를 세면된다고 생각하였다.

- banned에 있는 단어인지 하나씩 검사하고 처음부터 아예 안넣는 방식과 나중에 가장 많은 단어를 구할 때만 제외하는 방식 중 전자가 더 효율적이라고 생각하였다. 주어진 입력값에 따라 달라지겠지만 주어진 paragraph의 단어 수가 많아질 수록 매번 banned에 있는지 검사하는 것은 비효율적이라고 생각했기 때문이다. 하지만 전자의 방식을 unordered_set을 사용하면 일반적으로 O(1)에 find를 할 수 있기 때문에 시도해볼만하다고는 생각하였다.

풀이 1 (banned에 있는 단어인지 하나씩 검사하고 처음부터 아예 안넣는 방식)

#include <string>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <sstream>
#include <algorithm>
#include <regex>

using namespace std;

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
        regex pattern("[!?',;.]");
        string processedParagraph = regex_replace(paragraph, pattern, " "); // ""으로 대치하면 b,b,b 같은거를 bbb라는 한 단어로 판단하기 때문에 " "로 대치해야 함

        transform(processedParagraph.begin(), processedParagraph.end(), processedParagraph.begin(), ::tolower);

        unordered_set<string> bannedWords(banned.begin(), banned.end());

        unordered_map<string, int> wordCount;

        istringstream iss(processedParagraph);
        string word;

        while(iss >> word){
            if (bannedWords.find(word) == bannedWords.end()){
                wordCount[word]++;
            }
        }
        
        auto pr = max_element(wordCount.begin(), wordCount.end(), [](const auto &x, const auto &y){
            return x.second < y.second;
        });

        return pr->first;
    }
};

stringstream은 입력과 출력을 모두 지원하며, 문자열을 스트림으로 취급하여 데이터를 읽고 쓸 수 있습니다.
istringstream은 입력 전용으로, 주어진 문자열로부터 데이터를 읽어들입니다.
std::transform는 범위의 각 요소를 변환하여 다른 범위에 저장하는 데 사용
- std::transform(first1, last1, first2, unary_op);
- first1, last1: 입력 범위의 시작과 끝을 나타내는 반복자.
- first2: 출력 범위의 시작을 나타내는 반복자.
- unary_op: 각 요소에 적용할 단항 함수 또는 함수 객체.

풀이 2 (나중에 가장 많은 단어를 구할 때만 제외하는 방식)

#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <sstream>
#include <algorithm>
#include <regex>

using namespace std;

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
        regex pattern("[!?',;.]");
        string processedParagraph = regex_replace(paragraph, pattern, " ");

        transform(processedParagraph.begin(), processedParagraph.end(), processedParagraph.begin(), ::tolower);

        unordered_set<string> bannedWords(banned.begin(), banned.end());

        unordered_map<string, int> wordCount;

        istringstream iss(processedParagraph);
        string word;

        while (iss >> word) {
            wordCount[word]++;
        }

        // unordered_map을 벡터로 변환 (unordered_map은 정렬되지 않기 때문에)
        vector<pair<string, int>> wordCountVector(wordCount.begin(), wordCount.end());
        sort(wordCountVector.begin(), wordCountVector.end(), [](const pair<string, int>& a, const pair<string, int>& b) {
            return a.second > b.second;
        });

        for (int i = 0; i < wordCountVector.size(); ++i) {
            if (bannedWords.find(wordCountVector[i].first) == bannedWords.end()) {
                return wordCountVector[i].first;
            }
        }

        return "";
    }
};

풀이 3 ( banned에 속한 요소들의 count를 단순히 0으로 만들기)

using namespace std;

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
        regex pattern("[!?',;.]");
        string processedParagraph = regex_replace(paragraph, pattern, " ");

        transform(processedParagraph.begin(), processedParagraph.end(), processedParagraph.begin(), ::tolower);

        unordered_map<string, int> wordCount;

        istringstream iss(processedParagraph);
        string word;

        while(iss >> word){
            wordCount[word]++;
        }
        
        for(auto it = banned.begin(); it != banned.end(); it++){
            wordCount[*it] = 0;
        }
        
        auto pr = max_element(wordCount.begin(),wordCount.end(), [](const auto &x, const auto &y){
            return x.second < y.second;
        });

        return pr->first;
    }
};

빠른 풀이

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
        string s=paragraph;
        int n=s.size();
        map<string,int> mp;
        for(auto &ch:s){
            if(!isalpha(ch)){
                ch=' ';
            }
            ch=tolower(ch);
        }
        for(int i=0;i<n;i++){
            string t="";
            int j=i;
            while(s[j]!=' '&&j<n){
                t+=s[j];
                j++;
            }
            if(t!=""){mp[t]++;}
            i=j;
        }
        for(auto t:banned){
            mp[t]=0;
        }
        string ans;
        int mx=0;
        for(auto it:mp){
            //cout<<it.first<<" "<<it.second<<" ";
            if(it.second>mx){
                mx=it.second;
                ans=it.first;
            }
        }
        return ans;
    }
};

풀이 방식은 기존과 다르지 않지만 정규표현식을 사용하지않고 직접 수정하는 방식과 정렬하지 않고 단순 비교로 구하는 등 무거운 함수를 사용하지 않아서 더 빠르게 실행되는 것 같다.

728x90

저작자표시 비영리 동일조건 (새창열림)

'Problem Solving > LeetCode' 카테고리의 다른 글

[LeetCode] C++ 5. Longest Palindromic Substring (0)	2024.08.02
[LeetCode] C++ 49. Group Anagrams (0)	2024.08.02
[LeetCode] C++ 937. Reorder Data in Log Files (0)	2024.08.02
[LeetCode] C++ 344. Reverse String (0)	2024.08.02
[LeetCode] C++ 125. Valid Palindrome (0)	2024.08.02

현재글[LeetCode] C++ 819. Most Common Word

250x250

Study Log

정리하는 공간

MySQL, 프로그래머스, leetcode, Python, Network, FastAPI, SWEA, snowflake, Linux, SQLAlchemy, cryptography, Java, DP, JOIN, 투자자산운용사, Algorithm, 비전공자 투자자산운용사, 백준, Computer Architecture, group by,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Study Log