데이터셋 활용하여 리뷰 평가 기능 구현 / TIL

데이터셋 활용하여 리뷰 평가 기능 구현 / TIL_221108

Dong_Devlog 2022. 11. 9. 09:22

국립중앙도서관의 빅데이터를 통해 리뷰에서 긍정점수와 부정점수를 추출하는 코드 구현해보기

책의 리뷰 분석을 통해 해당 책이 긍정적인 반응이 많은지 부정적인 반응이 많은지 평가
리뷰 작성시 리뷰에 속한 단어를 추출하여 긍정적인 단어인지 부정적인 단어인지 평가하여 점수를 부여
데이터셋에는 긍정적 점수와 부정적 점수를 부여한 수만개의 단어로 구성되어 있음

import pandas as pd
from books.models import Review

# 감성 리뷰 데이터셋 읽기
book_sense = pd.read_csv(r'books\csv\sense.csv')

pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 300)


# 데이터베이스에 저장된 리뷰를 불러와 조사를 빼고 단어만 추출할 수 있도록 가공
for books.review != None:
	sentence = books.review[i]

	sentence_words_strip = []
	sentence_split = sentence.split()
	for sentence_words in sentence_split:
	    sentence_words_strip.append(sentence_words.rstrip("."",""!""?""을""를""은""는"))

	print(sentence_words_strip)


# 추출한 단어와 데이터셋의 단어를 비교하여 일치하는 단어가 있을 시에는
# 각 단어에 긍정 점수와 부정 점수를 합산하여 퍼센테이지로 출력
positive_score = 0
nagative_score = 0
for i in sentence_words_strip:

    if book_sense[book_sense['term'] == i].empty == True:
        pass
    else:
        words = book_sense[book_sense['term'] == i]

        print(words, words['positive_score'].values)
        positive_score += words['positive_score'].values[0]
        nagative_score += words['nagative_score'].values[0]

print(positive_score)
print(nagative_score)

positive_per = positive_score / (positive_score + nagative_score) * 100
nagative_per = nagative_score / (positive_score + nagative_score) * 100

print(positive_per)
print(nagative_per)

이번 프로젝트 기간 동안 구현해본 기능

데이터셋 csv 파일을 받아서 데이터셋에 저장하지 않고 활용하는 방법으로 코드를 구현하였고,

데이터셋이 정말 다양한 분야에서 다양한 방법으로 활용 된다는 것을 느꼈다.