데이터셋 활용하여 리뷰 평가 기능 구현 / TIL

국립중앙도서관의 빅데이터를 통해 리뷰에서 긍정점수와 부정점수를 추출하는 코드 구현해보기

책의 리뷰 분석을 통해 해당 책이 긍정적인 반응이 많은지 부정적인 반응이 많은지 평가
리뷰 작성시 리뷰에 속한 단어를 추출하여 긍정적인 단어인지 부정적인 단어인지 평가하여 점수를 부여
데이터셋에는 긍정적 점수와 부정적 점수를 부여한 수만개의 단어로 구성되어 있음

import pandas as pd
from books.models import Review

# 감성 리뷰 데이터셋 읽기
book_sense = pd.read_csv(r'books\csv\sense.csv')

pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 300)


# 데이터베이스에 저장된 리뷰를 불러와 조사를 빼고 단어만 추출할 수 있도록 가공
for books.review != None:
	sentence = books.review[i]

	sentence_words_strip = []
	sentence_split = sentence.split()
	for sentence_words in sentence_split:
	    sentence_words_strip.append(sentence_words.rstrip("."",""!""?""을""를""은""는"))

	print(sentence_words_strip)


# 추출한 단어와 데이터셋의 단어를 비교하여 일치하는 단어가 있을 시에는
# 각 단어에 긍정 점수와 부정 점수를 합산하여 퍼센테이지로 출력
positive_score = 0
nagative_score = 0
for i in sentence_words_strip:

    if book_sense[book_sense['term'] == i].empty == True:
        pass
    else:
        words = book_sense[book_sense['term'] == i]

        print(words, words['positive_score'].values)
        positive_score += words['positive_score'].values[0]
        nagative_score += words['nagative_score'].values[0]

print(positive_score)
print(nagative_score)

positive_per = positive_score / (positive_score + nagative_score) * 100
nagative_per = nagative_score / (positive_score + nagative_score) * 100

print(positive_per)
print(nagative_per)

이번 프로젝트 기간 동안 구현해본 기능

데이터셋 csv 파일을 받아서 데이터셋에 저장하지 않고 활용하는 방법으로 코드를 구현하였고,

데이터셋이 정말 다양한 분야에서 다양한 방법으로 활용 된다는 것을 느꼈다.

'코딩공부 > Machine Learning' 카테고리의 다른 글

OpenCV 이미지 다루기 (읽기, 출력, 저장) (0)	2022.11.29
딥러닝을 위한 기초적인 이미지 다루기 (0)	2022.11.16
데이터셋을 활용하여 추천시스템 구현 / TIL_221102 (0)	2022.11.02
머신러닝 yolo를 이용한 이미지에서 사람 인식 / TIL_221013 (0)	2022.10.13
머신러닝 라이브러리 / TIL_221012 (0)	2022.10.12

Dong_Devlog

데이터셋 활용하여 리뷰 평가 기능 구현 / TIL_221108

'코딩공부 > Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

데이터셋 활용하여 리뷰 평가 기능 구현 / TIL_221108

'코딩공부 > Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바