python怎么爬携程网评论

爬取携程网评论需要使用Python的一些库，如requests、BeautifulSoup、re等，下面我将详细地介绍如何使用这些工具来爬取携程网的评论，在此之前，请确保你已经安装了Python及相关库。

分析目标网站

我们需要分析携程网评论页面的结构，通过查看网页源代码，可以发现评论内容都包含在特定的标签和类名中，找到这些标签和类名是爬取数据的关键。

编写爬虫代码

1、导入所需库

我们需要导入Python中的一些基础库：

Python

import requests
from bs4 import BeautifulSoup
import re
import time

2、发送请求

使用requests库向目标URL发送请求，获取页面内容：

Python

url = 'https://club.ctrip.com/review/XXX'  # XXX代表具体的评论页面
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/XX.XX.XX.XX Safari/537.36'
}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'

3、解析页面

使用BeautifulSoup库解析页面内容：

Python

soup = BeautifulSoup(response.text, 'html.parser')

4、提取评论

python怎么爬携程网评论

根据分析得到的标签和类名，提取评论内容：

Python

comments = soup.find_all('div', class_='conttxt')
for comment in comments:
    content = comment.get_text().strip()
    print(content)

5、处理分页

由于评论可能分布在多个页面，我们需要编写循环来处理分页，在URL中找到分页参数，然后进行遍历：

Python

page = 1
while True:
    url = f'https://club.ctrip.com/review/XXX/p{page}/'  # XXX代表具体的评论页面，page代表页码
    response = requests.get(url, headers=headers)
    response.encoding = 'utf-8'
    soup = BeautifulSoup(response.text, 'html.parser')
    comments = soup.find_all('div', class_='conttxt')
    
    if not comments:
        break  # 如果没有评论内容，则退出循环
    for comment in comments:
        content = comment.get_text().strip()
        print(content)
    page += 1
    time.sleep(1)  # 休眠1秒，防止访问过于频繁

6、保存数据

python怎么爬携程网评论

将爬取到的评论保存到文件中：

Python

with open('comments.txt', 'a', encoding='utf-8') as f:
    for comment in comments:
        content = comment.get_text().strip()
        f.write(content + '
')