收集Twitter数据
近期科研,需要收集Twitter数据分析。
#最初的思路是,使用premium账号,调用api1.1收集,但premium开通费时费力,故另辟蹊径。
#随后,尝试使用academic账号收集,查看developer community,推荐使用twarc。
pip install twarc
twarc2 configure
twarc2 search --archive --start-time 2018-05-05 --end-time 2018-05-10 '#alphago' > tweets4.jsonl
twarc2 csv tweets4.jsonl tweets4.csv
#使用R,更方便导出规范json。
install.packages("academictwitteR")
# This will load the academicTwitteR package
library(academictwitteR)
# Set your own bearer token (replace the XXXXX with your own bearer token)
bearer_token <- "XXXXX"
# This will load the academicTwitteR package
library(academictwitteR)
# Set your own bearer token (replace the XXXXX with your own bearer token)
bearer_token <- "XXXXX"
tweets <-
get_all_tweets("from:twitterdev",
"2021-01-01T00:00:00Z",
"2021-05-31T00:00:00Z",
n = 9000000,
page_n = 500,
bearer_token,
data_path = "data/",
bind_tweets = FALSE)
#from:twitterdev可更换为query内容,本项目中,使用的是alphago -is:retweet 或者alphago 或者 lang:en
#举例如,"alphago" "alphago lang:en" "alphago lang:en -is:retweet" "alphago lang:en"
#-is:retweet代表不获取rt转推;lang:en代表获取英文tweet;from代表从某一特定用户获取tweet
#n为获取的tweets最大数量,page_n为单个json返回的一页tweets数量
参考链接:
https://twarc-project.readthedocs.io/en/latest/
https://twarc-project.readthedocs.io/en/latest/twarc2/
https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research
https://twittercommunity.com/t/academic-research-track-limit-problem/151835/4
https://osf.io/mnaud/wiki/home/
https://blog.f-secure.com/searching-twitter-with-twarc/
https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet