Getting Started

This entire document is based off of code written by Michael Kearney demonstrating how to use rtweet.

Authentication/authorization of API keys

Learn more about Authentication and loading your API keys here (with images): Cran’s rtweet auth

There are multiple ways of doing this, but I find I get errors when I do the browser based method. So, this is the token-based authentication method.

  1. Navigate to developer.twitter.com/en/apps and select your Twitter app
  2. Click the tab labeled Keys and tokens to retrieve your keys.
  3. Locate the Consumer API keys (aka “API Secret”).

store api keys (use this code, but replace with your own keys)

api_key <- "texthere"
api_secret_key <- "texthere"
access_token <- "texthere"
access_token_secret <- "texthere"

access token/secret method

Replace the app name (New2021proj) with whatever you called your app when you created it for your API keys.

token <- create_token(
  app = "New2021proj",
  consumer_key = api_key,
  consumer_secret = api_secret_key,
  access_token = access_token,
  access_secret = access_token_secret)

Installing/loading packages & auth

If you haven’t already installed the rtweet package, do so now - Install {rtweet} from CRAN.

install.packages("rtweet")

Otherwise, load the package - Load {rtweet}

library(rtweet)
## load any other packages you may need
library(dplyr)
library(maps)
library(ggplot2)

Make sure that you have your authentication with the API keys loaded. If you did the authentication above, you can just enter “get_token()” and it should ensure your keys are authenticated.

get_token()
<Token>
<oauth_endpoint>
 request:   https://api.twitter.com/oauth/request_token
 authorize: https://api.twitter.com/oauth/authenticate
 access:    https://api.twitter.com/oauth/access_token
<oauth_app> New2021proj
  key:    rsSoV8bRT29xvOJR2k95gJ50t
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

Searching for tweets with search_tweets

search_tweets()

Search for one or more keyword(s)

tacos <- search_tweets("tacos")
tacos
# A tibble: 100 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 436134… 13535202… 2021-01-25 01:49:06 kevingut20… "Vte… Twitt…
 2 109291… 13535202… 2021-01-25 01:49:00 abbysvol6   "nah… Twitt…
 3 128039… 13535202… 2021-01-25 01:48:59 nicolee2442 "Tal… Twitt…
 4 128755… 13535201… 2021-01-25 01:48:56 MontoyaEme… "Shr… Twitt…
 5 132776… 13535201… 2021-01-25 01:48:50 ScillaMich… "ste… Twitt…
 6 334304… 13535201… 2021-01-25 01:48:44 shouisdrea… "My … Twitt…
 7 177389… 13535201… 2021-01-25 01:48:37 Akaito      "@ne… Twitt…
 8 718165… 13535201… 2021-01-25 01:48:34 manuu_davi… "hj … Twitt…
 9 116896… 13535200… 2021-01-25 01:48:32 ElGordoUri… "Cha… Twitt…
10 601371… 13535200… 2021-01-25 01:48:31 IrishCorin… "@Ro… Twitt…
# … with 90 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>


If you want multiple words there is an implicit AND between words

cb <- search_tweets("cheap beer")
cb
# A tibble: 100 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 422908… 13535169… 2021-01-25 01:35:52 BasiliskOn… "\"r… Twitt…
 2 275843… 13535168… 2021-01-25 01:35:28 krveale     "\"r… Tweet…
 3 326330… 13535165… 2021-01-25 01:34:31 MetaAdamJo… "\"r… Twitt…
 4 125510… 13535152… 2021-01-25 01:29:10 JeremyKoene "Wis… Twitt…
 5 391597… 13535150… 2021-01-25 01:28:40 RPete1103   "@ma… Twitt…
 6 771076… 13535148… 2021-01-25 01:27:50 Roganzar    "\"r… Twitt…
 7 170620… 13535124… 2021-01-25 01:18:02 gmarie55    "@Pa… Twitt…
 8 364198… 13535118… 2021-01-25 01:15:52 pdog11234   "\"r… Twitt…
 9 167786… 13535108… 2021-01-25 01:11:45 withoutapl… "\"r… Twitt…
10 490158… 13535091… 2021-01-25 01:04:52 isaac_urner "@Ki… Twitt…
# … with 90 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

search for exact phrase

## single quotes around doubles
ds <- search_tweets('"data science"')

## or escape the quotes
ds <- search_tweets("\"data science\"")
ds
# A tibble: 100 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 127505… 13535201… 2021-01-25 01:48:41 Women_who_… "#Py… Nlogi…
 2 127505… 13535134… 2021-01-25 01:22:00 Women_who_… "#FE… Nlogi…
 3 127505… 13535153… 2021-01-25 01:29:40 Women_who_… "#FE… Nlogi…
 4 112782… 13535134… 2021-01-25 01:22:15 xaelbot     "Dat… xael …
 5 112782… 13535182… 2021-01-25 01:41:16 xaelbot     "#FE… xael …
 6 112782… 13535201… 2021-01-25 01:48:36 xaelbot     "#Py… xael …
 7 128125… 13535201… 2021-01-25 01:48:36 100DaysOf2… "Tab… #100D…
 8 128125… 13535198… 2021-01-25 01:47:25 100DaysOf2… "#Py… #100D…
 9 128125… 13535137… 2021-01-25 01:23:12 100DaysOf2… "Dat… #100D…
10 128125… 13535138… 2021-01-25 01:23:43 100DaysOf2… "#FE… #100D…
# … with 90 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

keywords and phrases

Search for keyword(s) and phrases

rpds <- search_tweets("rstats python \"data science\"")
rpds
# A tibble: 98 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 955289… 13535096… 2021-01-25 01:07:13 IndieDevPr… "#Da… ""    
 2 955289… 13534207… 2021-01-24 19:13:53 IndieDevPr… "RT … ""    
 3 955289… 13534568… 2021-01-24 21:37:07 IndieDevPr… "“Th… ""    
 4 955289… 13534207… 2021-01-24 19:13:45 IndieDevPr… "Ful… ""    
 5 955289… 13534185… 2021-01-24 19:05:00 IndieDevPr… "Che… ""    
 6 108248… 13534793… 2021-01-24 23:06:27 epuujee     "Dat… "Puuj…
 7 108248… 13534207… 2021-01-24 19:13:52 epuujee     "Ful… "Puuj…
 8 108248… 13534713… 2021-01-24 22:34:40 epuujee     "#R … "Puuj…
 9 108248… 13534208… 2021-01-24 19:14:02 epuujee     "RT … "Puuj…
10 108248… 13534750… 2021-01-24 22:49:33 epuujee     "If … "Puuj…
# … with 88 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <lgl>, reply_to_user_id <lgl>,
#   reply_to_screen_name <lgl>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

increasing number of results

  • search_tweets() returns 100 most recent matching tweets by default

  • Increase n to return more (tip: use intervals of 100)

rbeer <- search_tweets("beer", n = 500)
#can be up to n = 18000
rbeer
# A tibble: 500 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 119026… 13535202… 2021-01-25 01:49:09 kiyomiyaaa… "@be… Twitt…
 2 119026… 13535188… 2021-01-25 01:43:35 kiyomiyaaa… "@be… Twitt…
 3 132921… 13535202… 2021-01-25 01:49:07 DsMpz2PU3a… "フォロ… Twitt…
 4 130048… 13535188… 2021-01-25 01:43:25 fooI94      "bee… Twitt…
 5 130048… 13535202… 2021-01-25 01:49:06 fooI94      "@co… Twitt…
 6 517559… 13535202… 2021-01-25 01:49:06 Kirin_Brew… "@sd… Belug…
 7 517559… 13535202… 2021-01-25 01:49:06 Kirin_Brew… "@hi… Belug…
 8 517559… 13535188… 2021-01-25 01:43:38 Kirin_Brew… "@KA… Belug…
 9 517559… 13535191… 2021-01-25 01:44:57 Kirin_Brew… "@ya… Belug…
10 517559… 13535202… 2021-01-25 01:49:06 Kirin_Brew… "@ur… Belug…
# … with 490 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

Please be mindful that you have a rate limit of 18,000 per fifteen minutes, which means you can only pull this much in one search and will get errors after that for 15 min

getting a lot more tweets

PRO TIP #1: Get the firehose for free by searching for tweets by verified or non-verified tweets

fff <- search_tweets("filter:verified OR -filter:verified", n = 3000) #could be n = 18000
fff
# A tibble: 2,920 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 717045… 13535202… 2021-01-25 01:49:15 paz_edgardo "🚫🚫 … Twitt…
 2 110842… 13535202… 2021-01-25 01:49:15 Asa_no_oto  "存在し… Twitt…
 3 269613… 13535202… 2021-01-25 01:49:15 salmonmycat "#แจ… Twitt…
 4 117322… 13535202… 2021-01-25 01:49:15 GGranblu    "E0D… グランブル…
 5 135313… 13535202… 2021-01-25 01:49:15 JohanaPDHe… "@Mi… Twitt…
 6 132960… 13535202… 2021-01-25 01:49:15 ujesquadar… "@Ab… Twitt…
 7 134250… 13535202… 2021-01-25 01:49:15 sore_nammae "@FU… Twitt…
 8 134487… 13535202… 2021-01-25 01:49:15 TGQdogoLxn… "やっと… Twitt…
 9 259684… 13535202… 2021-01-25 01:49:15 ag_gerardo  "#LI… Twitt…
10 283978… 13535202… 2021-01-25 01:49:15 dokidokima… "渡し方… twitt…
# … with 2,910 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

plotting tweets

Visualize second-by-second frequency

ts_plot(fff, "secs")

ts_plot(dplyr::group_by(fff, is_retweet), "secs")

twitter search operators

You can combine any of the above commands to extract what you are searching for.

PRO TIP #2: Use search operators provided by Twitter, e.g.,

  • filter by language and exclude retweets and replies
rt <- search_tweets("tacos", lang = "en", 
  include_rts = FALSE, `-filter` = "replies")
  • filter only tweets linking to news articles
nws <- search_tweets("filter:news")

filtering in search_tweets

  • filter only tweets that contain links
links <- search_tweets("filter:links")
links
# A tibble: 100 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 328337… 13535203… 2021-01-25 01:49:39 SRiitu      "नौट… Twitt…
 2 119261… 13535203… 2021-01-25 01:49:39 Looirinhaa… "é s… Twitt…
 3 125695… 13535203… 2021-01-25 01:49:39 agAryaminx  "Alb… Twitt…
 4 906150… 13535203… 2021-01-25 01:49:39 non835012   "晴れ着… Twitt…
 5 125700… 13535203… 2021-01-25 01:49:39 lemmedrivd… "Dan… Twitt…
 6 286798… 13535203… 2021-01-25 01:49:39 jensoogasm  "htt… Twitt…
 7 124783… 13535203… 2021-01-25 01:49:39 reallymorg_ "Mah… Twitt…
 8 301663… 13535203… 2021-01-25 01:49:39 sabamisoka… "メカ少… Twitt…
 9 951444… 13535203… 2021-01-25 01:49:39 vmac515     "Jes… Twitt…
10 149149… 13535203… 2021-01-25 01:49:39 EgemenSelv… "ben… Twitt…
# … with 90 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <lgl>, reply_to_user_id <lgl>,
#   reply_to_screen_name <lgl>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>
  • filter only tweets that contain video
vids <- search_tweets("filter:video")
vids
# A tibble: 71 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 130265… 13535148… 2021-01-25 01:27:36 93WDBHG     "@st… Twitt…
 2 133435… 13534043… 2021-01-24 18:08:39 yaboibugzyt "Top… Twitt…
 3 417289… 13533528… 2021-01-24 14:44:08 VW_SetApar… "No … Twitt…
 4 120886… 13533397… 2021-01-24 13:52:03 formerbada… "For… Twitt…
 5 129654… 13532265… 2021-01-24 06:22:06 bintanglan… "@_c… Twitt…
 6 131211… 13532055… 2021-01-24 04:58:27 ygcarchive  "jim… Twitt…
 7 145553… 13531045… 2021-01-23 22:17:27 hntgrl2     "The… Twitt…
 8 120401… 13530139… 2021-01-23 16:17:13 dawnsavor   "pon… Twitt…
 9 123885… 13526665… 2021-01-22 17:16:43 HikmahBegum "LMA… Twitt…
10 122931… 13525808… 2021-01-22 11:36:08 jdiaz900817 "Guy… Twitt…
# … with 61 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

tweets sent by screennames

  • filter only tweets sent from:{screen_name} or to:{screen_name} certain users
## vector of screen names
users <- c("cnnbrk", "AP", "nytimes", 
  "foxnews", "msnbc", "seanhannity", "maddow")
## then use search_tweets
tousers <- search_tweets(paste0("from:", users, collapse = " OR "))
tousers
# A tibble: 100 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 428333  13535192… 2021-01-25 01:45:15 cnnbrk      "Mex… Socia…
 2 428333  13535180… 2021-01-25 01:40:25 cnnbrk      "Pre… Socia…
 3 428333  13533898… 2021-01-24 17:10:55 cnnbrk      "The… Socia…
 4 428333  13534533… 2021-01-24 21:23:25 cnnbrk      "Pre… Socia…
 5 2836421 13535172… 2021-01-25 01:37:20 MSNBC       "Dr.… Tweet…
 6 2836421 13533852… 2021-01-24 16:52:39 MSNBC       ".@g… Tweet…
 7 2836421 13534555… 2021-01-24 21:32:03 MSNBC       "JUS… Wildm…
 8 2836421 13534706… 2021-01-24 22:32:04 MSNBC       "\"W… Socia…
 9 2836421 13534175… 2021-01-24 19:01:10 MSNBC       "The… Socia…
10 2836421 13535154… 2021-01-25 01:29:54 MSNBC       ".@e… Tweet…
# … with 90 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

searching only verified accounts

  • filter only tweets with at least 100 favorites or 100 retweets
pop <- search_tweets(
  "(filter:verified OR -filter:verified) (min_faves:100 OR min_retweets:100)")
  • filter by the type of device that posted the tweet.
rt <- search_tweets("lang:en", source = '"Twitter for iPhone"')

search_tweets() with location

Search by geolocation (ex: tweets within 25 miles of Durham University)

durham25 <- search_tweets(
  geocode = "54.7649859,-1.5803916,25mi", n = 500
)
durham25
# A tibble: 500 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 133603… 13535203… 2021-01-25 01:49:41 HarryBo2021 "@Pr… Twitt…
 2 133603… 13535173… 2021-01-25 01:37:44 HarryBo2021 "@Lo… Twitt…
 3 133603… 13535128… 2021-01-25 01:19:54 HarryBo2021 "@Lo… Twitt…
 4 356466… 13535203… 2021-01-25 01:49:37 nadenesorby "Fee… Twitt…
 5 400157… 13535203… 2021-01-25 01:49:33 milkydisco… "I d… Twitt…
 6 400157… 13535197… 2021-01-25 01:47:21 milkydisco… "@ia… Twitt…
 7 460392… 13535203… 2021-01-25 01:49:29 HistoryMick "@Da… Twitt…
 8 367850… 13535203… 2021-01-25 01:49:25 AcapIqmal   "Cav… Twitt…
 9 531284… 13535202… 2021-01-25 01:49:03 PaulNic_Jo… "@Or… Twitt…
10 531284… 13535181… 2021-01-25 01:40:48 PaulNic_Jo… "@Or… Twitt…
# … with 490 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
#   favorite_count <int>, retweet_count <int>, quote_count <int>,
#   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
#   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
#   media_t.co <list>, media_expanded_url <list>, media_type <list>,
#   ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
#   ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
#   lang <chr>, quoted_status_id <chr>, quoted_text <chr>,
#   quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
#   quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
#   quoted_name <chr>, quoted_followers_count <int>,
#   quoted_friends_count <int>, quoted_statuses_count <int>,
#   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>,
#   retweet_status_id <chr>, retweet_text <chr>, retweet_created_at <dttm>,
#   retweet_source <chr>, retweet_favorite_count <int>,
#   retweet_retweet_count <int>, retweet_user_id <chr>,
#   retweet_screen_name <chr>, retweet_name <chr>,
#   retweet_followers_count <int>, retweet_friends_count <int>,
#   retweet_statuses_count <int>, retweet_location <chr>,
#   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
#   place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
#   country_code <chr>, geo_coords <list>, coords_coords <list>,
#   bbox_coords <list>, status_url <chr>, name <chr>, location <chr>,
#   description <chr>, url <chr>, protected <lgl>, followers_count <int>,
#   friends_count <int>, listed_count <int>, statuses_count <int>,
#   favourites_count <int>, account_created_at <dttm>, verified <lgl>,
#   profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>,
#   profile_banner_url <chr>, profile_background_url <chr>,
#   profile_image_url <chr>

mapping geotagged tweets

Use lat_lng() to convert geographical data into lat and lng variables (single point)

setting up some basic parameters

I used Google Maps to get the Lat/Long of Durham. I set these as variables so that I could later just pull from them.

#lat and long of durham
xlong<--1.5803916
ylat<-54.7649859

# Where in maps database is this lat and long? (create a variable for this)
region<-map.where(database = "world", xlong,ylat )

Mapping the geotagged tweets

#create lat/lng variables using all available tweet and profile geo-location data
durham25 <- lat_lng(durham25)

#notice how I use the region variable I created above and add to the xlong/ylat variables to set my extents?

maps::map("world",regions = region, fill = TRUE, col = "#ffffff", lwd = .25, mar = c(0, 0, 0, 0), xlim = c((xlong-5), (xlong+5)), y = c(ylat-5, ylat+5))
with(durham25, points(lng, lat, pch = 20, col = "red"))

This code plots geotagged tweets within 25 miles of Durham on a map of the UK

Please note if you were making a map of the United States the maps::map() has 3 databases for the USA and only one for “world” see help(package='maps') for more details.

searching in an entire country

Search by geo-location—for example, find 10,000 tweets in the English language sent from the United States. Note: some countries and cities are hardcoded in the API, while sometimes lookup_coords() requires users have a Google API key

search for 5,000 tweets in english, sent from the US

usa <- search_tweets(
  "lang:en", geocode = lookup_coords("usa"), n = 5000
)

These tweets are all geotagged