Instagram - Fountain Pen (2)

This is kind of an update to my previous post. I played a little more with the Instagram data. Similar to previous post, first of all we need data.

rm(list=ls())

# Load devtools library
library(devtools)

# Load instaR package
library(instaR)

# Load authorization file
load("~/my_oauth/my_oauth_instagram")

# Get posts with 'fountainpen' tag
data <- searchInstagram(tag = "fountainpen",
                               token=my_oauth,
                               n=5000)

I downloaded the newest 5000 posts tagged with fountainpen. The following are the data column names.

# These are the column names of the data
colnames(data)
##  [1] "type"           "longitude"      "latitude"       "location_name" 
##  [5] "location_id"    "comments_count" "filter"         "created_time"  
##  [9] "link"           "likes_count"    "image_url"      "caption"       
## [13] "username"       "user_id"        "user_fullname"  "id"

We can duplicate data frame since it takes some time to download. It’s better to keep the original data while playing on data so you don’t mess up and download it again and again.

# Create another data from original
fountainpen <- data

As seen from the colnames above we have lots of useful information in the data set. Let’s get the number of counts of images and videos in the latest 5000 posts.

library(plyr)
# Count how many images or videos
fountainpen_image  <- count(fountainpen, vars = "type")

Apparently, fpgeeks like pictures more than videos. There are 4940 images and 60 videos posted.

We can also count the filters used and find the favorite filter of fpgeeks.

# Count filters used
fountainpen_filter <- arrange(count(fountainpen, vars = "filter"), freq)

Results show that the favorite filter is Normal. We can create a plot of filters.

library(ggplot2)
# Remove Normal and plot
fountainpen_filter_n <- subset(fountainpen_filter, filter != "Normal")

fountainpen_filter_n$ffilter <- factor(fountainpen_filter_n$filter, 
                                       levels=fountainpen_filter_n$filter)
filter.plot <- qplot( data = fountainpen_filter_n,
                      x = ffilter, y = freq,
                      geom="bar", stat = "identity",
                      xlab="", ylab="Frequency")
filter.plot <- filter.plot + theme(axis.text.x = element_text(angle = 0, size = 10)) 
filter.plot <- filter.plot + coord_flip()

Here is the plot of filters used. I removed Normal since it was huge compared to others.

plot of chunk unnamed-chunk-7

We can also check the most liked post and the user that posted it.

# Find the most liked post
fountainpen_liked           <- arrange(fountainpen, desc(likes_count))
most_liked_index            <- which.max(fountainpen$likes_count)
most_liked_image            <- fountainpen$link[most_liked_index]
most_liked_user             <- fountainpen$username[most_liked_index]

Our most liked post is from @serrose and it is this one.

Similarly we can find the most commented post and the user posted it.

# Find the most commented post
fountainpen_commented       <- arrange(fountainpen, desc(comments_count))
most_commented_index        <- which.max(fountainpen$comments_count)
most_commented_image        <- fountainpen$link[most_commented_index]
most_commented_user         <- fountainpen$username[most_commented_index]

Our most commented post is from @gouletpens and it is this one.

In my previous post, I was searching for brand tags individually and this was creating some problems for brand comparisons. This time I only used the hashtags that fpgeeks used. I extracted the tags from the picture captions, combined, and ordered them.

# Since the previous post was choosing non-fpgeeks, calculate popular brand with this data
# See http://goo.gl/MqM5hj
library(stringr)                             # Load library to manipulate strings
hashtag.regex <- perl("(?<=^|\\s)#\\S+")     # Regular expression to choose hashtags

# Extract hashtags from captions
hashtags <- str_extract_all(fountainpen$caption, hashtag.regex)

# Convert to dataframe
hashtags_out <- data.frame(matrix(unlist(hashtags), nrow=36657, byrow=T))

# Give column name
colnames(hashtags_out) <- "tag"

# Convert all to lower case and remove "#" from hashtags
hashtags_out$tag <- tolower(sub("[^[:alnum:]]", "", hashtags_out$tag))

# Count different tags
hashtag_count <- arrange(count(hashtags_out), desc(freq))

# Remove data with frequency less than 100
hashtag_count_n <- subset(hashtag_count, freq >=100)

This data show us that the most widely used tags with fountainpen tag. We can plot the data with the following command.

# Plot tag histogram
hashtag_count_n$ttag <- factor(hashtag_count_n$tag, levels=hashtag_count_n$tag)
bar.plot <- qplot( data = hashtag_count_n,
                   x = ttag, y = freq, xlab="", ylab="Frequency",
                   geom="bar", stat = "identity" )
bar.plot <- bar.plot + theme(axis.text.x = element_text(angle = 0, size = 10)) + coord_flip()

Here is the graph. There are some Chinese and Japanese tags which don’t show up in the graph. I didn’t try to fix those.

plot of chunk unnamed-chunk-12

As seen from the graph, fpgeeks mostly use ink, pen, and lamy tags. So - probably, if I didn’t make any coding errors - the favorite brand of fountain pen community on Instagram is LAMY.