Tourism Data and Pope

UPDATE: I created a Shiny app in which you can see the data interactively. See the app here.

Recently Pope Francis visited Turkey and in Turkish media people talked about how this visit will increase the number of foreign tourists coming to Turkey as it did happen after XVI. Benedictus’s visit in 2006.

I wanted to see the effects of XVI. Benedictus’s visit and created a couple of plots. I downloaded the data from Turkish Statistical Institute. The data I downloaded is Monthly Foreigner Arrivals at Border Gate Selection. Here are some codes and plots.

Before merging data I did some cleaning and correction. If you are interested you can find R files here. The following code loads libraries required and reads the merged data.

setwd("~/Dropbox/Tourism/Rdata")

# Libraries ====================================================================
library("plyr")
library("reshape2")
library("ggplot2")
library("chron")
library("magrittr")
library("zoo")
library("ggmap")
library("googleVis")
library("geocodeHERE") 

# Read Data ====================================================================
dataset <- readRDS("Merged.rds")

# Summary ======================================================================
summary(dataset)
##         Mode             City             Gate           Value        
##  Airway   : 7080   Muğla   : 4320   Center  : 8640   Min.   :      0  
##  Excursion: 5280   Antalya : 2796   Kapıkule:  912   1st Qu.:     14  
##  Railway  : 1764   İzmir   : 2532   Bodrum  :  660   Median :    347  
##  Roadway  : 6816   Edirne  : 2244   Dikili  :  660   Mean   :  11522  
##  Seaway   :14940   İstanbul: 1584   Marmaris:  660   3rd Qu.:   2559  
##                    Hatay   : 1548   Fethiye :  648   Max.   :1782165  
##                    (Other) :20856   (Other) :23700                    
##       Date           
##  Min.   :1977-01-01  
##  1st Qu.:1988-11-01  
##  Median :1999-06-01  
##  Mean   :1997-11-17  
##  3rd Qu.:2007-09-08  
##  Max.   :2013-12-01  
## 
# Create Month and Year variables ==============================================
dataset$Month <- format(dataset$Date, "%b")
dataset$Year  <- format(dataset$Date, "%Y")
dataset$Date  <- as.yearmon(dataset$Date)

Following show a couple of results by year, city, etc.

# Some practice with magrittr

# Total Tourist Arrivals by City since 1977
dataset$Value %>% 
    aggregate(list(City = dataset$City), sum) %>% 
    head(n=10)
##          City         x
## 1       Adana   2173130
## 2        Ağrı   7571506
## 3      Ankara   5956878
## 4     Antalya 119562767
## 5       Aydın  11764206
## 6   Balıkesir    550485
## 7       Bursa     73702
## 8   Çanakkale    365475
## 9  Diyarbakır     10314
## 10     Edirne  46893500
# Total Tourist Arrivals by Date
dataset$Value %>% 
    aggregate(list(Date = dataset$Date), sum) %>% 
    head(n=10)
##        Date      x
## 1  Jan 1977  77950
## 2  Feb 1977  82870
## 3  Mar 1977 155000
## 4  Apr 1977 245892
## 5  May 1977 277634
## 6  Jun 1977 250834
## 7  Jul 1977 335602
## 8  Aug 1977 390798
## 9  Sep 1977 313900
## 10 Oct 1977 256214
# Total Tourist Arrivals by Month and Year (Same as above)
#head(aggregate(dataset$Value, list(Month = dataset$Month, Year = dataset$Year), sum), n=10)

# Total Tourist Arrivals by City and Year
dataset$Value %>% 
    aggregate(list(City = dataset$City, Year = dataset$Year), sum) %>% 
    head(n=10)
##          City Year      x
## 1       Adana 1977  21192
## 2        Ağrı 1977  56672
## 3      Ankara 1977  59104
## 4     Antalya 1977  28870
## 5       Aydın 1977 274302
## 6   Balıkesir 1977   1520
## 7       Bursa 1977   4630
## 8   Çanakkale 1977   6928
## 9  Diyarbakır 1977     52
## 10     Edirne 1977 553106
# Total Tourist Arrivals by City and Year (2013) and Mode
arrival_2013 <- dataset %>% subset(Year==2013)

arrival_2013$Value %>%
  aggregate(list(City = arrival_2013$City, Mode = arrival_2013$Mode), sum) %>%
  arrange(desc(x)) %>%
  head(n=10)
##        City      Mode        x
## 1   Antalya    Airway 10905129
## 2  İstanbul    Airway  9869794
## 3    Edirne   Roadway  2828930
## 4     Muğla    Airway  2566477
## 5    Artvin   Roadway  1879695
## 6     İzmir    Airway   865796
## 7     Aydın Excursion   554484
## 8  İstanbul Excursion   517977
## 9    Şırnak   Roadway   504030
## 10    İzmir Excursion   494960
# Total Tourist Arrivals by City (Adana) and Year (2013) and Mode
ada_arrival_2013 <- dataset %>% subset(Year==2013 & City=="Adana")

ada_arrival_2013$Value %>%
  aggregate(list(City = ada_arrival_2013$City, Mode = ada_arrival_2013$Mode, Month = ada_arrival_2013$Month), sum) %>%
  arrange(desc(Month)) %>%
  head(n=10)
##     City   Mode Month    x
## 1  Adana Airway   Sep 7299
## 2  Adana Seaway   Sep 1264
## 3  Adana Airway   Oct 7665
## 4  Adana Seaway   Oct   79
## 5  Adana Airway   Nov 3678
## 6  Adana Seaway   Nov  125
## 7  Adana Airway   May 6257
## 8  Adana Seaway   May  118
## 9  Adana Airway   Mar 4788
## 10 Adana Seaway   Mar   88

Here are some plots. This one shows the number of yearly visits to Turkey. Blue highlighted areas are summer months (May to August).

# PLOTS ========================================================================
# The code below is messy!

# Yearly plot - summer months highlighted

yearly <- aggregate(dataset$Value, list(Date = dataset$Date), sum)
summer_start <- as.Date(unique(subset(dataset$Date, dataset$Month == "May")))
summer_end   <- as.Date(unique(subset(dataset$Date, dataset$Month == "Aug")))
summer       <- data.frame(summer_start, summer_end)
colnames(summer) <- c("start", "end")

yearly_plot <- ggplot(yearly, aes(x=as.Date(Date), y=x)) + geom_line() + theme_bw() +
  scale_y_continuous(name="")+
  xlab("") +
  ylab("") +
  theme(legend.position="top") +
  theme(legend.title=element_blank())+
  theme(legend.key = element_rect(fill = NA, colour = NA, size = 1.25))+
  geom_rect(data=summer, aes(xmin=start, xmax=end, ymin=-Inf, ymax=Inf), color=NA, fill="blue", alpha=0.1, inherit.aes = FALSE)

yearly_plot

plot of chunk unnamed-chunk-3

This one is not easy to read. So I plotted a couple of recent years.

# Too much data not easy to see

# Create above plot for the last 3 years

last_three <- subset(yearly, format(Date, "%Y") >= 2010)
last_three_summer <- subset(summer, format(start, "%Y") >= 2010)

yearly_plot_last_three <- ggplot(last_three, aes(x=as.Date(Date), y=x)) + geom_line() + theme_bw() +
  scale_y_continuous(name="")+
  xlab("") +
  ylab("") +
  theme(legend.position="top") +
  theme(legend.title=element_blank())+
  theme(legend.key = element_rect(fill = NA, colour = NA, size = 1.25))+
  geom_rect(data=last_three_summer, aes(xmin=start, xmax=end, ymin=-Inf, ymax=Inf), color=NA, fill="blue", alpha=0.1, inherit.aes = FALSE)

yearly_plot_last_three

plot of chunk unnamed-chunk-4

It seems that the visits to Turkey are highly seasonal.

The following plot shows XVI. Benedictus’s visit to Istanbul in 2006. Blue highlighted are is the time he visited. I highlighted the two months after his visit as red. Other red highlighted areas are corresponding months in different years.

# Check the effect of XVI. Benedictus's visit to Istanbul in 2006

istanbul_2006 <- subset(dataset, Date >= "Oct 2005" & Date <= "Mar 2008" & City == "İstanbul")
istanbul_2006 <- aggregate(istanbul_2006$Value, list(Date = istanbul_2006$Date), sum)

pope_min <- as.Date("2006-11-01")
pope_max <- as.Date("2006-12-01")
one_month_pope_min <- as.Date("2006-12-01")
one_month_pope_max <- as.Date("2007-02-01")
# Highlight same month for different years to see effect
pope_min_2006 <- as.Date("2005-12-01")
pope_max_2006 <- as.Date("2006-02-01")
pope_min_2008 <- as.Date("2007-12-01")
pope_max_2008 <- as.Date("2008-02-01")

pope <- data.frame(pope_min, pope_max, one_month_pope_min, one_month_pope_max, pope_min_2006, pope_max_2006, pope_min_2008, pope_max_2008)

istanbul_2006_plot <- ggplot(istanbul_2006, aes(x=as.Date(Date), y=x)) + geom_line() + theme_bw() +
  scale_y_continuous(name="")+
  xlab("") +
  ylab("") +
  theme(legend.position="top") +
  theme(legend.title=element_blank()) +
  geom_rect(data=pope, aes(xmin=pope_min, xmax=pope_max, ymin=-Inf, ymax=Inf), color=NA, fill="blue", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=one_month_pope_min, xmax=one_month_pope_max, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=pope_min_2006, xmax=pope_max_2006, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=pope_min_2008, xmax=pope_max_2008, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  annotate(geom="text", x=as.Date("2006-11-15"), y=450000, label="XVI. Benedictus's visit to Istanbul", colour="blue",alpha=0.3, size=7, angle=90)


istanbul_2006_plot

plot of chunk unnamed-chunk-5

From the plot it seems that his visit didn’t have too much effect. Let’s check country wide visits.

# From the graph it seems Pope's visit had no effect. Let's check Turkey.

turkey_2006 <- subset(dataset, Date >= "Oct 2005" & Date <= "Mar 2008")
turkey_2006 <- aggregate(turkey_2006$Value, list(Date = turkey_2006$Date), sum)

turkey_2006_plot <- ggplot(turkey_2006, aes(x=as.Date(Date), y=x)) + geom_line() + theme_bw() +
  scale_y_continuous(name="")+
  xlab("") +
  ylab("") +
  theme(legend.position="top") +
  theme(legend.title=element_blank()) +
  geom_rect(data=pope, aes(xmin=pope_min, xmax=pope_max, ymin=-Inf, ymax=Inf), color=NA, fill="blue", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=one_month_pope_min, xmax=one_month_pope_max, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=pope_min_2006, xmax=pope_max_2006, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  geom_rect(data=pope, aes(xmin=pope_min_2008, xmax=pope_max_2008, ymin=-Inf, ymax=Inf), color=NA, fill="red", alpha=0.1, inherit.aes = FALSE)+
  annotate(geom="text", x=as.Date("2006-11-15"), y=2000000, label="XVI. Benedictus's visit to Istanbul", colour="blue",alpha=0.3, size=7, angle=90)

turkey_2006_plot

plot of chunk unnamed-chunk-6

Similarly this one also shows no sign of increase after his visit. Increase in visits in those months are similar in other years.

Since I downloaded data I played a little bit. Dataset includes the modes of visits. So I plotted them. It’s interesting to see how Airways became the dominant mode of visits.

# Plot different modes =========================================================
mode_data <- aggregate(dataset$Value, list(Date = dataset$Date, Mode = dataset$Mode, Month= dataset$Month), sum)
# Drop Excursion
#mode_data <- subset(mode_data, Mode!="Excursion")

# Plot
mode_plot <- ggplot(mode_data, aes(x=as.Date(Date), y=x))+ theme_bw() +
  geom_line() +
  facet_grid(Mode ~ ., scales="free_y") + theme_bw() +
  scale_y_continuous(name="")+
  xlab("") +
  ylab("") +
  theme(legend.position="top") +
  theme(legend.title=element_blank())+
  theme(legend.key = element_rect(fill = NA, colour = NA, size = 1.25))+
  geom_rect(data=summer, aes(xmin=start, xmax=end, ymin=-Inf, ymax=Inf), color=NA, fill="blue", alpha=0.1, inherit.aes = FALSE)

mode_plot

plot of chunk unnamed-chunk-7

Excursion had some missing data so we have that flat region in the plot. I also played with Google Maps. There are nice packages make this very easy. Here’s the code for Google Maps.

# Try mapping ==================================================================
# Aggregate data by city and year
map_data <- aggregate(dataset$Value, list(City = dataset$City, Year = dataset$Year), sum)
# Get subset 2013
map_data <- subset(map_data, Year == 2013)
# Create address like City names
map_data$City <- paste(unique(map_data$City), " Türkiye", sep=",")
# Get coordinates from Google Maps
coord = cbind.data.frame(City = unique(map_data$City), lat = NA, lon = NA)
# Convert factor to characters since google api requires characters
# See http://stackoverflow.com/a/2853231
i <- sapply(coord, is.factor)
coord[i] <- lapply(coord[i], as.character)
# Get coordinates from google maps API
# See for Google API limit:
# http://shanelynn.ie/index.php/massive-geocoding-with-r-and-google-maps/
#coord <- with(coord, data.frame(City, t(sapply(coord$City, geocode))))
#save(coord, file = "~/Dropbox/Tourism/coord.RData")
#load("~/Dropbox/Tourism/coord.RData")

# See for alternative to Google
# http://blog.corynissen.com/2014/11/geocodehere-01-is-on-cran.html
#coord <- with(coord, data.frame(City, t(sapply(coord$City, geocodeHERE_simple))))
#save(coord, file = "~/Dropbox/Tourism/coord.RData")
load("~/Dropbox/Tourism/coord.RData")


# Merge coordinates with data
map <- merge(map_data, coord, by="City")
# Convert lat-lon to location for googlevis
# For Google use
# map$loc=paste(map$lat, map$lon, sep=":")
map$loc=paste(map$Latitude, map$Longitude, sep=":")

# Change column name for hover text
colnames(map)[3] <- "Number of Tourist Arrivals"

# See https://developers.google.com/chart/interactive/docs/gallery/geomap
# Plot map chart
G1 <- gvisGeoChart(map, 
                   locationvar="loc", 
                   colorvar="Number of Tourist Arrivals", 
                   hovervar = "City",
                   options=list(displayMode="Markers",
                                region="TR",
                                height= 443,
                                width= 715,
                                colorAxis="{colors:['purple', 'red', 'orange', 'green', 'blue']}",
                                backgroundColor="lightblue"), 
                   chartid="Tourism")
plot(G1, tag='chart')

This map shows the total visits in 2013 by cities. As expected cities like Istanbul and Antalya have the most visitors.

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] graphics  grDevices utils     datasets  stats     methods   base     
## 
## other attached packages:
##  [1] geocodeHERE_0.1.1   googleVis_0.5.6     ggmap_2.3          
##  [4] zoo_1.7-11          magrittr_1.5        chron_2.3-45       
##  [7] stargazer_5.1       xlsx_0.5.7          xlsxjars_0.6.1     
## [10] rJava_0.9-6         reshape2_1.4        plyr_1.8.1         
## [13] ProjectTemplate_0.6 markdown_0.7.4      lessR_3.1.1        
## [16] knitr_1.8           ggplot2_1.0.0       devtools_1.6.1     
## 
## loaded via a namespace (and not attached):
##  [1] car_2.0-22          colorspace_1.2-4    digest_0.6.4       
##  [4] evaluate_0.5.5      foreign_0.8-61      formatR_1.0        
##  [7] gdata_2.13.3        grid_3.1.1          gtable_0.1.2       
## [10] gtools_3.4.1        labeling_0.3        lattice_0.20-29    
## [13] leaps_2.9           mapproj_1.2-2       maps_2.3-9         
## [16] MASS_7.3-35         MBESS_3.3.3         munsell_0.4.2      
## [19] nnet_7.3-8          png_0.1-7           proto_0.3-10       
## [22] Rcpp_0.11.3         RgoogleMaps_1.2.0.6 rjson_0.2.15       
## [25] RJSONIO_1.3-0       scales_0.2.4        stringr_0.6.2      
## [28] tools_3.1.1         triangle_0.8