Tourism in Nepal
Introduction
Nepal is celebrating year 2020 as “Tourism Year” targeting 2 million international tourist arrivals. You can learn more about the #VisitNepal2020. I want to see the trend/history of nepal tourism and extracted the data from the Wikipedia in Scraping Data with R blog post.
In this section, we will work on the scrapped data from the Scraping Data with R blog post and perform analysis, create some visualization and understand the trend of Tourism in Nepal.
Lets Start
I will load the required package for this blog.
|
|
year | tourist_number | per_change |
---|---|---|
1993 | 293567 | -12.2 |
1994 | 326531 | 11.2 |
1995 | 363395 | 11.3 |
1996 | 393613 | 8.3 |
1997 | 421857 | 7.2 |
1998 | 463684 | 9.9 |
1999 | 491504 | 6.0 |
2000 | 463646 | -5.7 |
2001 | 361237 | -22.1 |
2002 | 275468 | -23.7 |
This is the data frame which I got after the extraction and cleaning process in the Scraping Data with R blog post. Now I will create a visualization from this data using ggplot2 and plotly.
Visuals
|
|
This barplot shows the number of arrival of international tourist from 1993 to 2018. The trend of tourist number was good from 1993 to 2000. Then from 2001, the flow of the international tourist decreased as it was the period where civil war was at its height. The country was in the emergency period.
In 2015 there was an earthquake which destroyed most of the historic sites in Kathmandu valley and with large physical and human casualties. This decreased the number of tourist in Nepal. I can see that 2016 onward the number of tourists arrival increased every year. In 2018 the number of tourist arrival crossed first time 1 million.
|
|
This plot shows the percentage change in the flow of the tourist arrival every year.
Top 10 Country
In this section, I will find the top 10 countries with must tourist to Nepal.
Rank | Country | 2013 | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|---|
chr | chr | chr | chr | chr | chr | chr |
1 | India | 160832 | 118249 | 75124 | 135343 | 180974 |
2 | China | 104664 | 104005 | 66984 | 123805 | 113173 |
3 | United States | 79146 | 53645 | 42687 | 49830 | 47355 |
4 | United Kingdom | 51058 | 46295 | 29730 | 36759 | 35688 |
5 | Sri Lanka | 45361 | 57521 | 44367 | 37546 | 32736 |
6 | Thailand | 39154 | 26722 | 32338 | 33422 | 40969 |
7 | South Korea | 34301 | 25171 | 18112 | 23205 | 19714 |
8 | Australia | 33371 | 25507 | 16619 | 24516 | 20469 |
9 | Myanmar | 30852 | 25769 | 21631 | N/A | N/A |
10 | Germany | 29918 | 23812 | 16405 | 18028 | 22263 |
11 | Bangladesh | 29060 | 23440 | 14831 | 21851 | 22410 |
12 | Japan | 27326 | 22979 | 17613 | 25892 | 26694 |
13 | France | 26140 | 20863 | 16405 | 24097 | 21842 |
14 | Malaysia | 18284 | 13669 | 9855 | 18915 | 18842 |
15 | Spain | 15953 | 12255 | 6741 | 13110 | 10412 |
16 | Canada | 15105 | 12491 | 8398 | 11610 | 12132 |
17 | Netherlands | 13393 | 11453 | 7515 | 12320 | 10516 |
This is the data I got from the Scraping Data with R blog post and now I need to extract the top 10 countries from this dataframe.
|
|
Rank | Country | 2013 | 2014 | 2015 | 2016 | 2017 |
---|---|---|---|---|---|---|
chr | chr | chr | chr | chr | chr | chr |
1 | India | 160832 | 118249 | 75124 | 135343 | 180974 |
2 | China | 104664 | 104005 | 66984 | 123805 | 113173 |
3 | United States | 79146 | 53645 | 42687 | 49830 | 47355 |
4 | United Kingdom | 51058 | 46295 | 29730 | 36759 | 35688 |
5 | Sri Lanka | 45361 | 57521 | 44367 | 37546 | 32736 |
6 | Thailand | 39154 | 26722 | 32338 | 33422 | 40969 |
7 | South Korea | 34301 | 25171 | 18112 | 23205 | 19714 |
8 | Australia | 33371 | 25507 | 16619 | 24516 | 20469 |
9 | Myanmar | 30852 | 25769 | 21631 | N/A | N/A |
10 | Germany | 29918 | 23812 | 16405 | 18028 | 22263 |
This dataframe has the years in a different column. I keep all the years in year column along with there tourist arrival value.
|
|
Now, I need to rank the country based on the number of tourist arrival from that country. I use mutate() to create a new column “rank”.
|
|
Rank | Country | year | value | rank | Value_lbl |
---|---|---|---|---|---|
1 | India | 2013 | 160832 | 1 | 160832 |
1 | India | 2014 | 118249 | 1 | 118249 |
1 | India | 2015 | 75124 | 1 | 75124 |
1 | India | 2016 | 135343 | 1 | 135343 |
1 | India | 2017 | 180974 | 1 | 180974 |
2 | China | 2013 | 104664 | 2 | 104664 |
2 | China | 2014 | 104005 | 2 | 104005 |
2 | China | 2015 | 66984 | 2 | 66984 |
2 | China | 2016 | 123805 | 2 | 123805 |
2 | China | 2017 | 113173 | 2 | 113173 |
This is the final data frame that I got after cleaning. Now, it is time to create some visualization/animation.
I am using the ggplot2 package to create static visualization and gganimate to create beautiful animations.
|
|
In this visualization, I need 10 different colours to show a different country in the animation. I created a colour palette and named colors which will be used while creating visualization below.
|
|
Conclusion
In this blog I showed the interactive visualization made with ggplot2 and plotly using the scraped data. I also made the animation showing the number of tourist arrival based on country from 2013 to 2017.
Feel free to send to me your feedback and suggestions regarding this post!