R notebookを使って簡単な分析をやってみた
以下の記事で、データをRのパッケージにしてgithubで共有できるようにしてみました。
今回はこれを使用した簡単な分析をR notebookでやってみました。R notebookを使用すると、編集用の.Rmdファイルとアウトプットの.htmlファイルが作成されます。いずれもメール等で送れるため、同僚と分析経過を共有したり、社内での簡単なレポーティングに使えると思います。
編集過程は特に引っかかることはなく、R markdownやRの文法についてはググればすぐに答がでてくるため、最終的なR markdownのみ貼っておきます。
なお、これをhtmlファイルにすると以下のページのようになります。(英語の練習も兼ねて英語で書いてます)
ANALYSIS ON GOLDEN STATE WORRIORS IN 2016-2017
R markdown (R notebook)
----- title: "ANALYSIS ON GOLDEN STATE WORRIORS IN 2016-2017" output: html_notebook ----- Firstly, I show the roaster members. I use **season_stats** data-set in **nbastats** package. This data-set includes players stats since 1950-2017 and is available by `devtools::install_github("hadadada00/nbastats")` ```{r} library(tidyverse) library(nbastats) gsw <- seasons_stats %>% filter(Year == 2017, Tm == "GSW") gsw %>% select(Year, Player, Pos, Age) %>% arrange(Age) ``` There are 17 palyers in 2016-2017 season. The youngest player is Kevon Looney (20 y/o) and the oldest players are Matt Barners and David West (36 y/o). Secondly, I want to know Who is the best scorer. There are many ways to evaluate players' offence ability, but to put it simply, I choose 2 criteria as below: 1. Total points in the season (PTS) 2. Points per minutes in the season (PTS / MP) ```{r} gsw %>% ggplot(aes(reorder(Player, PTS), PTS)) + geom_bar(stat = "identity") + coord_flip() + theme(axis.title.y = element_text(angle = 0)) + labs(x = "Player", y = "Points(PTS)") ``` The result is the same with my impression. Curry is the 1st, Thompson is the 2nd, and KD is the 3rd. As a trial, I calculate the proportion of top 3 players points with team's total points. ```{r} # top 3 player list top3 <- c("Stephen Curry", "Klay Thompson", "Kevin Durant") gsw %>% mutate(group = ifelse(Player %in% top3, "Top 3", "Else")) %>% group_by(group) %>% summarise(total = sum(PTS)) %>% mutate(prop = total / sum(total) * 100) %>% ggplot(aes(reorder(group, desc(prop)), prop)) + geom_bar(stat = "identity") + theme(axis.title.y = element_text(angle = 0)) + labs(x = "Top 3 players vs Else", y = "prop of points(%)") + scale_y_continuous(breaks = seq(0, 100, by = 10)) ``` As above, top 3 players (Curry, Thompson, KD) earn more than half of team's total points. Nextly, I think about the second criteria; points per minutes. This is calculated by PTS(Points) / MP(Minutes Played). ```{r} gsw %>% mutate(ppm = PTS / MP) %>% ggplot(aes(x = reorder(Player, ppm), y = ppm)) + geom_bar(stat = "identity") + coord_flip() + theme(axis.title.y = element_text(angle = 0)) + labs(x = "Player", y = "Points per minutes") ``` As above, the most efficent scorer is Curry. The second place is KD, and third place is Thompson. Top 3 players are the same with PTS ranking. From these result, the best offense player in GSW is Stephen Curry. As a supplement, I combine these two results (point ranking and efficient scoring ranking). ```{r} gsw %>% mutate(ppm = PTS / MP, group = ifelse(Player %in% top3, "Top 3", "Else")) %>% ggplot(aes(x = PTS, y = ppm, color = group)) + geom_point() + geom_text(aes(label = Player), size = 2, color = "black", nudge_y = 0.03) + theme(axis.title.y = element_text(angle = 0)) + labs(x = "Points", y = "Points \n per minutes") + guides(color = guide_legend(reverse = TRUE)) ``` As above, top 3 players(Curry, Thompson, KD) are outstanding. Only JaVale McGee is the same level in terms of the efficiency of scoring.