はだだだだ

定食にサラダは不要だと思う。

MENU

R notebookを使って簡単な分析をやってみた

以下の記事で、データをRのパッケージにしてgithubで共有できるようにしてみました。

hadadada00.hatenablog.com

今回はこれを使用した簡単な分析をR notebookでやってみました。R notebookを使用すると、編集用の.Rmdファイルとアウトプットの.htmlファイルが作成されます。いずれもメール等で送れるため、同僚と分析経過を共有したり、社内での簡単なレポーティングに使えると思います。

編集過程は特に引っかかることはなく、R markdownやRの文法についてはググればすぐに答がでてくるため、最終的なR markdownのみ貼っておきます。

なお、これをhtmlファイルにすると以下のページのようになります。(英語の練習も兼ねて英語で書いてます)

ANALYSIS ON GOLDEN STATE WORRIORS IN 2016-2017

R markdown (R notebook)

-----
title: "ANALYSIS ON GOLDEN STATE WORRIORS IN 2016-2017"
output: html_notebook
-----

Firstly, I show the roaster members. I use **season_stats** data-set in **nbastats** package. This data-set includes players stats since 1950-2017 and is available by `devtools::install_github("hadadada00/nbastats")` 
```{r}
library(tidyverse)
library(nbastats)

gsw <- seasons_stats %>% 
  filter(Year == 2017, Tm == "GSW")

gsw %>% select(Year, Player, Pos, Age) %>% 
  arrange(Age)
```

There are 17 palyers in 2016-2017 season. The youngest player is  Kevon Looney (20 y/o) and the oldest players are Matt Barners and David West (36 y/o).

Secondly, I want to know Who is the best scorer. There are many ways to evaluate players' offence ability, but to put it simply, I choose 2 criteria as below:

  1. Total points in the season (PTS)
  2. Points per minutes in the season (PTS / MP)

```{r}
gsw %>% 
  ggplot(aes(reorder(Player, PTS), PTS)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme(axis.title.y = element_text(angle = 0)) +
  labs(x = "Player", y = "Points(PTS)")
```

The result is the same with my impression. Curry is the 1st, Thompson is the 2nd, and KD is the 3rd. As a trial, I calculate the proportion of top 3 players points with team's total points.

```{r}
# top 3 player list
top3 <- c("Stephen Curry", "Klay Thompson", "Kevin Durant")

gsw %>% 
  mutate(group = ifelse(Player %in% top3, "Top 3", "Else")) %>%
  group_by(group) %>% 
  summarise(total = sum(PTS)) %>% 
  mutate(prop = total / sum(total) * 100) %>% 
  ggplot(aes(reorder(group, desc(prop)), prop)) +
  geom_bar(stat = "identity") +
  theme(axis.title.y = element_text(angle = 0)) +
  labs(x = "Top 3 players vs Else",
       y = "prop of points(%)") +
  scale_y_continuous(breaks = seq(0, 100, by = 10))
```

As above, top 3 players (Curry, Thompson, KD) earn more than half of team's total points.

Nextly, I think about the second criteria; points per minutes. This is calculated by PTS(Points) / MP(Minutes Played).

```{r}
gsw %>% 
  mutate(ppm = PTS / MP) %>%  
  ggplot(aes(x = reorder(Player, ppm), y = ppm)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme(axis.title.y = element_text(angle = 0)) +
  labs(x = "Player", y = "Points per minutes")
```

As above, the most efficent scorer is Curry. The second place is KD, and third place is Thompson.

Top 3 players are the same with PTS ranking.

From these result, the best offense player in GSW is Stephen Curry.

As a supplement, I combine these two results (point ranking and efficient scoring ranking). 

```{r}
gsw %>% 
  mutate(ppm = PTS / MP,
         group = ifelse(Player %in% top3, "Top 3", "Else")) %>% 
  ggplot(aes(x = PTS, y = ppm, color = group)) +
  geom_point() +
  geom_text(aes(label = Player),
            size = 2,
            color = "black",
            nudge_y = 0.03) +
  theme(axis.title.y = element_text(angle = 0)) +
  labs(x = "Points", y = "Points \n per minutes") +
  guides(color = guide_legend(reverse = TRUE))
```

As above, top 3 players(Curry, Thompson, KD) are outstanding. Only JaVale McGee is the same level in terms of the efficiency of scoring.