1 \(~\)gaku’s work general

References

Books & links

Style

Google’s R Style Guide
The Tidyverse Style Guide
Coding best Practice
Efficient R programming
かならず、RcodeチャンクはThe Tidyverse Style Guide で自分のコーディングとチェックする。
Use commented lines of - and = to break up your file into easily readable chunks.
Variable and function names should use only lowercase letters, numbers, and . Use underscores ( _ ) (so called snake case) to separate words within a name.
<- は、argument passingとobject assignment の両方を行うが
=はどちらか一つ- use && insted of &
search() はロードしてパッケージを示す。ここではAutoloadsもパッケージなのでGlobalEnv以外は

search()

##  [1] ".GlobalEnv"        "package:forcats"   "package:stringr"  
##  [4] "package:dplyr"     "package:purrr"     "package:readr"    
##  [7] "package:tidyr"     "package:tibble"    "package:ggplot2"  
## [10] "package:tidyverse" "package:stats"     "package:graphics" 
## [13] "package:grDevices" "package:utils"     "package:datasets" 
## [16] "package:methods"   "Autoloads"         "package:base"

Function_List <- getNamespaceExports("ggplot2")

head(Function_List)

## [1] "draw_key_vpath"        "StatDensity2dFilled"   "find_panel"           
## [4] "stat_density2d_filled" "stat_count"            "scale_fill_date"

rlang
:= the walrus　“セイウチ”
{{}} curry - curry

Colours

https://everydayanalytics.ca/2017/03/when-to-use-sequential-and-diverging-palettes.html https://www.dataembassy.co.nz/Liza-colours-in-R#26

Topics

dplyr

verbs for “select” starts_with(), ends_with(), contains(), matches(), num_range(), one_of(), everything(),

reordered_imf <- imf_data %>% 
  relocate(iso:year,
         matches("gdp$"),
        .after =year)

imf_data %>% 
  select(
    # Choose columns iso to year
    iso:year,
    # Choose columns starting with "gov" using regular expression

matches("^gov"),
    # Keep remaining columns too
    gdp_in_billions_of_usd:last_col()) %>% 
  names()

In addition to data frames/tibbles, dplyr makes working with other computational backends accessible and efficient. Below is a list of alternative backends:

dtplyr: for large, in-memory datasets. Translates your dplyr code to high performance data.table code.
dbplyr: for data stored in a relational database. Translates your dplyr code to SQL.
sparklyr: for very large datasets stored in Apache Spark.

ggplot

element_blank(): draws nothing, and assigns no space.

element_rect(): borders and backgrounds.

element_line(): lines.

element_text(): text. - ggplotのTEXT reference

全てのシステムフォントの確認: 日本語を使うときに確認。

Markdown

Markdownの記入法では、かならず　半角スペースを後にいれる。
htmlへのPW設定
code chunk options
share it online
Pimp my RMD: a few tips for R Markdown
R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund
rmarkdown site generator
Tips and tricks for working with images and figures in R Markdown documents
RMarkdown for Scientists
{r, attr.source='.numberLines'}を使う場合には、template:tangoにoutput file を変える。defaultではできない。
knitr Elegant, flexible, and fast dynamic report generation with R
rmarkdown-cookbook
HTML book built with bootstrap4

bs4_book(
  theme = bs4_book_theme(),
  repo = NULL,
  ...,
  lib_dir = "libs",
  pandoc_args = NULL,
  extra_dependencies = NULL,
  template = "default",
  split_bib = FALSE
)

bs4_book_theme(primary = "#0068D9", version = 4, ...)

google sheet

https://googlesheets4.tidyverse.org/

sample code

library(googlesheets4)
library(googledrive)

###   save at google sheet 
System_Time <- format(Sys.time(), "%Y-%m-%d_%H:%M")
Output_Sheet_Name <- paste("GFW_COVID_Project_1_Tables",System_Time, sep = "")

Results_at_GSheet <- gs4_create(Output_Sheet_Name,
  sheets = list(Summary_Vessel_Size = Summary_Vessel_Size,
                Summary_Gear = Summary_Gear,
                Summary_Longline_Vessel_Prefecture = Summary_Longline_Vessel_Prefecture))

# URL
read_sheet("https://docs.google.com/spreadsheets/d/1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY/edit#gid=780868077")

# Sheet ID
read_sheet("1U6Cf_qEOhiR9AZqTqS3mbMF3zt2db48ZP5v3rkrAEJY")

# a googledrive "dribble"
googledrive::drive_get("gapminder") %>% 
  read_sheet()

flexdashboard

https://pkgs.rstudio.com/flexdashboard/

Spark

高速に分散処理を行うオープンソースのフレームワーク
sparklyr
Mastering Spark with R
Sparkのphilosophyは、すべてを double typeにする。そのため、Rにもどす時に、categoricalやlogicalにコンバートが必要。

library(sparklyr)
library(tidyverse)
library(nycflights13) # install.packages(c("nycflights13", "Lahman"))

#spark_install() # これを最初にはしらせて　sparkをインストールが必要
spark_conn_TEST <- spark_connect(master = "local")
# Print the version of Spark
spark_version(sc = spark_conn_TEST)

TEST_Data <- nycflights13::flights
object.size(TEST_Data)

TEST_tbl <- copy_to(spark_conn_TEST, TEST_Data)

src_tbls(spark_conn_TEST)

TEST_tbl_2 <- tbl(spark_conn_TEST, "test_data")

# Disconnect from Spark
spark_disconnect(sc = spark_conn_TEST)

全てのデータは、リモートデータベース(もしくはローカル)になっている。

library(sparklyr)
library(tidyverse)
library(nycflights13)

spark_conn_TEST <- spark_connect(master = "local")
# Print the version of Spark
spark_version(sc = spark_conn_TEST)

#  all sould be lower case
test_data <- nycflights13::flights
test_tbl <- copy_to(spark_conn_TEST, test_data)

results <- test_tbl %>% filter(month == 1)

# data is at the remote database @spark
class(results)

# collect results from spark
collected <- results %>% collect()

class(collected)

# Disconnect from Spark
spark_disconnect(sc = spark_conn_TEST)

Visualization

Benchmarking & Environment

R.version

##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          1.0                         
## year           2021                        
## month          05                          
## day            18                          
## svn rev        80317                       
## language       R                           
## version.string R version 4.1.0 (2021-05-18)
## nickname       Camp Pontanezen

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4    
## [5] readr_2.1.1     tidyr_1.1.4     tibble_3.1.6    ggplot2_3.3.5  
## [9] tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1   xfun_0.28          bslib_0.3.1        haven_2.4.3       
##  [5] colorspace_2.0-2   vctrs_0.3.8        generics_0.1.1     htmltools_0.5.2   
##  [9] yaml_2.2.1         utf8_1.2.2         rlang_0.4.12       jquerylib_0.1.4   
## [13] pillar_1.6.4       withr_2.4.3        glue_1.5.1         DBI_1.1.1         
## [17] dbplyr_2.1.1       modelr_0.1.8       readxl_1.3.1       lifecycle_1.0.1   
## [21] munsell_0.5.0      gtable_0.3.0       cellranger_1.1.0   rvest_1.0.2       
## [25] evaluate_0.14      knitr_1.36         tzdb_0.2.0         fastmap_1.1.0     
## [29] fansi_0.5.0        broom_0.7.10       Rcpp_1.0.7         backports_1.4.0   
## [33] scales_1.1.1       jsonlite_1.7.2     fs_1.5.1           hms_1.1.1         
## [37] digest_0.6.28      stringi_1.7.6      grid_4.1.0         cli_3.1.0         
## [41] tools_4.1.0        magrittr_2.0.1     sass_0.4.0         encryptedRmd_0.2.1
## [45] crayon_1.4.2       pkgconfig_2.0.3    ellipsis_0.3.2     xml2_1.3.3        
## [49] reprex_2.0.1       lubridate_1.8.0    rstudioapi_0.13    assertthat_0.2.1  
## [53] rmarkdown_2.11     httr_1.4.2         R6_2.5.1           compiler_4.1.0

library("microbenchmark")

colon <- function (n) 1:n

seq_default <- function(n) seq(1:n)

seq_by <- function(n) seq(1, n, by = 1)

system.time(res <- colon(1e8))

n <- 1e8

microbenchmark(colon(n),
               seq_default(n),
               seq_by(n),
               times =10) # run 10 times

library(benchmarkme)
get_ram()
get_cpu()

res <- benchmark_io(runs = 1, size =5)
plot(res)

code profiling

library("ggplot2movies")
library("profvis")


data(movies, package = "ggplot2movies")
dim(movies)

Sleepless_in_Seattle <- movies[47412,]
movies_drama <- movies[movies$Drama ==1, ]

movies_drama %>% ggplot(aes(x = year, y = rating)) +
  geom_point(alpha =0.05) +
  xlab("Year") + ylab("Rating") +
  geom_point(aes(Sleepless_in_Seattle$year,
             Sleepless_in_Seattle$rating),
             colour = "#E16B8C") +
  geom_smooth(method = "loess", colour = "#2EA9DF")

###

profvis::profvis({
data(movies, package = "ggplot2movies")
dim(movies)

Sleepless_in_Seattle <- movies[47412,]
movies_drama <- movies[movies$Drama ==1, ]

movies_drama %>% ggplot(aes(x = year, y = rating)) +
  geom_point(alpha =0.05) +
  xlab("Year") + ylab("Rating") +
  geom_point(aes(Sleepless_in_Seattle$year,
             Sleepless_in_Seattle$rating),
             colour = "#E16B8C") +
  geom_smooth(method = "loess", colour = "#2EA9DF")
})
             
#######

Database

API

API Application Programming Interfaces
R API Tutorial: Getting Started with APIs in R
基本のGETとPOST リクエスト

#install.packages(c("httr", "jsonlite"))
library(httr)
library(jsonlite)

library(pageviews)

海知る

海知るAPI

library(httr)
umishiru_Result_1 <- GET(url="https://api.msil.go.jp/apis/v1/", query=list(api_key = gaku_key))
str(umishiru_Result_1 )

## List of 10
##  $ url        : chr "https://api.msil.go.jp/apis/v1/?api_key=b3587ccb427b4ae6bdf8a5fd518a2e5b"
##  $ status_code: int 401
##  $ headers    :List of 4
##   ..$ content-length  : chr "152"
##   ..$ content-type    : chr "application/json"
##   ..$ www-authenticate: chr "AzureApiManagementKey realm=\"https://api.msil.go.jp/apis\",name=\"Ocp-Apim-Subscription-Key\",type=\"header\""
##   ..$ date            : chr "Thu, 02 Dec 2021 22:38:15 GMT"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ all_headers:List of 1
##   ..$ :List of 3
##   .. ..$ status : int 401
##   .. ..$ version: chr "HTTP/1.1"
##   .. ..$ headers:List of 4
##   .. .. ..$ content-length  : chr "152"
##   .. .. ..$ content-type    : chr "application/json"
##   .. .. ..$ www-authenticate: chr "AzureApiManagementKey realm=\"https://api.msil.go.jp/apis\",name=\"Ocp-Apim-Subscription-Key\",type=\"header\""
##   .. .. ..$ date            : chr "Thu, 02 Dec 2021 22:38:15 GMT"
##   .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ cookies    :'data.frame': 0 obs. of  7 variables:
##   ..$ domain    : logi(0) 
##   ..$ flag      : logi(0) 
##   ..$ path      : logi(0) 
##   ..$ secure    : logi(0) 
##   ..$ expiration: 'POSIXct' num(0) 
##   ..$ name      : logi(0) 
##   ..$ value     : logi(0) 
##  $ content    : raw [1:152] 7b 20 22 73 ...
##  $ date       : POSIXct[1:1], format: "2021-12-02 22:38:15"
##  $ times      : Named num [1:6] 0 0.0624 0.0819 0.1363 0.1541 ...
##   ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##  $ request    :List of 7
##   ..$ method    : chr "GET"
##   ..$ url       : chr "https://api.msil.go.jp/apis/v1/?api_key=b3587ccb427b4ae6bdf8a5fd518a2e5b"
##   ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
##   .. ..- attr(*, "names")= chr "Accept"
##   ..$ fields    : NULL
##   ..$ options   :List of 2
##   .. ..$ useragent: chr "libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2"
##   .. ..$ httpget  : logi TRUE
##   ..$ auth_token: NULL
##   ..$ output    : list()
##   .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
##   ..- attr(*, "class")= chr "request"
##  $ handle     :Class 'curl_handle' <externalptr> 
##  - attr(*, "class")= chr "response"

estat access

estatapi - 政府統計の総合窓口（e-Stat）のAPIを使うためのRパッケージ

# install.packages("estatapi")
library(estatapi)

## このサービスは、政府統計総合窓口(e-Stat)のAPI機能を使用していますが、サービスの内容は国によって保証されたものではありません。

estat_Result_1 <-estatapi::estat_getStatsList(appId = appId_estat_gakuLab, searchWord = "魚種" )

head(estat_Result_1)

## # A tibble: 6 × 25
##   `@id`      STAT_NAME GOV_ORG STATISTICS_NAME TITLE CYCLE SURVEY_DATE OPEN_DATE
##   <chr>      <chr>     <chr>   <chr>           <chr> <chr> <chr>       <chr>    
## 1 0003228937 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## 2 0003228938 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## 3 0003228939 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## 4 0003228940 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## 5 0003228941 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## 6 0003228942 海面漁業… 農林水… 海面漁業生産統… 海面… 年次  200301-200… 2019-01-…
## # … with 17 more variables: SMALL_AREA <chr>, COLLECT_AREA <chr>,
## #   MAIN_CATEGORY <chr>, SUB_CATEGORY <chr>, OVERALL_TOTAL_NUMBER <chr>,
## #   UPDATED_DATE <chr>, TABULATION_CATEGORY <chr>,
## #   TABULATION_SUB_CATEGORY1 <chr>, TABULATION_SUB_CATEGORY2 <chr>,
## #   DESCRIPTION <chr>, TABLE_CATEGORY <chr>, TABLE_NAME <chr>,
## #   TABLE_EXPLANATION <chr>, TABLE_SUB_CATEGORY1 <chr>,
## #   TABLE_SUB_CATEGORY2 <chr>, TABLE_SUB_CATEGORY3 <chr>, …

Latex

Aligning_equations

Parallel computing

library(parallel)
library("microbenchmark")

detectCores()
ncores <- detectCores(logical = FALSE)
n <- ncores:1

lapply(n, rnorm, mean = 10, sd =2)

cl <- makeCluster(ncores)
clusterApply(cl, n, fun = rnorm, mean =10, sd =2 )

Socket VS Folk

Analysis

探索的データ分析

General

ギリシャ文字表

文字	名称	文字	名称
\(\alpha\)	アルファ	\(\nu\)	ニュー
\(\beta\)	ベータ	\(\xi\)	クシー
\(\gamma\)	ガンマ	\(\omicron\)	オミクロン
\(\delta\)	デルタ	\(\pi\)	パイ
\(\epsilon\)	イプシロン	\(\rho\)	ロー
\(\zeta\)	ゼータ	\(\sigma\)	シグマ
\(\eta\)	エータ（イータ）	\(\tau\)	タウ
\(\theta\)	テータ（シータ）	\(\upsilon\)	ユプシロン
\(\iota\)	イオタ	\(\phi\)	フィー
\(\kappa\)	カッパ	\(\chi\)	キー
\(\lambda\)	ラムダ	\(\psi\)	プシー
\(\mu\)	ミュー	\(\omega\)	オメガ

ギリシャ文字表(大文字)

文字	名称	文字	名称
\(A\)	アルファ	\(N\)	ニュー
\(B\)	ベータ	\(\Xi\)	クシー
\(\Gamma\)	ガンマ	\(O\)	オミクロン
\(\Delta\)	デルタ	\(\Pi\)	パイ
\(E\)	イプシロン	\(P\)	ロー
\(Z\)	ゼータ	\(\Sigma\)	シグマ
\(H\)	エータ（イータ）	\(T\)	タウ
\(\Theta\)	テータ（シータ）	\(\Upsilon\)	ユプシロン
\(I\)	イオタ	\(\Phi\)	フィー又は　ファイ
\(K\)	カッパ	\(X\)	キー又は　カイ
\(\Lambda\)	ラムダ	\(\Psi\)	プシー
\(M\)	ミュー	\(\Omega\)	オメガ

ギリシャ文字表(小文字 var)

文字	名称
\(\varepsilon\)	イプシロン
\(\vartheta\)	テータ（シータ）
\(\varrho\)	ロー
\(\varsigma\)	シグマ
\(\varphi\)	フィー　又は　ファイ

記号の読み方

記号	意味	読み方日本語	読み方　英語
\(x \in A\)	ｘはAに属する		x in A / x belonging to A

標準正規分布表

2 \(~~\)Project

Overview

Note

Log

3 \(~~\)Environment

3.1 Clear all variables

# 現在の環境にある変数の消去
rm(list = ls("all.names" = TRUE))

3.2 Load library

Packages General

library(tidyverse)

library(data.table)

library(DT)
# https://rstudio.github.io/DT/005-bootstrap.html
# https://kazutan.github.io/kazutanR/DT_demo.html

Packages Peoject

library(ggformula)  # http://www.mosaic-web.org/ggformula/

library(latex2exp)

library(ggsci)

library(googlesheets4)

library(ggpubr)

# https://r-coder.com/economics-charts-r/
#library(econocharts)

#library(reconPlots)

Packages Experimental

4 \(~\)Data

Data List

4.1 Data load (import)

4.2 Explonatory data analysis

Grammer of Grapghics

サンプルとしてテンプレートに入れる。

GG concepts by Leland Wikinson

Leland Wilkinson, H2O.ai - The Grammar of Graphics and the Future of Big Data Visualization
The Grammar of Graphics Rule
A statistical graphic is a representation of the graph of a function
The graph of a function is a subset of the product of its domain(定義域) and codomain(値域).
The rest is annotation(注釈).

\(~\) Graph \[G = \{(x, f(x)) : x \in \mbox{R} \ and \ f(x) = e^{(-x^2)}　\}\]

日本語:　　graph グラフ　関数を特徴づける集合
実数\(R\)である\(x\)と、\(x\)の関数である\(f(x)\)
日本語:　　graph グラフ　関数を特徴づける集合
実数\(R\)である\(x\)と、\(x\)の関数である\(f(x)\)

\(~\) Frame \[ F = [-3,3] \times [0,1]\]

日本語:　frame フレーム定義域と地域を表示する範囲
定義域　−3から

\(~\) Aesthetic \[ A : x \rightarrow x_{position}, \ f(x) \rightarrow y_{position} \]
\(~\) Graphics \[G_A =A(F \cap G)\]

Data_1 <- tibble(x = seq(-3,3, by=.001), y = exp(-x^2))

Data_1 %>% ggplot(aes(x,y)) +
  geom_line() +
  labs(y = TeX('$y = e^{-x^2}$'), x = "x") +
  theme_bw(base_family = "HiraKakuProN-W3")

gaku Template

gaku Ishimura

2021-12-03