Press →, scroll, or swipe to advance
Made with Keydown
Department of Statistics, Iowa State University
Econ workshops on R
Nov 8, 2011
mean()
, var()
, lm()
, glm()
, rnorm()
, boxplot()
, ...)obj.method()
, but often method(obj)
)?
to read documentatione.g. try ?lm
for help on linear models
help.start()
for a view on the whole package## use install.packages(), e.g. install.packages('animation')
You will be asked to choose a mirror.
Too many packages?? Use the task view: http://cran.r-project.org/web/views/ (e.g. Econometrics)
update.packages()
<-
or =
(experts recommend the former but I do not believe them)?Arithmetic
(e.g. x+y
, x %% y
)[]
x <- 10 y <- 15:3 1 + 2 11 %% 2 11 %/% 2 3^4 log(10) y[4] y[-1] # what do negative indices mean?
function_name(arguments)
z = rnorm(100) fivenum # what are the arguments of this function? fivenum(z) fivenum(x = z, na.rm = TRUE) fivenum(z, TRUE)
## the if-else statement if (TRUE) { print(1) } else { print(2) } ## a for-loop (the other type is while-loop) s = 0; x = c(4, 2, 6) for (i in 1:3) { s = s + x[i] } s ## the above loop is the most stupid thing to do in R
read.table()
, read.csv()
, ...## the tips data tips = read.csv('http://dicook.public.iastate.edu/Army/tips.csv') str(tips) # structure of an object; one of the most useful function in R summary(tips) mean(tips$bill) # index by $name var(tips[, 'tip']) # index by character name table(tips[, 4]) # index by column number hist(tips$bill) boxplot(tip~sex, data=tips) plot(tip~bill, data=tips, col=as.integer(tips$sex))
A family of functions for distributions: dfun()
, pfun()
, qfun()
and rfun()
For example, rnorm()
library(animation) demo('fire', ask = FALSE) # an application of image()
Q: How many times do we need to flip the coin until we get a sequence of HTH and HTT respectively? (For example, for the sequence HHTH, the number for HTH to appear is 4, and in THTHTT, the number for HTT is 6.)
A: https://yihui.org/en/2009/08/counterintuitive-results-in-flipping-coins/
Take the tips data for example
fit1 = lm(tip ~ bill, data = tips) summary(fit1)
You get
Call: lm(formula = tip ~ bill, data = tips) Residuals: Min 1Q Median 3Q Max -3.1982 -0.5652 -0.0974 0.4863 3.7434 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.920270 0.159735 5.761 2.53e-08 *** bill 0.105025 0.007365 14.260 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.022 on 242 degrees of freedom Multiple R-squared: 0.4566, Adjusted R-squared: 0.4544 F-statistic: 203.4 on 1 and 242 DF, p-value: < 2.2e-16
The formula is an important component of many R functions for modelling.
fit2 = lm(tip ~ bill + sex, data = tips) # two variables fit3 = lm(tip ~ bill + 0, data = tips) # without intercept ## you try summary() on them
Before we get started, let's see if you can install the gWidgetsRGtk2
package:
install.packages('gWidgetsRGtk2') library(gWidgetsRGtk2) ## follow instructions in case of errors
graphics
packagegrid
packageThere are many add-on packages based on the two systems; see the Graphics task view on CRAN for an overview: http://cran.r-project.org/web/views/Graphics.html
lattice
: Trellis plots
library(lattice)
)ggplot2
: Grammar of Graphics
install.packages('ggplot2')
Last week we mentioned the tips data.
tips = read.csv('http://dicook.public.iastate.edu/Army/tips.csv') str(tips) ## scatter plot: positive correlation with a 'constraint' plot(tip ~ bill, data = tips) ## what is the problem with R's default choice of point shapes? ## plot() is a very 'tricky' function in R; details later hist(tips$tip, main = 'histogram of tips') ## you see nothing except a right-skewed distribution hist(tips$tip, breaks = 30) # more bins hist(tips$tip, breaks = 100) # what do you see now?
20-30 years ago, the research on choosing the histogram binwidth was extremely hot in statistics, but... who cares?
We can change the binwidth interactively in R via many tools; one possibility is to build a GUI
## I prefer gWidgetsRGtk2, but I do not know if you can install it ## gWidgetstcltk is easier to install if (!require('gWidgetstcltk')) install.packages('gWidgetstcltk') library(gWidgetstcltk) # or library(gWidgetsRGtk2) options(guiToolkit = 'tcltk') # or options(guiToolkit = 'RGtk2') x = tips$tip gslider(from = 1, to = 100, by = 1, container = gwindow('Change the number of bins'), handler = function(h, ...) { hist(x, breaks = seq(min(x), max(x), length = svalue(h$obj))) }) ## many many other GUI elements to use
There are many color models in R, like rgb()
, hsv()
, ... And there are built-in color names, e.g. 'red'
, 'purple'
. Here is an example of rgb()
rgb(1, 0, 0) # red (hexidecimal representation) rgb(1, 1, 0) # yellow ## an interactive example g = ggroup(horizontal = FALSE, container = gwindow('Color Mixer')) x = c(0,0,0) # red, green, blue for(i in 1:3) { gslider(from = 0, to = 1, by = 0.05, action = i, handler = function(h, ...) { x[h$action] <<- svalue(h$obj) par(bg = rgb(x[1], x[2], x[3]), mar = rep(0, 4)) plot.new() }, container = g) } colors() # all names you can use plot(rnorm(30), pch = 19, col = sample(colors(), 30), cex = 2)
Old-fashioned but many goodies...
help.start()
and take a look at the graphics
packagecontour()
, filled.contour()
, fourfoldplot()
, mosaicplot()
, pairs()
, smoothScatter()
, stripchart()
, sunflowerplot()
, symbols()
Graphics are not as easy as you might have imagined
rgl
package)Only the source code is real.
x = seq(.1, 10, .1) plot(x, 1/x, type = 'l', lwd = 2) lines(x, 1/x + 2, col = 'red', lwd = 2)
sex
variablesmoker
variablesize
variable)We still use the tips
data here.
library(ggplot2) ## different colors denote the sex variable qplot(bill, tip, data = tips, color = sex) ## point symbols qplot(bill, tip, data = tips, shape = smoker) ## you can manipulate ggplot2 objects p = qplot(bill, tip, data = tips, color = size) p ## do not like the color scheme? change it p + scale_colour_gradient2(low="white", high="blue") ## faceting qplot(tip, data = tips, facets = time ~ day) p + geom_smooth() # smoothing line
Unlike most of R packages, ggplot2 has its own website of documentation, which is a rich source of examples.
As mentioned before, there are many other packages based on the two graphics systems.
animation
: a gallery of statistical animations and tools to export animationsrgl
: interactive 3D plotsiplots
: interactive statistical graphicsrggobi
: connect R with GGobi (a standalone software package for interactive stat graphics)## be prepared install.packages(c('animation', 'rgl', 'iplots'))
The idea is simple:
## rotate the word 'Animation' for (i in 1:360) { plot(1, ann = FALSE, type = "n", axes = FALSE) text(1, 1, "Animation", srt = i, col = rainbow(360)[i], cex = 7 * i/360) Sys.sleep(0.01) }
library(animation) ?brownian.motion ?quincunx ?grad.desc ## export to an HTML page ?saveHTML ## want more hilarious examples? try demo('busybees') demo('CLEvsLAL')
Play with statistical graphics.
## use your mouse (drag or wheel up/down) library(rgl) demo('rgl') ## an artificial dataset library(animation) demo('pollen') ## linked plots library(iplots) ibar(tips$sex) ihist(tips$tip)
parse()
and deparse()
1+1
and I return you 1 + 1
(what the heck is the difference?)well, compare
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
to
for (k in 1:10) { j = cos(sin(k) * k^2) + 3 print(j - 5) }
The function debug
can be used to debug a function.
f = function(x) { m = length(x) x[is.na(x)] = mean(x, na.rm = TRUE) # impute by mean s = sum(x^2) # sum of squares s } f(c(1, NA, 2)) ## begin to debug the function now debug(f) f(c(1, NA, 2)) undebug(f)