組合投資研究-之-r軟件語言_第1頁
組合投資研究-之-r軟件語言_第2頁
組合投資研究-之-r軟件語言_第3頁
組合投資研究-之-r軟件語言_第4頁
組合投資研究-之-r軟件語言_第5頁
免費預(yù)覽已結(jié)束,剩余179頁可下載查看

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

統(tǒng)計軟件和R語言裝了R沒有?R免費R

資源公開(不是黑盒子,也不是

鬼)R可以在UNIX,Windows和Macintosh運行.R

有優(yōu)秀的內(nèi)在幫助系統(tǒng).R有優(yōu)秀的畫圖功能學(xué)生能夠輕松地轉(zhuǎn)到商業(yè)支持的

S-Plus程序(如果需要使用商業(yè)

)R語言有一個強大的,容易學(xué)習(xí)的語法,有許多內(nèi)在的統(tǒng)計函數(shù).通過用戶自編程序,R語言很容易延伸和擴大.它就是這樣成長的.R是計算機編程語言.類似于UNIX語言,C語言,Pascal,Gauss語言等.對于熟練的編程者,它將覺得該語言比其他語言更熟悉.而對計算機初學(xué)者,學(xué)習(xí)R語言使得學(xué)習(xí)下一步的其他編程不那么

.那些傻瓜

(SAS,SPSS等)語言的語法則完全不同.R的缺點沒有商業(yè)支持(但有網(wǎng)上支持)需要編程,不傻瓜.速度不如C++或FORTRANR的歷史S語言在1980年代后期在AT&T開發(fā).R項目由Auckland大學(xué)統(tǒng)計系的Robert

Gentleman和RossIhaka于1995年開始的.它很快得到廣泛用戶的歡迎.目發(fā)展團隊維持;它組成的工作努力前它是由R是一個由的國際團隊點擊CRAN得到一批鏡像R(

)點擊鏡像比如Berkeley選擇這個,安裝文件選擇這個,包選擇basePackages

(每個都有大量數(shù)據(jù)和可以讀寫修改的函數(shù)/程序)base

The

R

Base

Packageboot

Bootstrap

R

(S-Plus)

Functions

(Canty)class

Functions

for

Classificationcluster

Cluster ysis

Extended

Rousseeuw

et

al.concord

Concordance

and

reliabilitydatasets

The

R

Datasets

PackageexactRankTests

ExactDistributions

for

Rank

and

PermutationTestsforeign

Read

Data

Stored

by

Minitab,

S,

SAS,

SPSS,

Stata,

Systat,

dBase,

...graphics

The

R

Graphics

PackagegrDevices

The

R

Graphics

Devices

and

Support

for

Colours

and

Fontsgrid

The

Grid

Graphics

PackageKernSmooth

Functions

for

kernel

smoothing

for

Wand

&

Jones

(1995)lattice

Lattice

Graphics

Interfacetools

Tools

for

Package

Developmentutils

The

R

Utils

PackagePackages

(繼續(xù))ysisMASSMain

Package

of

Venables

andRipley's

MASSmethodsFormal

Methods

and

ClassesmgcvGAMswith

GCV

smoothness

estimation

and

GAMMs

by

REML/PQLmulttestResampling-based

multiple

hypothesistestingnlmeLinear

and

nonlinearmixed

effects

modelsnnetFeed-forward

Neural

Networks

and

Multinomial

Log-Linear

ModelsnortestTests

for

NormalityoutliersTests

for

outliersplsPartial

Least

Squares

Regression

(PLSR)

and

Principal

Component

Regression

(PCR)pls.pcrPLS

and

PCR

functionsrpartRecursive

PartitioningSAGxStatistical ysisof

t

eChipsmaStatisticalMicroarray

ysisspatialFunctions

for

Kriging

and

Point

PatternsplinesRegression

Spline

Functions

and

ClassesstatsThe

R

Stats

Packagestats4Statistical

Functions

using

S4

ClassessurvivalSurvival ysis,

including

penalisedlikelihood.tcltkTcl/Tk

InterfacetoolsTools

for

Package

DevelopmentutilsTheR

Utils

PackagePackages

(網(wǎng)上)網(wǎng)上還有許多所有這些Packages都是在base

和stats

package上添加的Base,stats包含所有固有的應(yīng)用和數(shù)據(jù)而其他的packages包含各統(tǒng)計學(xué)家自己發(fā)展的方法和數(shù)據(jù)。希望你是下一個加盟這些packages的作者之一。賦值和運算z=

rnorm(1000000,4,0.1)median(z)賦值:“=”可以用“<-”代替x<-z->y->w簡單數(shù)

算有:+,-,*,/,

^,%*%,%%(mod)%/%(整數(shù)除法)等等常用的數(shù)學(xué)函數(shù)有:abs,sign,log,log2,log10,logb,expm1,

log1p(x),sqrt,exp,sin,cos,

tan,acos,asin,atan,cosh,sinh,tanh還有round, floor,

ceilinggamma

,

lgamma,

digamma

and

trigamma.sum,

prod,

cumsum,

cumprodmax,

min,

cummax,

cummin,

pmax,

pmin,rangemean,

length,

var,

duplicated,

uniqueunion,

intersect,

setdiff>,

>=,

<,

<=,&,

|,

!還有輸入輸出數(shù)據(jù):scan,read.table,

dump,save,load,

write,write.table,read.csvletters,

LETTERSlist,

matrix,

array,

cbind,

rbind,

mergesort,

order,

sort.list,

rev,

stack,

unstack

,reshape序列和向量z=seq(-1,10,length=100)z=seq(-1,10,

len=100)z=seq(10,-1,-1)z=10:-1x=rep(1:3,3)x=rep(3:5,1:3)>

x[1]

3

4

4

55

5x=rep(c(1,10),c(4,5))w=c(1,3,x,z);w[3]分布和產(chǎn)生隨機數(shù)正態(tài)分布:pnorm(1.2,2,1);dnorm(1.2,2,1);qnorm(.7,2,1);rnorm(10,0,1)#rnorm(10)t分布:pt(1.2,1);dt(1.2,2);qt(.7,1);rt(10,1)此外還有指數(shù)分布、F分布、“卡方”分布

Beta

分布、二項分布、Cauchy

分布、

Gamma分布、幾何分布、超幾何分布、對數(shù)正態(tài)分布、Logistic分布、負二項分布、

Poisson

分布、均勻分布、Weibull分布、

Willcoxon分布等變元可以是向量!向量運算x=rep(0,10);z=1:3;x+z[1]

1

2

3

1

2

3

1

2

3

1Warning

message:

longerobject

length

is

not

a

multiple

of

shorter

object

length

in:

x

+

zx*z[1]

0

0

0

0

0

0

0

0

0

0Warning

message:

longerobject

length

is

not

a

multiple

of

shorter

object

length

in:

x

*zrev(x)z=c("no

cat","has

","nine","tails")z[1]=="no

cat"[1]

TRUE向量名字和appendx=1:3;names(x)=LETTERS[1:3]xA

B

C12

3append(x,runif(3),after=2)A

B

C1.0000000

2.0000000

0.3107987

0.7505149

0.5752226

3.0000000向量賦值z=1:5z[7]=8;z[1]

1

2

3

4 5

NA

8z=NULLz[c(1,3,5)]=1:3;z[1] 1

NA 2

NA

3rnorm(10)[c(2,5)]z[-c(1,3)]#去掉第1、3元素.z[(length(z)-4):length(z)]#最后五個元素.向量的大小次序z=sample(1:100,10);z#比較sample(1:100,10,rep=T)[1]

75

68

28

42

17

21

96

34

69

47order(z)[1]

5

6

3

8 4

10

2

9

1

7z[order(z)][1]

17

21

28

34

42

47

68

69

75

96sort(z)[1]

17

21

28

34

42

47

68

69

75

96which(z==max(z))#給出下標Matrix[2,]

2

610

14

18[3,]

3

711

15

19[4,]

4

812

16

20x=matrix(runif(20),4,5)>

x[,1]

[,2]

[,3]

[,4]

[,5][1,]

0.7983678

0.04607601

0.04555323

0.8594483

0.73089500[2,]

0.6559851

0.79562222

0.02948270

0.1453364

0.79552838[3,]

0.6759171

0.56193147

0.48286653

0.2419931

0.56069988[4,]

0.1183701

0.80652627

0.49405167

0.6523137

0.08345406>

x=matrix(1:20,4,5);x[,1]

[,2]

[,3]

[,4][,5][1,]

1

5

9

13

17>

x=matrix(1:20,4,5,byrow=T);x[,1]

[,2]

[,3]

[,4]

[,5][1,]

1

2

3

4

56

7

8

9

10[2,][3,][4,]11

12

13

14

1516

17

18

19

20一些簡單函數(shù)max,min,length,mean,median,fivenum,

le,unique,sd,var,range,rep,diff,sort,order,sum,cumsum,prod,cumprod,rev,print,sample,seq,exp,pi矩陣的行和列(子集)nrow(x);

ncol(x);dim(x)#行列數(shù)目x=matrix(rnorm(24),4,6)x[c(2,1),]#第2和第1行x[,c(1,3)]#第1和第3列x[2,1]

#第[2,1]元素x[x[,1]>0,1]

#第1列大于0的元素sum(x[,1]>0)#第1列大于0的元素的個數(shù)sum(x[,1]<=0)#第1列不大于0的元素的個數(shù)x[,-c(1,3)]#沒有第1、3列的x.x[-2,-c(1,3)]

#沒有第2行、第1、3列的x.矩陣/向量的(子集)x[x[,1]>0&x[,3]<=1,1];

#第1中大于0并且相應(yīng)于第3列中小于或等于1的元素(“與”)x[x[,2]>0|x[,1]<.51,1]

#第1中小于.51或者相應(yīng)于第2列中大于0的元素(“或”)x[!x[,2]<.51,1]#第一列中相應(yīng)于第2列中不小于.51的元素(“非”)邏輯運算:>,<,==,<=,>=,!=;&,|,!x=rnorm(10)all(x>0);all(x!=0);any(x>0);(1:10)[x>0]x=sample(1:7,5,rep=T);unique(x)矩陣的轉(zhuǎn)置和逆矩陣x=matrix(runif(9),3,3);x[,1]

[,2]

[,3][1,]

0.6747652

0.9954731

0.7524502[2,]

0.3090199

0.2390141

0.2472961[3,]

0.5102675

0.9515505

0.6082803t(x)[,1]

[,2]

[,3][1,]

0.6747652

0.3090199

0.5102675[2,]

0.9954731

0.2390141

0.9515505[3,]

0.7524502

0.2472961

0.6082803solve(x)

# solve(a,b)可以解ax=b方程[,1]

[,2]

[,3][1,]

-12.313293

15.125819

9.082300[2,]

-8.459725

3.627898

8.989864[3,]

23.563034

-18.363808

-20.037986警告:計算機中的0是什么?x%*%solve(x)[,1]

[,2]

[,3][1,]

1.000000e+00

-9.454243e-17

-3.911801e-16[2,]

5.494737e-16

1.000000e+00

3.248270e-16[3,]

-3.018419e-16

1.804980e-15

1.000000e+00要用線性代數(shù)的知識來判斷諸多少非零特征根等問題.假定v是特征值組成的向量,不能用諸如

sum(v!=0)等方法來判斷非零特征根的數(shù)目!Matrix

&

Arrayx=array(runif(20),c(4,5));

x[,1]

[,2]

[,3]

[,4]

[,5][1,]

0.5474306

0.2362356

0.687007107

0.4036998

0.5255839[2,]

0.8234363

0.4922711

0.960554564

0.4704976

0.1327870[3,]

0.1861151

0.8461655

0.390523424

0.2202575

0.4057607[4,]

0.8117521

0.5375946

0.004505845

0.4821567

0.7644741is.matrix(x)[1]TRUEx[1,2]x[1,]x[,2]dim(x)#得到維數(shù)(4,5)Arrayx=array(runif(24),c(4,3,2))is.matrix(x)

#可由dim(x)得到維數(shù)(4,3,2)[1]

FALSEx,,

1[,1]

[,2]

[,3][1,]

0.3512615

0.7270611

0.009055522[2,]

0.1444965

0.2527673

0.697977027[3,]

0.6658176

0.6638542

0.773747542[4,]

0.4258436

0.4168940

0.634235148,

,

2[,1]

[,2]

[,3][1,]

0.3664152

0.9633497

0.5628006[2,]

0.3466645

0.5036830

0.1542986[3,]

0.4552553

0.1289775

0.8423017[4,]

0.1074899

0.3841463

0.7648297Array的子集>x=array(1:24,c(4,3,2))x[c(1,3),,],

,

1[,1]

[,2]

[,3][1,]159[2,]3711,,

2[,1]

[,2]

[,3][1,]

13

17

21[2,]

15

19

23矩陣乘法及行列運算[3,]

-1.750057-0.02764783

1.694761

3.4171705.1395786.861987[4,]

5.8624129.78064218

13.698872

17.61710321.53533325.453563x=matrix(1:30,5,6);y=matrix(rnorm(20),4,5)y%*%x[,1]

[,2]

[,3]

[,4]

[,5]

[,6][1,]

-3.231808

-8.13791204

-13.044017

-17.950121

-22.856225

-27.762330[2,]

-14.072030

-39.33640851

-64.600787

-89.865165

-115.129543

-140.393921apply(x,1,mean)[1]

13.5

14.5

15.5

16.5

17.5apply(x,2,sum)[1]

140apply(x,2,prod)[1]

120

30240 360360

1860480

6375600

17100720Array的維運算x=array(1:24,c(4,3,2))apply(x,1,mean)[1]

11

12

13

14apply(x,1:2,sum)[,1]

[,2]

[,3][1,]

14

22

30[2,]

16

24

32[3,]

18

26

34[4,]

20

28

36apply(x,c(1,3),prod)[,1][,2][1,] 454641[2,]120

5544[3,]2316555[4,]384

7680矩陣與向量之間的運算sweep(x,1,1:5,"*")[,1]

[,2]

[,3][,4]

[,5][,6][1,]16111621

26[2,]414243444

54[3,]924395469

84[4,]1636567696

116[5,]255075100125

150x*1:5sweep(x,2,1:6,"+")[,1]

[,2]

[,3]

[,4]

[,5]

[,6][1,]2814202632[2,]3915212733[3,]41016222834[4,]51117232935[5,]61218243036scale(x):Standardizes

the

elementsof

x,

i.e.subtracts

the

mean

and

divides

by

thestandard

deviation.

Results

in

a

vector

withzero

mean

and

unit

standard

deviation.Also

works

with

data

frames

(column-wiseand

only

with

numeric

data!).update.packages()complete.casesalgae

<-na.omit(algae).RDatasave.image()存的函數(shù)例子se

<-

function(x)

{v

<-

var(x)n

<-

length(x)return(sqrt(v/n))}函數(shù)例子basic.stats

<-

function(x,more=F)

{stats

<-

list()clean.x

<-

x[!is.na(x)]stats$n

<-

length(x)stats$nNAs

<-

stats$n-length(clean.x)stats$mean

<-

mean(clean.x)

stats$std

<-

sd(clean.x)stats$med

<-median(clean.x)if

(more){stats$skew<-

sum(((clean.x-stats$mean)/stats$std)^3)/length(clean.x)stats$kurt

<-

sum(((clean.x-stats$mean)/stats$std)^4)/length(clean.x)

-3}stats}basic.stats(c(45,2,4,46,43,65,NA,6,-213,-3,-45))Array和矩陣/向量/array之間的運算z=array(1:24,c(2,3,4))#注意排列次序z,

,

1[,1]

[,2][,3][1,]

1

3

5[2,]

2

4

6,

,

2[,1]

[,2][,3][1,]

7

9

11[2,]

8

10

12,

,

3[,1]

[,2][,3][1,]

13

15

17[2,]

14

16

18,

,

4[,1]

[,2][,3][1,]

19

21

23[2,]

20

22

24Array和矩陣/向量/array之間的運算sweep(z,1,1:2,"-"),

,

1[,1]

[,2]

[,3][1,]

0

2

4[2,]

0

2

4,

,

2[,1]

[,2]

[,3][1,]

6

8

10[2,]

6

8

10,

,

3[,1]

[,2]

[,3][1,]

12

14

16[2,]

12

14

16,

,

4[,1]

[,2]

[,3][1,]

18

20

22[2,]

18

20

22Array和矩陣/向量/array之間的運算sweep(z,c(1,2),matrix(1:6,2,3),"-"),

,

1[,1]

[,2]

[,3][1,]

0

0

0[2,]

0

0

0,

,

2[,1]

[,2]

[,3][1,]

6

6

6[2,]

6

6

6,

,

3[,1]

[,2]

[,3][1,]

12

12

12[2,]

12

12

12,

,

4[,1]

[,2]

[,3][1,]

18

18

18[2,]

18

18

18外積(產(chǎn)生矩陣或array)[1,]11[2,]22outer(1:2,rep(1,2))[,1]

[,2]outer(1:2,matrix(rep(1,6),3,2)),

,

1[,1]

[,2]

[,3][1,]111[2,]222,,

2[,1]

[,2]

[,3][1,]111[2,]222List(set

of

objects)list可以是任何對象的集合(包括lists)z=list(1:3,Tom=c(1:2,

a=list("R",letters[1:5]),w="hi!"))z[[1]];z[[2]];z$T;z$T$a2;z$T[[3]];z$T$wattributes(z)#屬性!$names""

"Tom"attributes(matrix(1:6,2,3))$dim[1]

23矩陣,array及其維名字x=matrix(1:12,nrow=3,dimnames=list(c("I","II","III"),paste("X",1:4,sep="")))X1

X2

X3

X4I

1

4 7

10II

2

5 8

11III

3

6 9

12y=array(1:12,c(3,2,2),dimnames=list(c("I","II","III"),paste("X",1:2sep=""),paste("Y",1:2,

sep=""))),

,

Y1X1

X21

42

53

6,

,

Y2X1

X2I 7

108

119

12data.framex=matrix(1:6,2,3)x=as.data.frame(x);xV1

V2

V31

1 3

52

2 4

6x$V2[1]

3

4x$V2[1]

3

4attributes(x)$names[1]

"V1"

"V2"

"V3"$s[1]

"1"

"2"$class[1]

"data.frame"data.frame20011352002246names(x)=c("TOYOTA","GM","HUNDA")s(x)=c("2001","2002")xTOYOTA

GM

HUNDAx$GM[1]

3

4data.frameattach(x)GM[1]

3

4detach(x)GMError:

Object

"GM"

not

found直接手工輸入和編輯數(shù)據(jù)直接敲入:x=c(1,2,7,8,…)或者x=scan()1

2

7

8….(以“Enter”兩次來結(jié)束)fix(x)(通過編輯修改數(shù)據(jù))Categorical

dataA

survey

asks

people

if

they

smoke

or

not.

Thedata

is

Yes,

No,

No,

Yes,

Yesx=c("Yes","No","No","Yes","Yes")table(x);xfactor(x)Barplot:Suppose,

a

group

of

25

people

are

surveyed

as

to

their

beer-drinking

preference.

The

categories

were

(1)

Domestic

can,(2)Domesticbottle,

(3)

Microbrew

and

(4)

import.

The

raw

data

is

3

4

12

32

3

1

1

1

1

4

3

1beer

=

scan()3

4

1 2

3

2

3

1

1

1

1

4

3

1barplot(beer)

#

this

isn't

correctbarplot(table(beer))

#

Yes,call

with

summarized

databarplot(table(beer)/length(beer))

#

divide

by

n

for

proportiontable(beer)/length(beer)CEO

salaries:

Suppose,

CEO

yearlycompensations

aresampled

and

the

following

are

found

(in

millions).(This

isbefore

being

indicted

for

cooking

the

books.)

12

.4

5

2

50

83

1

4

0.25sals

=

scan()

#

read

in

with

scan12

.4

5

2

50

8

3

1

4

0.25mean(sals)

;var(sals)

;

sd(sals)

;median(sals)fivenum(sals)

#

min,

lower

hinge,

Median,

upper

hinge,

maxsummary(sals)d (10,

17,

18,

25,

28,28);

summary(data);le(data,.25);

le(d

(.25,.75))sort(sals);

fivenum(sals);summary(sals)mean(sals,trim=1/10)

;mean(sals,trim=2/10)IQR(sals)Mad:median|Xi-median(X)|(1.4826)mad(sals)median(abs(sals

-

median(sals)))

#

withoutmedian(abs(sals

-

median(sals)))

*

1.4826Stem-and-leaf

Charts

Suppose

you

have

the

boxscore

of

a

basketball

game

and

the

following

pointsper

game

for

players

on

both

teams2

0

0

0

6

28

31

14

4

8

2

52

3scores

=

scan()2

32

0

0

0

6

28

31

14

4

8

2

5apropos("stem")#`apropos‘returns

a

charactervector

giving

the

names

of

all

objects

in

the

searchlist

matching

`what’.如>apropos(“stem”)[1]

“stem”

“system”

“system.file”“system.time”

參看find("stem")stem(scores);stem(scores,scale=2)The

salaries

could

beplaced

into

broad

categories

of

0-1million,

1-5

million

and

over

5

million.

To

do

this

usingRone

uses

the

cut()function

and

the

table()

function.Suppose

the

salaries

are

again12

.45

2

50

8

3

1

4

.25

And

we

want

to

break

that

dataintothe

intervals

[0;

1];

(1;

5];

(5;

50]sals

=

c(12,

.4,

5,

2,

50,

8,

3,

1,

4,

.25)

#

enter

datacats

=

cut(sals,breaks=c(0,1,5,max(sals)))

#

the

breakscats

#

view

the

valuestable(cats)

#

organizelevels(cats)

=

c("poor","rich","rolling

in

it")table(cats)Histograms:

Suppose

the

top

25

rankedmovies

madethefollowing

gross

receipts

for

a

week

429.6

28.2

19.6

13.7

13.0

7.8

3.4

2.0

1.9

1.0

0.7

0.4

0.4

0.3

0.30.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0.1

0.1x=scan()29.6

28.2

19.6

13.7

13.0

7.8

3.4

2.0

1.9

1.0

0.7

0.4

0.4

0.3

0.30.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0.1

0.1hist(x)

#

frequencieshist(x,probability=TRUE)

#

proportions(or

probabilities)rug(jitter(x))

#

add

tick

markshist(x,breaks=10)

#

10

breaks,

or

just

hist(x,10)hist(x,breaks=c(0,1,2,3,4,5,10,20,max(x)))

#

breaksFrequency

Polygons:x=c(.314,.289,.282,.279,.275,.267,.266,.265,.256,.250,.249,.211,.161)tmp

=

hist(x)

#

store

the

resultslines(c(min(tmp$breaks),tmp$mids,max(tmp$breaks)),c(0,tmp$counts,0),type="l")data(faithful)attach(faithful)

#

make

eruptions

visiblehist(eruptions,15,prob=T)

#

proportions,

not

frequencieslines(density(eruptions))

#

lines

makes

a

curve,

default

bandwidthHandling

bivariate

categorical

data:Smokes

amount

of

StudyingY

less

than

5

hoursN

5

-

10

hoursN

5

-

10hoursY

more

than

10

hoursN

more

than

10

hoursY

less

than

5

hoursY

5

-

10hoursY

less

than

5

hoursN

more

than

5

hoursY

5

-

10

hourslibrary(MASS)quineattach(quine)table(Age)table(Sex,

Age);

tab=xtabs(~

Sex

+

Age,quine);

unclass(tab)tapply(Days,

Age,

mean)tapply(Days,

list(Sex,

Age),

mean)apply,

sapply,

tapply,

lapplysmokes

=

c("Y","N","N","Y","N","Y","Y","Y","N","Y")amount

=

c(1,2,2,3,3,1,2,1,3,2)table(smokes,amount)tmp=table(smokes,amount)

#

store

the

tableoptions(digits=3)

#

only

print

3

decimal

placesprop.table(tmp,1)

#

the

rows

sum

to

1

nowprop.table(tmp,2)

#

the

columns

sum

to

1

nowReally

`sweep(x,

margin,

margin.table(x,

margin),

"/")‘prop.table(tmp)#amount

#

all

the

numbers

sum

to

1options(digits=7)

#

restore

the

number

of

digitsPlotting

tabular

databarplot(table(smokes,amount))barplot(table(amount,smokes))smokes=factor(smokes)

#

for

namesbarplot(table(smokes,amount),

beside=TRUE,legend.text=T)barplot(table(amount,smokes),main="table(amount,smokes)",

beside=TRUE,

legend.text=c("lessthan

5","5-10","more

than

10"))categorical

vs.

numerical:A

simple

example

mightbe

in

adrug

test,

where

you

have

data

(in

suitable

units)

for

anexperimental

group

and

for

a

control

group.experimental:

5

5

5

13

7

11

11

9

8

9control:

11

8

4

5

9

5

10

5

4

10x

=

c(5,

5,

5,

13,

7,

11,

11,

9,

8,9)y

=

c(11,

8,

4,

5,

9,

5,

10,

5,

4,

10)boxplot(x,y);amount

=

scan()5

5

5

13

711

11

9

8

9

11

8

4

5

9

5

10

5

4

10category

=

scan()1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

2

2boxplot(amount

~

category)

#

note

the

tilde

~從文本文件輸入ASCII碼數(shù)據(jù)x=scan("f:\\book\\1.txt")這是按照文本一行一行讀入的數(shù)據(jù)如果原先是4

5矩陣形式,則加用x=matrix(x,4,5,byrow=T)或直接用x=

matrix(scan("f:\\book\\1.txt"),4,5,b=T)如果原先是4

5有名字的data.frame形式,則用x=read.table("f:\\book\\ww.txt",header=T)xGM

VW

HUNDA6

7

8控制語句x=NULL;for(iin

1:5)x=cbind(x,i^2)i=1;x=NULL;while(i<=5){x=cbind(x,i^2);i=i+1}x=rnorm(1);if

(x>0)

y=x

else

y=-x+10i=1;x=rnorm(1);repeat{x=x+rnorm(1);if(x>3)break;i=i+1};print(c(i,x))怎么調(diào)出Packages來使用?Packages:libraries敲library(),就知道有什么libraries了,缺省library是base.比如要進入mass,就敲>library(MASS)每個library都有許多數(shù)據(jù)在每個library敲data(),就知道有什么數(shù)據(jù)了比如敲data(Titanic),就調(diào)出數(shù)據(jù)Titanic來了.注意:R語言對大小寫敏感.怎么從網(wǎng)上Packages來使用?從R上網(wǎng)直接到CRAN主頁zip文件到你的計算機用本機的zip來安裝程序包…byattach(warpbreaks)by(warpbreaks[,

1:2],

tension,

summary)by(warpbreaks[,

1],

list(wool

=

wool,

tension

=

tension),

summary)##

now

suppose

we

wantto

extract

the

coefficients

by

grouptmp

<-by(warpbreaks,

tension,

function(x)

lm(breaks

~

wool,

data

=

x))sapply(tmp,

coef)%in%和match##

The

intersection

of

two

sets

:intersect

<-

function(x,

y)

y[match(x,

y,

nomatch

=

0)]intersect(1:10,7:20)1:10%in%c(1,3,5,9)sstr

<-

c("c","ab","B","bba","c","@","bla","a","Ba","%")sstr[sstr

%in%

c(letters,LETTERS)]"%w/o%"

<-

function(x,y)

x[!x

%in%

y]

#-- x

without

y(1:10)

%w/o%c(3,7,12)intersect

<-

function(x,

y)

y[match(x,y,

nomatch

=

0)]intersect(1:10,7:20)attach(warpbreaks)warpbreaks[tension%in%

c("L","H"),]warpbreaks[warpbreaks$tension%in%

c("L","H"),]warpbreaks[warpbreaks[,3]%in%

c("L","H"),]unique(tension)ftable:把array/矩陣(沒有頻率的)數(shù)據(jù)變成列聯(lián)表(找出計數(shù))##

Start

with

a

contingencytable.ftable(Titanic,

row.vars

=

1:3)ftable(Titanic,

row.vars

=

1:2,

col.vars

=

“Survived”)ftable(Titanic,

row.vars=

2:1,

col.vars

=

“Survived”)##

Start

with

a

data

frame.x

<-

ftable(mtcars[c(“cyl”,

“vs”,

“am”,

“gear”)])x#為array,其維的次序為(“cyl”,“vs”,“am”,“gear”)ftable(x,row.vars=c(2,4))#從x(array)確定表的行變量##Start

with

expressions,use

table()'s"dnn"to

change

labelsftable(mtcars$cyl,

mtcars$vs,

mtcars$am,

mtcars$gear,

row.vars

=

c(2,

4),dnn

=

c("Cylinders",

"V/S",

"Transmission",

"Gears"))ftable(vs~carb,mtcars)#vs是列,carb是行#或ftable(mtcars$vs~mtcars$carb)ftable(carb~vs,mtcars)

#vs是行,carb是列

ftable(mtcars[,c(8,11)])#和上面ftable(carb~vs,mtcars)等價

ftable(breaks~wool+tension,warpbreaks)as.data.frame(UCBAdmissions)把array(三維列聯(lián)表)變成方陣(DF

<-as.data.frame(UCBAdmissions))Admit

Gender

Dept

FreqAdmitted

Male A

512Rejected

Male A

313………xtabs(Freq

~

Gender

+

Admit,

DF)把有頻率/計數(shù)的方陣變成列聯(lián)表xtabs(Freq

~

Admit+

Gender

+

Dept,

DF)把方陣變成原來的列聯(lián)表(a=xtabs(Freq~

Admit+Gender,data=DF))#如無頻數(shù)(權(quán)),左邊為空Gender

Admit Male

FemaleAdmitted

1198

557Rejected

1493

1278library(MASS);biplot(corresp(a,

nf=2))#應(yīng)用之一的互換上一頁數(shù)據(jù)中變量的屬性和定量is.factor(DF[,2])[1]

TRUEis.factor(DF[,3])[1]

TRUEis.factor(DF[,4])[1]

FALSEDF[,4]=as.factor(DF[,4])is.factor(DF[,4])[1]

TRUEDF[,4]=as.numeric(DF[,4])is.factor(DF[,4])[1]

FALSE,則可能出錯在用啞元記錄屬性變量觀測時,如不改變mtcars[1:4,]mpg

cyl

disp

hp

drat wt

qsec

vs

amgear

carbMazdaRX4 21.0

6

160

110

3.90

2.620

16.46

0

1

4

4Mazda

RX4

Wag

21.0

6

160

1103.90

2.875

17.02

0

1

4

4Datsun

710 22.8

4

108

93

3.85

2.320

18.61

1

1

4

1Hornet

4

Drive

21.4

6

258

1103.08

3.215

19.44

1

0

3

1lm(mpg~gear+carb,mtcars)

#把定性變量當成定量變量Call:lm(formula

=

mpg~

gear

+

carb,

data

=mtcars)Coefficients:(Intercept)7.276gear

carb5.576 -2.754mtcars[,10]=as.factor(mtcars[,10])#改變mtcars[,11]=as.factor(mtcars[,11])#改變(lm(mpg~gear+carb,mtcars))Call:lm(formula

=

mpg

~

gear

+

carb,

data

=mtcars)Coefficients:(Intercept)gear4gear5carb2carb3carb420.9327.7208.349-3.289-4.632-9.064carb6-9.581carb8-14.281向量比較:allx=1:12;y=1:12;all(y==x)[1]

TRUEcat

和printif(all(x

<

0))

cat("all

x

values

arenegative\n")all

x

values

are

negativecat("a

logical

or

(positive)

numeric

controlling

how

the

outputis++broken

into

successive

lines. If

'FALSE'

(default),

onlynewlines

created

explicitly

by\n")a

logical

or

(positive)

numeric

controlling

how

the

output

isbroken

into

successivelines. If

'FALSE'

(default),

onlynewlines

created

explicitly

byif(all(x

<0))

print("all

x

values

arenegative\n")[1]

"all

x

values

arenegative\n"print("a

logical

or

(positive)

numeric

controlling

how

the

output

is++broken

into

successive

lines. If

'FALSE'

(default),

onlynewlines

created

explicitly

by")[1]

"a

logical

or

(positive)

numeric

controlling

how

the

output

is\nbroken

into

successivelines. If

'FALSE'

(default),

only\nnewlines

created

explicitly

by"解釋后面語句的意思n=1200;x=runif(n*10)x=matrix(x,n,10)x1=(x[,1:5]<.4)*1p=x1*.8+(1-x1)*.2x2=1*(x[,6:10]<p)z=cbind(x1,x2)ss=function(z){nu=NULL;pattern=NULLz0

=as.matrix(z);n=nrow(z0);id=1;repeat{if

(id%%100==0)print(id)z1=sweep(z0,2,z0[1,],"==")pattern=rbind(pattern,z0[1,])z2=apply(z1,1,prod)z0=z0[z2!=1,];nu=c(nu,sum(z2))if

(sum(nu)>=n)breakif

(is.matrix(z0)==F){nu=c(nu,1);pattern=rbind(pattern,z0);id=id+1break}id=id+1}list(pattern=pattern,number=nu,id=id)}R-語言畫圖x=0:18plot(x,x,pch=x,col=x)points(x,18-x,pch=x)matplot(x,cbind(x,18-x))畫圖spring=data.frame(compression=c(41,39,43,53,42,48,47,46),distance=c(120,114,132,157,122,144,137,141))attach(spring)(Hooke’s

law:

f=.5ks)plot(distance

~

compression)plot(compression,

distance)畫圖par(mfrow=c(2,2))#準備畫2

2的4個圖plot(compression,

distance,main=

"Hooke'sLaw")#只有標題的圖plot(compression,

distance,main=

"Hooke'sLaw",xlab=

"x",ylab="y")#標題+x,y標記identify(compression,distance)#標出點號碼plot(compression,

distance,main="Hooke'sLaw")#只有標題的圖text(46,120,

"f=1/2*k*s")#在指定位寫入文字plot(compression,

distance,main="Hooke'sLaw")#只有標題的圖text(locator(2),

"I

am

here!")

#在點擊的兩個位置寫入文字畫圖library(mass);data(Animals);attach(Animals)plot(body,

brain)plot(sqrt(body),plot((body)^0.1,sqrt(brain))(brain)^0.1)plot(log(body),log(brain))0200004000060000800000100030005000bodybrain0501001502002503000204060sqrt(body)sqrt(brain)1.01.52.02.53.01.01.52.0(body)^0.1(brain)^0.1051002468log(body)log(brain)畫圖(原始數(shù)據(jù))par(mfrow=c(1,1),

pch=1)plot(body,

brain,

xlim=c(0,

6000))text(x=body,y=brain,labels=s(Animals),adj=0)#adj=0

implies

left

adjusted

textplot(log(body),

log(brain))text(x=log(body),y=log(brain),labels=s(Animals),adj=0)#0100020003000400050006000010002000300040005000bodybrainAsian

elephantHumanoknekyeymesatverrHGoirasfefe

ChDGimopnriaklClenaozyeweR畫圖(幾個點)plot(body[c(1,3,8)],brain[c(1,3,8)],xlim=c(0,200))text(x=body[c(1,3,8)],y=brain[c(1,3,8)],labels=s(Animals[,c(1,3,8)]),

adj=0)0501001502000100200300400body[c(1,

3,

8)]brain[c(1,

3,

8)]Mount

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論