湛蓝之海 发表于 2021-12-29 16:25:38

R语言入门 Chapter05 | 因子

不登高山,不知天之高也;不临深溪,不知地之厚也。 ——荀子
这篇文章讲述的是R语言中关于数据框的相关知识。希望这篇R语言文章对您有所帮助!如果您有想学习的知识或建议,可以给作者留言~
Chapter05 | 因子

在R中名义型变量和有序性变量称为因子,factor。这些分类变量的可能值称为一个水平,level,例如good,better,best,都称为一个leve。
由这些水平值构成的向量就称为因子。
所有元素构成因子
因子类型的数据:

[*] state.division
[*] state.region
因子的应用:

[*]1、计算频数
[*]2、独立性检验
[*]3、相关性检验
[*]4、方差分析
[*]5、主成分分析
[*]6、因子分析
[*]1、频数统计
# cyl这一列可以当作因子类型
> mtcars$cyl
6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

> table(mtcars$cyl)
468
117 14


[*] 2、如何将向量转换为因子
使用factor()函数
> f <- factor(c("red","red","green","red","blue","green","blue","blue"))
> f
red   red   green red   bluegreen blueblue
Levels: blue green red

# 有序性变量也可以作为因子

# 不定义levels时levels自动去重
> week <- factor(c("Mon","Fri","Thu","Wed","Mon","Fri","Sun"))
> week
Mon Fri Thu Wed Mon Fri Sun
Levels: Fri Mon Sun Thu Wed

# 自定义levels不允许重复
> week <- factor(c("Mon","Fri","Thu","Wed","Mon","Fri","Sun"),order = TRUE,
+                levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))
> week
Mon Fri Thu Wed Mon Fri Sun
Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun

# 一个向量转换成因子,直接输入到factor()内即可
> fcyl <- factor(mtcars$cyl)
> fcyl
6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8


[*]3、向量和因子图形的对比
[*]向量
plot(mtcars$cyl)


[*]因子
plot(fcyl)


[*] 4、cut()函数
cut函数可以将连续性变量x分割为n个水平的因子
> num <- c(1:100)
> num
   1   2   3   4   5   6   7   8   910111213141516171819202122
23242526272829303132333435363738394041424344
45464748495051525354555657585960616263646566
67686970717273747576777879808182838485868788
8990919293949596979899 100

# 每隔10个进行分组

> cut (num,c(seq(0,100,10)))
(0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]
(0,10]   (10,20](10,20](10,20](10,20](10,20](10,20](10,20](10,20]
(10,20](10,20](20,30](20,30](20,30](20,30](20,30](20,30](20,30]
(20,30](20,30](20,30](30,40](30,40](30,40](30,40](30,40](30,40]
(30,40](30,40](30,40](30,40](40,50](40,50](40,50](40,50](40,50]
(40,50](40,50](40,50](40,50](40,50](50,60](50,60](50,60](50,60]
(50,60](50,60](50,60](50,60](50,60](50,60](60,70](60,70](60,70]
(60,70](60,70](60,70](60,70](60,70](60,70](60,70](70,80](70,80]
(70,80](70,80](70,80](70,80](70,80](70,80](70,80](70,80](80,90]
(80,90](80,90](80,90](80,90](80,90](80,90](80,90](80,90](80,90]
(90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100]
(90,100]
10 Levels: (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] ... (90,100]
如果数字较大,我们可以通过cut()函数进行频数统计,很方便




https://blog.51cto.com/u_14683590/3737448
页: [1]
查看完整版本: R语言入门 Chapter05 | 因子