R base笔记

2023-02-27 00:00:00 函数文件元素的是返回

R语言记录

R语言细节汇总
with, within, transform
dgCMatrix object
并行操作foreach
命令行并行
函数重命名构造

R语言细节汇总

1、查看是否安装某个包：requireNamespace("nnls", quietly = TRUE). 加载R包除了library()函数还有require()函数。如果安装了某个包，require()就会返回TRUE，并加载R包。没安装就会返回FALSE.
2、查看R包在线帮助文档browseVignettes('stringr')
3、&, | return a vector, &&, || return a number.

4、

==	`identical(x,y)`	`all.equal(x,y,tolerance=)`	`dplyr::near()`
返回等长向量	完全相等	容许误差	容许误差

ctrl+shift+R 添加注释；
dir('path',recursive=F)返回当前路径下的文件, recursive=T会返回目录下所有文件；dir.create('path') 创建文件夹. 复制别的文件夹的文件到当前位置file.copy().

a = dir("../pipeline/",pattern = "R$")[2:6] #提取需要复制的文件，dir()显示某个目录下有哪些文件，pattern = "R$"是显示以.R结尾的文件（代码）。
for (i in a) {
  file.copy(paste0("../pipeline/",i),"./") # file.copy()：复制文件
}

7、do.call()函数将list扁平化，如do.call(rbind, ls)行合并list元素中每一个.

8、NA与null区别：

a、NA表示数据集中该数据的遗失，对含有NA的数据集直接操作，NA不会被剔除. 如x<-c(1,2,3,NA); mean(x)返回NA。NA没有自己数据类型，会追随其他数据类型，如因为mode(x)='numeric', mode(x[4]) 也是numeric.
b\NULL表示未知状态，不知道存不存在数据，运算时不会计算。如x<-c(1,2,3,NULL), mean(x)=2. NULL是不算数的，length(c(NULL))=0 但length(c(NA))=1, 即NA占了位置.
9、x[x%in%y] 与 intersect(x,y)区别：前者只查看，不去重; 后者会自动去重
r > x = c(1,3,5,1,3) > y = c(3,2,5,6,2) > x[x %in% y ] [1] 3 5 3 > # [1] 3 5 3 > intersect(x,y) [1] 3 5

10、match函数：向量匹配排序. match(x,y)依次返回x中每个元素在y中位置. y[match(x,y)]以x作为模板，给y调顺序.

> x <- c("A","B","C","D","E")
> y <- c("B","D","E","A","C")
> match(x,y) #依次返回x中每个元素在y中位置
[1] 4 1 5 2 3
> y[match(x,y)] # 将y按照x顺序排序
[1] "A" "B" "C" "D" "E"
> x[match(y,x)] # 将x按照y顺序排序
[1] "B" "D" "E" "A" "C"
> a <- data.frame(x,y,z=sample(100,5))
> a
  x y  z
1 A B 18
2 B D 24
3 C E 54
4 D A 73
5 E C 79
> a[match(a$y,a$x),]
  x y  z
2 B D 24
4 D A 73
5 E C 79
1 A B 18
3 C E 54

11、列表取子集：使用[]取出的是只有一个元素的列表，而[[]]取出的才是列表中元素

> l <- list(m=matrix(1:9, nrow = 3),
+           df=data.frame(gene  = paste0("gene",1:3),
+                         sam   = paste0("sample",1:3),
+                         exp   = c(32,34,45)),
+           x=c(1,3,5))
> l[3]
$x
[1] 1 3 5

> class(l[3]) #取出的还是列表
[1] "list"
> l[[3]]
[1] 1 3 5
> class(l[[3]]) #取出的是数值型向量
[1] "numeric"

12、网络问题经常导致github访问不了，可以先在github上下载把code下载下来，再选择install_local()本地安装devtools::install_local("AnnoProbe-master.zip",upgrade = F)

13、length() and str_length()

y <- c('aaa','fhuv','dvh')
length(y)
# [1] 3
str_length(y)
# [1] 3 4 3

14、data.frame里取1行生成的是只有1行的数据框，但取1列生成的是向量.

df <- data.frame(x=1:5,y=letters[3:7],z=rnorm(5))
class(df[2,])
# [1] "data.frame"
class(df[,2])
# [1] "character"
class(df[2]) #取单列但想生成数据框，不加逗号即可（不设置取行还是取列默认取列）
# df[2]等价于df[,2,drop=F]
# [1] "data.frame"

15、factor()函数定义分组，其自动生成的level默认按照字母排序，所以设置时，好直接指定.

#设置参考水平，指定levels，对照组在前，处理组在后，不能反
Group = factor(Group,
               levels = c("control","patient"))

16、read.table(), read.csv(), import()这些函数都支持直接读取压缩了的文件，不需解压。

dat = read.table("counts.tsv.gz",check.names = F,row.names = 1,header = T)

17、一次加载多个包：

lapply(c('ggplot2','glmnet'),
       require,character.only=T)

查看当前镜像源 getOption('repos')
查看当前平台.Platform$OS.type == 'windows' or 'unix'

with, within, transform

with()函数：返回评估执行R表达式的值
within()函数：返回修改对象

# 数据框的单个改动，with()的代码相对简洁
anorexia$wtDiff <- with(anorexia, Postwt - Prewt)
anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt)
anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)
 
# 数据框的多个改动，with()的代码相对冗长繁琐，推荐使用 within() 和 transform()
fahrenheit_to_celcius <- function(f) (f - 32) / 1.8
airquality[c("cTemp", "logOzone", "MonthName")] <- with(airquality, list(
  fahrenheit_to_celcius(Temp),
  log(Ozone),
  month.abb[Month]
))
  
airquality <- within(airquality,
{
  cTemp2     <- fahrenheit_to_celcius(Temp)
  logOzone2  <- log(Ozone)
  MonthName2 <- month.abb[Month]
})
  
airquality <- transform(airquality,
  cTemp3     = fahrenheit_to_celcius(Temp),
  logOzone3  = log(Ozone),
  MonthName3 = month.abb[Month]
)

dgCMatrix object
It is a class of sparse numeric matrices in the compressed, sparse, column-oriented format. In this implementation the non-zero elements in the columns are sorted into increasing row order.

library(Matrix)
M <- Matrix(c(, ,  , 2,
              6, , -1, 5,
              , 4,  3, ,
              , ,  5, ),
            byrow = TRUE, nrow = 4, sparse = TRUE)
rownames(M) <- paste0("r", 1:4)
colnames(M) <- paste0("c", 1:4)
str(M)
# Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
# ..@ i       : int [1:7] 1 2 1 2 3  1
# ..@ p       : int [1:5]  1 2 5 7
# ..@ Dim     : int [1:2] 4 4
# ..@ Dimnames:List of 2
# .. ..$ : chr [1:4] "r1" "r2" "r3" "r4"
# .. ..$ : chr [1:4] "c1" "c2" "c3" "c4"
# ..@ x       : num [1:7] 6 4 -1 3 5 2 5
# ..@ factors : list()

x, i and p:
x slot存储非零值，
i slot的第k个元素，表示x中第k个元素所在行数为M@i[k]+1
p slot的第k个元素表示包括k-1列前总非零元数目，如M@p[1] = 0, and M@p[j+1] - M@p[j] 表示M中第j列的非零元数目。比如第三列非零元数目M@p[3+1]-M@p[3]=3。

M
# 4 x 4 sparse Matrix of class "dgCMatrix"
#    c1 c2 c3 c4
# r1  .  .  .  2
# r2  6  . -1  5
# r3  .  4  3  .
# r4  .  .  5  .
M@x
# [1]  6  4 -1  3  5  2  5
as.numeric(M)[as.numeric(M) != ]
# [1]  6  4 -1  3  5  2  5
M@i
# [1] 1 2 1 2 3  1
M@p
# [1]  1 2 5 7

并行操作foreach

library(doParallel)
library(foreach)

# 根据plat选择core, 并创建集群和注册
if(.Platform$OS.type=='windows'){
  core_num <- 4; core_type <- 'PSOCK'
}else{
  core_num <- 8; core_type <- 'FORK'
}
cl <- makeCluster(core_num,type=core_type)
# cores <- detectCores(logical=F)
# cl <- makeCluster(cores-1) # 创建
registerDoParallel(cl) # 注册

result <- foreach(iter_num=1:100, 
.packages = 'InvariantCausalPrediction', # 引入外部包
.export = 'get_candidate_lasso', # 引入外部定义函数
.combine = 'rbind', # 结果合并方式c,rbind,
 ... )%dopar%{ 
  # 不能用iter=1:10, 因为iter时内置某个函数
  ... 
  return_result  
}

# 关闭集群
stopImplicitCluster()
stopCluster(cl)

错误的可能解决途径：

使用for循环程序可以正常运行，但使用foreach and %dopar%后报错object p not found。可能原因是变量p在全局环境中定义了，但在某些函数内未定义而直接使用该函数。

命令行并行

R脚本前读取命令行参数

# Here command line arguments from the command line are used to determine the tiling
args = commandArgs(trailingOnly = TRUE)
if (length(args)!=4) {
  stop("Supply the first col, last col, first row, last row for tiling as command line arguments!", call.=FALSE)
} 

# The arguments are passed to varibles here for further use
first_col <- args[1]
last_col <- args[2]
first_row <- args[3]
last_row <- args[4]

命令行R脚本执行

#!/bin/bash
echo "Starting DI_TILE_cmd.R scripts in the background"
nohup Rscript --vanilla --verbose my_script.R 1 5 1 5 &> nohup_1.out &
nohup Rscript --vanilla --verbose my_script.R 1 5 6 10  &> nohup_2.out &
nohup Rscript --vanilla --verbose my_script.R 6 10 1 5 &> nohup_3.out &
nohup Rscript --vanilla --verbose my_script.R 6 10 6 10 &> nohup_4.out &

函数重命名构造

对函数名称重定义.

a = 1
old_sum <- function(x){
  return(x+a)
}
old_sum(3)

test_func <- get('old_sum')
test_func(3)

相关文章