UNIX 将 LARGE csv 导入 SQLite

2022-01-08 00:00:00 csv r sas sqlite

我有一个 5gig 的 csv 文件(也可以作为 sas 数据文件,如果它更容易的话),我需要将它放入一个 sql 数据库中,以便我可以在 R 中使用它.

I have a 5gig csv file (also as a sas datafile, if it would be easier) which I need to put into a sql database so I can work with it in R.

变量名称都包含在第一个观察行中并用双引号引起来.有 1000 多个变量,其中一些是数字其他字符(虽然一些字符变量是数字字符串,但我不太担心我可以在 R 中修复它).

The variables names are all contained in the first observation line and are double quoted. There are 1000+ variables some of numeric others character (though some of the character variables are strings of numerals, but I'm not too worried about it I can fix it in R).

我的问题是如何以最小的痛苦将 csv 文件导入数据库中的新表?

My question is how can I import the csv file into a new table in my database with minimal pain?

我发现要先创建表(包括指定所有变量,其中我有 1000 多个),然后使用.import 文件表"来引入数据.或者,使用一些 gui 导入向导,这对我来说不是一个选项.

I've found things saying to create your table first (which includes specifying all the variables, of which I have 1000+) and then using ".import file table" to bring in the data. Or, to use some gui import wizard, which is not an option for me.

对不起,如果这是 sql 101,但感谢您的帮助.

Sorry if this is sql 101 but thanks for the help.

推荐答案

这是我的工作流程:

library("RSQLite")
setwd("~/your/dir")
db <- dbConnect(SQLite(), dbname="your_db.sqlite") ## will make, if not present
field.types <- list(
        date="INTEGER",
        symbol="TEXT",
        permno="INTEGER",
        shrcd="INTEGER",
        prc="REAL",
        ret="REAL")
dbWriteTable(conn=db, name="your_table", value="your_file.csv", row.names=FALSE, header=TRUE, field.types=field.types)
dbGetQuery(db, "CREATE INDEX IF NOT EXISTS idx_your_table_date_sym ON crsp (date, symbol)")
dbDisconnect(db)

field.types 不是必需的.如果您不提供此列表,RSQLite 将从标题中猜测.索引也不是必需的,但稍后会加快您的查询速度(如果您为查询建立正确的列的索引).

The field.types isn't necessary. RSQLite will guess from the header if you don't provide this list. The index isn't required either, but will speed up your queries later on (if you index the correct column for your queries).

我已经在 SO 上学习了很多这些东西,所以如果你查看我在 SQLite 上提出/回答的问题,你可能会发现一些标记性的东西.

I've been learning a lot of this stuff here on SO, so if you check my questions asked/answered on SQLite, you may find some tagential stuff.

相关文章