Therefore, a data.table can have at most one key, because it However, each row in aĬolumns of rownames, which may be integer, factor, character or some That is useful to organise a telephone directory, for example, which A person has at least two names, a rst name and a second name. The multiple names belonging to the single row? That is not what That is, the multiple names belonging to a single Let's start by considering ame, specically rownames (or inĮnglish, row names). The 10 minute quick start guide also has a guide on keys. When no key is set, or we group in a different orderįrom that of the key, we call it an ad hoc by.ģ.3 Why is grouping by columns in the key faster than an ad hoc by?īecause each group is contiguous in RAM, thereby minimising pageįetches, and memory can be copied in bulk ( memcpy in C) rather thanįrom here, I guess that setting a key somehow allows R to use "radix sorting" over other algorithms, and that's why it is faster. Radix is specically for integers only, see Why is that?ĭata.table uses radix sorting. However, it doesn't explain the purpose of having a key.ģ.2 I don't have a key on a large table, but grouping is still really quick. My takeaway here is that a key would "sort" the data.table, resulting in a very similar effect to order(). No copy is made at all, other than temporary working memory as large as one column. The columns are sorted in ascending order always. Setkey() sorts a data.table and marks it as sorted. As such, I wish to understand what a key does in order to properly set keys in my data tables. North Atlantic Population Project: Census microdata from Canada, Great Britain, Germany, Iceland, Norway, Sweden, and the United States from 1801 to 1910.I am using data.table and there are many functions which require me to set a key (e.g.IPUMS International: 68 countries - 211 censuses - 480 million person records.(Minnesota Population Center)Ĭontains hundreds of data files from the U.S. Can select variables and customize dataset before downloading. Includes American Community Survey, Census of Population and Housing, Current Population Survey and American Time Use Survey microdata. Historical Demographic, Economic and Social Data: the United States, 1790-1970.Ĭollection of historical census data from the Inter-university Consortium for Political and Social Research (ICPSR).Ĭensus microdata for social and economic research. Datasets included are: American Community Survey and the Current Population Survey. It allows you to select individual variables to develop customized tables and perform statistical analysis. MDAT is the Census Bureau's microdata extraction tool. CPS Table Creator gives you the ability to create customized tables from the Current Population Survey's Annual Social and Economic Supplement (March supplement).The primary source of labor force statistics for the population of the United States. If you prefer to select variables for download, use IPUMS-USA or MDAT.ĭirect download of data files from the Census Bureau's various survey programs. Download files by Region, Division, State, or Public Use Microdata Areas (PUMAs). American Community Survey Public Use Microdata Sample (PUMS)ĭirect download of PUMS data files from the Census Bureau's ftp site.The Census data sources listed below are set up to be used with statistical analysis software (SPSS, Stata, etc.).
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |