indexedTable

语法

indexedTable(keyColumns, X, [X1], [X2], …..)

或

indexedTable(keyColumns, capacity:size, colNames, colTypes)

或

indexedTable(keyColumns, table)

参数

keyColumns 是一个字符串标量或向量，表示主键。

第一种用法中，X, X1….是向量。

第二种用法中：

capacity 是正整数，表示建表时系统为该表分配的内存（以记录数为单位）。当记录数超过 capacity 时，系统会首先会分配 capacity 1.2~2倍的新的内存空间，然后复制数据到新的内存空间，最后释放原来的内存。对于规模较大的表，此类操作的内存占用会很高。因此，建议建表时预先分配一个合理的 capacity。

size 只能是0或1。若 size=0，则建立一个空表；若 size=1，则建立一个只包含1条记录的表，其中记录的初始值取决于列的数据类型：

BOOL 类型默认值为 false；
数值类型、时间类型、IPADDR、COMPLEX、POINT 的默认值为 0；
Literal, INT128 类型的默认值为 NULL。

colNames 是字符串向量，表示列名。

colTypes 是向量，表示各列的数据类型。

第三种用法中，table 是一个表。注意，table 中的 keyColumns 不能包含重复值。

详情

创建索引内存表(indexed table)。一个索引内存表有一个主键。主键可由一个或多个字段组成。

向表中添加新记录时，系统会检查新记录的 keyColumns 值。如果新记录的 keyColumns 值与已有记录的 keyColumns 值重复，会更新表中对应的记录。

查询优化：

对于索引内存表，满足以下条件，SQL查询性能会优于普通内存表：

查询语句必须包含 keyColumns 的第 1 列，且该列使用的过滤条件只能使用 = 或 in，且该条件后面不能跟 or 语句；
所有过滤条件使用的 in 谓词次数小于等于两次。

查询 indexedTable 时，建议调用 sliceByKey 以提高性能。
请对比键值内存表优化 SQL 查询条件。

例子

例1. 创建索引表

第一种写法：

$ sym=`A`B`C`D`E
$ id=5 4 3 2 1
$ val=52 64 25 48 71
$ t=indexedTable(`sym`id,sym,id,val)
$ t;

sym	id	col1
A	5	52
B	4	64
C	3	25
D	2	48
E	1	71

第二种写法：

$ t=indexedTable(`sym`id,1:0,`sym`id`val,[SYMBOL,INT,INT])
$ insert into t values(`A`B`C`D`E,5 4 3 2 1,52 64 25 48 71);

第三种写法：

$ tmp=table(sym, id, val)
$ t=indexedTable(`sym`id, tmp);

创建索引内存分区表：

$ t=indexedTable(`sym`id,sym,id,val)
$ db=database("",VALUE, sym)
$ pt=db.createPartitionedTable(t,`pt,`sym).append!(t);

例2. 更新索引表

$ t=indexedTable(`sym,1:0,`sym`datetime`price`qty,[SYMBOL,DATETIME,DOUBLE,DOUBLE])
$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:00 2018.06.08T12:30:00 2018.06.08T12:30:00,50.3 45.6 58.0,5200 4800 7800)
$ t;

sym	datetime	price	qty
APPL	2018.06.08T12:30:00	50.3	5200
IBM	2018.06.08T12:30:00	45.6	4800
GOOG	2018.06.08T12:30:00	58	7800

插入新记录，并且新记录中的 keyColumns 的值与表中 keyColumns 的值重复：

$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:01 2018.06.08T12:30:01 2018.06.08T12:30:01,65.8 45.2 78.6,5800 8700 4600)
$ t;

sym	datetime	price	qty
APPL	2018.06.08T12:30:01	65.8	5800
IBM	2018.06.08T12:30:01	45.2	8700
GOOG	2018.06.08T12:30:01	78.6	4600

keyColumns 的值不允许更新：

$ update t set sym="C_"+sym;
Can't update a key column.

例3. 查询索引内存表

SQL语句中若不使用 or，且某个过滤条件包含 keyColumns 的第1列，且该过滤条件使用 = 或 in谓词，且in谓词使用不超过两次，则查询索引内存表的性能优于查询普通内存表。与之对比，若希望查询键值表的性能优于普通内存表，过滤条件必须包含全部 keyColumns。

首先，分别创建包含100万条记录的普通内存表 t 和索引内存表 t1。

$ id=shuffle(1..1000000)
$ date=take(2012.06.01..2012.06.10, 1000000)
$ type=take(0..9, 1000000)
$ val=rand(100.0, 1000000)
$ t=table(id, date, type, val)
$ t1=indexedTable(`id`date`type, id, date, type, val);

使用 keyColumns 的第一列进行过滤：

$ timer(100) select * from t where id=500000;
Time elapsed: 177.286 ms

$ timer(100) select * from t1 where id=500000;
Time elapsed: 1.245 ms

$ timer(100) sliceByKey(t1, 500000)
Time elapsed: 0.742 ms

$ timer(100) select * from t where id in [500000, 600000, 700000];
Time elapsed: 1134.429 ms

$ timer(100) select * from t1 where id in [500000, 600000, 700000];
Time elapsed: 1.377 ms

若 keyColumns 第一列的过滤条件不使用 = 或 in ，则查询索引内存表的性能不会优化：

$ timer(100) select * from t where id between 500000:500010;
Time elapsed: 641.544 ms

$ timer(100) select * from t1 where id between 500000:500010;
Time elapsed: 599.752 ms

使用 keyColumns 的第一列与第三列进行过滤：

$ timer(100) select * from t where id=500000, type in [3,6];
Time elapsed: 172.808 ms

$ timer(100) select * from t1 where id=500000, type in [3,6];
Time elapsed: 1.664 ms

若不使用第一列 keyColumns 进行过滤，则查询索引内存表的性能不会优化：

$ timer(100) select * from t where date in [2012.06.03, 2012.06.06];
Time elapsed: 490.182 ms

$ timer(100) select * from t1 where date in [2012.06.03, 2012.06.06];
Time elapsed: 544.015 ms

$ timer(100) select * from t where date=2012.06.03, type=8;
Time elapsed: 205.443 ms

$ timer(100) select * from t1 where date=2012.06.03, type=8;
Time elapsed: 204.532 ms

若过滤条件使用超过两个 in 谓词，则查询索引内存表的性能不会优化：

$ timer(100) select * from t where id in [100,200], date in [2012.06.03, 2012.06.06], type in [3,6];
Time elapsed: 208.714 ms

$ timer(100) select * from t1 where id in [100,200], date in [2012.06.03, 2012.06.06], type in [3,6];
Time elapsed: 198.674 ms