连接分布式表
连接分布式表时,若不使用 select, where 等过滤语句,返回的表会包含过多冗余数据。因此,DolphinDB 只能在 SQL 语句中进行分布式表连接。分布式连接仅支持以下几种连接:left join(lj / lsj),(ej),asof join(aj),window join(wj),full join(fj) [2.00.8]。
分布式表连接时,需要遵循以下规则:
1. 分布式表 asof join 或 window join 时,必须提供至少两个连接列。
2. 当分布式表与维度表或内存表连接时,系统会将维度表或内存表复制到分布式表所在的各个节点上执行连接操作。如果本地表数据量非常庞大,表的传送将非常耗时。为了提高性能,系统在数据复制之前用where条件尽可能多地过滤内存表。如果右表数据量太大,会影响查询速度,所以在实际应用中,右表的数据量最好比较小。
例子
创建数据库的脚本如下:
$ dates=2019.01.01..2019.01.31
$ syms="A"+string(1..30)
$ sym_range=cutPoints(syms,3)
$ db1=database("",VALUE,dates)
$ db2=database("",RANGE,sym_range)
$ db=database("dfs://stock",COMPO,[db1,db2])
$ n=10000
$ datetimes=2019.01.01T00:00:00..2019.01.31T23:59:59
$ t=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(1000,n) as qty,rand(500.0,n) as price)
$ trades=db.createPartitionedTable(t,`trades,`trade_time`sym).append!(t)
$ n=200
$ t2=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(500.0,n) as bid,rand(500.0,n) as offer)
$ quotes=db.createPartitionedTable(t2,`quotes,`trade_time`sym).append!(t2)
$ t3=table(syms as sym,take(0 1,30) as type)
$ infos=db.createTable(t3,`infos).append!(t3)
例1. 连接两个分布式表trades和quotes:
select * from ej(trades,quotes,`trade_time`sym);
trade_time |
sym |
qty |
price |
bid |
offer |
---|---|---|---|---|---|
2019.01.01T00:00:00 |
A1 |
39 |
7.366735 |
37.933525 |
446.917644 |
2019.01.01T00:00:09 |
A10 |
15 |
461.381014 |
405.092702 |
26.659516 |
2019.01.01T00:00:10 |
A11 |
987 |
429.981704 |
404.289413 |
347.64917 |
2019.01.01T00:00:11 |
A12 |
266 |
60.466206 |
420.426175 |
83.538043 |
2019.01.01T00:00:12 |
A13 |
909 |
362.057769 |
324.886047 |
162.502655 |
2019.01.01T00:00:13 |
A14 |
264 |
113.964472 |
497.598722 |
103.114702 |
2019.01.01T00:00:14 |
A15 |
460 |
347.518325 |
24.584629 |
357.854207 |
2019.01.01T00:00:15 |
A16 |
196 |
258.889177 |
49.467399 |
13.974672 |
2019.01.01T00:00:16 |
A17 |
198 |
403.564922 |
428.539984 |
208.410852 |
2019.01.01T00:00:17 |
A18 |
30 |
288.469046 |
41.905556 |
378.080141 |
… |
例2. 连接分布式表和维度表:
select * from lj(trades,infos,`sym);
trade_time |
sym |
qty |
price |
type |
---|---|---|---|---|
2019.01.01T00:00:00 |
A1 |
856 |
359.809918 |
0 |
2019.01.01T00:00:09 |
A10 |
368 |
305.801702 |
1 |
2019.01.01T00:00:10 |
A11 |
549 |
447.406744 |
0 |
2019.01.01T00:00:11 |
A12 |
817 |
115.613373 |
1 |
2019.01.01T00:00:12 |
A13 |
321 |
298.317481 |
0 |
2019.01.01T00:00:13 |
A14 |
3 |
2.289171 |
1 |
2019.01.01T00:00:14 |
A15 |
586 |
91.841629 |
0 |
2019.01.01T00:00:15 |
A16 |
745 |
43.256142 |
1 |
2019.01.01T00:00:16 |
A17 |
60 |
0.153205 |
0 |
… |
例3. 连接分布式表和内存表:
tmp=table("A"+string(1..15) as sym,2019.01.11..2019.01.25 as date);
select * from ej(trades,tmp,`sym);
trade_time |
sym |
qty |
price |
date |
---|---|---|---|---|
2019.01.01T00:00:00 |
A1 |
856 |
359.809918 |
2019.01.11 |
2019.01.01T00:00:09 |
A10 |
368 |
305.801702 |
2019.01.20 |
2019.01.01T00:00:10 |
A11 |
549 |
447.406744 |
2019.01.21 |
2019.01.01T00:00:11 |
A12 |
817 |
115.613373 |
2019.01.22 |
2019.01.01T00:00:12 |
A13 |
321 |
298.317481 |
2019.01.23 |
2019.01.01T00:00:13 |
A14 |
3 |
2.289171 |
2019.01.24 |
2019.01.01T00:00:14 |
A15 |
586 |
91.841629 |
2019.01.25 |
2019.01.01T00:00:30 |
A1 |
390 |
325.407485 |
2019.01.11 |
… |
更多连接方式请查看 表连接。