连接分布式表

连接分布式表时,若不使用 select, where 等过滤语句,返回的表会包含过多冗余数据。因此,DolphinDB 只能在 SQL 语句中进行分布式表连接。分布式连接仅支持以下几种连接:left join(lj / lsj),(ej),asof join(aj),window join(wj),full join(fj) [2.00.8]。

分布式表连接时,需要遵循以下规则:

1. 分布式表 asof join 或 window join 时,必须提供至少两个连接列。

2. 当分布式表与维度表或内存表连接时,系统会将维度表或内存表复制到分布式表所在的各个节点上执行连接操作。如果本地表数据量非常庞大,表的传送将非常耗时。为了提高性能,系统在数据复制之前用where条件尽可能多地过滤内存表。如果右表数据量太大,会影响查询速度,所以在实际应用中,右表的数据量最好比较小。

例子

创建数据库的脚本如下:

$ dates=2019.01.01..2019.01.31
$ syms="A"+string(1..30)
$ sym_range=cutPoints(syms,3)
$ db1=database("",VALUE,dates)
$ db2=database("",RANGE,sym_range)
$ db=database("dfs://stock",COMPO,[db1,db2])
$ n=10000
$ datetimes=2019.01.01T00:00:00..2019.01.31T23:59:59
$ t=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(1000,n) as qty,rand(500.0,n) as price)
$ trades=db.createPartitionedTable(t,`trades,`trade_time`sym).append!(t)

$ n=200
$ t2=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(500.0,n) as bid,rand(500.0,n) as offer)
$ quotes=db.createPartitionedTable(t2,`quotes,`trade_time`sym).append!(t2)

$ t3=table(syms as sym,take(0 1,30) as type)
$ infos=db.createTable(t3,`infos).append!(t3)

例1. 连接两个分布式表trades和quotes:

select * from ej(trades,quotes,`trade_time`sym);

trade_time

sym

qty

price

bid

offer

2019.01.01T00:00:00

A1

39

7.366735

37.933525

446.917644

2019.01.01T00:00:09

A10

15

461.381014

405.092702

26.659516

2019.01.01T00:00:10

A11

987

429.981704

404.289413

347.64917

2019.01.01T00:00:11

A12

266

60.466206

420.426175

83.538043

2019.01.01T00:00:12

A13

909

362.057769

324.886047

162.502655

2019.01.01T00:00:13

A14

264

113.964472

497.598722

103.114702

2019.01.01T00:00:14

A15

460

347.518325

24.584629

357.854207

2019.01.01T00:00:15

A16

196

258.889177

49.467399

13.974672

2019.01.01T00:00:16

A17

198

403.564922

428.539984

208.410852

2019.01.01T00:00:17

A18

30

288.469046

41.905556

378.080141

例2. 连接分布式表和维度表:

select * from lj(trades,infos,`sym);

trade_time

sym

qty

price

type

2019.01.01T00:00:00

A1

856

359.809918

0

2019.01.01T00:00:09

A10

368

305.801702

1

2019.01.01T00:00:10

A11

549

447.406744

0

2019.01.01T00:00:11

A12

817

115.613373

1

2019.01.01T00:00:12

A13

321

298.317481

0

2019.01.01T00:00:13

A14

3

2.289171

1

2019.01.01T00:00:14

A15

586

91.841629

0

2019.01.01T00:00:15

A16

745

43.256142

1

2019.01.01T00:00:16

A17

60

0.153205

0

例3. 连接分布式表和内存表:

tmp=table("A"+string(1..15) as sym,2019.01.11..2019.01.25 as date);
select * from ej(trades,tmp,`sym);

trade_time

sym

qty

price

date

2019.01.01T00:00:00

A1

856

359.809918

2019.01.11

2019.01.01T00:00:09

A10

368

305.801702

2019.01.20

2019.01.01T00:00:10

A11

549

447.406744

2019.01.21

2019.01.01T00:00:11

A12

817

115.613373

2019.01.22

2019.01.01T00:00:12

A13

321

298.317481

2019.01.23

2019.01.01T00:00:13

A14

3

2.289171

2019.01.24

2019.01.01T00:00:14

A15

586

91.841629

2019.01.25

2019.01.01T00:00:30

A1

390

325.407485

2019.01.11

更多连接方式请查看 表连接