I'm pleased to announce the release of Spider storage engine version 2.1(beta).
http://spiderformysql.com/
The main changes in this version are following.
- Add UDFs "spider_direct_sql" and "spider_bg_direct_sql".
These UDFs execute hand writing SQLs on remote server by using Spider's table link and store result sets in temporary tables.
Main usages are following.
1.Maintenance remote servers.
(Create table on remote servers for Spider's link tables, etc.)
2.Use every resources (excluding table) on remote servers.
3.Mass update of the bulk data. (Update all rows in tables, etc.
There are next advantages. It is possible to update it in parallel
with multiple remote servers. The amount of the data transfer of
a local server and remote servers can be reduced)
4.Aggregate by 2-steps. (For DWH etc)
First step: Aggregate on remote servers and collect result
to local server.
Second step: Aggregate on local server.
(There are next advantages. It is possible to aggregate it in
parallel with multiple remote servers. The amount of the data
transfer of a local server and remote servers can be reduced)
Example of 2-steps aggregating
(Count the number of rows in 7 shards tables from local server)
-------------------------------------------------------------------------------
Prepare for aggregating:
mysql> create table target_hosts (host char(17))engine=myisam;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into target_hosts (host) values ('192.168.22.13'),('192.168.22.14'),('192.168.22.15'),('192.168.22.16'),('192.168.22.51'),('192.168.22.52'),('192.168.22.53');
Query OK, 7 rows affected (0.00 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> create temporary table a (a int)engine=myisam;
Query OK, 0 rows affected (0.00 sec)
Execute aggregating:
mysql> select sum(a.a) from a, (select sum(spider_direct_sql('select count(*) from data01', 'a', concat('srv "s", host "', host, '"'))) cnt from target_hosts) c;
+-----------+
| sum(a.a) |
+-----------+
| 117641811 |
+-----------+
1 row in set (40.86 sec)
mysql> truncate table a;
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(a.a) from a, (select spider_bg_direct_sql('select count(*) from data01', 'a', concat('srv "s", host "', host, '"')) cnt from target_hosts) c;
+-----------+
| sum(a.a) |
+-----------+
| 117641811 |
+-----------+
1 row in set (5.91 sec)
-------------------------------------------------------------------------------
Please see "99_change_logs.txt" in the download documents for checking other changes.
Enjoy!
Subscribe to:
Post Comments (Atom)
Can you explain the UDFs in more depth? Is there any documentation?
ReplyDeleteBasically I think this is what you are saying can be accomplished:
Given the following on the "spider server":
select
c1,
count(*) the_count,
sum(c2 * c3) thesum,
avg(c4) theavg
from partitioned_table;
on each shard:
create temporary table tmp_materialization
select
c1,
sum(1) the_count,
sum(c2 * c3) thesum,
sum(c4) theavg_sum,
count(c4) theavg_cnt
from partitioned_table;
select
c1,
sum(the_count) the_count,
sum(the_sum) the_sum,
sum(theavg_sum) / sum(theavg_cnt) the_avg
from (
select * from a.tmp_materialization
union all
select * from b.tmp_materialization
union all
select * from c.tmp_materialization
) distributed_materializations
I omitted GROUP BY c1 on the queries. Just pretend it is there:)
ReplyDeleteHi Swanhart,
ReplyDelete> Is there any documentation?
Yes.
Documentation are "21_udf.txt" and "22_udf_parameters.txt" in "spider-doc-2.1-for-5.1.37.tgz".
> Can you explain the UDFs in more depth?
Yes, I can.
At your case,
given the following on the "spider server" (local server):
-- Change column type and storage engine suitably.
create temporary table tmp_materialization (
c1 int,
the_count int,
thesum int,
theavg_sum int,
theavg_cnt int,
key idx1(c1)
)engine=myisam;
-- 'srv "s"' in statement is created by "create server s foreign data wrapper mysql options (...)" statement. Please see "http://dev.mysql.com/doc/refman/5.1/en/create-server.html" for detail.
-- "target_hosts" table in statement has your shard's host names.
select
a.c1,
sum(a.the_count) the_count,
sum(a.thesum) the_sum,
sum(a.theavg_sum) / sum(a.theavg_cnt) the_avg
from
tmp_materialization a,
(select spider_bg_direct_sql('select c1, sum(1) the_count, sum(c2 * c3) thesum, sum(c4) theavg_sum, count(c4) theavg_cnt from partitioned_table group by c1', 'tmp_materialization', concat('srv "s", host "', host, '"')) cnt from target_hosts) c
group by a.c1;
Thanks,
Kentoku
Don't need to do anything on each shard (remote servers).
ReplyDelete