8/29/2009

[MySQL][Spider]Spider storage engine 2.1 released

I'm pleased to announce the release of Spider storage engine version 2.1(beta).
http://spiderformysql.com/

The main changes in this version are following.
- Add UDFs "spider_direct_sql" and "spider_bg_direct_sql".
  These UDFs execute hand writing SQLs on remote server by using Spider's table link and store result sets in temporary tables.
  Main usages are following.
  1.Maintenance remote servers.
    (Create table on remote servers for Spider's link tables, etc.)
  2.Use every resources (excluding table) on remote servers.
  3.Mass update of the bulk data. (Update all rows in tables, etc.
    There are next advantages. It is possible to update it in parallel
    with multiple remote servers. The amount of the data transfer of
    a local server and remote servers can be reduced)
  4.Aggregate by 2-steps. (For DWH etc)
    First step: Aggregate on remote servers and collect result
                    to local server.
    Second step: Aggregate on local server.
    (There are next advantages. It is possible to aggregate it in
    parallel with multiple remote servers. The amount of the data
    transfer of a local server and remote servers can be reduced)

Example of 2-steps aggregating
(Count the number of rows in 7 shards tables from local server)
-------------------------------------------------------------------------------
Prepare for aggregating:
mysql> create table target_hosts (host char(17))engine=myisam;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into target_hosts (host) values ('192.168.22.13'),('192.168.22.14'),('192.168.22.15'),('192.168.22.16'),('192.168.22.51'),('192.168.22.52'),('192.168.22.53');
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> create temporary table a (a int)engine=myisam;
Query OK, 0 rows affected (0.00 sec)

Execute aggregating:
mysql> select sum(a.a) from a, (select sum(spider_direct_sql('select count(*) from data01', 'a', concat('srv "s", host "', host, '"'))) cnt from target_hosts) c;
+-----------+
| sum(a.a)  |
+-----------+
| 117641811 |
+-----------+
1 row in set (40.86 sec)

mysql> truncate table a;
Query OK, 0 rows affected (0.00 sec)

mysql> select sum(a.a) from a, (select spider_bg_direct_sql('select count(*) from data01', 'a', concat('srv "s", host "', host, '"')) cnt from target_hosts) c;
+-----------+
| sum(a.a)  |
+-----------+
| 117641811 |
+-----------+
1 row in set (5.91 sec)
-------------------------------------------------------------------------------

Please see "99_change_logs.txt" in the download documents for checking other changes.

Enjoy!

4 comments:

  1. Can you explain the UDFs in more depth? Is there any documentation?

    Basically I think this is what you are saying can be accomplished:

    Given the following on the "spider server":
    select
    c1,
    count(*) the_count,
    sum(c2 * c3) thesum,
    avg(c4) theavg
    from partitioned_table;

    on each shard:
    create temporary table tmp_materialization
    select
    c1,
    sum(1) the_count,
    sum(c2 * c3) thesum,
    sum(c4) theavg_sum,
    count(c4) theavg_cnt
    from partitioned_table;

    select
    c1,
    sum(the_count) the_count,
    sum(the_sum) the_sum,
    sum(theavg_sum) / sum(theavg_cnt) the_avg
    from (
    select * from a.tmp_materialization
    union all
    select * from b.tmp_materialization
    union all
    select * from c.tmp_materialization
    ) distributed_materializations

    ReplyDelete
  2. I omitted GROUP BY c1 on the queries. Just pretend it is there:)

    ReplyDelete
  3. Hi Swanhart,

    > Is there any documentation?
    Yes.
    Documentation are "21_udf.txt" and "22_udf_parameters.txt" in "spider-doc-2.1-for-5.1.37.tgz".

    > Can you explain the UDFs in more depth?
    Yes, I can.
    At your case,
    given the following on the "spider server" (local server):

    -- Change column type and storage engine suitably.
    create temporary table tmp_materialization (
    c1 int,
    the_count int,
    thesum int,
    theavg_sum int,
    theavg_cnt int,
    key idx1(c1)
    )engine=myisam;

    -- 'srv "s"' in statement is created by "create server s foreign data wrapper mysql options (...)" statement. Please see "http://dev.mysql.com/doc/refman/5.1/en/create-server.html" for detail.
    -- "target_hosts" table in statement has your shard's host names.
    select
    a.c1,
    sum(a.the_count) the_count,
    sum(a.thesum) the_sum,
    sum(a.theavg_sum) / sum(a.theavg_cnt) the_avg
    from
    tmp_materialization a,
    (select spider_bg_direct_sql('select c1, sum(1) the_count, sum(c2 * c3) thesum, sum(c4) theavg_sum, count(c4) theavg_cnt from partitioned_table group by c1', 'tmp_materialization', concat('srv "s", host "', host, '"')) cnt from target_hosts) c
    group by a.c1;

    Thanks,
    Kentoku

    ReplyDelete
  4. Don't need to do anything on each shard (remote servers).

    ReplyDelete