Spring Boot整合Sharding-JDBC实现数据分库分表

2496 2020-02-20

需求

产品需求：数据增量快，且数据总量大

软件需求：性能满足压测标准

能够实现分库分表
能够有一定的自由度，可做定制化开发
性能损耗小
易于开发

技术选型

Sharding-JDBC

优点

框架轻量级
以jar包形式提供服务，无需额外部署和依赖
完全兼容JDBC和各种ORM框架
性能损耗小

缺点

仅面向开发人员（DBA不感知），对代码有较小的侵入

概要介绍

官网：Apache ShardingSphere

文档：3.X版本官方文档

配置方式：Spring Boot Starter

由于预研过程中，发现3.1.0及以上版本与Gaea6.11.X存在无法解决的包冲突，因此最终使用的版本为3.0.0，以下文字说明及示例代码均基于sharding-jdbc 3.0.0版本

Maven依赖

<!--sharding-jdbc-->
<dependency>
  <groupId>io.shardingsphere</groupId>
  <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
  <version>3.0.0</version>
</dependency>
<!--阿里数据库连接池（可选）-->
<dependency>
  <groupId>com.alibaba</groupId>
  <artifactId>druid</artifactId>
  <version>1.1.21</version>
</dependency>

yaml配置

官方文档很良心，基本说明移步Spring Boot配置

#配置均支持Groovy语法，详情移步百度
sharding:
  jdbc:
    datasource:
      #数据库名，以英文逗号分隔
      names: habit_local,habit_local_0,habit_local_1,habit_local_fj
      #默认库，即不需要分库或分表的数据表所在的库。如habit表在该库，而user_habit_u和user_habit_h不在该库中
      habit_local:
        type: com.alibaba.druid.pool.DruidDataSource
        driver: com.mysql.jdbc.Driver
        #参照隔壁组前车之鉴，与ND的fabric驱动无法适配，具体说明：http://dwz.date/36U
        url: jdbc:mysql://127.0.0.1:3306/habit_local?autoReconnect=true&useUnicode=true&characterEncoding=UTF8
        username: root
        password: XXX
        driver-class-name: com.mysql.jdbc.Driver
      #按标准分库策略分配的库
      habit_local_0:
        type: com.alibaba.druid.pool.DruidDataSource
        driver: com.mysql.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/habit_local_0?autoReconnect=true&useUnicode=true&characterEncoding=UTF8
        username: root
        password: XXX
        driver-class-name: com.mysql.jdbc.Driver
      habit_local_1:
        type: com.alibaba.druid.pool.DruidDataSource
        driver: com.mysql.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/habit_local_1?autoReconnect=true&useUnicode=true&characterEncoding=UTF8
        username: root
        password: XXX
        driver-class-name: com.mysql.jdbc.Driver
      #为某产品独立配置的库
      habit_local_fj:
        type: com.alibaba.druid.pool.DruidDataSource
        driver: com.mysql.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/habit_local_fj?autoReconnect=true&useUnicode=true&characterEncoding=UTF8
        username: root
        password: XXX
        driver-class-name: com.mysql.jdbc.Driver
    config:
      sharding:
        props:
          sql:
            show: true
        tables:
          #虚拟表名称
          user_habit_u:
            #必须完整书写库表
            actual-data-nodes: habit_local_$->{0..1}.user_habit_u_$->{0..2},habit_local_fj.user_habit_u_$->{0..2}
            #分库策略
            database-strategy:
              #标准策略
              standard:
                #分库依据
                sharding-column: tenant_id
                #指定策略实现
                precise-algorithm-class-name: com.nd.elearning.habit.cultivate.sdk.api.config.DatabasePreciseShardingConfig
            #分表策略
            table-strategy:
              #内联行表达式 
              inline:
                #分表依据
                sharding-column: user_id
                #策略表达式，只支持基础的取模和hash
                algorithm-expression: user_habit_u_$->{user_id % 3}
          user_habit_h:
            actual-data-nodes: habit_local_$->{0..1}.user_habit_h_$->{0..2},habit_local_fj.user_habit_u_$->{0..2}
            database-strategy:
              standard:
                sharding-column: tenant_id
                precise-algorithm-class-name: com.nd.elearning.habit.cultivate.sdk.api.config.DatabasePreciseShardingConfig
            table-strategy:
              standard:
                sharding-column: habit_id
                precise-algorithm-class-name: com.nd.elearning.habit.cultivate.sdk.api.config.UserHabitPreciseShardingConfig
        #没有进行分片存取的表所查询的默认数据库
        default-data-source-name: habit_local
  #以下为自定义的配置，用于解决为某产品单独配置库的需求
  custom:
    independence-app: {5: habit_local_fj}

Java代码

库/表策略类

/**
 * UserHabit表按habit_id分片的配置.
 * <p>Description: </p>
 * <p>Create Time: 2020/2/19 0019</p>
 * @author 910204(zys)
 */
public class UserHabitPreciseShardingConfig implements PreciseShardingAlgorithm {
  /**
   * 精确分片(分表)算法
   *
   * @param availableTargetNames 表名称列表
   * @param shardingValue 分片的列的值
   * @return String 表名
   */
  @Override
  public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) {
    int suffix =
        Math.abs(shardingValue.getValue().toString().hashCode() % availableTargetNames.size());
    // 由于纯数字后缀匹配有可能table_22被2匹配到，因此后缀要加上自定义的分隔符，如"_"
    String suffixStr = "_" + String.valueOf(suffix);
    for (Object each : availableTargetNames) {
      String targetName = ((String) each);
      if (targetName.endsWith(suffixStr)) {
        return targetName;
      }
    }
    return availableTargetNames.iterator().next().toString();
  }
}

JPA Entity

@Data
@Entity
@Table(name = "user_habit_u")
public class UserHabitByUser {
  @Id
  @Column(nullable = false, length = 36)
  @Type(type = "uuid-char")
  private UUID userHabitId;
  @Column
  private Long tenantId;
  @Column
  private Long userId;
  //其余字段
}

Repository

以下三种查询方式均支持，sharding-jdbc语法解析器会完成虚拟表到物理表的映射

public interface UserHabitByUserRepository extends JpaRepository<UserHabitByUser, UUID> {
  List<UserHabitByUser> findAllByUserId(Long userId);
  @Query(value = "select uhu from UserHabitByUser uhu where uhu.userId=:userId")
  List<UserHabitByUser> getUserAll(@Param(value = "userId") Long userId);
  @Query(value = "select uhu.* from user_habit_u uhu where uhu.user_id=:userId",nativeQuery = true)
  List<UserHabitByUser> getUserAll2(@Param(value = "userId") Long userId);
}

自定义代码部分

// 读取yaml中对象
@Data
// 交由spring托管
@Component
@ConfigurationProperties(prefix = "sharding.custom")
// 类名无所谓
public class ShardingCustomConfig {
  // 字段名需驼峰匹配
  private Map<String, String> independenceApp;
}

PS：自动映射yaml配置到对象非spring boot本身功能，需集成以下包。（sharding-jdbc-spring-boot-starter自带了）

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-configuration-processor</artifactId>
    <optional>true</optional>
</dependency>

public class DatabasePreciseShardingConfig implements PreciseShardingAlgorithm {
  /**
   * 分库算法
   *
   * @param availableTargetNames 库名称列表
   * @param shardingValue 分库的列的名称
   * @return String 库名
   */
  @Override
  public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) {
    // 从上下文中获取（我在该类中@Resource无法注入，但是从上下文中可以，很奇怪）
    ShardingCustomConfig shardingCustomConfig =
        ApplicationContextUtil.getApplicationContext().getBean(ShardingCustomConfig.class);
    // 获取单独部署库的租户
    Set<String> independenceAppTenantIds = shardingCustomConfig.getIndependenceApp().keySet();
    // 若未需单独部署的租户，直接映射到配置的库中
    if (independenceAppTenantIds.contains(shardingValue.getValue().toString())) {
      return shardingCustomConfig.getIndependenceApp().get(shardingValue.getValue().toString());
    }
    // 其他租户按指定策略寻库
    int suffix = Math.abs(shardingValue.getValue().toString().hashCode() % 2);
    // 由于纯数字后缀匹配有可能db_22被2匹配到，因此后缀要加上自定义的分隔符，如"_"
    String suffixStr = "_" + String.valueOf(suffix);
    for (Object each : availableTargetNames) {
      String targetName = ((String) each);
      if (targetName.endsWith(suffixStr)) {
        return targetName;
      }
    }
    return availableTargetNames.iterator().next().toString();
  }
}

注意事项及思考

若仅进行分表，查询被分表的表时，必须带上分表依据的列，否则sharding-jdbc默认回去遍历所有的分表。如查询user_habit_u虚拟表时，不带user_id条件，则引擎会去依次查询user_habit_u_0、user_habit_u_1、user_habit_u_2，然后返回查询结果
若进行了分库和分表，原理同上，引擎默认会去遍历所有库的所有表
如果的确有不带参数的需求，需实现Hint分片策略，继承HintShardingAlgorithm实现无参的查询逻辑
二次扩容问题？；；
- 分表策略上如果类似UC一样可以按区间段分表，那么扩展性强，也无需迁移数据
- 无法按区间段分表
  - 如果组件未来数据量在可控范围内，那么以预估数据量为前提一次性创建多个表，按普通hash取模来映射，规则简单开发迅速
  - 如果数据量不可控，那么二次扩容一定需要迁移数据，那么在此基础上可考虑一致性hash算法，尽可能缩减需要迁移的数据的范围

// ketama算法：常用来解决一致性hash时hash值范围不一致的问题
public class DemoClass{
    public static long hash(String key) {
        if (md5 == null) {
          try {
            md5 = MessageDigest.getInstance("MD5");
          } catch (NoSuchAlgorithmException e) {
            throw new IllegalStateException("no md5 algorythm found");
          }
        }
        md5.reset();
        md5.update(key.getBytes());
        byte[] bKey = md5.digest();
        long res =
            ((long) (bKey[3] & 0xFF) << 24)
                | ((long) (bKey[2] & 0xFF) << 16)
                | ((long) (bKey[1] & 0xFF) << 8)
                | (long) (bKey[0] & 0xFF);
        return res & 0xffffffffL;
    }
}

君の知らない物语

Spring Boot整合Sharding-JDBC实现数据分库分表

需求

技术选型

Sharding-JDBC

优点

缺点

概要介绍

Maven依赖

yaml配置

Java代码

库/表策略类

JPA Entity

Repository

自定义代码部分

注意事项及思考

夜里随心两三句

八月未央

Ubuntu系统忘记root账号密码无法登录的重置方法（转）

Quartz中CronExpression的“BUG”

虚拟机ubuntu桥接模式提供固定IP访问及源码安装redis6.2.7

《大鱼海棠》观后感

今生无悔二次元——动画篇v4.0

有个人是你的远方 | 这篇大概算是18年的总结

剑网三服务器及数据互通一览

写给未来明尊徒弟