跳转至

7.2.4 Paimon Catalog

1 使用须知

  1. 数据放在 hdfs 时,需要将 core-site.xmlhdfs-site.xmlhive-site.xml 放到 FEBEconf 目录下。优先读取 conf 目录下的 hadoop 配置文件,再读取环境变量 HADOOP_CONF_DIR 的相关配置文件。

  2. 当前适配的 Paimon 版本为 0.8

2 创建 Catalog

Paimon Catalog 当前支持两种类型的 Metastore 创建 Catalog

  • filesystem (默认),同时存储元数据和数据在 filesystem

  • hive metastore ,它还将元数据存储在 Hive metastore 中。用户可以直接从 Hive 访问这些表。

2.1 基于 FileSystem 创建 Catalog

2.1.1 HDFS

SQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CREATE CATALOG `paimon_hdfs` PROPERTIES (
    "type" = "paimon",
    "warehouse" = "hdfs://HDFS8000871/user/paimon",
    "dfs.nameservices" = "HDFS8000871",
    "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
    "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
    "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
    "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
    "hadoop.username" = "hadoop"
);

CREATE CATALOG `paimon_kerberos` PROPERTIES (
    'type'='paimon',
    "warehouse" = "hdfs://HDFS8000871/user/paimon",
    "dfs.nameservices" = "HDFS8000871",
    "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
    "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
    "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
    "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
    'hadoop.security.authentication' = 'kerberos',
    'hadoop.kerberos.keytab' = '/doris/hdfs.keytab',
    'hadoop.kerberos.principal' = '<hdfs@HADOOP.COM>'
);

2.1.2 MINIO

SQL
1
2
3
4
5
6
7
CREATE CATALOG `paimon_s3` PROPERTIES (
    "type" = "paimon",
    "warehouse" = "s3://bucket_name/paimons3",
    "s3.endpoint" = "http://<ip>:<port>",
    "s3.access_key" = "ak",
    "s3.secret_key" = "sk"
);

2.1.3 OBS

SQL
1
2
3
4
5
6
7
CREATE CATALOG `paimon_obs` PROPERTIES (
    "type" = "paimon",
    "warehouse" = "obs://bucket_name/paimon",
    "obs.endpoint"="obs.cn-north-4.myhuaweicloud.com",
    "obs.access_key"="ak",
    "obs.secret_key"="sk"
);

2.1.4 COS

SQL
1
2
3
4
5
6
7
CREATE CATALOG `paimon_s3` PROPERTIES (
    "type" = "paimon",
    "warehouse" = "cosn://paimon-1308700295/paimoncos",
    "cos.endpoint" = "cos.ap-beijing.myqcloud.com",
    "cos.access_key" = "ak",
    "cos.secret_key" = "sk"
);

2.1.5 OSS

SQL
1
2
3
4
5
6
7
CREATE CATALOG `paimon_oss` PROPERTIES (
    "type" = "paimon",
    "warehouse" = "oss://paimon-zd/paimonoss",
    "oss.endpoint" = "oss-cn-beijing.aliyuncs.com",
    "oss.access_key" = "ak",
    "oss.secret_key" = "sk"
);

2.2 基于 Hive Metastore 创建 Catalog

SQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CREATE CATALOG `paimon_hms` PROPERTIES (
    "type" = "paimon",
    "paimon.catalog.type" = "hms",
    "warehouse" = "hdfs://HDFS8000871/user/zhangdong/paimon2",
    "hive.metastore.uris" = "thrift://172.21.0.44:7004",
    "dfs.nameservices" = "HDFS8000871",
    "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
    "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
    "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
    "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
    "hadoop.username" = "hadoop"
);

CREATE CATALOG `paimon_kerberos` PROPERTIES (
    "type" = "paimon",
    "paimon.catalog.type" = "hms",
    "warehouse" = "hdfs://HDFS8000871/user/zhangdong/paimon2",
    "hive.metastore.uris" = "thrift://172.21.0.44:7004",
    "hive.metastore.sasl.enabled" = "true",
    "hive.metastore.kerberos.principal" = "hive/xxx@HADOOP.COM",
    "dfs.nameservices" = "HDFS8000871",
    "dfs.ha.namenodes.HDFS8000871" = "nn1,nn2",
    "dfs.namenode.rpc-address.HDFS8000871.nn1" = "172.21.0.1:4007",
    "dfs.namenode.rpc-address.HDFS8000871.nn2" = "172.21.0.2:4007",
    "dfs.client.failover.proxy.provider.HDFS8000871" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
    "hadoop.security.authentication" = "kerberos",
    "hadoop.kerberos.principal" = "<hdfs@HADOOP.COM>",
    "hadoop.kerberos.keytab" = "/doris/hdfs.keytab"
);

2.3 基于 Aliyun DLF 创建 Catalog

该功能自 2.1.73.0.3 版本支持。

SQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
CREATE CATALOG `paimon_dlf` PROPERTIES (
    "type" = "paimon",
    "paimon.catalog.type" = "dlf",
    "warehouse" = "oss://xx/yy/",
    "dlf.proxy.mode" = "DLF_ONLY",
    "dlf.uid" = "xxxxx",
    "dlf.region" = "cn-beijing",
    "dlf.access_key" = "ak",
    "dlf.secret_key" = "sk"

    -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com",  -- optional
    -- "dlf.catalog.id" = "xxxx", -- optional
);

3 列类型映射

Paimon Data Type Doris Data Type Comment
BooleanType Boolean
TinyIntType TinyInt
SmallIntType SmallInt
IntType Int
FloatType Float
BigIntType BigInt
DoubleType Double
VarCharType VarChar
CharType Char
VarBinaryType, BinaryType String
DecimalType(precision, scale) Decimal(precision, scale)
TimestampType,LocalZonedTimestampType DateTime
DateType Date
ArrayType Array 支持Array嵌套
MapType Map 支持Map嵌套
RowType Struct 支持Struct嵌套(2.0.10 和 2.1.3 版本开始支持)

4 常见问题

  1. Kerberos 问题

    • 确保 principalkeytab 配置正确。

    • 需在 BE 节点启动定时任务(如 crontab ),每隔一定时间(如 12 小时),执行一次 kinit -kt your_principal your_keytab 命令。

  2. Unknown type valueUNSUPPORTED

    这是 Doris 2.0.2 版本和 Paimon 0.5 版本的一个兼容性问题,需要升级到 2.0.3 或更高版本解决,或自行 patch

  3. 访问对象存储( OSSS3 等)报错文件系统不支持

    2.0.5 (含)之前的版本,用户需手动下载以下 jar 包并放置在 ${DORIS_HOME}/be/lib/java_extensions/preload-extensions 目录下,重启 BE

    • 访问 OSSpaimon-oss-0.6.0-incubating.jar

    • 访问其他对象存储: paimon-s3-0.6.0-incubating.jar

    2.0.6 之后的版本不再需要用户手动放置。