Skip to content

Conversation

@Kota-SH
Copy link
Contributor

@Kota-SH Kota-SH commented Dec 16, 2025

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the correct way, at least you should not let clients config meta table name right? They should ask the meta table name from bootstrap node.

Theoretically, we can even support changing meta table name at runtime, clients and region servers can ask meta table name from master when they hit table not found exception when accessing meta.

And if we think this is a bit difficult to implement, and maybe there is no need to change meta table name after initialization, then maybe we can introduce a config to set meta table name when bootstraping a cluster, then we record this name in master local region or on HDFS directly. But anyway, clients and region servers still need to ask for the meta table name from master, not from configuration. But they just do not need to deal with meta table name change when running.

Thanks.

if (instance == null) {
synchronized (MetaTableName.class) {
if (instance == null) {
instance = initializeHbaseMetaTableName(HBaseConfiguration.create());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static, what if a client wants to connect different read replica clusters? And it will difficult to write UTs when you want to set up two clusters with different meta table names...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static

I agree, this does not change the static referring.

what if a client wants to connect different read replica clusters?

Could clients use different configurations for each cluster being accessed?

@wchevreuil
Copy link
Contributor

And if we think this is a bit difficult to implement, and maybe there is no need to change meta table name after initialization, then maybe we can introduce a config to set meta table name when bootstraping a cluster, then we record this name in master local region or on HDFS directly. But anyway, clients and region servers still need to ask for the meta table name from master, not from configuration. But they just do not need to deal with meta table name change when running.

Could be as simple as the hbase.id approach (a single file under the root dir containing the meta suffix generated during bootstrap)? Then all processes would load that file when initialising?

Comment on lines 54 to 58
private static TableName initializeHbaseMetaTableName(Configuration conf) {
TableName metaTableName =
TableName.valueOf(NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR, "meta");
LOG.info("Meta table suffix value: {}", metaTableName);
return metaTableName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the conf being used here? And the suffix?

if (instance == null) {
synchronized (MetaTableName.class) {
if (instance == null) {
instance = initializeHbaseMetaTableName(HBaseConfiguration.create());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static

I agree, this does not change the static referring.

what if a client wants to connect different read replica clusters?

Could clients use different configurations for each cluster being accessed?

@anmolnar
Copy link
Contributor

Shall we add meta table name to ConnectionRegistry?
ConnectionFactory will fetch it from active master or get it from ZooKeeper and set it in the Connection object where clients will find it easily.

ZooKeeper quorums will be distinct between replica clusters, so it could be a good place to store meta table name during the bootstrap process. We don't need to support changing it in runtime.

For instance, in cluster bootstrap process:

  • read configuration for suffix / replica id setting "replica1"
<property>
    <name>hbase.meta.table.suffix</name>
    <value>replica1</value>
 </property>
  • generate meta table name based on the above setting:
    • if suffix is not set: "hbase:meta"
    • if suffix is set: "hbase:meta_386330f"
  • store generated table name in ZooKeeper and retrieve it via ConnectionRegistry

wdyt?

@Apache9
Copy link
Contributor

Apache9 commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

@anmolnar
Copy link
Contributor

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

@Apache9
Copy link
Contributor

Apache9 commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

@anmolnar
Copy link
Contributor

anmolnar commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

So basically, we read from the config:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

and create a file on HDFS: hbase.replica1

meta_table_name=hbase:meta_3a63c0f

?

@Apache9
Copy link
Contributor

Apache9 commented Dec 20, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

So basically, we read from the config:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

and create a file on HDFS: hbase.replica1

meta_table_name=hbase:meta_3a63c0f

?

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

@anmolnar
Copy link
Contributor

anmolnar commented Jan 9, 2026

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

We can certainly do that, but in that case I'm not sure how to generate meta table names for the replica clusters. They have to be unique at the same storage location and with files we would able to scan existing files for conflicts. By storing it in master local region I'm not sure how to do it, unless we use UUIDs as suffixes.

Same issue arises with the directory name of Master local region of a replica cluster. Shall we use the configured replica name or the generated suffix?

Actually this might be the right way:

  1. Generate suffix for replica: "3a63c0f"
  2. Try to create master local region directory: "MasterData_3a63c0f"
  3. If succeeds, use this suffix for meta table name: "meta_3a63c0f"
  4. If not, go back to step 1.

Question: how would the replica cluster know about its suffix?

@Apache9
Copy link
Contributor

Apache9 commented Jan 10, 2026

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

We can certainly do that, but in that case I'm not sure how to generate meta table names for the replica clusters. They have to be unique at the same storage location and with files we would able to scan existing files for conflicts. By storing it in master local region I'm not sure how to do it, unless we use UUIDs as suffixes.

Same issue arises with the directory name of Master local region of a replica cluster. Shall we use the configured replica name or the generated suffix?

Actually this might be the right way:

  1. Generate suffix for replica: "3a63c0f"
  2. Try to create master local region directory: "MasterData_3a63c0f"
  3. If succeeds, use this suffix for meta table name: "meta_3a63c0f"
  4. If not, go back to step 1.

Question: how would the replica cluster know about its suffix?

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

@anmolnar
Copy link
Contributor

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

That's where we started. It's already there:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

I think we started the conversation about whether we should use this replica.id in the meta table and master region directory name or generate a random value for each replica?

@Apache9
Copy link
Contributor

Apache9 commented Jan 13, 2026

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

That's where we started. It's already there:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

I think we started the conversation about whether we should use this replica.id in the meta table and master region directory name or generate a random value for each replica?

As long as the replica id must be unique, I think use it as master local region's suffix is enough? For meta table, I'm nature on whether to reuse this id too, or we could generate one and store it in master local region, seems the former one is enough but the latter is more flexible, and as said above, theoretically we could even change the meta table name.

Thanks.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for branch
+1 💚 mvninstall 4m 19s master passed
+1 💚 compile 12m 25s master passed
+1 💚 checkstyle 5m 17s master passed
+1 💚 spotbugs 12m 50s master passed
+1 💚 spotless 1m 3s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 53s the patch passed
+1 💚 compile 13m 19s the patch passed
+1 💚 cc 13m 19s the patch passed
-0 ⚠️ javac 1m 3s /results-compile-javac-hbase-client.txt hbase-client generated 1 new + 127 unchanged - 0 fixed = 128 total (was 127)
-0 ⚠️ javac 3m 58s /results-compile-javac-hbase-server.txt hbase-server generated 1 new + 192 unchanged - 1 fixed = 193 total (was 193)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 17s /results-checkstyle-hbase-common.txt hbase-common: The patch generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2)
-0 ⚠️ checkstyle 0m 13s /results-checkstyle-hbase-diagnostics.txt hbase-diagnostics: The patch generated 11 new + 0 unchanged - 0 fixed = 11 total (was 0)
+1 💚 rubocop 0m 5s No new issues.
+1 💚 spotbugs 15m 18s the patch passed
+1 💚 hadoopcheck 13m 31s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 hbaseprotoc 5m 55s the patch passed
-1 ❌ spotless 0m 51s /patch-spotless.txt patch has 63 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 2m 34s The patch does not generate ASF License warnings.
110m 32s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/7/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7558
JIRA Issue HBASE-29691
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless cc buflint bufcompat hbaseprotoc rubocop
uname Linux 12a20d9a2864 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 0974dd4
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-hadoop-compat hbase-client hbase-zookeeper hbase-balancer hbase-server hbase-mapreduce hbase-diagnostics hbase-testing-util hbase-thrift hbase-shell hbase-backup hbase-it hbase-rest U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/7/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9

I don't think we're getting anywhere with this patch and I tend to stop wasting more efforts here. We might be able to find an easier approach for integration testing with multiple clusters.

What do you think about this new approach? Is it the same what was on your mind?

Comment on lines +73 to +74
@Deprecated
public static TableName META_TABLE_NAME =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still being used somewhere? Shouldn't we just remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There no usages for this in the current patch. I have changed all the references to either use connection or hard coded "hbase:meta" (in just a couple of places but majorly in tests). I marked it Deprecated for backward compatibility.
We can still use this deprecated var in all those above mentioned references to make this patch smaller but needs some rework later and also here. We would need another PR later, maybe after read replica, to delete this from TableName and move this into TEST_UTILS.

Collections.unmodifiableList(Arrays.asList(
(String[]) ArrayUtils.addAll(new String[] { TableName.META_TABLE_NAME.getNameAsString() },
HBASE_NON_TABLE_DIRS.toArray())));
(String[]) ArrayUtils.addAll(new String[] { "hbase:meta" }, HBASE_NON_TABLE_DIRS.toArray())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do this refactoring, because later it will be harder to find. Since you just deprecated the static meta table name, you can still use it if there's no better option.

LOG.info("BOOTSTRAP: creating hbase:meta region");
FileSystem fs = rootDir.getFileSystem(env.getMasterConfiguration());
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.valueOf("hbase", "meta"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't use hardcoded values.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 42s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 2m 29s master passed
+1 💚 compile 4m 9s master passed
+1 💚 javadoc 2m 54s master passed
+1 💚 shadedjars 4m 21s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 2m 21s the patch passed
+1 💚 compile 4m 14s the patch passed
+1 💚 javac 4m 14s the patch passed
+1 💚 javadoc 0m 7s hbase-protocol-shaded in the patch passed.
+1 💚 javadoc 0m 12s hbase-common in the patch passed.
+1 💚 javadoc 0m 10s hbase-hadoop-compat in the patch passed.
+1 💚 javadoc 0m 13s hbase-client in the patch passed.
+1 💚 javadoc 0m 9s hbase-zookeeper in the patch passed.
+1 💚 javadoc 0m 9s hbase-balancer in the patch passed.
+1 💚 javadoc 0m 21s hbase-server generated 0 new + 63 unchanged - 2 fixed = 63 total (was 65)
+1 💚 javadoc 0m 10s hbase-mapreduce in the patch passed.
+1 💚 javadoc 0m 10s hbase-diagnostics in the patch passed.
+1 💚 javadoc 0m 10s hbase-testing-util in the patch passed.
+1 💚 javadoc 0m 23s hbase-thrift in the patch passed.
+1 💚 javadoc 0m 8s hbase-shell in the patch passed.
+1 💚 javadoc 0m 10s hbase-backup in the patch passed.
+1 💚 javadoc 0m 9s hbase-it in the patch passed.
+1 💚 javadoc 0m 12s hbase-rest in the patch passed.
+1 💚 shadedjars 4m 20s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 0m 25s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 57s hbase-common in the patch passed.
+1 💚 unit 0m 27s hbase-hadoop-compat in the patch passed.
-1 ❌ unit 39m 34s /patch-unit-hbase-client.txt hbase-client in the patch failed.
+1 💚 unit 0m 38s hbase-zookeeper in the patch passed.
+1 💚 unit 7m 4s hbase-balancer in the patch passed.
-1 ❌ unit 7m 34s /patch-unit-hbase-server.txt hbase-server in the patch failed.
+1 💚 unit 22m 14s hbase-mapreduce in the patch passed.
+1 💚 unit 3m 46s hbase-diagnostics in the patch passed.
+1 💚 unit 1m 44s hbase-testing-util in the patch passed.
+1 💚 unit 6m 9s hbase-thrift in the patch passed.
-1 ❌ unit 6m 39s /patch-unit-hbase-shell.txt hbase-shell in the patch failed.
+1 💚 unit 12m 28s hbase-backup in the patch passed.
+1 💚 unit 0m 35s hbase-it in the patch passed.
+1 💚 unit 2m 54s hbase-rest in the patch passed.
148m 7s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/7/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7558
JIRA Issue HBASE-29691
Optional Tests javac javadoc unit compile shadedjars
uname Linux 38290765b139 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 0974dd4
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/7/testReport/
Max. process+thread count 3895 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-hadoop-compat hbase-client hbase-zookeeper hbase-balancer hbase-server hbase-mapreduce hbase-diagnostics hbase-testing-util hbase-thrift hbase-shell hbase-backup hbase-it hbase-rest U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/7/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants