Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka under an M1 inconsistently fails #5736

Closed
3 tasks done
alexcu opened this issue Jun 7, 2021 · 2 comments
Closed
3 tasks done

Kafka under an M1 inconsistently fails #5736

alexcu opened this issue Jun 7, 2021 · 2 comments

Comments

@alexcu
Copy link

alexcu commented Jun 7, 2021

  • I have tried with the latest version of Docker Desktop
  • I have tried disabling enabled experimental features
  • I have uploaded Diagnostics
  • Diagnostics ID: Multiple (see the table below)

Expected behavior

Confluent's images of Apache Kafka and Kafka Connect at v6.1.1 running under linux/amd64 able to consistently run under Docker with an Apple M1 processor.

There are currently no native linux/arm64 images, per confluentinc/kafka-images#80.

Actual behavior

Inconsistently facing multiple issues (including deadlocks, shutting down of threads, qemu crashes) when different versions of Docker and variations of experimental features are used with Kafka/Kafka Connect. Below lists issues with the containers described in the steps to reproduce below.

Broker (i.e., kafka)

main-EventThread shuts down:

[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /172.18.0.3:59878, server: zookeeper/172.18.0.2:2181
[main-SendThread(zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server zookeeper/172.18.0.2:2181, sessionid = 0x1000002b7970000, negotiated timeout = 40000
[main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1000002b7970000 closed
[main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000002b7970000
===> Launching ...
===> Launching kafka ...
[2021-06-07 02:15:21,232] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-06-07 02:15:22,419] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-06-07 02:15:22,789] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2021-06-07 02:15:22,810] INFO starting (kafka.server.KafkaServer)

Connect (i.e., kafka-connect)

Deadlock issues (stalls with no response) when plug-ins are loaded or as Kafka connect loads:

[2021-06-07 02:26:41,541] INFO Scanning for plugin classes. This might take a moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
[2021-06-07 02:26:41,620] INFO Loading plugin from: /usr/share/java/cp-base-new (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)

<stays like this for over 5 mins>
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 6.1.1-ccs
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: c209f70c6c2e52ae
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1623033886126
===> Launching ...
===> Launching kafka-connect ...

<stays like this for over 5 mins>

qemu will randomly crash:

[2021-06-07 02:15:33,430] INFO Scanning for plugin classes. This might take a moment ... (org.apache.kafka.connect.cli.ConnectDistributed)
[2021-06-07 02:15:33,507] INFO Loading plugin from: /usr/share/java/cp-base-new (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[thread 300 also had an error]#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000401eb55600, pid=1, tid=299
#

# JRE version: OpenJDK Runtime Environment Zulu11.48+21-CA (11.0.11+9) (build 11.0.11+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Zulu11.48+21-CA (11.0.11+9-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 1295[thread 301 also had an error]
 c1 javassist.bytecode.ConstPool.read(Ljava/io/DataInputStream;)V (64 bytes) @ 0x000000401eb55600 [0x000000401eb555e0+0x0000000000000020][thread 298 also had an error]

#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/appuser/hs_err_pid1.log
Compiled method (c1)    3615 1330       3       javassist.bytecode.ClassFile::read (301 bytes)
 total in heap  [0x000000401eb64710,0x000000401eb686d0] = 16320
 relocation     [0x000000401eb64888,0x000000401eb64c90] = 1032
 main code      [0x000000401eb64ca0,0x000000401eb67560] = 10432
 stub code      [0x000000401eb67560,0x000000401eb67770] = 528
 oops           [0x000000401eb67770,0x000000401eb67780] = 16
 metadata       [0x000000401eb67780,0x000000401eb67858] = 216
 scopes data    [0x000000401eb67858,0x000000401eb67f20] = 1736
 scopes pcs     [0x000000401eb67f20,0x000000401eb68630] = 1808
 dependencies   [0x000000401eb68630,0x000000401eb68640] = 16
 nul chk table  [0x000000401eb68640,0x000000401eb686d0] = 144
Compiled method (c1)    3615 1319       3       org.reflections.vfs.ZipDir$$Lambda$120/0x0000000100188c40::test (8 bytes)
 total in heap  [0x000000401eb60e90,0x000000401eb618a0] = 2576
 relocation     [0x000000401eb61008,0x000000401eb61098] = 144
 main code      [0x000000401eb610a0,0x000000401eb61660] = 1472
 stub code      [0x000000401eb61660,0x000000401eb616a0] = 64
 oops           [0x000000401eb616a0,0x000000401eb616b8] = 24
 metadata       [0x000000401eb616b8,0x000000401eb616e8] = 48
 scopes data    [0x000000401eb616e8,0x000000401eb61788] = 160
 scopes pcs     [0x000000401eb61788,0x000000401eb61878] = 240
 dependencies   [0x000000401eb61878,0x000000401eb61880] = 8
 nul chk table  [0x000000401eb61880,0x000000401eb618a0] = 32
Compiled method (c1)    3615 1357       3       java.util.stream.StreamSpliterators$WrappingSpliterator$$Lambda$134/0x000000010018c840::accept (9 bytes)
 total in heap  [0x000000401eb78410,0x000000401eb78778] = 872
 relocation     [0x000000401eb78588,0x000000401eb785c0] = 56
 main code      [0x000000401eb785c0,0x000000401eb786a0] = 224
 stub code      [0x000000401eb786a0,0x000000401eb786e0] = 64
 oops           [0x000000401eb786e0,0x000000401eb786e8] = 8
 metadata       [0x000000401eb786e8,0x000000401eb786f8] = 16
 scopes data    [0x000000401eb786f8,0x000000401eb78710] = 24
 scopes pcs     [0x000000401eb78710,0x000000401eb78760] = 80
 dependencies   [0x000000401eb78760,0x000000401eb78768] = 8
 nul chk table  [0x000000401eb78768,0x000000401eb78778] = 16
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
#
qemu: uncaught target signal 6 (Aborted) - core dumped

Unable to load plugins, e.g., the Postgres plugin:

[2021-06-07 02:18:37,221] INFO Loading plugin from: /usr/share/confluent-hub-components/debezium-debezium-connector-postgresql (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2021-06-07 02:18:37,923] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed)
java.lang.ExceptionInInitializerError
	at io.debezium.connector.postgresql.PostgresConnector.version(PostgresConnector.java:47)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:390)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:395)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:365)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:337)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:268)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:260)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:229)
	at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:206)
	at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:61)
	at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:93)
	at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:80)
Caused by: java.lang.NullPointerException: inStream parameter is null
	at java.base/java.util.Objects.requireNonNull(Objects.java:246)
	at java.base/java.util.Properties.load(Properties.java:406)
	at io.debezium.util.IoUtil.loadProperties(IoUtil.java:491)
	at io.debezium.util.IoUtil.loadProperties(IoUtil.java:521)
	at io.debezium.connector.postgresql.Module.<clinit>(Module.java:19)
	... 12 more

Information

  • Is it reproducible? Yes
  • Is the problem new? Yes
  • macOS Version: 11.2.3 (20D91)
  • Intel chip or Apple chip: Apple M1
  • Docker Desktop Version: 3.3.3 (64133)

Steps to reproduce the behavior

Please see the associated Dockerfile and .env files below.

  1. Create a new network:
$ docker network create --driver bridge my_network
  1. Build the image:
$ docker build . --platform='linux/amd64' -t 'connect'
  1. Run zookeeper:
$ docker run --platform="linux/amd64" -p "2181:2181" --name="zookeeper" --env-file=zookeeper.env --network=my_network confluentinc/cp-zookeeper:6.1.1
  1. Run the broker:
$ docker run --platform="linux/amd64" -p "29092:29092" -p "9092:9092" -p "9101:9101" --name="broker" --env-file=broker.env --network=my_network confluentinc/cp-kafka:6.1.1
  1. Run the connector:
$ docker run --platform="linux/amd64" -p "8083:8083" --name="connect" --env-file=connect.env --network=my_network connect

Overview of Issues

I have explored various permutations of using Docker 3.3.3/3.4.0, the experimental flag on/off, and Big Sur's new Virtualisation framework (under experimental features) on/off. These tests were run Run @ 14GB RAM resources per this post.

Docker Experimental Virtualization Diagnostics ID Issues
3.3.3 Off Off E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607020858 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Fails to load debezium-debezium-connector-postgresql
3.3.3 Off On E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607021859 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Inconsistent behaviour, occasional qemu crash at plugin load, fails to load debezium-debezium-connector-postgresql (two logs provided)
3.3.3 On Off E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607024642 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Inconsistent behaviour, deadlocks encountered whilst launching kafka-connect (two logs provided)
3.3.3 On On E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607023115 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Inconsistent behaviour, deadlocks encountered whilst loading plugins, fails to load debezium-debezium-connector-postgresql (two logs provided)
3.4.0 Off Off E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607034536 Zookeeper - None observed; Broker - EventThread shutdown; Connect - fails to load `debezium-debezium-connector-postgresql
3.4.0 Off On E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607034132 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Inconsistent behaviour, deadlocks encountered whilst loading plugins (cp-base-new), fails to load debezium-debezium-connector-postgresql (two logs provided)
3.4.0 On Off E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607032648 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Deadlocks encountered while launching kafka-connect (two logs provided) and while loading plugins (kafka-serde-tools), fails to load debezium-debezium-connector-postgresql (two logs provided)
3.4.0 On On E970E6CB-2254-43E8-86EE-7C9B0D73722B/20210607033551 Zookeeper - None observed; Broker - EventThread shutdown; Connect - Deadlocks encountered while loading plugins (cp-base-new) - this happened twice (two logs provided)

Logs

Logs captured in the tests above are named in following format:

<container_name>.docker=<docker_version>.experimental=<on/off>.virtualization=<on/off>.[<1/2>].log

When connect encountered deadlock issues or qemu crashed, I killed the connect container, deleted it, recreated it, and ran it once more, noting the result. There are therefore two log files: connect.docker=X.experimental=X.virtualization=X.1.log and connect.docker=X.experimental=X.virtualization=X.2.log.

Files

Files for Reproducibility

Logs (Broker, Connect, Zookeeper)

@stephen-turner
Copy link
Contributor

Unfortunately qemu cannot successfully run all containers. This is out of our control, which is why we document it as "best effort only": https://docs.docker.com/docker-for-mac/apple-silicon/#known-issues.

Sorry, I realise this is not a very satisfactory answer to a detailed bug report, but the only solution is for the container authors to produce an arm64 image as requested at confluentinc/kafka-images#80.

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jul 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants