-
Notifications
You must be signed in to change notification settings - Fork 1
dunedaq and pocket in the post v2.10.0 era
A prototype of dunedaq running in a kubernetes (kind) cluster
BEWARE:
- At least one step of these instructions may require sudo rights, specifically the firewall configuration.
- The following instructions rely on
docker
20
or greater, which is not available on Centos7 natively but as a third-party package set under the name oddocker-ce
. Installation instructions are available here.
HOWEVER, it is currently recommended to run these instructions on one of several computers in the np04daq cluster (namely one of np04-srv-001
, -010
, -019
, -021
, -022
, and -024
), and on those computers, the docker
user group is being used to enable the communication with the docker daemon. Please avoid using readout hosts: -26
, -28
, -29
, -30
; Except if you are testing with readout hardware. (Note that almost all the nodes are CentOS Stream 8 nodes). The use of the docker
user group reduces the need for special privileges, as does the fact that a firewall is not typically run on these computers. If/when you are ready to try these instructions on one of the special np04daq computers, please get in touch with system administrators to get added to the docker
group.
cd <MyTopDir>
mkdir pocket-daq
cd pocket-daq
POCKDAQDIR=$PWD
This version of restcmd
allows sending the command notification to a different host from the sender.
source ~np04daq/bin/web_proxy.sh
cd $POCKDAQDIR
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.10.0-cs8
dbt-create.py dunedaq-v2.10.0-cs8 dunedaq-workdir
cd dunedaq-workdir
cd sourcecode
git clone https://github.com/DUNE-DAQ/restcmd.git
cd ..
dbt-workarea-env
dbt-build.py
Steps for re-setting up this `dunedaq-workdir` software area when you come back to it later
...for example, after you have logged out and logged back in...
cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
cd $POCKDAQDIR/dunedaq-workdir
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.10.0-cs8
dbt-workarea-env
(for later use)
This version of nanorc
is modified to interface to the kind
control plane to manage the daq processes.
source ~np04daq/bin/web_proxy.sh # if not already done
git clone [email protected]:DUNE-DAQ/nanorc.git -b plasorak/k8s
cd nanorc
pip install -e .
source ~np04daq/bin/web_proxy.sh # if not already done
# Back to `pocket-daq`
cd $POCKDAQDIR
git clone https://github.com/DUNE-DAQ/pocket.git -b thea/kind-1.20.0
cd pocket/images/daq_application/daq_area_cvmfs
# Builds a docker image importing the `dunedaq-k8s` dbt work area. For TRACE, see note below.
./build.sh ../../../../dunedaq-workdir
Accessing high-speed TRACE memory buffer file
Assuming TRACE_FILE=/dunedaq/pocket/trace_buffer in rebuild_work_area.sh before running ./build.sh above - on the host, `export TRACE_FILE=$PACKDAQDIR/pocket/share/trace_buffer`. This will give you access from the host to the TRACE_FILE and control TRACEing which happens in the containers.Now the image pocket-daq-area-cvmfs:v2.10.0
image should have been created; you can check that by doing:
docker images
REPOSITORY TAG
...
pocket-daq-area-cvmfs v2.10.0
...
Note: If you use a different release, the v2.10.0 tag will be different, and you will need to change nanorc! Check src/nanorc/k8spm.py
around line 141, daq_app_image
.
Time to start the cluster...
source ~np04daq/bin/web_proxy.sh # if not already done
# Start the cluster
cd $POCKDAQDIR/pocket
SERVICES=cvmfs,opmon,ers make setup.local
## Make your shell use binaries (`kubectl`, ...) that pocket ships with
eval $(make env)
# Load the new docker image into the cluster registry
kind load docker-image pocket-daq-area-cvmfs:v2.10.0 --name pocketdune
NOTE 1: The pocket-daq
image needs to be reloaded every time the cluster is restarted.
NOTE 2: kafka
takes a while to come up. No messages will appear on daqerrordisplay or on the grafana dashboard until the broker is operational. The pod status can be monitored on the dashboard or by running kubectl get -A pod
. Pod logs can be examined by running kubectl logs <podname> --namespace=<namespace_name>
. More detail on the status of a pod can be obtained using kubectl describe pod/<podname> --namespace=<namespace_name>
.
Steps to re-start the cluster when you come back to this 'pocket' software area later
... for example, after you have shut down the cluster, logged out, and logged back in...
cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
source ~np04daq/bin/web_proxy.sh # if not already done
cd $POCKDAQDIR/pocket
SERVICES=cvmfs,opmon,ers make setup.local
eval $(make env)
kind load docker-image pocket-daq-cvmfs:v0.1.0 --name pocketdune
# (OR, the following if the cluster is already running)
cd $POCKDAQDIR/pocket
eval $(make env)
Please Note that the startup of the cluster a second time, or at least the successful loading of the web pages described below, can take a long time. Up to 10 minutes.
If a firewall is running, it needs to be tweaked to allow the daq apps to report status changes back to nanorc
. You can check whether a firewall is running by using the following command: ps -ef | grep -i firewalld
.
Use docker network ls
to find the network named kind
--(~)--> docker network ls
NETWORK ID NAME DRIVER SCOPE
71246ea5c5d7 bridge bridge local
24580441c323 host host local
7fa13261749f kind bridge local
3e5da2d00bba none null local
The ID
should match a bridge network interface on the host called something like br-<id>
in ifconfig
.
Find it and use the following command to put it in the trusted zone.
sudo firewall-cmd --permanent --zone=trusted --change-interface=br-7fa13261749f
sudo firewall-cmd --reload
If firewalld is not running, the above command is not needed.
The make setup.local
and kind load
commands above start the Kubernetes cluster for you. Once it is running, you can point your browser to several different pages to check on the status of the cluster and see the graphical displays that are available.
ToDo: provide instructions for setting up a tunnel so that we can visit web pages on np04daq computers from browsers running on computers outside of CERN. In the meantime, I'll just include a link to the part of Marco's 'graphical viewer' page that talks about the tunnel. That is not a perfect reference for what is needed here, but it's a good start.
The definitive list of available services and their ports is printed to the console when you run the make setup.local
command above. That list includes the 'in-cluster' addresses, the 'out-cluster' addresses, and the username and password, if those are needed. All of that is very useful, so you should take a look.
In the meantime, here is a non-definitive list, for reference:
- Kubernetes dashboard:
http://<host>:31001
- Grafana:
http://<host>:31003
- ERS:
http://<host>:30080
- Kafka:
http://<host>:30092
- InfluxDB:
http://<host>:31002
where <host>
is something like np04-srv-024
.
NOTE: This needs to be modified so that daq_application access the correct path for ERS/Grafana/frames.bin...
cd $POCKDAQDIR
mkdir runarea
cd runarea
daqconf_multiru_gen test -d /dunedaq/pocket/frames.bin -o /dunedaq/pocket
AND download frames.bin
into pocket/share
.
curl -o ${POCKDAQDIR}/pocket/share/frames.bin -O https://cernbox.cern.ch/index.php/s/7qNnuxD8igDOVJT/download
pocket/share
is mounted on daq_application
containers as /dunedaq/pocket
, where the ruemu
application is instructed to load the raw data file from.
source ~np04daq/bin/web_proxy.sh -u
nanorc --k8s test
user@rc> boot
[18:15:35] INFO test_k8s_2 received command 'boot'
INFO test_k8s_2 propagating to children nodes (['test_k8s_2']) simultaneously
INFO Subsystem test_k8s_2 is booting
[18:15:35] Creating a namespace 'user-dunedaq' in kubernetes to hold your DAQ applications
INFO Resolving the kind gateway
INFO kind network gateway: 172.18.0.1
INFO Creating user-dunedaq namespace
INFO Creating user-dunedaq:dataflow0 daq application (port: 3333)
INFO Creating user-dunedaq:dfo daq application (port: 3333)
INFO Creating user-dunedaq:hsi daq application (port: 3333
INFO Creating user-dunedaq:ruemu0 daq application (port: 3333)
INFO Creating user-dunedaq:trigger daq application (port: 3333)
...
[18:15:39] INFO Application dataflow0 booted
INFO Application dfo booted
INFO Application hsi booted
[18:15:40] INFO Application ruemu0 booted
INFO Application trigger booted
test_k8s_2 apps
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ name ┃ state ┃ host ┃ pings ┃ last cmd ┃ last succ. cmd ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ test_k8s_2 │ booted │ │ │ │ │
│ └── test_k8s_2 │ booted │ │ │ │ │
│ ├── dataflow0 │ booted - alive │ dataflow0.user-dunedaq │ True │ None │ None │
│ ├── dfo │ booted - alive │ dfo.user-dunedaq │ True │ None │ None │
│ ├── hsi │ booted - alive │ hsi.user-dunedaq │ True │ None │ None │
│ ├── ruemu0 │ booted - alive │ ruemu0.user-dunedaq │ True │ None │ None │
│ └── trigger │ booted - alive │ trigger.user-dunedaq │ True │ None │ None │
└───────────────────┴────────────────┴────────────────────────┴───────┴──────────┴────────────────┘
The pods will take some time to come up. nanorc
queries the control plane and waits for the application to open the command port (3333).
The scheduling of the applications can be followed on the pocket dashboard at http://localhost:31001
. (On the dashboard, it is helpful to change the namespace selection at the top of the page from "default" to "All namespaces". With this, it is easier to find the pods associated with your partition.)
At that point init
, conf
, start
can be issued.
Note: don't forget to choose the correct partition name in the dunedaq grafana dashboard to be able to see the ers issue flowing in.
We all love application logs, here is how to get them: First, open a new terminal window on the same host, then go to the pocket directory and do:
eval $(make env)
Now you have k8s in your PATH, so you can do:
kubectl get pods -n <partition_name>
Note: partition_name can be given as a parameter to the nanorc boot command: boot --partition <partition_name>
and the default partition name seems to currently be "user-dunedaq" The partition_name is used to create a namespace during the boot process.
For example, when partition_name is "user-dunedaq":
kubectl get pods -n user-dunedaq
NAME READY STATUS RESTARTS AGE
dataflow0-84d77d48c9-mcbq9 1/1 Running 0 66s
...
And you can use the pod name with the kubectl logs
command:
kubectl logs dataflow0-84d77d48c9-mcbq9 -n user-dunedaq
To destroy when you are finished, run
cd $POCKDAQDIR/pocket
make destroy.local
This will delete the cluster entirely, along with any state.
Please note that if you get logged out of your shell window on the np04daq cluster before you run these steps to destroy the cluster, you can log back in and run them without having to re-do any of the steps from earlier sections of these instructions.