-
Notifications
You must be signed in to change notification settings - Fork 1
Old Pocket DAQ (deprecated)
A prototype of dunedaq running in a kubernetes (kind) cluster
BEWARE: At least one step of these instructions may require sudo rights, specifically the firewall configuration.
HOWEVER, it is currently recommended to run these instructions on one of several computers in the np04daq cluster (namely one of np04-srv-001
, -010
, -019
, -021
, -022
, -024
, and -030
), and on those computers, the docker
user group is being used to enable the communication with the docker daemon. (Note that np04-srv-019
is a CentOS Stream 8 node, while others are CentOS 7 nodes). This reduces the need for special privileges, as does the fact that a firewall is not typically run on those four computers. If/when you are ready to try these instructions on one of the special np04daq computers, please get in touch with system administrators to get added to the docker
group.
cd <MyTopDir>
mkdir pocket-daq
cd pocket-daq
POCKDAQDIR=$PWD
This version of restcmd
allows sending the command notification to a different host from the sender.
source ~np04daq/bin/web_proxy.sh
cd $POCKDAQDIR
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.8.0
dbt-create.sh dunedaq-v2.8.0 dunedaq-k8s
cd dunedaq-k8s
cd sourcecode
git clone https://github.com/DUNE-DAQ/restcmd.git -b thea/k8s
cd ..
dbt-workarea-env
dbt-build.sh
Steps for re-setting up this 'dunedaq-k8s' software area when you come back to it later
...for example, after you have logged out and logged back in...
cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
cd $POCKDAQDIR/dunedaq-k8s
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.8.0
dbt-workarea-env
(for later use)
This version of nanorc
is modified to interface to the kind
control plane to manage the daq processes.
source ~np04daq/bin/web_proxy.sh # if not already done
pip install -U https://github.com/DUNE-DAQ/nanorc/archive/thea/k8s.tar.gz
source ~np04daq/bin/web_proxy.sh # if not already done
# Back to `pocket-daq`
cd $POCKDAQDIR
git clone https://github.com/DUNE-DAQ/pocket.git -b thea/daq_application
cd pocket/images/daq_application/cvmfs_import
# Builds a docker image importing the `dunedaq-k8s` dbt work area
./build.sh ../../../../dunedaq-k8s
Now the image pocket-daq:v0.1.0
image should have been created.
Time to start the cluster....
source ~np04daq/bin/web_proxy.sh # if not already done
# Start the cluster
cd $POCKDAQDIR/pocket
SERVICES=opmon,ers make setup.local
## Make your shell use binaries (`kubectl`, ...) that pocket ships with
eval $(make env)
# Load the new docker image into the cluster registry
kind load docker-image pocket-daq-cvmfs:v0.1.0 --name pocketdune
NOTE 1: The pocket-daq
image needs to be reloaded every time the cluster is restarted.
NOTE 2: don't forget to initialize the ers
database before moving on. Navigate to the ERS address - http://<pocket hostname>:30080
- in a web browser, click "Apply Migrations", then reload.
Steps to re-start the cluster when you come back to this 'pocket' software area later
...for example, after you have shut down the cluster, logged out, and logged back in...
cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
source ~np04daq/bin/web_proxy.sh # if not already done
cd $POCKDAQDIR/pocket
SERVICES=opmon,ers make setup.local
eval $(make env)
kind load docker-image pocket-daq-cvmfs:v0.1.0 --name pocketdune
Please Note that the startup of the cluster a second time, or at least the successful loading of the web pages described below, can take a long time. Up to 10 minutes.
If a firewall is running, it needs to be tweaked to allow the daq apps to report status changes back to nanorc
. You can check whether a firewall is running by using the following command: ps -ef | grep -i firewalld
.
Use docker network ls
to find the network named kind
--(~)--> docker network ls
NETWORK ID NAME DRIVER SCOPE
71246ea5c5d7 bridge bridge local
24580441c323 host host local
7fa13261749f kind bridge local
3e5da2d00bba none null local
The ID
should match a bridge network interface on the host called something like br-<id>
in ifconfig
.
Find it and use the following command to put it in the trusted zone.
sudo firewall-cmd --permanent --zone=trusted --change-interface=br-7fa13261749f
sudo firewall-cmd --reload
If firewalld is not running, the above command is not needed.
The make setup.local
and kind load
commands above start the Kubernetes cluster for you. Once it is running, you can point your browser to several different pages to check on the status of the cluster and see the graphical displays that are available.
ToDo: provide instructions for setting up a tunnel so that we can visit web pages on np04daq computers from browsers running on computers outside of CERN. In the meantime, I'll just include a link to the part of Marco's 'graphical viewer' page that talks about the tunnel. That is not a perfect reference for what is needed here, but it's a good start.
The definitive list of available services and their ports is printed to the console when you run the make setup.local
command above. That list includes the 'in-cluster' addresses, the 'out-cluster' addresses, and the username and password, if those are needed. All of that is very useful, so you should take a look.
In the meantime, here is a non-definitive list, for reference:
- Kubernetes dashboard:
http://<host>:31001
- Grafana:
http://<host>:31003
- ERS:
http://<host>:30080
- Kafka:
http://<host>:30092
- InfluxDB:
http://<host>:31002
where <host>
is something like np04-srv-024
.
cd $POCKDAQDIR/dunedaq-k8s
python -m minidaqapp.nanorc.mdapp_multiru_gen mdapp_4proc
and tweak the boot.json
environment section to point erskafka
to the right place.
"env": {
"DUNEDAQ_ERS_DEBUG_LEVEL": "getenv:-1",
"DUNEDAQ_ERS_ERROR": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
"DUNEDAQ_ERS_FATAL": "erstrace,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
"DUNEDAQ_ERS_INFO": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
"DUNEDAQ_ERS_STREAM_LIBS": "erskafka",
"DUNEDAQ_ERS_VERBOSITY_LEVEL": "getenv:1",
"DUNEDAQ_ERS_WARNING": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
},
Here are some bash commands that will make this edit for you
cp -p mdapp_4proc/boot.json mdapp_4proc/boot.json.orig
sed -i 's/erstrace,throttle,lstdout\"/erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)\"/' mdapp_4proc/boot.json
nanorc mdapp_4proc
shonky rc> boot -p my-partition
[17:03:34] INFO Resolving the kind gateway
INFO kind network gateway: 172.19.0.1
INFO Creating p0 namespace
INFO Creating p0:dataflow daq application (port: 3333)
INFO Creating p0:hsi daq application (port: 3333)
INFO Creating p0:ruemu0 daq application (port: 3333)
INFO Creating p0:trigger daq application (port: 3333)
INFO Creating nanorc responder service p0:nanorc for 172.19.0.1:56789
INFO Creating nanorc responder endpoint
Apps
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ name ┃ host ┃ alive ┃ pings ┃ last cmd ┃ last succ. cmd ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ dataflow │ dataflow.p0 │ True │ False │ None │ None │
│ hsi │ hsi.p0 │ True │ False │ None │ None │
│ ruemu0 │ ruemu0.p0 │ True │ False │ None │ None │
│ trigger │ trigger.p0 │ True │ False │ None │ None │
└──────────┴─────────────┴───────┴───────┴──────────┴────────────────┘
The pods will take some time to come up. nanorc
queries the control plane and waits for the application to open the command port (3333).
The scheduling of the applications can be followed on the pocket dashboard at http://localhost:31001
.
After a while, status should report...
Apps
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ name ┃ host ┃ alive ┃ pings ┃ last cmd ┃ last succ. cmd ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ dataflow │ dataflow.p0 │ True │ True │ None │ None │
│ hsi │ hsi.p0 │ True │ True │ None │ None │
│ ruemu0 │ ruemu0.p0 │ True │ True │ None │ None │
│ trigger │ trigger.p0 │ True │ True │ None │ None │
└──────────┴─────────────┴───────┴───────┴──────────┴────────────────┘
At that point init
can be issued.
At that point a few informational ers
issues should appear on both Grafana and the ERS issue collector. [Kurt note: I haven't seen these yet, and my understanding is that this is an issue that is being investigated.]
Note: don't forget to set the correct partition name in the dunedaq grafana dashboard to be able to see the ers issue flowing in.
At the moment, init
is as far as we can get. The next step of configuring the system (conf
) will not currently work because we are still working on cataloguing the network ports that are needed for the various DAQ applications and then hacking nanorc to open them in the cluster containers.
In the absence of being able to run conf
, all we can do is shut down the system with the terminate
command in nanorc.
To destroy when you are finished, run
cd $POCKDAQDIR/pocket
make destroy.local
This will delete the cluster entirely, along with any state.
Please note that if you get logged out of your shell window on the np04daq cluster before you run these steps to destroy the cluster, you can log back in and run them without having to re-do any of the steps from earlier sections of these instructions.