Skip to content

Old Pocket DAQ (deprecated)

Alessandro Thea edited this page Sep 27, 2021 · 1 revision

A prototype of dunedaq running in a kubernetes (kind) cluster

BEWARE: At least one step of these instructions may require sudo rights, specifically the firewall configuration.

HOWEVER, it is currently recommended to run these instructions on one of several computers in the np04daq cluster (namely one of np04-srv-001, -010, -019, -021, -022, -024, and -030), and on those computers, the docker user group is being used to enable the communication with the docker daemon. (Note that np04-srv-019 is a CentOS Stream 8 node, while others are CentOS 7 nodes). This reduces the need for special privileges, as does the fact that a firewall is not typically run on those four computers. If/when you are ready to try these instructions on one of the special np04daq computers, please get in touch with system administrators to get added to the docker group.

Getting started

cd <MyTopDir>
mkdir pocket-daq
cd pocket-daq
POCKDAQDIR=$PWD

Prepare a daq workarea with modified restcmd

This version of restcmd allows sending the command notification to a different host from the sender.

source ~np04daq/bin/web_proxy.sh
cd $POCKDAQDIR
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.8.0
dbt-create.sh dunedaq-v2.8.0 dunedaq-k8s
cd dunedaq-k8s
cd sourcecode
git clone https://github.com/DUNE-DAQ/restcmd.git -b thea/k8s
cd ..
dbt-workarea-env
dbt-build.sh
Steps for re-setting up this 'dunedaq-k8s' software area when you come back to it later

...for example, after you have logged out and logged back in...

cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
cd $POCKDAQDIR/dunedaq-k8s
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt dunedaq-v2.8.0
dbt-workarea-env

Install nanorc

(for later use) This version of nanorc is modified to interface to the kind control plane to manage the daq processes.

source ~np04daq/bin/web_proxy.sh  # if not already done
pip install -U https://github.com/DUNE-DAQ/nanorc/archive/thea/k8s.tar.gz

Create the daq_application docker image

source ~np04daq/bin/web_proxy.sh  # if not already done

# Back to `pocket-daq`
cd $POCKDAQDIR
git clone https://github.com/DUNE-DAQ/pocket.git -b thea/daq_application

cd pocket/images/daq_application/cvmfs_import

# Builds a docker image importing the `dunedaq-k8s` dbt work area
./build.sh ../../../../dunedaq-k8s

Now the image pocket-daq:v0.1.0 image should have been created. Time to start the cluster....

source ~np04daq/bin/web_proxy.sh  # if not already done

# Start the cluster
cd $POCKDAQDIR/pocket
SERVICES=opmon,ers make setup.local

## Make your shell use binaries (`kubectl`, ...) that pocket ships with
eval $(make env)

# Load the new docker image into the cluster registry
kind load docker-image pocket-daq-cvmfs:v0.1.0 --name pocketdune

NOTE 1: The pocket-daq image needs to be reloaded every time the cluster is restarted.
NOTE 2: don't forget to initialize the ers database before moving on. Navigate to the ERS address - http://<pocket hostname>:30080 - in a web browser, click "Apply Migrations", then reload.

Steps to re-start the cluster when you come back to this 'pocket' software area later

...for example, after you have shut down the cluster, logged out, and logged back in...

cd <MyTopDir>/pocket-daq
POCKDAQDIR=$PWD
source ~np04daq/bin/web_proxy.sh  # if not already done
cd $POCKDAQDIR/pocket
SERVICES=opmon,ers make setup.local
eval $(make env)
kind load docker-image pocket-daq-cvmfs:v0.1.0 --name pocketdune

Please Note that the startup of the cluster a second time, or at least the successful loading of the web pages described below, can take a long time. Up to 10 minutes.

Networking

If a firewall is running, it needs to be tweaked to allow the daq apps to report status changes back to nanorc. You can check whether a firewall is running by using the following command: ps -ef | grep -i firewalld.

Use docker network ls to find the network named kind

--(~)--> docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
71246ea5c5d7        bridge              bridge              local
24580441c323        host                host                local
7fa13261749f        kind                bridge              local
3e5da2d00bba        none                null                local

The ID should match a bridge network interface on the host called something like br-<id> in ifconfig. Find it and use the following command to put it in the trusted zone.

sudo firewall-cmd --permanent --zone=trusted --change-interface=br-7fa13261749f
sudo firewall-cmd --reload

If firewalld is not running, the above command is not needed.

Opening web pages for the various services that are started

The make setup.local and kind load commands above start the Kubernetes cluster for you. Once it is running, you can point your browser to several different pages to check on the status of the cluster and see the graphical displays that are available.

ToDo: provide instructions for setting up a tunnel so that we can visit web pages on np04daq computers from browsers running on computers outside of CERN. In the meantime, I'll just include a link to the part of Marco's 'graphical viewer' page that talks about the tunnel. That is not a perfect reference for what is needed here, but it's a good start.

The definitive list of available services and their ports is printed to the console when you run the make setup.local command above. That list includes the 'in-cluster' addresses, the 'out-cluster' addresses, and the username and password, if those are needed. All of that is very useful, so you should take a look.

In the meantime, here is a non-definitive list, for reference:

  • Kubernetes dashboard: http://<host>:31001
  • Grafana: http://<host>:31003
  • ERS: http://<host>:30080
  • Kafka: http://<host>:30092
  • InfluxDB: http://<host>:31002

where <host> is something like np04-srv-024.

Generate a mdapp configuration

cd $POCKDAQDIR/dunedaq-k8s
python -m minidaqapp.nanorc.mdapp_multiru_gen mdapp_4proc

and tweak the boot.json environment section to point erskafka to the right place.

    "env": {
        "DUNEDAQ_ERS_DEBUG_LEVEL": "getenv:-1",
        "DUNEDAQ_ERS_ERROR": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
        "DUNEDAQ_ERS_FATAL": "erstrace,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
        "DUNEDAQ_ERS_INFO": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
        "DUNEDAQ_ERS_STREAM_LIBS": "erskafka",
        "DUNEDAQ_ERS_VERBOSITY_LEVEL": "getenv:1",
        "DUNEDAQ_ERS_WARNING": "erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)",
    },
Here are some bash commands that will make this edit for you
cp -p mdapp_4proc/boot.json mdapp_4proc/boot.json.orig
sed -i 's/erstrace,throttle,lstdout\"/erstrace,throttle,lstdout,erskafka(kafka-svc.kafka-kraft:9092)\"/' mdapp_4proc/boot.json

FINALLY

nanorc mdapp_4proc

shonky rc> boot -p my-partition
[17:03:34] INFO     Resolving the kind gateway
           INFO     kind network gateway: 172.19.0.1
           INFO     Creating p0 namespace
           INFO     Creating p0:dataflow daq application (port: 3333)
           INFO     Creating p0:hsi daq application (port: 3333)
           INFO     Creating p0:ruemu0 daq application (port: 3333)
           INFO     Creating p0:trigger daq application (port: 3333)
           INFO     Creating nanorc responder service p0:nanorc for 172.19.0.1:56789
           INFO     Creating nanorc responder endpoint
                                 Apps
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ name     ┃ host        ┃ alive ┃ pings ┃ last cmd ┃ last succ. cmd ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ dataflow │ dataflow.p0 │ True  │ False │ None     │ None           │
│ hsi      │ hsi.p0      │ True  │ False │ None     │ None           │
│ ruemu0   │ ruemu0.p0   │ True  │ False │ None     │ None           │
│ trigger  │ trigger.p0  │ True  │ False │ None     │ None           │
└──────────┴─────────────┴───────┴───────┴──────────┴────────────────┘

The pods will take some time to come up. nanorc queries the control plane and waits for the application to open the command port (3333). The scheduling of the applications can be followed on the pocket dashboard at http://localhost:31001.

After a while, status should report...

                                 Apps
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ name     ┃ host        ┃ alive ┃ pings ┃ last cmd ┃ last succ. cmd ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ dataflow │ dataflow.p0 │ True  │ True  │ None     │ None           │
│ hsi      │ hsi.p0      │ True  │ True  │ None     │ None           │
│ ruemu0   │ ruemu0.p0   │ True  │ True  │ None     │ None           │
│ trigger  │ trigger.p0  │ True  │ True  │ None     │ None           │
└──────────┴─────────────┴───────┴───────┴──────────┴────────────────┘

At that point init can be issued.

At that point a few informational ers issues should appear on both Grafana and the ERS issue collector. [Kurt note: I haven't seen these yet, and my understanding is that this is an issue that is being investigated.]

Note: don't forget to set the correct partition name in the dunedaq grafana dashboard to be able to see the ers issue flowing in.

At the moment, init is as far as we can get. The next step of configuring the system (conf) will not currently work because we are still working on cataloguing the network ports that are needed for the various DAQ applications and then hacking nanorc to open them in the cluster containers.

In the absence of being able to run conf, all we can do is shut down the system with the terminate command in nanorc.

Destroying the cluster

To destroy when you are finished, run

cd $POCKDAQDIR/pocket
make destroy.local

This will delete the cluster entirely, along with any state.

Please note that if you get logged out of your shell window on the np04daq cluster before you run these steps to destroy the cluster, you can log back in and run them without having to re-do any of the steps from earlier sections of these instructions.