Using Home Assistant to Automate Labs Part 3
Automate a Little Longer, Code A Little Longer, With Node-Red!
Before getting into Node-Red, let’s discuss the configuration.yaml and how we are going to utilize the “command line” platform to send our commands. YAML is very picky about proper indentation, and the Studio Code Server add-on will do its best to help you avoid mistakes in formatting.
After any change however, regardless of what Studio Code Server tells you, go to the Developer Tools and click “Check Configuration.” It will tell you if you’d made any mistakes that would prevent HA from rebooting correctly. And since whenever you make a change to configuration.yaml HA must be restarted, it’s an important step.
Home Assistant Command Line Integration
Documentation can be found here. Command Line will be the platform, and “switch” will be the domain. “Switch” will allow the entity to have an on/off function. Every PNET/EVE-NG node will have two lines to the entity. You’ll notice line 1 is start and line 2 is stop. We are calling different options on the unletlab script.
Here’s an example of two entities I created. After reboot, they are searchable and available.
Let’s review the command string.
ssh -o StrictHostKeyChecking=no -i /config/.keys_ssh/id_rsa root@[server ip] 'sudo /opt/unetlab/wrappers/unl_wrapper -a start -T 1 -S 1 -D 1 -F "/opt/unetlab/labs/Aruba SDWAN.unl"'
ssh -o StrictHostKeyChecking=no = do not require entry into “known hosts” file.
/config/.keys_ssh/id_rsa = location of private key
/opt/unetlab/wrappers/unl_wrapper =
the script PNET/EVE-NG uses
-a start -T 1 -S 1 -D 1 = after the “a” is the command, start or stop. The others are Tenant, Session, and Node ID. The only one you really need to change here is the node ID. Every Node gets a unique number.
-F "/opt/unetlab/labs/[your lab file name].unl"' = The name and location of the topology file.
You can use “ps aux” on the command line and manually start a node and glean if these values are right for your use case.
You can get your node IDs in the GUI, but here’s a script you can run that will return all nodes, their numbers, and template types. Simply change the lab file name to yours at the bottom of the script, and run on your lab server. As with all scripts, please review and test before running in production.
https://github.com/darthrater78/pnetscripts/blob/main/listnodes.py
An effective method of testing the ability of HA to send command strings to the lab server is to SSH into HA and paste one of the strings in its entirety and run it. If the node starts, you’re in business. Once validated, start to create all your nodes in configuration.yaml, and reboot HA when complete.
Fun with Booleans.
Before we start with Node-Red, we need to make some Booleans to present our “On/Off” button. In HA they are called “Helpers”.
Navigate to Settings>Devices and Services and in the upper right you will see “helpers.”
Click on “Create Helper” at the bottom. There are lots of options here, but we want the “toggle” helper. This will create an entity in the “input_boolean” domain E.G. input_boolean.test. You can now find it and have fun toggling it off and on. By itself, it does nothing. We will use this helper type to link to a node on Node-Red and the action node to trigger our actions when actuated.
An introduction to Node Red
Node red is an add-on installed via Settings>Add-Ons. Search and install it. Most add-ons can be installed to the sidebar, highly recommended.
Let’s go over the node types I use for this effort. There’s many, many different types and you can go deep down a rabbit hole. At its most basic concept Node-Red is simply passing packets logically across a flow. You can query the packets, set requirements on them, etc. Here are some examples of the nodes I use in my automation.
Now let’s take a look how this all works, and start with the simple setup, the Proxmox endpoints. This is on my dashboard. I have some custom settings on this card, which forces me to use all yaml. Everything below the lab control is a status of the VMs and has nothing to do with Node-Red. The important thing here is that I have a Boolean entity created called input_boolean.axis_lab_start. Lets take a look at how we call this in Node-Red.
The “LAB” node is a event:state node. The GIF shows what will trigger the forwarding. If the boolean is toggled from off to on, the packet will exit the true output and start the action node.
If we check the properties of the event:state node, the domain for these entities is “button” and I am triggering the command for start I showed earlier. I included multiple entities here, comma separated.
For the “true” output changing to “false” the packet would flow to the bottom node, triggering the shutdown logic.
So that’s the easy setup to automate the Promxox VMs. The PNET/EVE-NG is more complicated. Let’s get into that now.
PNET/EVE-NG NODE-RED AUTOMATION
Let’s start with the dashboard. Here, I am simply calling the zigbee power adapter and enabling a “toggle” to power the box on. The “Server” Shutdown” button is tied to Node-Red.
Ok, let’s step through this. My goal is to turn on the server, walk away, have the “PNET Server Shutdown” boolean change from off to on so I can actuate it later, and have my nodes all start in a staged fashion. I only want this to happen once the server is responding.
Ping server
Once the server responds, use the trigger node to halt all future forwarding until it receives a “reset” packet. This prevents the start-up logic from firing off more than once. Not having this trigger node was causing the startup logic to run when the shutdown logic was running, since the shutdown turns off the router Boolean off, which then triggered the startup to go again, and hilarity ensued.
Use the “Shut Down PNET On?” to only forward if the Boolean is “off”.
Then turn the Boolean on (so I can use it later to turn it all off)
Delay the next packet for 2 minutes, to ensure the system is up.
Next we have three “current state” nodes for Routers, SDWAN, and Compute. They will only forward a packet if the Booleans are off. Upon validation, all three will receive the next packet, however notice how two of them have delay nodes to stagger the next actions.
The furthest right nodes are Booleans. These are tied to a event:state node, which are tied to numerous action nodes. We’ll go over that shortly.
Now let’s focus on the Router Boolean. The logic is the same for SDWAN and Compute, so there’s no need to detail each. One quick note, however.
For my other Booleans, I have disabled the shutdown portion of the automation. The unletlab script currently only has “stop” and not shutdown as options. I’m alright with hard shutdowns of my routers, but my other nodes not so much. Those I shutdown by hand until I can find a way to trigger graceful shutdowns.
UPDATE: 3/8/24
Good News Everyone! I have a method to invoke a clean ACPI shutdown of my lab nodes! The feature is new in PNET Beta 5.5.18/6.0.0-100 and likely in GA in EVE. The command structure is a little different. ssh -o StrictHostKeyChecking=no -i /config/.keys_ssh/id_rsa root@[IP] 'echo system_powerdown | sudo nc -U /opt/unetlab/tmp/1/30/monitor.sock -q 0'
Basically we are going to look for the presence of the monitor.sock file and if present, echo “system_powerdown” to qemu and trigger an orderly shutdown. There’s a big difference between the location of the command as opposed to the power on logic. With the power on command, we are sending the node ID as it is on the topology.
With the ACPI command, we are looking for the presence of a file in the physical location on the node’s folder. In PNET, those two numbers are not always the same. For example, one node is 5 on the topology, but linked to folder 60 in /tmp.
Ok, back to the routers. Here, I list each entity separately and link them end-to end. This will cause a procedural turn up of all the nodes in the chain.
The switch domain has a “turn on” and “turn off” service.
After about five minutes or so everything is up and running with all nodes started. I have the same process in work for the SDWAN and Compute Booleans. Let’s go over the logic now to turn it all off.
Remember our friend here?
When I toggle that off, here’s what happens.
Toggle off the Shutdown.
Pass state condition, only forward if the Router Boolean is “on”
Action node call to Shutdown Routers. This is where the loop occurred in my startup logic that my trigger node prevents.
Delay next message 50 seconds.
run “switch.pnet_server_shutdown” which sends graceful shutdown command to labserver. “ssh -o StrictHostKeyChecking=no -i /config/.keys_ssh/id_rsa root@[serverip] 'shutdown now'
Delay next message 30 seconds.
Remove power from server via zigbee switch.
Toggle off remaining Booleans.
Note that I have SDWAN and Compute at the end of this logic. Once I find a way to shut things down gracefully, I will move them in between the 50 second delay and the shutdown. This setup is to keep the dashboard clean.
Ok, there you have it!
Happy labbing, and until next time.