Troubleshooting#

Investigate API object health#

sclctl can be used to get a good understanding of what is going on.

Check infrastructure information#

sclctl node list: Are Nodes registered as expected?
sclctl controller list: Are all Controllers registered and reporting heartbeats regularly?
Are VLAN tags depleted? Is it possible to create new SCs (each SC has a unique VLAN tag)?

Check object status information#

Most SCL Objects that users interact with expose status information written by controllers. The OpenAPI reference provides detailed description of fields like ControllerStatus, NodeStatus, VolumeStatus, RouterStatus, and VmStatus. Status fields can be useful to answer questions such as:

Does the SCL Object indicate any unrecoverable error status?
Does it look like an SCL Object is "stuck" / is not making any progress?

Investigate health of systemd services#

Use regular systemd tools (systemctl status, journalctl -b -u) to gain a deeper insight into each component:

Component	Systemd Service Name
etcd	`etcd`
SCL API	`scl-api`
L2 network controller	`scl-local-l2-net-ctrl`
L3 network controller	`scl-local-l3-net-ctrl`
Image registry	`vm-image-registry`
VM scheduler	`scl-scheduler-ctrl`
VM controller	`scl-vm-ctrl`
Volume controller	`scl-local-vol-ctrl.apiHost`

Useful questions are:

Is the service running?
Are there any log entries with the Error or Warning level?

If necessary, increase the log level of services as documented in the references to get more feedback. The services are designed to be stateless, so restarts should not cause any major problems (e.g. like a data loss).