A few months ago, we added “Let’s Encrypt” certificates to our websites, allowing us to generate free certificates for our custom website urls. Along with this, were web jobs that refresh the certificates automatically, if they have less than 30 days remaining. As the certificates only last 90 days, this job is fairly important.
Earlier this week we received an email notifying us that our certificate was going to expire in less than 30 days. That is strange. We checked the web job and found it was pending a restart.
When we restarted the web job it failed.
Diving into the logs, we just see a lot of errors. This is not good. It looks like we are missing a connection string to our storage, where the web job logs are stored.
When we check out web apps configuration, we are indeed missing two storage connection strings.
With the connection strings back in, our logs now work and we can see the problem. It turns out, we didn’t specify an email address that Let’s Encrypt requires when automatically creating the certificate.
We add the missing property to the ARM template, and push the code to Azure DevOps, running a pull request to confirm this works. As our pull request also pushes our changes to Dev, we can verify that this worked.
Now the logs show success, and our certificates are generating.
We complete the pull request and watch our build redeploy to Dev, and also deploy the missing changes to QA and Prod
Once it’s done, we confirm that the web job did run and update the certificate.
Today we’ve learned the basics of how to troubleshoot and resolve web job issues. More importantly, we also understand the root cause that allowed us to miss this the first time around.