Performing a DNS Migration with Reduced Risk
Updated May 26, 2024Created March 24, 2022
Read through this post fully before getting started. Plan 2 days minimum to complete all the steps with the least amount of risk and stress.
DNS Migrations can be stressful for mulitple services and although careful planning and preparation can mitigate mosts risks, issues do occur and should be planned for.
Pre-Migration Testing
This is to reduce the errors in the DNS migration. Test as much as possible locally.
Tools
- Any browser
- Switchhosts for easy and consistent hosts file management
- ping (already installed)
- hoppscotch.io / postman
- curl
Steps
Setup the new production server. Note any routes or APIs to be tested.
Get the IP address of the new server. This can be achieved with ping.
ping example.com
Optionally, create a backup of your existing hosts file.
sudo cp /etc/hosts /etc/hosts.backup-$(date '+%Y-%m-%d-%Hh%Mm')
Append to your hosts file using either Switchhosts or manually
-
sudo killall -HUP mDNSResponder && echo "done"
Test the site with your browser. If this doesnt work as expected continue with the next step to debug.
Test API(s) with hoppscotch.io / postman (Start building up a collection of the endpoints for testing if you havent already). Debugging may need to a host header added to the request.
Check the SSL certificate being issued covers the expected domain. ie if our final domain is example.com, the SSL certificate needs to specify that (not a default AWS domain)
# From Stackoverflow https://stackoverflow.com/a/34812039 curl --insecure -vvI https://www.example.com 2>&1 | awk 'BEGIN { cert=0 } /^\* SSL connection/ { cert=1 } /^\*/ { if (cert) print }'
Pre-Day Planning
- Look at analytics and advise the client when the best time would be. Plan to experience downtime and you may need to inform users. Prefer early mornings or early in the week. This is a gateway to the clients system and regardless of planning, changes will carry risk.
- Plan a brief session with the product owner, ideally two technical members who will be performing the DNS migration, and invite the client/stakeholder for transparency.
- Get access to the production DNS host or inform the client that a team member with access will need to attend the briefing call.
- Reduce the DNS TTL for minimal downtime.
- Schedule 3 sessions with the client for the migration day. At least 30mins each, and at least 2hrs apart. The sessions will be for rollout (±5mins) & testing (±10mins - adjust as needed), and if any issues, rollback (±5mins). The planned gaps are for debugging and making appropriate changes to code and systems before the next attempt.
- Discuss and agree with the client the bug triaging process. See this triaging table as a starting point.
Rollout Process
Tools
- Read the DNS host documentation and think through the steps.
- dnschecker
- Flush Cache
- Opera Browser or an equivalent browser and VPN
- hoppscotch.io / postman
Steps
Ensure all host files have been restored from prior testing.
-
sudo killall -HUP mDNSResponder && echo "done"
Talk through the steps and the plan. Invite all relevant parties and stakeholders, but set the precedent its for transparency and that not to add pressure. Acknowledge any previously failed attempts, the fixes that have been implemented, and the next steps.
Create a backup of the values you plan to change and store then in a shared location. Consider doing an export via the existing provider. Be aware nameserver transfers will require more records to be moved.
Update the DNS values in the DNS hosting provider.
Open the page via VPN in an Opera incognito window.
Ensure there are no unexpected SSL issues.
Test the routes and APIs previously noted. Run any Hoppscotch collections, e2e test suites, or a smoke tests.
If there are any issues note them down. Proceed to gather as much information as possible. Aim to test the whole system - consider other paths around the failiures to see if there are other areas overlooked. ie homepage may not work but API's do.
Assess if the errors experienced are critical, as per triage table.
Move to Rollback Process if needed.
Use dnschecker to check if propogation has finished.
Write out and send a report with the results, steps followed, errors experienced, and next steps.
If successful, increase the DNS TTL.
Rollback Process
Tools
As per rollout process.
Steps
- Most importantly; if there are any issues note them down. Proceed to gather as much information as possible. Aim to test the whole system - consider other paths around the failiures to see if there are other areas overlooked. ie homepage may not work but API's do.
- Update the DNS values in the DNS hosting provider to the original values.
- Flush Public Cache.
- Open the page via VPN in an Opera incognito window.
- Ensure there are no unexpected SSL issues.
- Test the routes and APIs previously noted. Run any Hoppscotch collections, e2e test suites, or a smoke tests.
- Flush Public Cache.
- Use dnschecker to check if propogation has finished.
- Write out and send a report with the results, steps followed, errors experienced, and next steps.
Further Info
Further Tools
- Cloudflare DNS
- Traceroute for figuring out the routing hops data has to go through.
- Google Admin Toolbox - Dig for DNS lookup by querying name servers.
- dig as above, but locally run.
- Nginx for host rewrite