Feed: Matillion.
Author: Julie Polito
;
Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.
Let’s say you have a Matillion instance that works perfectly in your AWS environment. You’ve built a ton of jobs and installed all the necessary dependencies to support your workloads. You’ve taken the care to properly configure and automatically back up your external RDS Postgres metadata repository for storing your projects and jobs. The instance has been enhanced with third-party Python packages, a carefully configured user store, and possibly even some security hardening and cron jobs applied. Everything works per your exact data and security requirements, and you have all the access you need to your data and cloud resources.
Now, let’s say another department in your organization also uses your AWS account and has created a new VPC with a few VMs and services deployed within it. Maybe as part of this new environment’s setup, a site-to-site VPN tunnel has been established between your VPC and your on-premises Microsoft SQL or Oracle Server. You’d like Matillion to connect to that source data and load it into your data platform. How might you go about allowing Matillion to leverage all this new infrastructure – and access to on-premises resources – when it’s siloed in its own VPC?
The Challenge
It’s not possible to move an existing EC2 instance between VPCs, so maybe you’ve looked into standing up a new Matillion instance in the new VPC and using the Migration tool to move your jobs to the new instance. But you run into a few gotchas with this approach:
- It requires you make special networking accommodations between the old and new VPC, which might be difficult or impossible depending on your network and security teams’ requirements and change management process.
- The Migration tool facilitates the migration of Matillion projects, environments, jobs, and some system-wide configurations. However, as mentioned here, the tool doesn’t assist with the migration of other important things like the LDAP configuration, third-party Python packages, and aftermarket security hardening measures.
Despite these and other potential challenges with using the built-in tools to migrate your prized Matillion environment to the new, more connected and enabled VPC, have no fear! For this, we have a solution…
The Solution
The approach we recommend to customers seeking to move their Matillion instance from one VPC to another in scenarios like the one above involves a little bit of cloud knowledge and a little bit of resource juggling. With the right level of access and permissions in AWS, it’s possible to create backups of your Matillion instance, as well as its external RDS Postgres metadata repository. Thankfully, if you have copies of both of these items, you can create replicas of each in the new VPC and begin taking advantage of all the access and resources it has to offer.
Here’s the general approach:
Clone the Existing Resources
- In the EC2 console, create an AMI of the original Matillion instance.
- If the target region is different from the original, in the EC2 console, copy the AMI to the desired region.
- Launch a new EC2 instance from it in the desired VPC.
- In the source Matillion instance’s web interface, disable any schedules that should not run on the new instance until migration is complete. This step is important because once the cloned instance comes online, it will begin running any scheduled jobs.
- In the RDS console, create a snapshot of the instance’s Postgres database.
- In the source Matillion instance’s UI, re-enable any schedules that were disabled in Step 4.
- If the target region is different from the original, in the RDS console, copy the database snapshot to the desired region.
- In the RDS console, locate the database snapshot and restore it into a new RDS instance in the desired VPC, applying a security group that allows inbound traffic from the new Matillion instance.
Here are some links to resources that may assist with implementing the above steps:
Configure Matillion
- In an SSH session to the new Matillion instance, configure it to use the newly restored Postgres database.
- Reboot the new Matillion instance, and ensure the expected projects, jobs, shared jobs, schedules, etc. from the source instance appear in the web interface upon logging in.
- In the new Matillion instance’s UI, find a job or two that can be run without impacting your production workload and customer data. Try to find jobs that connect to external data and your target data platform. Run them (or relevant portions of them) to ensure your new instance’s networking configuration allows connectivity to the necessary sources and targets.
Complete the Cut-Over
- In the original Matillion instance’s web interface, disable all schedules now that the new instance is available, noting which were enabled.
- In the new Matillion instance’s web interface, enable any schedules that were disabled in any previous step and manually run any missed schedules as needed.
- Update any Application Load Balancers (ALBs), DNS entries, CI/CD pipelines and any other integrations for the original Matillion instance to refer to the new instance, or create new of any, if desired.
- Decommission the original Matillion instance. For instance, if it was created by way of CloudFormation template, you may delete the stack and any AWS resources associated with it.
And here are a few links to assist with the above steps:
Conclusion
This migration path is somewhat involved but will have minimal impact on your development team. The process should likely happen twice: once following the steps in only sections I and II above to identify all the permissions, resources, and detailed steps required, and a second time during a scheduled maintenance window following all steps to complete the migration. The final migration process should take roughly 4 hours to complete, and once done, you’re off to the races.
We hope this guide helps you formulate rock-solid migration plans that allow you to continue taking advantage of all of the benefits of running your ELT workloads in the cloud by way of Matillion, with minimal impact to your environment, your developers, and your customers.
More tips for your Matillion ETL on AWS
If you’d like more tips and tricks for optimizing Matillion ETL within AWS, download one of our “Optimizing” ebooks. There’s one for Amazon Redshift and one for Snowflake.
Download Now: Optimizing Amazon Redshift
Download Now: Optimizing Snowflake
The post Migrating an existing AWS RDS-backed Matillion ETL instance to another VPC appeared first on Matillion.