Talend & PostgreSQL - Data Migration Dream Team
Talend & PostgreSQL - Data Migration Dream Team

Talend & PostgreSQL - Data Migration Dream Team

05/05/2017 by James Bujold
Introduce yourself to a great way to migrate any amount of data using an ETL tool (Talend) and a database (PostgreSQL)

The idea of migrating massive amounts of data from any database into Salesforce can be intimidating.  If you’ve ever tried to migrate hundreds of thousands or even millions of records using Excel you’ve most likely experienced very slow processing times and regular crashes, if the program can manage to open the file at all.  One solution may be to migrate the data in batches, but this is slow and cumbersome and only increases the likelihood of human errors being made in the process.  As with any task that at first seems overwhelming, it simply needs to be broken down into its component parts and taken on one piece at a time.   Migrations of this size also require a different set of tools be used. The goal of this 2-part post is to introduce these tools and walk through the process of setting up a basic, but highly scalable, migration job from one Salesforce org to another using Talend and a PostgreSQL database.  Explaining every nuance and detail of setting up and using these tools is well beyond the scope of this post, but hopefully, this can serve as an introduction to further learning.

So what if it takes longer, why should I use an ETL tool?

Extract, transform and load (ETL) tools offer a level of automation and repeatability not achievable by a data loader.  The Salesforce Data Loader and other products like it are great for inserting, updating, deleting, and even upserting data into Salesforce. The tricky part is preparing that data to enter the database.  Id’s need to be mapped, picklist values need to be translated, and some records may need to be filtered out entirely. Performing those updates in a spreadsheet is a manual process that’s easy to do once, twice, or maybe a few times, but before long, you’ll be wishing there was some way to automate the process.  That’s where ETL tools do exactly as the name suggests extract, transform, and load, all in one place with almost infinite possibilities. Anything from migrations and simple data transfers to highly complex integrations can be run with ETL tools.  For the sake of this example, we’ll be using Talend.

Talend

Talend is a free, open source ETL tool.  This is where we’ll be setting up our Migration jobs.  You’ll want the ‘Talend Open Studio for Data Integration’. Grab the latest version here.  Why Talend? The world is full of data integration and ETL tools that could be used to perform the same process I’m going to be describing.  Talend just happens to be free and accessible for everyone. It’s also open source, which means it can be a bit buggy sometimes, and it helps to know Java when running into issues.     

ETL seems cool. What do we need a database for?

We still need a place to store our id values, so we can accurately map our record’s old lookup id’s to their new values.  Similar to a vlookup in Excel, we’ll be using one column of the table in our database to store the legacy Id values and another to store the record’s new Id. It would be possible to write the values to a csv file from the ETL tool, but doing this in a database like PostgreSQL allows for almost infinite scalability along with a way to query the database and look for errors without having to open a file with possibly millions of rows that Excel will likely choke on.

PostgreSQL + PgAdmin

PostgreSQL is an open-source multi-platform database that we’ll be running locally to handle our lookup tables.   PgAdmin is a GUI for managing and querying our PostgreSQL database.  You can find the latest version on this download page for nearly any operating system.  

Another scenario where these tools come in handy is when a migration needs to take place in a narrow time frame, like over a weekend.  It allows you to set up all the mapping and migration jobs in advance with plenty of testing so that on the day of the migration you can simply focus on running the jobs in sequence and looking our for bulk data job errors.  

Migrations can be carried out in many different ways.  Learning to use an ETL tool can help streamline the process. Combining that with a local database can help you scale those processes to handle any size dataset.  

Part 2 of this post will walk through, step by step, how to setup each of these tools to run a basic sample migration.  

What’s your favorite way to migrate data into Salesforce? Share them with me on the Arkus Facebook page, in the comments below, in the Success Community, or to me directly via Twitter at @jpbujold