Talend & PostgreSQL - Part 2 - Setup
Talend & PostgreSQL - Part 2 - Setup

Talend & PostgreSQL - Part 2 - Setup

05/26/2017 by James Bujold
This is a follow up post to part 1 "Talend & PostgreSQL - Data Migration Dream Team"

This is part 2 of a blog post about utilizing Talend and a PostgreSQL database to assist with data migrations.  Part 1 outlines the reasons to use an ETL + database approach to migrating data and generally outlines the tools involved.  You’ll also find links to download and install the tools needed to complete the following setup.

Setting Up your database:

Once you’ve installed your tools we’re ready to start setting up connections to our various datasets.  In this case, two Salesforce orgs and a PostgreSQL database.  First, lets add a database and a table to our PostgreSQL server.  Open pgAdmin.  Goto ‘Object’--> Create → Database.  I’ll call mine ‘IDLookup’.  Save.

Next we’ll add a table with some columns.  

Goto ‘Tools’ → ‘Query Tool’   Past in the following:  

CREATE TABLE id_table

(

   "Legacy_ID__c" VARCHAR(18),

   "Id" VARCHAR(18)

)

Press Execute / F5

Screen Shot 2017-04-28 at 3.56.09 PM.png

 
 

Executing this script will create a table with two columns ‘Legacy_ID__c’ and ‘Id’.

After refreshing the view:  (‘Object’ → ‘Refresh’)  You’ll see the table show up under ‘your table name’ → ‘Schemas’ → ‘Public’ → ‘Tables’


Screen Shot 2017-04-28 at 3.56.51 PM.png


Note that we just created a table via the query tool.  CUD (create, update, delete) are all possible for tables and records in a SQL database using the query tool.


Connecting Data Sources to Talend

Now that we have a table to connect to in our database we can switch to setting up Talend.  

Once Talend is installed, be sure to install additional packages from the help menu.  Some of these will be needed in order to connect with Salesforce.
 

Screen Shot 2017-04-28 at 1.21.26 PM.png


We’ll start by creating a connection with the PostgreSQL database.  

In Metadata, create a new Db Connection.


Name the connection and fill in the database details. Click ‘check’ to make sure you’ve successfully connected to the database.  The default port will be 5432.  In order to make sure, execute the following script in the query tool of pgAdmin:


SELECT *

FROM pg_settings

WHERE name = 'port';


Note: mine is running on 5433, so that's what I’ll enter in Talend.


Once you’ve connected to the database you’ll need to retrieve the schema.



Select the table/s to include in your schema.


Click ‘Finish’.

 



You’ll see the database, table, and columns in your metadata repository.

 



Next will be to create connections with each of the Salesforce orgs we’ll be working with.

Create a new Salesforce connection from the Metadata section.  

 



Enter your org credentials and click ‘Test connection’.  If you’re connecting to a sandbox be sure to update the Salesforce url in the ‘Advanced…’  section from: https://www.salesforce.com/services/Soap/u/37.0  to https://test.salesforce.com/services/Soap/u/37.0

 

 

Click ‘Next’ and then select all the objects you’ll be exporting/importing to/from the org.






Repeat for each org you’d like to connect for this job.

 

 

Now that we have all of our connections available to select from metadata.  We can start adding components to our job.  

The first thing we’ll do is extract all the Account records from our Source org and move them into our destination org.  

Drag the ‘Account’ object from the Source Salesforce Org onto the design screen.  Select the tSalesforceInput component.  From the Palette add a tmap component and finally select the ‘Account’ object from the Destination Salesforce org and select the tSalesforceOutputBulkExec.

Connect the components by dragging the connector from tSalesforceInput to tMap and then from tMap to tSalesforceOutputBulkExec. This should populate the schema from both orgs into the tMap component.  



Double click the tMap and simply drag fields from the source on the left to the destination on the right. Most importantly, map the Id field from the source to an external Id field in the destination org.  In this case, I’m mapping to OldSalesforceID__c.


Formula fields and some audit fields will have to be removed from the destination schema.  Created Date and Created by Id can be set only on an insert and the profile of the credentials used to login through Talend will need to have set Audit field permissions.


Screen Shot 2017-04-28 at 11.55.57 PM.png


Run the job.  

Now we’ll create and run a job to populate both the legacy and new id from the Destination Salesforce org into our PostgreSQL database. This time select a tSalesforce input after dragging the ‘Account’ object from the Destination Salesforce org onto the design screen followed by a tMap into a tPostgresOutput.


Edit the map to put the Id from the account to the Id column of the database schema and the OldSalesforceID__c field into the Legacy_Id column.

Deactivate the previous sub job before running this one.  Run it.  

At this stage, we’ve migrated the accounts from one org to the other and created a map of ids for those accounts in the id_table our PostgreSQL database. When we migrate the contacts we’ll use that map to assign contacts from the source org to the correct account in the destination org. 

Next, we’ll setup the job to migrate the contacts and reference the database. Add both the ‘Contact’ object tSalesforceInput and a tPostgresqlInput  component that references the id_table as inputs to a tMap component. The output will be tSalesforceOutputBulkExec to the ‘Contact’ object from the Salesforce Destination org.

Map the AccountId field to the Legacy_ID__c field of the tPostgresqlInput  then map the Id field of the tPostgresqlInput  to the AccountId field of the output.  Referencing the Id_table in our database is that simple and it can be done as many times as necessary for any number of lookups on the object.  Just add another instance of the tPostgresqlInput for each lookup.  


 

Run the job and your contacts will insert with the correct account ids.  Since created_by_id and created_date can only be set via insert, this method will allow you to correctly migrate those values.  

This is a very basic example of what can be done using the combination of an ETL and database to assist with data migrations.  Like anything worth learning, it may take some time to wrap your head around the potential benefit of using these tools.  Take your time and make sure you do your learning in a sandbox.  

Have any great ETL learning stories? Share them with me on the Arkus Facebook page, in the comments below, in the Success Community, or to me directly via Twitter at @jpbujold