The developer can manage how and when data is committed to a database using DATABASE SESSIONS. This recipe demonstrates how to accomplish this in Talend. Editing, duplicating, and adding a task to a database connection are just a few of the management options accessible.
These management choices are detailed in the sections below.
Getting Ready
Open the job jo_cook_ch07_0060_databaseSession.
On closer inspection, you'll notice that the job is set up to commit after each record is written.
Become a Talend Certified professional by learning Talend Certification Course from hkrtrainings !
How to do it…
The following are the steps to be taken:
How does it work?
The tMysqlConnection component connects to MySQL and starts a session. When you choose to Use an existing connection in an output component, the component is added to the session, assuring that the records written require explicit commit. The commit component closes the session and commits the writes to the database.
Executions
Multiple Outputs
[Related Articles: Talend Vs Informatica]
Remember the Commit
A common beginner's mistake is to believe that the connection just eliminates the need for manual component setup, which is correct, but it also initiates a database session. As a result, if you don't include a commit component, no data will be written to the database.
Committing but not Closing
A session is automatically closed when the commit component is set. If you want to commit while keeping a session open, the tMysqlCommit component includes a setting that allows you to keep the session open after the commit.
A common real-world requirement is to pass a value from the parent Job to the child Job.
Environment
This procedure was written with:
The suggested procedure was created in the following environment.
Getting Ready
Open the Job
jo_cook_ch07_0070_databaseSessionParent
It is identical to the completed version from the previous recipe, but with a child job in place of the main process. On closer inspection, the kid job should have a connection set up, and it should be the same connection as the parent job.
How do you achieve it?
The following are the steps to consider.:
How does it work?
The parent job's tMysqlConnection component makes a connection and starts a session, while the child job's tMysqlConnection component does the same. The issue with this scenario is that they both create separate sessions, resulting in no entries being committed to the database when the parent is executed, despite the fact that the child has written records.
When the parent of the "cookbook" defines a shared connection, the session information becomes available as a session in the child if we choose to use it, which we do in this case by using the shared connection registered by the parent.
Explore Talend Sample Resumes! Download & Edit, Get Noticed by Top Employers!
As a result, the parent and child connections are now shared in the same session, and when the command is sent, the records contributed by the child are likewise committed.
Many apps will write to/delete from the same table in a variety of ways, with different fields serving as keys and updating different fields at different times. This recipe demonstrates how to accomplish this without having to construct new schemas every time.
Getting ready
Open the job jo_cook_ch07_0080_fieldSelection.
How do you achieve it?
The following are the steps to be taken:
How does it work?
First, we choose whether to insert or update the insert technique. This allows us to use the same component to write to both new and existing records.
The first operation is an insert, which populates the createdDate and createdBy columns for the table's record while leaving the updatedDate and updatedBy columns empty.
Any subsequent write to this database for the supplied key is an update, therefore this will leave the createdDate fields blank and populate the updatedDate and updatedBy columns, as well as the new value of customerName.
There’s more…
This is a wonderful way to ensure that tasks follow pre-defined schemas, which encourages reusability and traceability, and it also enables us to update rows using a different set of key values based on the data we have.
Updating
Since Talend would be producing a SQL WHERE clause under the covers, any key can be used to conduct an update, not only the table's primary key. Simply select the fields you want to use as the column's key in the Field choices box, then click the Update key.
Deleting
You'll also see a column for defining the deletion key in the list of fields. The same technique applies to deleting rows as it does for updating them; however, in the Advanced settings tab, the column Deletion key should be used instead, and the Action on data in the Basic settings set to Delete.
Many database applications need that a log of rejections and errors be kept in order to identify erroneous records and make manual corrections. This recipe demonstrates how to use the reject row facility to collect mistakes for particular rows.
Getting ready
Open the job jo_cook_ch07_0090_rejectsAndErrors.
How do you do it?
The following are the measures to take:
How does it work?
There’s more…
There are a few other things to consider:
Die from an Error
In order for rejects to be recorded using this approach, the option to Die on Error must also be deactivated in addition to any bulk insert methods.
Efficiency
The ability to reject rows without cancelling the task is really valuable and makes code and error management much easier, but it comes with a cost: the rows must be added one at a time.
As inserting rows one at a time is inefficient compared to employing bulk insertion methods or setting block sizes, you may find that there is a trade-off between loading rates and error reporting and management in some databases.
Error Handling
Individual rows being rejected makes troubleshooting considerably easier because the error for a single reject row is a one-to-one match.
Using batch insert techniques, such as the MySQL extended insert method, on the other hand, will return one error but reject the entire batch of rows, which is both bad and good.
As a result, when one row fails, the entire batch is rejected, resulting in a situation where we have both good and bad records in a batch of rejects, forcing us to develop more complex techniques of correcting them for a single reject.
This simple script demonstrates how to run table-related and database management commands.
Getting ready
Create a new job jo_cook_ch07_0100_tableManagement
How to do it…
The following are the measures to take:
How does it work?
The database equivalent of both tJava and tJavaRow, the tMysqlRow component can be used in a flow like tJavaRow or standalone like tJava.
However, like tJava, the tMysqlRow component is most usually used independently, like in this case, when creating a table.
There’s more…
This basic example explains how to utilise tMysqlRow in a single, isolated situation. It's typically used before processing to build temporary tables or drop constraints or indexes before bulk loading, and it's also used after processing to remove temporary tables and restore constraints or indexes.
Surrogate keys would be used by many application databases to uniquely identify rows in a table. As a result, after writing a record, it's common to need to capture the surrogate key so that any connected child elements can use the parent's surrogate as a foreign key. This recipe demonstrates one approach for constructing surrogate keys in a relation and then goes over a few more.
Getting ready
Open the job jo_cook_ch07_0110_surrogateKeys
How do you do it?
The following are the measures to take:
How does it work?
The following two stages are used to create unique keys:
The first stage is to capture the highest value for the key in the customer3NF and order3NF tables, which is done in tMysqlInput. These values are then saved in the globalMap variables to be used later.
The second stage is to create a sequence that starts at the maximum value plus one and uses the sequence's output as the surrogate key for each input in the customer row now that we know the highest key value in the customer3NF table.
We then write the customer data to the customer3NF table, read the order data, and connect the order data to the customer table using the natural key (customerURN and source) in tMap to extract the customer surrogate to use as the foreign key for the order data.
We also build the order surrogate and write the completed order row to the database using tMap.
There’s more…
We can blitz a table at a time with the way shown, which is particularly efficient for batch processing or data migrations. There is one caveat: we must ensure that no other task or process is simultaneously writing to the same table using the same mechanism. If we can't be certain, we should try a different approach.
By avoiding re-reading the globalCustomer table, this approach can be made more efficient. It's advisable to transfer the surrogate key (created) and natural keys (source and customerURN) into a hashMap if you have enough memory. This hashMap may then be utilised as a lookup, avoiding the need to reload the full customer table from the database rather than reading data already in memory.
Prior to writing, identify a range of values as a little variation on the preceding procedure. This could be accomplished by inserting a temporary marker record with a key equal to the current maximum value plus the number of records to be written plus one into the table (or a given range plus one). This means that a second process will begin loading at the top of the range, avoiding any potential for overlap. But don't forget to erase the marker record at the conclusion of the range's writing.
Sequences are database objects that each time they are read, they return a unique value. The procedure simply adds a lookup to the main flow that selects the next sequence value for the table and uses the resulting lookup to produce surrogate keys.
This approach is suitable for both real-time and batch updates; however, sequences are not supported by all database versions.
It is slightly less efficient than the Talend produced sequence approach, but it has the benefit of being usable by multiple processes at once. As a result, in a real-time/web service-based context, it is the optimal way to use.
When a record is written to a table, certain databases allow fields to produce a new one-up number automatically. We don't need to generate a key if the key field is set to auto increment; we only need to write the record to a table and the database will handle the rest.
This strategy, however, has a drawback in that we must re-read the table using the natural key in order to locate the database-generated surrogate.
The last option on the list is to utilise an auto-increment key field but write the record to the database using a stored procedure. After that, the process can write the record and capture the most recent value recorded. This has the benefit of working with most databases, but the drawback of requiring the maintenance of a stored procedure/function.
Conclusion:
We hope this blog is very helpful in learning the different procedures to work with the databases with Talend.
Batch starts on 24th Mar 2023, Fast Track batch
Batch starts on 28th Mar 2023, Weekday batch
Batch starts on 1st Apr 2023, Weekend batch