Skip to main content

Pentaho Data Integration - Multi-part Form submission with file upload using the User Defined Java Class Step

I recently needed to use Pentaho Data Integration (PDI) to send a file to a server for processing using HTTP Post. I spent several hours trying to use the existing steps HTTP Post, HTTP Client & Rest Client but I couldn't get it to work. After some more research I came across the issue PDI-10120 - Support for Multi-part Form Submittal In Web Service Steps and I thought I was out of luck.

I previously wrote a small Java client for a similar use case and remembered the PDI has a step called User Defined Java Class (UDJC). After reading this great tutorial I created the following basic transaction. I have a dataset with the URL and the full file path and use the UDJC to make the HTTP call.

HTTP Post using User Defined Java Class

The Java class handles the actual HTTP Post. It uses 2 input variables, the URL (url) which is used for the call and the file name (longFileName). The HTTP call then contains the file (line 30) and the file name (line 31). I included some basic error handling based on the HTTP status code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import java.io.File;

import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.mime.*;
import org.apache.http.impl.client.HttpClients;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
 Object[] r = getRow();
 
    if (r == null) {
        setOutputDone();
        return false;
    }

 r = createOutputRow(r, data.outputRowMeta.size());
 
  // Get the value from url
    String urlString = get(Fields.In, "url").getString(r);
 // Get the file name 
 String longFileNameString = get(Fields.In, "longFileName").getString(r);
 // Load the file
 File file = new File(longFileNameString);
 
 HttpPost post = new HttpPost(urlString);
 MultipartEntityBuilder entity = MultipartEntityBuilder.create();
 entity.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
 entity.addBinaryBody("file", file);
 entity.addTextBody("fileName", file.getName());
 post.setEntity(entity.build());

 HttpClient httpClient = HttpClients.createDefault();

 try{ 
  //Make HTTP Call 
  HttpResponse response = httpClient.execute(post);
  // Check if response code is as expected
  if (response.getStatusLine().getStatusCode() != 200) {
   putError(data.outputRowMeta, r, 1, "Error returned", "HTTP Status Code", "HTTP Status: " + response.getStatusLine().getStatusCode());
   return true;
  } 
  // Set value of HTTP Status, integer values need conversion
  get(Fields.Out, "http_status").setValue(r,  Long.valueOf(response.getStatusLine().getStatusCode()));
 } catch (Exception e) {
  // Set value of HTTP Status to -1 since HTTP Post caused exception
  get(Fields.Out, "http_status").setValue(r,  Long.valueOf(-1));
  return true;
 }

 // Send the row on to the next step.
    putRow(data.outputRowMeta, r);
 
 return true;
}

To get this example to work you will have to download the Apache HTTP Client and add the JAR files to the lib folder from PDI.

Click here to download the sample transformation. I hope it will be helpful for others, use at your own risk.

Comments

Anonymous said…
Can we send PDF as well?
dhakim said…
Hello, I am getting error on Line 28, Column 31: Cannot determine simple type name "MultipartEntityBuilder" although, I have copied "httpmime-4.5.10.jar" in my Pentaho DI Lib folder.
Ben said…
I just checked my lib folder, I added the following files. I dont' remember 100% if all of them were required for multi-part form submission but this version should work.
httpclient-4.3.3.jar
httpmime-4.3.3.jar
httpcore-4.3.2.jar
Anonymous said…
For some reason the transformation is not working on PDI 9. The transformation doesn't get any errors but the file is never sending.

The response field is also empty so i can't figure out what the error is.
Angel said…
Any luck resolving this issue?? You're right in that it only happens on PDI 9, but I also found that it happens on the PDI server instance. The issue is not present in the PDI client tools. I was digging through the logs and I found this error after copying the apache client jars:

ERROR (version 9.0.0.0-423, build 9.0.0.0-423 from 2020-01-31 04.53.04 by buildguy) : java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.http.client.methods.HttpEntityEnclosingRequestBase.setEntity(Lorg/apache/http/HttpEntity;)V" the class loader (instance of org/codehaus/janino/ByteArrayClassLoader) of the current class, Processor, and the class loader (instance of org/apache/catalina/loader/ParallelWebappClassLoader) for the method's defining class, org/apache/http/client/methods/HttpEntityEnclosingRequestBase, have different Class objects for the type org/apache/http/HttpEntity used in the signature
Anonymous said…
I haven't tried it myself yet with PDI 9 but the message " have different Class objects for the type org/apache/http/HttpEntity used in the signature" seems to indicate that PID includes some of the libraries in a different version causing this issue
Angel said…
It looks like PDI 9 includes httpclient and and httpcore in their WEB-INF/lib directory, so httpmime just needs to be deployed there as well instead of the main tomcat/lib directory.

This solved the issue for me

Popular posts from this blog

Dynamic cell references in spreadsheets with Google Docs

During my former internships in a consulting company I had to work A LOT with Microsoft Excel and often had to use dynamic cell references over multiple worksheets. Recently I started using the spreadsheets of Google Docs to track my bank account balance and to figure out where all my money goes. I decided to have one sheet for every month + one sheet of the month I want to analyze. But how do I dynamically change the reference to the sheet (the monthly sheet) I want to analyze without editing every single formula? Here is my solution : Create the target sheets and your overview sheet - I gave my sheets the names Month + Year (July 09) In the overview choose one cell that you want to contain the reference sheet and enter the sheet name - cell D24 in the example - Using the month names you might have to write 'July 09 , otherwise Google will think it's a date. In the overview you can now dynamically reference to a detail sheet using the following formula: =INDIRECT("'

Creating YTD transformation tables

The other day I had to setup a new data warehouse that will be used for reporting with MicroStrategy. Part of it was setting up the date dimension including the transformation tables. I had a quick look online and couldn't find any script doing the work for me so I created them myself (with the help of a colleague). All you need is an existing date dimension with date_id, year_id, quarter_id, month_id and week_id, you can find plenty of scripts for that online. YTD table select t1.day_id, t2.day_id INTO YTD_DAY from LU_DAY t1, LU_DAY t2 where t1.day_id >= t2.day_id and t1.year_id = t2.year_id QTD table  select t1.day_id, t2.day_id as qtd_day_id INTO QTD_DAY from LU_DAY t1, LU_DAY t2 where t1.day_id >= t2.day_id and t1.QUARTER_id = t2.QUARTER_id  MTD table select t1.day_id, t2.day_id as mtd_day_id INTO MTD_DAY from LU_DAY t1, LU_DAY t2 where t1.day_id >= t2.day_id and t1.month_id = t2.month_id  WTD table select t1.day_id, t2.day_id as wtd_day_id INTO WTD_D