Friday, April 7, 2017

Replacing source files on the fly - Playing around with XML files

The following post came up after seeing that lot of people in the FDMEE community were asking "how can we manipulate the source file and replace by new one on the fly?"
In other words, how we can replace the source file selected by end user with a new file we create during the import process... or let's be clear... how can we cheat on FDMEE? :-)

I thought it was good idea to share with you a real case study we had from a customer. Their ERP system had a built-in process which was extracting data in XML format. Hold on, but can FDMEE import XML files? Not out the box, Yes with some imagination and scripting.

The Requirement
As stated above, FDMEE does not support all kind of formats out of the box. We usually have to ask our ERP admin (or IT) to create a file in a format that FDMEE can easily read, mainly delimited files such as CSV.

But what about Web Services like SOAP or REST? they mainly return XML or JSON responses. We need to be prepared for that in case we want our FDMEE integration to consume a WS. This is quite useful in FDMEE on-premise as I guess Data Management for the Cloud will include "some kind of JSON adapter" sooner or later in order to integrate with non-Oracle web services.

And what about any other integration where source files are in another format than fixed/delimited?

Luckily I had that one... we get an XML file from our IT and we need to convert to CSV so we can import through FDMEE.

High Level Solution
The main idea is to convert the XML file selected to a CSV file that FDMEE can understand. Now the questions are Where and How?

  • Where? It makes sense that we do the conversion before the actual import happens. BefImport event script?
  • How? FDMEE will be expecting a CSV file, how do we convert the XML to CSV? There are multiple methods: Python modules, Java libraries for XPath... I will show one of them.
The XML File
Image below doesn't show the real XML (confidentiality) but a basic one:
As you can see data is enclosed in <data> tag an lines are enclosed in <dataRow> tags. Besides, each dimension has a different tag. 
As an extra for this post, just to let you know that I usually use Notepad++ plugin XML Tools which allows me to perform multiple operations including XPath queries:
Before we move into more details. What do you think it would happen if we try to import the XML file with no customization? 
FDMEE rejects all records in the file. What were you thinking then? That's the reason I'm blogging about this (lol)

Import Format, Location and DLR for the CSV File
In this case, our source type is File. However, I usually define instances of File when I want FDMEE admins/users to see the real source system (this is optional):
The Import Format (IF)  has been defined to import a semicolon delimited file having numeric data only (you can use any delimiter):
I'm going to keep it simple. One-to-one mapping between XML dimension tags and HFM dimensions:
The Data Load Rule is using the IF we just defined. As you may know, we can have one location with multiple DLRs using different IFs when source is File.

BefImport Event Script
The conversion will done in the BefImport event script which is triggered before FDMEE imports the file the end-user selected when running the DLR.

We can split this script into two main steps:
  1. Create the new CSV file in the location's inbox folder
  2. Update database tables to replace the original file selected with the new one created in step 1
The final solution could be more sophisticated (create the CSV based on IF definition, parse null values, etc.). Today we will go for the simple one.

Let's dive into details.

Converting XML to CSV
There are multiple ways of converting an XML to CSV. To simplify, we could group them as:
  • Method A: parses the entire XML and convert to CSV
  • Method B: converts nodes into CSV lines as we iterate them
Method A would be good for small files. It's also quite useful if our XML structure is complex. However, if we have big files we may want to avoid loading file into memory before converting which is more efficient. Therefore, I have decided to implement Method B. Within all different options we have, I will show the event-style method using xml Python module.

Which Python modules I'm using?
  • xml module to iterate the XML nodes (iterparse)
  • csv module to create the CSV file
  • os to build the file paths
Let's import the modules:

# Import Section
try:
    from xml.etree.ElementTree import iterparse
    import csv
    import os
    fdmAPI.logInfo("Modules successfully imported")
except ImportError, err:
    fdmAPI.logFatal("Error importing libraries: " % err)

Then we need to build the different paths for the XML and CSV files. We will also create a file object for the CSV file. This object will be used to create the csv writer.
The XML file is automatically uploaded to the location's inbox folder when import begins. The CSV file will be created in the same folder.

# Get Context details
inboxDir = fdmContext["INBOXDIR"]
locName = fdmContext["LOCNAME"]
fileName = fdmContext["FILENAME"]
loadId = fdmContext["LOADID"]

# XML File
xmlFile = os.path.join(inboxDir, locName, fileName)
fdmAPI.logInfo("Source XML file: %s" % xmlFile)

# CSV file will be created in the inbox folder
csvFilename = fileName.replace(".xml", ".csv")
csvFilepath = os.path.join(inboxDir, locName, csvFilename)

# To avoid blank lines in between lines: csv file 
# must be opened with the "b" flag on platforms where 
# that makes a difference (like windows)
csvFile = open(csvFilepath, "wb")
fdmAPI.logInfo("New CSV file: %s" % csvFilepath)

The writer object for the CSV file must use semicolon as delimiter so it matches with our IF definition. We have also enclosed non-numeric values in quotes to avoid issues in case you define your import format as comma delimited:

try:
    # Writer
    writer = csv.writer(csvFile, delimiter=';', quoting=csv.QUOTE_NONNUMERIC)
except Exception, err:
    fdmAPI.logDebug("Error creting the writer: %s" % err)

Once the writer is ready, it's time to iterate the nodes and building our CSV. Before seeing the code, I'd like to highlight some points:
  • We just want to capture start tags so we only capture start event in iterparse
  • We can include event in the for statement for debugging purposes (we can print how the XML file is read)
  • Property tag returns the XML node name (<entity>...)
  • Property text returns the XML node text (<entity>EastSales</entity>) 
  • We know amount is the last XML tag so we will write the CSV line when it's found
  • The CSV writer generates the delimited line from list of node texts (row)
try: 
    # Iterate the XML file to build lines for CSV file
    for (event, node) in iterparse(xmlFile, events=['start']):
        
        # Ignore anything not being dimension tags
        if node.tag in ["data", "dataRow"]:            
            continue

        # For other nodes, get node value based on tag
        if node.tag == "entity":
            entity = node.text
        elif node.tag == "account":
            account = node.text
        elif node.tag == "icp":
            icp = node.text
        elif node.tag == "custom1":
            c1 = node.text
        elif node.tag == "custom2":
            c2 = node.text
        elif node.tag == "custom3":
            c3 = node.text
        elif node.tag == "custom4":
            c4 = node.text        
        elif node.tag == "amount":
            amount = node.text 
        
        # Build CSV row as a list (only when amount is reached)
        if node.tag == "amount":
            row = [entity, account, icp, c1, c2, c3, c4, amount] 
            fdmAPI.logInfo("Row parsed: %s" % ";".join(row))        
            # Output a data row
            writer.writerow(row)
        
except Exception, err:
    fdmAPI.logDebug("Error parsing the XML file: %s" % err)

The result of this step is the CSV file created in the same folder as the XML one:
If we open the file, we can see the 3 lines generated from the 3 XML dataRows:
Cool, first challenged completed. Now we need to make FDMEE to import the new file. Let's move forward.

Replacing the Source File on the fly
FDMEE stores the file name to be imported in several tables. It took to me some time and several tests to get which tables I had to update. Finally I got them:
  • AIF_PROCESS_DETAILS: to show the new file name in Process Details page
  • AIF_BAL_RULE_LOADS: to set the new file name for the current process
  • AIF_PROCESS_PERIODS: the file name is also used in the table where FDMEE stores periods processed by the current process
To update the tables we need 2 parameters: CSV file name and current Load Id (Process Id)

# ********************************************************************
# Replace source file in FDMEE tables
# ********************************************************************

# Table AIF_PROCESS_DETAILS
sql = "UPDATE AIF_PROCESS_DETAILS SET ENTITY_NAME = ? WHERE PROCESS_ID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

# Table AIF_BAL_RULE_LOADS
sql = "UPDATE AIF_BAL_RULE_LOADS SET FILE_NAME_STATIC = ? WHERE LOADID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

# Table AIF_PROCESS_PERIODS
sql = "UPDATE AIF_PROCESS_PERIODS SET IMP_ENTITY_NAME = ? WHERE PROCESS_ID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

Let's have a look to the tables after they have been updated:
  • AIF_BAL_RULE_LOADS
  •  AIF_PROCESS_DETAILS
  •  AIF_PROCESS_PERIODS
At this point, FDMEE doesn't know anything about the original XML file. Maybe some references in the process log, but nothing important.

Let's give a try!
Ready to go. FDMEE user selects the XML file when running the DLR:
Import is happening... and... data imported! XML with 3 dataRows = 3 lines imported
Process details show the new file (although it's not mandatory to change it if you don't want to)

I'm going to leave it here for today. Processing XML files can be something very useful, not only when we have to import data but in other scenarios. For example, I'm sure some of you had some solutions in mind where the Intersection Check Report (FDMEE generates an XML file which is converted to PDF) had to be processed...

I hope you enjoy this post and find it useful for your current or future requirements.

Have a good weekend!

Wednesday, March 29, 2017

BBT for FDMEE #1 - Target Accounts as Source in Mappings

Hola!
Working for customers, preparing training, conferences and the most important one, Francisco Jr running around, have been keeping me busy during the last months.

One of the presentations I've been working on is "Black Belt Techniques for FDMEE" (aka BBT for FDMEE). I thought it was interesting for people to know how to meet requirements of different complexity with some techniques which of course, aren't in the books :-)

Although I can't go into too much detail (I don't want to spoil the presentation), this is a foretaste of what you will enjoy if attending to Kscope17.

The Requirement
As you know, FDMEE can pull data from very heterogeneous source systems. Once data has been extracted, it has to be mapped into our target system (let's say HFM). Usually, people responsible of maintaining mappings (aka mappers) are more familiar with target system rather than source. 
This is not always the case but it's a common scenario when financial departments are split. How often do you hear "Not sure about this, we need to ask our ERP guy..."?

Another common scenario is that ICP/Custom dimensions mappings use source ERP account as a driver either importing source account into source ICP/Custom dimensions or using Multi-dim/Conditional maps. 

Have you have you ever asked to the mapper: Would it be easier for you if you could use the HFM account to define ICP/Custom dimension mappings rather than source account?

In my case, I always do. And what I found is that if they can define mapping using the target HFM account, maintenance tasks are much simpler and the number of mapping rules is highly reduced.

Of course, the immediate question is: Can we do that? Yes we can. How? 

Lookup dimensions as a Bridge, that's the answer
Lookup dimensions can be used in FDMEE for different purposes. How can they help us to meet our requirement?
  • They don't have an impact on target application
  • We can define a #SQL mapping to copy our target values into other source dimension values including the lookup dimension itself
  • We can define the order in which the lookup dimension mapped
Have a look at this: Source Account > Target Account > Lookup > Source C1 > Target C1
Did you understand above flow? Let's put some lights on it.

Let's start defining our lookup dimension "HFM Account":
In this example, we are going to use the lookup dimension to copy the target HFM account into the source lookup. For that purpose, we need to make sure that the lookup dimension is mapped after Account dimension. As you can see above, sequence number for the lookup is 2 while Account has assigned 1.

Besides, column UD5 of TDATASEG(_T) has been assigned to HFM Account (I could have used any other like UD10 so I leave some UDx free in case we have new HFM custom dimensions). 

Copying the Target HFM Account into other/lookup dimensions
As for any other dimension, we can create maps for lookup dimensions. Our main goal is to copy a target value into other dimensions so why not using a SQL mapping?
The good thing of mapping scripts is that we can include multiple columns:
  • Set target HFM Account to "Lookup"
  • Set source HFM Account (UD5) to target Account (ACCOUNTX)
Done, our target account has been copied and it's now available as a source value in our lookup dimension.

A picture is worth than a thousand words
Let's create a multi-dimensional mapping to show how this works:
Mapping says: when source Product is "'001" and HFM Account is "Price" then target Product is "P7000_Phones"

Thanks to our lookup dimension we can use the HFM account as a source. Mapping rule is clear and easy to create. No need to change the SQL map we created, that's static.

What happens in Data Load Workbench?
  1. Source ERP account "10100" has been mapped to "Price"
  2. "Price" has been copied to source HFM Account
  3. Product has been mapped using source Product and HFM Account
At some point, I expect Oracle enhancing multi-dim maps to support target dimensions also so let's see another example where this approach is quite useful as well.

Another example: write-back to SAP using SAP accounts as Source in Explicit maps
In this case, we are extracting data from HFM in order to generate report files for SAP (write-back)
The requirement is to map SAP IFRS using SAP Accounts. 

Following our approach:
  1. Map HFM Account to SAP Account
  2. Copy SAP Account to source SAP IFRS
  3. Map SAP IFRS based on SAP Accounts
Let's see the workflow:
As you can see, we have now copied the SAP Account into another dimension rather than the lookup. That allows us to create our Explicit mappings using SAP accounts in a very easy way.

Cloud-Friendly
Once nice thing is that this solution is cloud-friendly. Data Management for the Cloud allows creating lookup dimensions and #SQL mapping scripts so you can implement it if not using my loved on-premises FDMEE.

I'm going to leave here for today. I hope you found this BBT useful.

More BTTs soon!

Tuesday, January 17, 2017

Universal Data Adapter for SAP HANA Data Source - Part 3

Time to finish this blog series for using the UDA with SAP HANA data source. In the previous parts I covered both introduction and configuration in ODI:
In this last part we will go through the configuration in FDMEE web interface, and of course, we will extract our data from HANA.

My Source Table: FDMEE_DATA_T
Just to show you how my source data looks like:
I will be extracting data from a column-based table. I haven't tested with Virtual Tables yet so my test just shows how to extract data from standard tables.

FDMEE Artifacts
A basic configuration to make the UDA working includes:
  • Source System
  • Source Adapter
  • Source Period Mappings
  • Import Format
  • Location and Data Load Rule
  • Dummy Mappings (just to make it working)

Source System
I have registered the source system with the new context I created for HANA:

Source Adapter
Once the source adapter is created for my HANA table (FDMEE_DATA_T), we need to import the table definition into FDMEE. This is actually performing a reverse in ODI so we get the data store with its columns:
We are now able to see the columns in the source adapter. Next step is to classify Amount and Year/Period (in case we have them) columns. In my case, we do have columns for Year and Period so we can filter out data when running the extract:
I will keep display names as the column names but remember that these are the column names you will see in your import format. Therefore, I'd change them if columns have technical names.
I haven't defined any source parameter for my test but this would be done in the same as any other UDA configuration.

Once the table definition is imported and parameters are created, we have to generate the template package. This step will create the ODI interface and package:
Now we are done with the source adapter setup.

Source Period Mappings
You need to configure calendars if you are going to filter by source period and year:
If your table or view has current period data, you will be probably fine with setting period mapping type in the data load rule to None. In my case, I just created one source period map for January-2016.

Import Format
Easy one-to-one map:
After configuring the import format, we have to generate the ODI scenario that will be executed when we run the data load rule. To generate the scenario, the source adapter needs to be configured first so the table definition is imported and the ODI interface/package are successfully generated:

Location and Data Load Rule
I have created one location with one data load rule which uses source period mappings previously defined:

Running the Data Load Rule
Voilà! Here is my HANA data extracted:

Giving a try with standard View: FDMEE_DATA_V
I have also tested the data extract from a standard HANA View. Configuration steps are the same so I'm not going to replicate them. Just to show you that it works. My view is filtering only accounts starting with 1:

In case you get errors...
It may happen that you get errors when generating the ODI Scenario. In that case, I would suggest to raise a SR with Oracle as we identified some issues when working with case sensitiveness enabled.
You may get a different error but the one below shows in the FDMEE server log that the generation failed due to fatal errors in the ODI interface:

And this completes my trip around SAP HANA world. I hope you enjoyed and helped you to avoid some headaches.

Thursday, January 12, 2017

Universal Data Adapter for SAP HANA Data Source - Part 2

In my last post about SAP HANA integration I went through an introduction of what we can do with the new data source in the UDA. Basically, we saw that we can now extract data from HANA tables and views (pure views so do not confuse with Core Data Services for HANA) through the database layer using JDBC.

In order to use the UDA we need to perform the following tasks:
  1. Import ODI Models Folder and Project into the FDMEE repository
  2. Configure the JDBC driver
  3. Configure ODI Topology
  4. Configure Source System and Source Adapter in FDMEE
  5. Configure FDMEE artifacts (period mappings, import format, location, etc.)
1st task was already covered in the review of PSU210. However, there is a bug related to this which you can see at the end of this post.

In this second part I will cover the following topics:
  • Configuration of the JDBC driver for SAP HANA (2nd task)
  • Configuration of ODI Topology for SAP HANA (3rd task)

JDBC driver for SAP HANA (ngdbc.jar)
As an important remark, I already showed that we can't use the latest version of the JDBC driver (2.0). Once you have downloaded the driver (you need access to SAP market or get it from your SAP guys), you need to place it in a location so FDMEE (agent) can use it.

As usually, I perform several tests with the local agent in ODI so I need to place it in the driver's location for ODI Studio as well:
I just need to add a reference to the driver in additional_path.txt file located in WINDOWS_USER_HOME\AppData\Roaming\odi\oracledi\userlib (user "Administrator" in my case).

In the same way, we need to place it in the FDMEE agent's drivers location:

Please note that driver must be copied into every FDMEE server in case you have high availability architecture. For the local agent, just in the machines where ODI Studio is installed.

Both ODI Studio and FDMEE service(s) need to be restarted before using the driver.

JDBC Url (connection string)
The JDBC connection string is used by the agent to connect to the source HANA db. The generic url is jdbc:sap//<server>:<port>[/?<options>] and it is configured in the Data Server created in ODI Topology for the HANA DB server.

In my case, I tried to keep it simple and I used the following url to connect to the SystemDB database:

jdbc:sap//<server>:<port>?databaseName=SystemDB

I used the following ports but value depends on your SAP environment:
  • 30013 for SAP HANA 1.0 (HANA instance number = 00)
  • 39013 for SAP HANA 2.0 (HANA instance number = 90)
You can also configure tracing in the JDBC driver but this is something I won't cover as it is not essential. These two sites are good to see how the JDBC Url is configured and how to enable tracing:

Configuring the ODI Topology
Once SAP HANA technology has been imported into ODI, we have to configure our physical and logical architecture:
  1. Create new Context(s)
  2. Create new Data Server(s) 
  3. Create new Physical Schema(s)
  4. Link Physical Schema(s) with the Logical Schema UDA_HANA through context(s) created in step 1
I guess you notice about plurals for each artifact. This is just to highlight that may have to create multiple objects in case you have multiple HANA DB servers. In my example it's simpler, just one source.

Context
I always recommend to create new contexts rather than using GLOBAL one (it gives you more flexibility). My new context is UDA_HANA:
Data Server
Then I create a new data server under SAP_HANA technology:
As shown above, ODI will connect to HANA with specific user FDMEEHANA created for this purpose. This user owns the schema were my source table is located. You will have to ask your HANA admin for which user will connect to HANA via JDBC.
Don't get confuse about seeing two data servers, it's just because I tested the two versions of HANA.

Time for the JDBC Url:
I'm using the SystemDB database but again, just to keep it simple.

Physical Schema
Once the data server is created, I have to create a physical schema where I can set the HANA schema storing my source table/view:
Configure Context (link Physical-Logical schema)
You may have noticed that I skipped creation of Logical schema. This is because schema UDA_HANA is created when importing the technology into ODI. Therefore, the last step is just to link our physical schema with the logical schema through the context created:
As an additional step, we also have to link Global context as FDMEE uses it when generating the ODI project template for the source adapter. That's a common root cause when generation fails.

Testing the ODI connection to SAP HANA Database Server
Now that our topology is configured, we can test the connection to the SAP HANA DB server so we can confirm that everything is good so far:
Common errors you may get:
  • Invalid user/password: check with your HANA admin
  • ODI cannot reach the host: check your FDMEE server can ping the HANA DB server
  • ODI cannot load driver: check driver is in the right location as shown above and that the FDMEE service was restarted
Bug: SAP_HANA technology is missing the Data Types
I haven't covered this one before as I wanted you to have all steps clear. Now it's time to talk about it.
The technology delivered in PSU210 is not complete. Data Types are missing. Human error. It can happen to everyone :-)
Without data types, nothing will work.

At this point you have two options:
  1. Raise a SR with Oracle although this will be hopefully fixed in a PSE soon. Please confirm that the logical schema is present after importing the new technology delivered in the PSE. In case it doesn't exist, you have to create it manually (name it as UDA_HANA)
  2. Reverse data types yourself. This option requires also creating maps between HANA and Generic SQL technology for each data type
I would suggest option 1 so you are safe. In case you prefer option 2, I let you proceed on your own risk:
Once data types are reversed, you need to map 1 by 1 in the Converted To tab:
This map needs to be replicated in the Logical Schema as well:
FYI, I verified that data types reversed from HANA 1.0 match the ones reversed from HANA 2.0.


That's all for today. With these steps, UDA for SAP HANA is now configured in ODI.
What are the next steps? configure it in FDMEE so we can get our data extracted.

Enjoy!

Tuesday, January 10, 2017

Executing Custom Scripts with REST

This is a quick one...

As you may know, GJG (aka Great John Goodwin) has introduced FDMEE and REST in his blog:
So I'm going to steal a bit of his hard work :-)

In my review of PSU210 I went through executing a custom script from command line.

The formula is then easy, My post + GJG's post = Executing Custom Script with REST

We already saw that a custom script is actually treated as a report. Therefore syntax to execute the custom script with REST does not differ so much from the one John showed. The only difference is that reportFormatType is SCRIPT rather than REPORT.

BTW, I'm using RESTClient in Firefox. I'm not going to cover basics to add header with authentication and json content, etc. You can visit John's blog for that.

Also be aware this is only possible in FDMEE on-premise as Data Management in the Cloud does not support scripting yet.

Executing Custom Script without Parameters
We have three mandatory keys in the request body:
  • jobType = REPORT
  • jobName = this is the script name when you registered (not the py file)
  • reportFormatType = SCRIPT
After calling the REST resource to run jobs, we get both status and job id. We can then easily get status to see if it's completed or error was raised:
As you can see, processType is now COMM_EXECUTE_CUSTOM_SCRIPT.

If you navigate to FDMEE, process details page will show the script being successfully executed:

Executing Custom Script with Parameters
If your script uses parameters, then you need to add them to the json body:
Status can be checked in the same way as before:
And finally, process details will show execution:

And that's all I wanted to show. With the new PSU you can now execute custom scripts from external systems in a very easy way.

Thanks again to John for introducing FDMEE and REST.

Enjoy!