4- Moving Data Out of NiFi

4- Moving Data Out of NiFi

Objectives:

  • Examine data in Queues from GetTwitter
  • Create flow from GetTwitter to Local Folder
  • Use Templates to Create Multiple Flows

Examine data in Queues from GetTwitter

Last time we successfully pulled data from a Twitter account into our local instance of NiFi.  We are going to pick up right where we left off, so the NiFi Flow should look something like this (I have removed the GenerateFlowFile to Log Processors):


So if you run the GetTwitter Processor without turning on the LogAttribute Processor, you will see the Queue in the link between the processors fill up.  You'll notice in the above picture that I have 976 files queued.

Using these Queues is a good way to look at the data before it reaches a processor, which will be useful when debugging a problem when moving files through multiple processors in a larger flow.

To see a file in the Queue, right click on the Queue, and click on the "List queue" option.  You will see a list like the one below:


If you click on the "i" icon in the leftmost column you will see details on that specific FlowFile, below is the details of one of the files in my Queue


Open the details on a tweet that is at least 1 KB in size and click on the View button which should open a viewer.  I have selected one below, opened it using View, and have formatted the JSON file so that it can be easily read.  This tweet looks like a Retweet so it has a lot of information tied to it.


By looking at the data in the files in the Queue we can see what the FlowFiles look like before they reach the next processor.

Create flow from GetTwitter to Local Folder

Great, so next we want to create a target for the data that isn't a Log.  Add two Processors to your flow: PutFile and another LogAttribute, then link them together on FAILURE, not on Success.  Then create a connection from GetTwitter to PutFile on Success.  We only want to Log Failures for PutFile because we will know if it's a success by seeing looking at the folder.
Your flow should look something like this:


PutFile sends the data in the FlowFile to a local folder.  This destination is specified in the first entry in the Processor configuration (right click on the Processor and click "Configure")... under Directory.  You will need to know the direct file path for a good folder for as much data as you want to send to it... here's my configuration:


Now when you start GetTwitter and PutFile, JSON files get sent to the directory you specified in the PutFile processor.


Now we have a flow that can successfully pull data from one source (GetTwitter) and send it out of NiFi to a destination (PutFile).  Now we want to repeat this flow at any time, to do this, we will use Templates.

Use Templates to Create Multiple Flows

Templates are used to take a snapshot of the current NiFi flow and then recreate it later.  As a simple example, we will make a template of our current flow and instantiate it so that we can pull from two Twitter feeds and send it to the same destination.  Important to note that properties and configurations are saved when a Template is made (except for Twitter's Secret properties)

First, will create a Template by clicking the "Create Template Button" which is to the right of the Stop button in the Operate Panel on the left side of the Flow.


This will pull up the Create Template dialog, give it a title and click Create


Now, to instantiate the Template into our flow, use the add Template button in the top row of buttons.  Template is second to last.


A dialog to Add Template will come, with a dropdown list of all the Templates available.  Select the one you want and click "Add"


Now your flow will have two of the Twitter to Folder flows... feel free to configure the GetTwitter to a different account and send it to the same folder.


To manage the list of Templates, open the Menu from the top right (the 3 bars) and click on the Template option, it will open the Template Menu and you can delete Templates from here


Now you know how to make, use, and manage Templates.

Next Time:

Thanks for reading this blog post!

Now that we can pull data into NiFi and move it along out of NiFi we can try to manage what the data looks like when it gets put into a local directory.  Next time we will attempt make a flow that does some of the parsing of the Twitter JSON object before it gets output to the local directory.  Instead of going step by step, I will create a template for you to access and you will be able to upload it into your flow for your own use.  Exciting times ahead.

We're now over half way through with this story... let's keep trucking along, but until then...
Take it easy, Dude.


Comments

  1. My favorite part of this blog is showing us how to manage multiple templates for different Twitter accounts. I think it showed many capabilities of the NiFi and allows the user to pull data from different accounts that someone would like to view. I love how each file gives you an idea what is in a tweet. Very neat. Graphics are super helpful as well for tutorial purposes.

    ReplyDelete
  2. I think this post further amplifies my comment on the last post regarding the utility of Nifi in aggregating disparate data sources.

    ReplyDelete

Post a Comment

Popular posts from this blog

1 - Getting Started with Apache NiFi

2 - NiFi: UI, Processors, and Flow