Posts

7 - GCP to Tableau Finale

Image
7- Grand Finale - GCP to Tableau Objectives: Install and Setup Tableau Connect Final Thoughts Install and Setup Tableau Welcome to the final post that I will be making for this blog... more sentimental writing later on, for now DATA! Remember that this is our gameplan: Last post we handled the Twitter to NiFi to GCP part, now we just need to tackle GCP to Tableau. If you are a student, there is a free version of Tableau, a software for visualizing data, available to you at no cost. It is available here: https://www.tableau.com/academic/students Once you have Tableau, start a new book. Connect GCP to Tableau Your data is saved in a GCP bucket.  In order for Tableau to see it, it must be accessible through Google Big Query, so go into BigQuery and create a new table that points to the GCP bucket. So I'm running into a problem where BigQuery is denying me access to my bucket to build a table.  Without this table, we can't access it in Tab...

6- Finale Part 1: NiFi

Image
6 - Finale Part 1: NiFi Objectives: Current State of the Flow Processing Data Where's it all going? Current State of the Flow Before I get into this, I would like to briefly remind you what our goal is... Take data from Twitter and display it at some endpoint and try to make it look nice. Spoiler Alert: It's not working out for us this time, but we're going to put together the pieces so that it can happen in the future. Last post I said that we would be switching to using Tableau, and we're still going to try something with that... I'm not sure if it will work.  Now we are adding a step in the process: Google Cloud. What will the Google Cloud allow us to do? Google Cloud Platform (GCP) uses these objects called "buckets" to store information in the cloud.  NiFi has a PutGCP processor, which we will utilize to connect NiFi to GCP.  From there we should be able to connect up to Tableau, but that will be for next time. Here is a visual b...

5- Major Brick Wall

Image
5- Major Brick Wall Objectives: Address the Problem Find a Solution Begin Implementing Solution In his talk, which has become known as The Last Lecture, Randy Pausch discusses the many challenges a person faces in their life as they attempt to achieve their goals.  He calls them "brick walls" and says that they are there to help you figure out how to get around them.   We have hit a major brick wall. I am intentionally making this post uninteresting so that you skip it and go to the next post, which will solve the problems outlined in this post. Address the Problem The next step in our NiFi flow is to process the data we are getting from GetTwitter.  We know that it is providing us with a JSON file, which is great, but it is raw data.  This JSON file, unprocessed, is practically useless to us.  It provides us with WAY too much information, and some of the files that we are outputting are not even real tweets. Boiled down, the probl...

4- Moving Data Out of NiFi

Image
4- Moving Data Out of NiFi Objectives: Examine data in Queues from GetTwitter Create flow from GetTwitter to Local Folder Use Templates to Create Multiple Flows Examine data in Queues from GetTwitter Last time we successfully pulled data from a Twitter account into our local instance of NiFi.  We are going to pick up right where we left off, so the NiFi Flow should look something like this (I have removed the GenerateFlowFile to Log Processors): So if you run the GetTwitter Processor without turning on the LogAttribute Processor, you will see the Queue in the link between the processors fill up.  You'll notice in the above picture that I have 976 files queued. Using these Queues is a good way to look at the data before it reaches a processor, which will be useful when debugging a problem when moving files through multiple processors in a larger flow. To see a file in the Queue, right click on the Queue, and click on the "List queue" option.  You will...

3 - Getting Twitter To Talk to NiFi using Get Twitter

Image
Getting Twitter to Talk to NiFi using Get Twitter Objectives: Create a GetTwitter Processor in NiFi Set up Twitter to talk to NiFi Create a flow using GetTwitter Create a GetTwitter Processor in NiFi Welcome back! Today we are going to use NiFi's GetTwitter processor to create a flow of data into NiFi.  To start, we need to start our local NiFi instance and add a GetTwitter processor.  Your flow should look something like this: You'll notice two things - First that our GenerateFlowFile --> LogAttribute Flow is still there from the last time, which is good to see. Second - GetTwitter isn't linked to anything and won't be able to do anything until it's configured. Let's open the configuration, and go to the Properties Tab to see what we need: The first entry determines what type of Endpoint we are creating.  This will essentially determine if we want to see all Tweets in real time or if we want to pick which Twitter feeds to ...

2 - NiFi: UI, Processors, and Flow

Image
NiFi: UI, Processors, and Flow Objectives: NiFi Web UI Overview Create a processor and flow Start a flow NiFi Web UI Overview: First things first, fire up your local NiFi instance by running the run-nifi.bat file in the bin folder.  If you are new here, starting up the instance is detailed in the previous blog. Open up a web browser and access the NiFi UI on your local machine by going to this url: http://localhost:8080/nifi You should see your empty NiFi flow which is in the root Process Group.  A NiFi Flow is the whitish grid area.  This first flow is in the root Process Group, meaning everything is contained within this initial root flow.  Process Groups can be nested within each other, so we can make Process Groups within the root Process Group... Your screen should look like this: Blank NiFi UI On the top dark blue bar, you have several buttons that can be used to drag various items into your NiFi Flow.  Those items include: ...

1 - Getting Started with Apache NiFi

Image
Getting Started with Apache NiFi Objectives: Learn about history of NiFi Install NiFi Start up NiFi for first time on local machine NiFi Overview: The project was created by the United States' infamous National Security Agency (NSA) and had the codename "Niagarafiles".  In 2014 the NSA released it as open-source software.  The primary developer of the project so far has been Hortonworks who are also well known for Hadoop and their suite of data science software. At its core, Apache NiFi is used to manage data flow.  This means that its job is to take in data, process it, then send it along to another destination.  Think of it as a mail service for data.  One of the advantages that it has is that uses a web based interface that is simple and allows users to manage the data flow in real time. Installing NiFi: I am installing NiFi on my Windows 7 PC, installing it on a Mac or Linux OS will follow a different procedure.  Also of note: We w...