Big Data University – an IBM Community Initiative
November 17, 2015 § Leave a comment
I have continued to pursue my interest in Big Data and Spark. As I mentioned before, Big Data University is where I ended up. I took the Big Data Spark Fundamentals course. It was a great overview of the capabilities of Spark. Here are some of my impressions.
- Data Source simplicity – It was amazing the amount of simplicity in the Spark programming model. Much of the elements needed to perform tasks are already setup and created for you. There are numerous existing Spark connectors that have been built to allow you to work on data from many sources. Remember, one of the benefits of Spark is that the programming model is the same regardless of the data source. Obviously, Hadoop provides the infrastructure to store massive amounts of data. But there are times when data exists in files or existing SQL databases. The connectors do all the hard work.
- Programming simplicity – All of the examples that the course took me through showed the tremendous power of Spark and how it can be achieved with an incredibly small amount of code. Being able to trudge through millions of records in a very short amount of time in a few lines of code is really amazing.
- It’s all about the API – The programming model is very simple, the concepts are fairly simple, but all the power of Spark is in understanding the API. The basics of transformations and actions are easy to understand, but knowing how to construct the transformations is the key. I suspect most people rely on really good examples. The open source community is much better at this today than in the past. I write very little original code these days and often borrow from good examples that are out there in the community.
- Data Scientist and Spark Developer – The magic here is the collaboration between the data scientist (the one who knows the data and the questions to ask) and the Spark developer (the one that translate that into Spark code). There is lots of business value power in this small collaborative team. You can envision a very large enterprise with large data and and relatively small team being able to garner immense business value using Spark.
Now that I am armed with my Big Data Spark Fundamentals badge from Big Data University, I am going to check out what else I can learn from this portal. You should too.