November 20, 2015 § Leave a comment
The agile edge capability within a Hybrid IT model is what we are calling the environment that allows for both rapid experimentation and elastic scaling to cope with exponential adoption. It enables faster IT delivery and increasing innovation from both internal employees and external partners or ecosystems. This requires major change of culture, procedures and technologies. Enterprises will need to shift away from rigid methodologies and processes to those which enable agile and collaborative development (e.g. hackathons). Besides Platforms as a Services (PaaSs), this could include the use of IBM Design Thinking (learn more here) which is based on user focused development and highlights the user experience rather than the product itself though frequent updates and feature releases. Additionally, for an enterprise with mature Agile Edge capabilities, venture funding partnerships may be created which allow new growth models.
November 17, 2015 § Leave a comment
I have continued to pursue my interest in Big Data and Spark. As I mentioned before, Big Data University is where I ended up. I took the Big Data Spark Fundamentals course. It was a great overview of the capabilities of Spark. Here are some of my impressions.
- Data Source simplicity – It was amazing the amount of simplicity in the Spark programming model. Much of the elements needed to perform tasks are already setup and created for you. There are numerous existing Spark connectors that have been built to allow you to work on data from many sources. Remember, one of the benefits of Spark is that the programming model is the same regardless of the data source. Obviously, Hadoop provides the infrastructure to store massive amounts of data. But there are times when data exists in files or existing SQL databases. The connectors do all the hard work.
- Programming simplicity – All of the examples that the course took me through showed the tremendous power of Spark and how it can be achieved with an incredibly small amount of code. Being able to trudge through millions of records in a very short amount of time in a few lines of code is really amazing.
- It’s all about the API – The programming model is very simple, the concepts are fairly simple, but all the power of Spark is in understanding the API. The basics of transformations and actions are easy to understand, but knowing how to construct the transformations is the key. I suspect most people rely on really good examples. The open source community is much better at this today than in the past. I write very little original code these days and often borrow from good examples that are out there in the community.
- Data Scientist and Spark Developer – The magic here is the collaboration between the data scientist (the one who knows the data and the questions to ask) and the Spark developer (the one that translate that into Spark code). There is lots of business value power in this small collaborative team. You can envision a very large enterprise with large data and and relatively small team being able to garner immense business value using Spark.
Now that I am armed with my Big Data Spark Fundamentals badge from Big Data University, I am going to check out what else I can learn from this portal. You should too.
November 10, 2015 § Leave a comment
by Russell Hargraves and Sumit Patel
Nothing is more difficult to undertake, more perilous to conduct or more uncertain in its outcome, than to take the lead in introducing a new order of things. For the innovator has for enemies all those who have done well under the old and lukewarm defenders amongst those who may do well under the new. Niccolo Machiavelli (1523)
The world is entering into a new era of computing that will enable the digital transformation of society and business based on the advancement and personalization of cognitive computing. Cognitive computing systems learn and interact naturally with people to extend what either a human or machine could do on their own. Cognitive systems like IBM Watson are redefining society, business and human interaction in the increasingly pervasive digital economy by helping everyone and everything make better decisions.
Today, the world is being rewritten in software code igniting the explosion of big data enabled by apps, mobile devices, social networks and the internet of things (IoT) ushering in the new Cognitive era. The cloud and the emergence of the industrial hybrid cloud are the platforms on which the new digital builders, developers, business professionals, governments and individuals are reimagining everything from education, banking, retail, healthcare, transportation and beyond as seen in the figure below.
November 4, 2015 § Leave a comment
I have been a developer (at least at heart) all of my professional career. I have always found ways to keep my hands dirty in some type of coding effort. However, the older you get the more removed you become. Development is a young man’s game. However, I think I have found my next programming model playground.
If you haven’t noticed by now, Big Data is kind of a big deal. When you hear that things like “each day we produce 2.5 quintillion bytes of data” and “90% of the world’s data has been created in the last 2 years” it doesn’t take a genius to hypothesize that there might be some hidden value in all that data. It appears that the storage industry has no problem keeping up with this demand and networking is also getting better and better (I thought physics was involved but apparently we keep finding ways to get more through the same pipe).
So the problem to solve falls to the foot soldiers of every technical problem, the developers. And the default landscape that all developers maneuver and and work in is open source. The Apache Spark project is another great example of the open source ecosystem gaining unfathomably quick traction in solving a problem.
I got to go to the IBM Insight conference last week and I took that opportunity to learn some things (between my booth duty stints). There were many Spark sessions to attend but most were full. They turned away many people at many Spark-based sessions. And being an IBMer, I got sent to the end-of-the-line for any walk up lab spots. However, I was able to attend a few sessions and learn some good things. I am beginning my self education by taking some Big Data University courses online (check them out here.
I have spent a lot of my time in the developer community not only pushing out code, but also constantly tweaking and consulting on the interactions between developers and the rest of the larger team. Spark brings a new interesting dimension to this dynamic. As we talked about before, the reason we are here is that there is lots of value in all that data. So Spark was created as a programming model to make it simple to carry out highly compute-intensive data manipulation. The difference here is that someone typically asks a question that potentially has a simple answer, but getting it goes beyond the typical programming models/platforms that exist today.
Let’s examine some of the differences that this problem set brings to the party.
- The user interface is not important. We have spent so much effort in UI design, frameworks, etc. due to the boon of the mobile device. Application look-and-feel and user experience is so important in the mobile space due to the intense competition between vendors. In the big data space, the answer is really the only thing we care about.
- The requirements are simple and the results are typically simple, getting there is the hard part. The vast majority of the work done by Spark applications is the chunking of data. A very simple application (very few lines) can perform massive amounts of processing. The Spark platform is the ultimate effort in pushing all of the complexity below the development experience.
- Data scientists are the big data analysts. The role of the data scientist is the role that sits between the line-of-business and the developer. Data scientists know the data that is being captured and help translate the question being asked to the Spark developer. As a matter of fact, with tools like the Data Scientist Workbench, we are providing a platform for Data Scientists to learn enough about Spark to do the work themselves.
- The art is in understanding how Spark works and programming the effort accordingly. Understanding how Spark divides up the work, when and where to store intermediate data (if at all), and tuning the program accordingly is where a Spark developer brings the value. Programming a web application can be done with a single-user mentality in mind. Scaling the application can be tackled at other levels of the architecture. As I said before, the requirements for Big Data apps are typically simple and the answer is typically also simple, but the time it takes for the application to get to that answer is all that counts.
As I explore Spark more, I will keep you posted along the way. Let me know your thoughts.
October 19, 2015 § Leave a comment
Dev/Test clouds are often where many organizations start with Cloud. I’ve seen Cloud provide some big benefits for organizations wanting to improve their software development and testing practices (especially when you look at the ability to provision environments quickly for testing scenarios).
In working with clients, we’ve gotten into some interesting discussions around what are the key differences with Dev/Test clouds vs production clouds. First, I am a believer that Dev/Test should mirror production as closely as possible. Sometimes that is possible and sometimes it is not. These are the common variances I’ve run into:
- Production clouds having higher up-time (e.g. think the 5 9’s for availability)
- Data in production clouds is usually more sensitive. We typically leverage obfuscation and data masking to eliminate any data privacy issues in Dev/Test clouds
- Dev/Test clouds may not have all the scale out and support all the HA/DR scenarios that production clouds do.
- Dev/Test clouds should leverage service virtualization where it makes sense. (e.g. if you are waiting for off peak hours to test a CICS transactions on the mainframe in your Dev/Test environment, you might want to explore how this could be improved leveraging service virtualization.
Once you’ve setup your Dev/Test Cloud, these are the gotchas to watch out for:
- Dependent systems availability (or lack of availability) due to production schedules, security, or team contention
- Improper lifecycle management (e.g. teardown) that looks at the complete application lifecycle and speed (e.g. Agile teams (with a large number of iterations) moving through DEV, QA, and production can lead to virtualized application crawl)
- Unpredictable demand spikes
- Test data validity, obfuscation, and data movement (setup/teardown, etc)
- Usage fees with testing 3rd party services
In my next blog, I’ll talk about approaches to handle these gotchas!
October 19, 2015 § Leave a comment
I’ve never been part of a more “disruptive” movement in my 25 years in the industry than Cloud. (I’m using the term “disruptive” in a mostly positive way). I’ve already started to see the good, the bad, and sometimes the ugly emerging as I work with clients which I will share (with names hidden to protect the innocent!).
I’m looking forward to learning from my colleagues about their experiences as well.
October 14, 2015 § Leave a comment
OpenStack projects are an important part of the greater OpenStack vision. Projects can be thought of as subcomponents covering various parts of the cloud environment and are known more commonly by names such as nova for compute, neutron for networking, or cinder for block storage. The focus is more about tying together resources from a software level, while vendors such as IBM focus on enabling their infrastructure to be OpenStack compliant. As this is open source, projects follow a self-organized model where a designated Project Team Lead (PTL) seeks to carry out a vision of the project in the sense that complete functionality is delivered. Before I had a chance to take part in an IBM OpenStack Dojo a few weeks back, I had always assumed these projects were tightly coupled meaning that an ideal OpenStack implementation couldn’t have one component without another, but I’m beginning to realize this was incorrect.