April 20, 2016 § Leave a comment
By Darrell Schrag
You don’t have to go very far in the business world these days without running across the concept of disruptors. As a company if you are not actively seeking ways to disrupt your industry you will be passed up quickly, so they say. And even more challenging is when the disruptor comes out of the blue from a company that you didn’t consider a competitor. The latest example of this that I just read about is how retailer Overstock.com is actively pursuing trading technologies based on Blockchain. http://www.bloomberg.com/news/articles/2015-12-17/overstock-wins-sec-s-nod-to-upend-how-companies-issue-shares.
Who in the financial services industry would have seen a retailer completely shaking up the status quo from out of the blue. Companies like this are making the news and are vaulting ahead of their industry if not new industries. We at IBM lead the Bluemix story with just this discussion. Bluemix provides that agile development platform that allows your developers to think of and deliver new solutions quickly.
But there is another type of disruption that must be dealt with before going to market with new ideas. That is the need to disrupt the status quo within the IT department. I have had numerous conversations with customers that understand this concept of disruption but at the same time are knee deep in the molasses of their existing development processes. They look around the room seeking a brave soul willing to stick their neck out to suggest a drastic change to the way they do development. Disrupting their industry requires disrupting their IT department.
Those organizations that are blazing this disruptive trail are the ones driven by a line-of-business leader that demands this disruptive ability from its IT leaders. Rare is the IT leader that is out in front of this by pursuing something like Bluemix ahead of the demands of the business leaders, but it does happen. Being in the IT business I usually spend much more time on the IT side of a customer’s house. But helping those IT leaders tie what they do back to the business drivers that keep the CEO up at night can help the success rate. The best part of my job is to get opportunities to challenge corporate leaders to be disruptive, both in their industry and within their development walls.
The speed of innovation is increasing every day. And this survival-of-the-fittest disruptive behavior will only get more and more commonplace. It is an exciting time to be in IT.
November 17, 2015 § Leave a comment
I have continued to pursue my interest in Big Data and Spark. As I mentioned before, Big Data University is where I ended up. I took the Big Data Spark Fundamentals course. It was a great overview of the capabilities of Spark. Here are some of my impressions.
- Data Source simplicity – It was amazing the amount of simplicity in the Spark programming model. Much of the elements needed to perform tasks are already setup and created for you. There are numerous existing Spark connectors that have been built to allow you to work on data from many sources. Remember, one of the benefits of Spark is that the programming model is the same regardless of the data source. Obviously, Hadoop provides the infrastructure to store massive amounts of data. But there are times when data exists in files or existing SQL databases. The connectors do all the hard work.
- Programming simplicity – All of the examples that the course took me through showed the tremendous power of Spark and how it can be achieved with an incredibly small amount of code. Being able to trudge through millions of records in a very short amount of time in a few lines of code is really amazing.
- It’s all about the API – The programming model is very simple, the concepts are fairly simple, but all the power of Spark is in understanding the API. The basics of transformations and actions are easy to understand, but knowing how to construct the transformations is the key. I suspect most people rely on really good examples. The open source community is much better at this today than in the past. I write very little original code these days and often borrow from good examples that are out there in the community.
- Data Scientist and Spark Developer – The magic here is the collaboration between the data scientist (the one who knows the data and the questions to ask) and the Spark developer (the one that translate that into Spark code). There is lots of business value power in this small collaborative team. You can envision a very large enterprise with large data and and relatively small team being able to garner immense business value using Spark.
Now that I am armed with my Big Data Spark Fundamentals badge from Big Data University, I am going to check out what else I can learn from this portal. You should too.
November 4, 2015 § Leave a comment
I have been a developer (at least at heart) all of my professional career. I have always found ways to keep my hands dirty in some type of coding effort. However, the older you get the more removed you become. Development is a young man’s game. However, I think I have found my next programming model playground.
If you haven’t noticed by now, Big Data is kind of a big deal. When you hear that things like “each day we produce 2.5 quintillion bytes of data” and “90% of the world’s data has been created in the last 2 years” it doesn’t take a genius to hypothesize that there might be some hidden value in all that data. It appears that the storage industry has no problem keeping up with this demand and networking is also getting better and better (I thought physics was involved but apparently we keep finding ways to get more through the same pipe).
So the problem to solve falls to the foot soldiers of every technical problem, the developers. And the default landscape that all developers maneuver and and work in is open source. The Apache Spark project is another great example of the open source ecosystem gaining unfathomably quick traction in solving a problem.
I got to go to the IBM Insight conference last week and I took that opportunity to learn some things (between my booth duty stints). There were many Spark sessions to attend but most were full. They turned away many people at many Spark-based sessions. And being an IBMer, I got sent to the end-of-the-line for any walk up lab spots. However, I was able to attend a few sessions and learn some good things. I am beginning my self education by taking some Big Data University courses online (check them out here.
I have spent a lot of my time in the developer community not only pushing out code, but also constantly tweaking and consulting on the interactions between developers and the rest of the larger team. Spark brings a new interesting dimension to this dynamic. As we talked about before, the reason we are here is that there is lots of value in all that data. So Spark was created as a programming model to make it simple to carry out highly compute-intensive data manipulation. The difference here is that someone typically asks a question that potentially has a simple answer, but getting it goes beyond the typical programming models/platforms that exist today.
Let’s examine some of the differences that this problem set brings to the party.
- The user interface is not important. We have spent so much effort in UI design, frameworks, etc. due to the boon of the mobile device. Application look-and-feel and user experience is so important in the mobile space due to the intense competition between vendors. In the big data space, the answer is really the only thing we care about.
- The requirements are simple and the results are typically simple, getting there is the hard part. The vast majority of the work done by Spark applications is the chunking of data. A very simple application (very few lines) can perform massive amounts of processing. The Spark platform is the ultimate effort in pushing all of the complexity below the development experience.
- Data scientists are the big data analysts. The role of the data scientist is the role that sits between the line-of-business and the developer. Data scientists know the data that is being captured and help translate the question being asked to the Spark developer. As a matter of fact, with tools like the Data Scientist Workbench, we are providing a platform for Data Scientists to learn enough about Spark to do the work themselves.
- The art is in understanding how Spark works and programming the effort accordingly. Understanding how Spark divides up the work, when and where to store intermediate data (if at all), and tuning the program accordingly is where a Spark developer brings the value. Programming a web application can be done with a single-user mentality in mind. Scaling the application can be tackled at other levels of the architecture. As I said before, the requirements for Big Data apps are typically simple and the answer is typically also simple, but the time it takes for the application to get to that answer is all that counts.
As I explore Spark more, I will keep you posted along the way. Let me know your thoughts.
September 23, 2015 § Leave a comment
We have finally reached the point where large organizations are beginning to shift to Docker as its foundational virtual deployment technology. I am sure this has been happening much faster than I have realized but it is now reaching the customers I spend time with. Customers who have years invested in hypervisor-based solutions to provide their virtual environments are now looking very hard at moving to Docker.
I don’t think anyone can argue that the Docker concepts are very simple. It takes no time at all to setup and create a local Docker environment and begin to play with the technology. Put the technology into a developer’s hands and it quickly becomes a favorite. However, truly creating a Docker infrastructure for an enterprise takes some additional tooling to help manage and control the environment. Managing the Docker infrastructure as well as managing entities up the stack such as coordinating containers that represent applications are needs being filled by various projects out there. The landscape is pretty immature but sooner or later one or more of these will emerge, get merged with other efforts, or embraced by large players. This environment is the wild-wild-west right now and there are many many solutions out there trying to stand out. Here is a laundry list of solutions, some just getting started and others that have been in the over for awhile. Check out the Open Container mind map (https://www.mindmeister.com/389671722/docker-ecosystem) to get an idea of the breadth of the Docker ecosystem.
- Openstack – the Openstack ecosystem is now embracing Docker
July 21, 2015 § 2 Comments
I read a great blog post on Docker over at Valdhaus. You can find it here. All credit goes to them. I thought there were some really great misconceptions that I want to highlight.
1. If I learn Docker I don’t have to learn the other systems stuff – Boy is this not true. Docker makes things very simple for the developer but not for the operations team. Managing large servers to run the Docker run-time and be able to tune the server to best be able to be a Docker host is still a job for the professional mechanic, not Tim the Toolman Taylor. Just like any other technology the “hello world” example is simple and easy but getting to running production workloads is a big leap. Insure you have good operational understanding of Docker before you get there.
2. You should only have one process per Docker container – This is another problem with the “Hello World” example. Proper crafting of Docker container content is a big key to success, not only at the runtime perspective but also at the Docker file level. Create proper layers in the Docker file is important. Levels of abstraction are usually game changers for any type of architecture and Docker is no exception. I like the mindset of creating Docker containers as a “role based virtual machine” used in the blog. Having the single process mindset caused the wrong level of abstraction in many cases. I also highly recommend reading the referenced blog “Microservices – Not a free lunch.”
3. If I use Docker I don’t need an orchestration tool (my edit) – the Valdhaus blog really talks about the need for orchestration and promotes the use of Ansible. Of course being from IBM I would encourage UrbanCode Deploy, but the point is well taken. Coordinating the deployment of containers and the associated networking is challenging enough, but you can do some really cool blue/green deployment strategies to achieve seamless zero downtime production deployments using an orchestration tool.
The rest of the Valdhaus blog is great but these 3 points were targeted at where I spend my time. The big benefits of Docker are achievable as long as you spend time insuring the systems underneath your Docker environment are well maintained. Developers love Docker. Operations teams can also love Docker as long as they understand how to manage it.
June 30, 2015 § Leave a comment
The move to cloud is to say the least a major topic on every CxOs “keep me up at night” list. But building a business case to move to the cloud is not very straight forward. There are many factors involved and the OpEx/CapEx topic itself can be very intense (another reason I did not become an accountant). However, there are some interesting perspectives that are starting to get studied with some interesting results.
Information Services Group (ISG) has published the results of a study that shows the effect of usage in the overall cost benefit analysis of cloud. This is a challenging problem to study as the overall costs of public cloud vendors varies greatly not to mention the wide array of features included or not included in the costs. That being said, enterprises are intrigued by the “only pay for what you use” concept of public cloud. And the fact that they can off load all of the daily care and feeding of the infrastructure to the cloud vendor makes it that much more enticing. The problem is that comparing a public cloud “pay for what you use” model against internal IT where usage is not a factor is not easy.
So the report takes a “standard” infrastructure configuration for an application that is typical of what you would use to “test drive” the cloud and used its deep background to estimate the monthly cost it would incur at a typical large, internally managed IT organization. I have to admit that this seems a bit arbitrary and I am not sure if the degree of variation here isn’t greater than the variability in cloud costs per vendor. However, a stake needed to be put in the ground as a control point so I guess it is as good as any.
The conclusion of the study indicated that at 55% usage the costs where at a break even point between internal IT and public cloud. Obviously there are a tremendous amounts of factors that can make this break even point vary, but what I take out of it is that there is a break even point.
What does that mean for the CxO losing sleep? To me this study again validates the hybrid cloud story. One size does not fit all and the promise of being able to turn out the lights completely on an internal data center is still unrealistic. And it also points me in a direction to pursue. What are the lower usage scenarios (dev/test, capacity overflow, DR, etc.)? How can an enterprise make use of the price points a public cloud offers for these scenarios in a seamless way? How can I do my own cost comparison and have that dashboard that keeps me on the right side of the break even point? These are all factors involved in a hybrid cloud pursuit. We now have some interesting data being collected to help us answer those questions.