In theory, computing capacity should be like water, if you need more just leave the tap running a little longer. All manner of virtualization, grid and attached processing technologies are being developed and deployed in an effort the make this theory a reality. Therefore it would be easy to assume that capacity management would not be an issue for SOA. Just deploy the services and SOA infrastructure on your tap water computing power with a few automation policies and capacity management ceases to be an issue.
What a lovely fantasy.
The real world, unfortunately, does not work that way. The combination of great software flexibility (SOA) and great capacity flexibility (tap water computing power) does not eliminate the work of capacity management and planning. It transforms the job description in two ways, it requires advanced analysis of capacity usage to be a near real time activity, and it requires complex systems analysis to perform longer term capacity forecasting.
Capacity management as a near real time activity is increasing as enterprises increase their demands for constantly high performance applications and services. Performance problems related to capacity bottlenecks are not tolerable anymore. Unfortunately, making the capacity provisioning process easy and efficient can lead to a cycle of continually assigning more resources to a service which provides temporary relief but does not really address (or resolve) the true root-cause. In the worse case scenarios, well meaning operations staff may be adding more capacity to the incorrect service component or a run-away capacity allocation system can over-provision a particular service. The result is the opposite of what enterprises were looking for from their tap water computing solutions.
To avoid this scenario, IT operations needs real-time what-if capacity analysis allows application managers to preview how the planned resolution will impact not only the performance of the problematic transaction but the impact on all other related service workloads. This is significant because individual services and the SOA infrastructure are supporting multiple business applications, transactions and processes. Implementing a resolution to a problem with one application may negatively impact the performance of other important processes. Without getting a preview of the performance impact on other workloads, it could be that IT will be shooting itself in the foot in their attempt to solve a problem.
Yes, there are several workload and performance modeling solutions available on the market that can perform what-if analysis. However, these solutions are most frequently designed for use in off-line scenarios such as capacity planning or detailed performance optimization of a particular technology. This capability must be brought into the fast-paced world of operational problem resolution not only to speed resolution of existing problems but to prevent the occurrence of chain-reaction problems. Problem prevention is something harried operations staff can use more of and will happily embrace!
The other side of the real-time analysis is it can provide feedback to the development organization. End-to-end transaction or process problems are not easily recreated in lab environments and developers and quality assurance engineers need all the information they can get to ensure that their creative genius (aka the reusable service they released into the wilds of the datacenter) is fully optimized and appreciated.
Now let’s take a look at longer range capacity forecasting. Business executives take one look at their datacenter resource utilization percentages (20%-40%) and are appalled. Out comes the axe and IT Operations’ head is always first on the chopping block. What has this got to do with SOA? Plenty, because SOA compartmentalizes development and sizing software services.
When most developers think about capacity sizing for a software service, their thought process is: normal load would be about 60% and heavy load is 80% which leaves 20% as a safety net. This makes a lot of sense – until you deploy the service stack in a clustered environment. Even with a seemingly reasonable provisioning automation policy that adds a new server to the cluster when usage reaches 75% can get IT operations in hot water. By the time you have added the third system the aggregate resource usage for a heavy load is only 54% (75+75+10/300). It looks like half of your resources lying around doing nothing. Those aggregate numbers drive Infrastructure VPs and CFOs crazy (combining crazy people and axes is not a good situation for anyone).
There is also the shared aspect of SOA services. We are already seeing instances were a service developed for a departmental-level process gets reused for an enterprise-level process and instantly increases its workload by orders of magnitude. My last article discussed how this effects performance management, but this sort of thing will also throw all the historical capacity trending analysis out the window (and with it goes your infrastructure budget).
Basically, there is a disconnect between forecasting for a single component and forecasting for the aggregate working of all these loosely coupled components.
What can we do about this? Capacity forecasting in an SOA world needs to incorporate a good understanding of service relationships. An understanding of the existing relationships between individual services is required to get a complete picture of what is happening in the datacenter as a whole. Additionally, architectural plans for reusing existing services in future projects must be included to understand how new projects will impact the rest of the datacenter utilization.
This is so easy to say and so difficult to do.
The information is housed in so many different places, for example, project and portfolio management solutions, application performance management systems, configuration management databases, and provisioning systems. Then the planning solution must model all of this information with a variety of different workload growth scenarios to see how their datacenter needs with change over time. The model must be rich enough to capture the reality of an SOA datacenter but simple enough to be used by the mere humans working in corporate IT departments. Additionally it must generate reports that can be easily consumed by a variety of enterprise staff – corporate executives, enterprise architects, infrastructure managers.
It seems to me, that instead of killing capacity planning SOA requires more sophisticated types capacity analysis which are used by a much broader audience.