There are two things that I rank as being the most important resources in virtual infrastructures. Number one is RAM. Most of what takes place in a virtual environment runs in RAM, so having lots of RAM is key to performance. Second is shared storage. One of the biggest benefits of a virtual infrastructure is being able to move a VM live from one host to another. This hinges on the fact that storage is shared between the hosts and that hosts can concurrently access the same LUNs without adversely affecting other hosts. The downside to this is that all of the I/O for the infrastructure is now consolidated into a single SAN instead of being spread across the local disks of each server. This can lead to headaches when disk contention or bandwidth saturation occurs between the hosts and the storage.
Of all of the on-site performance assessments I do for clients as The VMware Guy, 80% or more of the issues are directly related to storage issues. More processor power and RAM can always be thrown at performance issues, but storage is another story. You are still bound by the speed and capacity of your storage network as well as the spindle could of the LUNs being accessed by the hosts. Traditional SANs that do not have virtualized storage are a nightmare to deal with, but even virtualized storage still has its issues. Furthermore, the hosts themselves do not manage storage as efficiently as possible. Everything runs in RAM and is written to disk as needed. There are no optimizations to queue data locally so that saturation of the storage network does not happen.
The performance monitoring tools from VMware do not interact directly with the storage platform, so there is no way to get an in-depth end-to-end view of what is happening with your storage. The VMware side only tells you about the virtual disk performance of the host. While this information can point you in the right direction for locating a storage contention issue, it does not correlate things such as high virtual disk latency being a result of limited IOPS on a single LUN.
There are several ways to approach this problem. One is to optimize the way that a host deals with shared storage via a reduction in the overall needed storage space. Also, attempting to orchestrate the flow of data from all hosts as if they were a single host can help ease saturation on the storage network. Products such as Virsto try to solve these issues at the host level. Many other vendors are jumping in to solve the storage management conundrum as well. Akorri offers the Balance Point virtual appliance which aims to manage the mapping of virtual servers to physical storage. It can notify IT of potential problems in capacity, I/O or failures.
I believe the solutions that can deliver the most impact will have to come from the storage vendors themselves. They will have to develop software (similar to the Storage Replication Adapters available for SRM) that hooks into the vSphere API and allows a real time analysis of storage performance in relation to activity in the virtual infrastructure. The two way communication will need to allow vSphere to manage I/O at the infrastructure level, attempting to even load over time, while the storage platform will need to dynamically increase spindle count and balance storage across tiers. This two way balance is the only way to properly manage storage in a virtual environment. Deduplication will also be key in reducing the amount of storage used by a virtual environment as well as increasing the performance of the SAN internally.