For some time, Enterprise Storage vendors have touted Solid State Devices (SSDs) as a true game changer. They talk about nearly infinite performance and little energy usage compared to spinning disk drives. They promote dynamic tiering software as the best way to assure that the workloads are optimized for SSDs. But will slapping a few SSDs into a storage frame really turn a Mack truck into a Ferrari? And are SSDs truly the land of milk and honey they are portrayed to be? While there is a lot of truth to the claims, there are many things you are not hearing about SSDs. Here are some points that storage vendors should be telling you but are not talking about very much.
1. The code that manages SSDs is just as complicated as what runs our storage arrays
Under the covers, an SSD is a combination of flash memory and a storage controller. The controller has sophisticated algorithms to manage the flash. For example, most SSDs can read at the page level but must do block erasures. To facilitate this, an SSD uses a log structured array. If you have been around for a while, recall how the STK Iceberg array worked. If you are newer to storage, the Network Appliance WAFL is fundamentally based on a log structured array design. With log structured arrays, there needs to be active “garbage collection” which is used to reclaim space that has been invalidated due to rewrites. Needless to say this introduces quite a bit of complexity.
2. Our storage array algorithms were designed for hard disk drives
For the most part, the microcode used by all of the high end Enterprise Storage arrays was developed for hard disk drives. For example, when a read miss occurs a storage controller typically will do full track staging even if only a 4 KB page was requested. This made great sense for hard disk drives since it is more efficient to read a larger amount of data in the hopes that reading other pages on the track will result in subsequent cache hits. This does not make sense for SSDs since the overhead for reading a track is not much different than reading an equivalent capacity with a number of individual page requests. Another example is with point in time copies from an SSD to an HDD raid array. Since the SSD can run many more I/O per second than the HDD, the point in time copy may back up because of HDD contention and cause performance delays on the host volumes allocated to the SSD.
3. Guess what, our storage array hardware was designed for hard disk drives too
A single SSD may be capable of reading tens of thousands of I/O per second. Typically the device adapters in a storage system may be fully saturated in terms of I/O capability with only a few SSDs installed. Theoretically it does not take many terabytes of SSDs to uncover bottlenecks in storage systems that can perform very well with hundreds of terabytes of hard disk drives.
4. SSD internal algorithms may have an intermittent effect on performance
Besides managing the log structured array, the SSD controller also implements algorithms to maintain data integrity. For example, wear leveling avoids overuse of certain data blocks which could cause flash cells to wear out prematurely. If cells do wear out, they must be removed from the available free space pool. Reads may affect the voltage on nearby cells over time if they are not rewritten. The controller forces rewrites of static data periodically to avoid errors and data loss caused by this read disturb phenomena. Most SSDs incorporate RAID under the covers to provide redundancy in the event of cell failures. Since all of these algorithms require extra I/O operations on the flash, they can affect SSD performance when they kick in.
5. SSD performance may not be stable until they have been in use for a while
When you take an SSD off the shelf, the performance will look outstanding until you have written the total capacity of all the flash memory within the SSD. Note that SSDs are over provisioned so this is usually much greater than (up to 2X) their rated capacity. It is important to condition the SSD by writing the full capacity including the over provisioned space at least one time before conducting any performance tests. This will assure that the internal wear leveling and garbage collection algorithms kick in. And don’t just write all zeros as this may result in no actual data being written to the flash cells.
6. Dynamic tiering puts the right workload on your SSDs only if history repeats itself
Dynamic tiering is outstanding technology based on a simple premise: use workload patterns to decide which data belongs on which tier of storage. All forms of dynamic tiering will attempt to put data that has many small block read misses onto an SSD while keeping data that is mostly accessed sequentially on HDDs. But these decisions are based on past history. If workload patterns change, the dynamic part of dynamic tiering will kick in and move data as appropriate. However if your workload patterns are not very predictable, there is no way that dynamic tiering can stay ahead of the curve and you may just wind up with a lot of churn and little performance benefit.
7. Quality is critical and we depend on our SSD vendors to provide it
All of the enterprise storage vendors source their SSDs from either an outside company or in the case of Hitachi, a different division. That means the people developing the storage systems don’t collaborate very closely with the SSD developers. Of course the same could be said for HDDs but they have been around much longer and reliability characteristics are much better known. With SSDs the storage system vendors have no choice but to rely on the SSD vendors to guarantee their quality. And if quality problems do arise, you can be certain that the SSD engineers will be called in to investigate. The storage vendors then have the unenviable position of mediating between an end user and a supplier. This means that the ultimate resolution could take a long time.
Go into SSD Deployment with your Eyes Open
Of course SSDs have many outstanding attributes and often deliver order of magnitude performance improvements compared to traditional hard disk drives. They can be a real difference maker in modern Enterprise Storage systems. But you need to go into any SSD deployment with your eyes open.
IntelliMagic Vision provides you with the tools to keep a careful eye on performance of your Enterprise Storage arrays including any SSDs that may be installed. If you are embarking on an SSD project or already have one going, consider IntelliMagic Vision as a key piece of the performance management puzzle.
An Update on zEDC and the Nest Accelerator Unit (NXU)
Advancements similar to the NXU are likely to become more commonplace since raw processor speeds have plateaued. Specialized processing drives new metrics to manage an already complex system. What’s affected? How will you keep up?
Top ‘IntelliMagic zAcademy’ Webinars of 2023
View the top rated mainframe performance webinars of 2023 covering insights on the z16, overcoming skills gap challenges, understanding z/OS configuration, and more!
z/OS Connect EE: Strategic On-ramp to the Mainframe
In this reprint from Cheryl Watson’s Tuning Letter, Todd Havekost describes how to use z/OS Connect's SMF records to monitor and manage the performance experienced by exploiters of z/OS Connect’s services