Updating the firmware on a Micron p420m

Update: Micron released B218 in July 2015 to resolve a critical issue with command time outs. Micron highly recommends upgrading to this firmware release.

A common PCIe flash card for PernixData customers to use is the Micron P420m in their environment. It’s a very high performing and cost effective PCIe card and has a variety of applications.

Like all hardware devices, the p420m has firmware that occasionally needs to be updated. To perform the firmware update, we’re going to download the Micron rssdm utility (packaged with the ESXi drivers) on Micron’s site in the Support Pack for Linux and VMware package. As of January 2015, the support pack B145.03 from September 2014 is still current.

The first step to determining which firmware version the card is running is to install rssdm. Put the host into maintenance mode, copy the vib for your version of ESXi to the host, and run the esxcli software vib install -v command and reboot the host.

Once the host is back up, log in and execute the /opt/micron/bin/rssdm -L command to see the firmware version of the card.

micron-rssdm-output

As you can see, my card is running firmware version B2100600 and needs to be updated. At the time this article is posted, current firmware version is B2120500. We’re going to copy the new firmware to the host or shared datastore and perform the upgrade.

With the host in maintenance mode and the device removed from the FVP Cluster, copy the B145.03.00.ubi firmware image downloaded from the Micron Support Pack above to a location accessible by the host. The B145.03.00.ubi file will be in the Unified Image folder.

Then execute /opt/micron/bin/rssdm -T /path/to/file/B145.03.00.ubi -n 0

micron-firmware-update

Once it’s complete, reboot the host.

When the host is back up, verify the new firmware is active.

micron-new-firmware

Secure erase the drive by executing: /opt/micron/bin/rssdm -X -n 0 -p ffff

Screen Shot 2015-01-19 at 10.40.10 PM

Note: For those interested, -X is to perform the secure erase, -n is to specify the drive ID, and -p is for password (default is ffff)

After the secure erase is complete, remove the host out of maintenance mode and add the device back to the FVP Cluster.

Use Case for Invalidating PernixData Read Cache

When sizing an acceleration resource for PernixData FVP, we look to recommend a resource that can contain the working set of data for the VMs using that resource. The majority of the time this isn’t a problem. However, we recently came across a use case where the working set of the VM was so large that the VM used the whole flash device and began garbage collection after just 3 days. Let’s take a closer look at what’s going on and how we found a feasible solution for the customer.
The workload is a batch processing application that runs once a day and churns through a lot of data during it’s processing time. Before implementing FVP, this was taking approximately 2.25 hours. The first day that it ran after installing FVP, the processing time was reduced a noticeable amount as FVP accelerates every write operation in write back but caches blocks for read acceleration as the VM requests them. By the 2nd or 3rd day, batch processing time was nearly reduced by 50% and customer was thrilled. But the 4th day, processing time was creeping back up to around 2.25 hours. It turned out that even with an 800 GB SSD, the drive filled up and was affected by frequent write amplification and garbage collection to take on newly read blocks and service the writes. So what about stringing some SSDs together in RAID0?
This is a very common question that we get so let’s investigate SSD and RAID a bit.
First, it’s important to understand that SSDs don’t fail like HDD. As SSDs near failure, they might continue to service IO but at an extremely high and invariable latency whereas we typically see consistent performance. In turn, a single SSD that is failing will drag down performance of the RAID array because the array can only performa at the latency and throughput of the slowest SSDs.
As a side note, FVP has adaptive resource management. As soon as FVP determines that going to a given flash device is slower than simply using the backend SAN, it will correctly deactivate said flash device from the FVP cluster. This is technology that is built from ground up to deal with the failure characteristics of SSDs (i..e good ssd vs slow SSD vs failed SSD). RAID was never built to make a distinction between a slow SSD and a failed SSD. Being highly available with an impractical-to-use flash device is pointless — that is actually worse than being not available.
The customer is also using blade servers so a larger capacity PCIe card isn’t an option. So the customer has the largest available capacity in SSD format. Currently the customer needed a solution to consistently achieve the reduction in batch processing time.
What can we do?
PernixData FVP ships with some very robust PowerShell cmdlets that allows us to manage the environment and automate all the things. We first considered using PowerShell to remove the VM from the FVP cluster every few days to force the cache to drop but that would also lose all the VM performance graphs. Instead, our solution was to use PowerShell to blacklist the VM every 3 days. The customer is happy with this solution because even on the first run with batch processing and FVP, run time was still better than straight to the array.
What does this look like?
#Setup some variables
$fvp_server = "fvpservername"
$fvp_username = "domain\username"
#TODO: This isn't very scalable for large environment. I know there's a better way.
$vms = "vm01", "vm02"
#A file we're going to use to store the password in an encrypted format
$passwordfile = "c:\path\fvp_enc.txt"

import-module prnxcli

# Authentication and connection strings removed for brevity. 

#Loop through the list of VMs and set each to be blacklisted, wait 30 seconds then add them back to cluster in WB
foreach ($vm in $vms)
{
     write-host "Blacklisting $vm to invalidate cache"
     Set-PrnxProperty $vm -Name cachePolicy –value 1
     Start-Sleep -seconds 10
     write-host “Adding $vm back to FVP cluster in write back"
     Set-PrnxProperty $vm -Name cachePolicy -value 3
     Set-PrnxAccelerationPolicy -WaitTimeSeconds 30 -name $vm -WB -NumWBPeers 1 -NumWBExternalPeers 0
}

Disconnect-PrnxServer

This scenario isn’t very typical but with large data sets, sometimes you need to get creative!

Clone this repo on GitHub to get the full script or fork and make it fancy! https://github.com/bdwill/prnx-invalidate-cache