9 Months at PernixData: Recapping My Best Career Decision

It’s hard to believe that my one year anniversary at PernixData is in 3 months. I have to say that coming here has been the best career decision I’ve ever made. I’m proud to be a part of the fastest growing software infrastructure company in history and surrounded by highly talented and passionate people. Having come from the customer side, specifically a medical practice, it’s interesting to be part of a software company and be a part of a startup.

PernixData is now a 3 year old company and I previously didn’t think I would ever join a startup. When I considered other opportunities at other young companies in the past, my family always cautioned me on having stability. I stayed at a 30 year old company for 10 years in the name of comfort and stability and I have some regrets about it. Louisiana isn’t busting at the seams with jobs in enterprise IT so the associated risk of joining a startup and losing my job one day weighed heavily on me. The security of an established organization was very comforting but I recognized that my growth was extremely limited due to the size of the organization and their plans for growth. Building the proverbial IT mansion was fun because I left the organization with a solid infrastructure but after the projects were over, the upkeep was minimal and the days became mundane. The decision to move on came down to fulfilling a desire to grow professionally by facing a new challenge outside of out of my comfort zone.

When going to a startup there can be a lot of risk and a lot of reward. Aside from believing in the architecture, what gave me comfort in joining a young company is it’s leaders. Satyam Vaghani and Poojan Kumar aren’t household names but they were both seasoned VMware alum that had a vision and brought along a team of world class developers that could execute that vision.

There’s risk with everything in life, but I opted to set aside my overly cautious feelings about job security and jumped right in because I only see PernixData continuing to grow.

The transition from Director of IT to Systems Engineer was exactly the change I was looking for because I was burnt out with operations. The career change also gave me the opportunity to explore existing, new, and upcoming technologies and understand how they relate to FVP. In my previous role, my experience with various hardware and software configurations were limited because there I felt learning about them wasn’t beneficial because they weren’t relevant to my job nor would we ever need to implement a solution like it. (Note: I recognize this was a terrible mindset to have and have since changed.)

As the Director of IT of a small company, I was responsible for maintaining the infrastructure and I was the manager of a desktop tech. The desktop guy was great at his job but I didn’t have a peer to collaborate with. That’s why starting a VMUG in Louisiana was important to me — I wanted a community of peers to share and learn with about virtualization.

The PernixData SE team is a great example of the peer community that I wanted to be part of. I’m surrounded by around 20 other engineers that have come from various backgrounds such as a fellow IT Director, virtualization admins, VMware instructor, and experienced SEs. Each of us has worked in different verticals and with different applications, hardware, and end users. This diversity allows each of us to bring our unique experiences and contribute them to the team and further develop a highly skilled and technical team. I’m also very proud that over 90% of our team are VMware vExperts.

What has been the most fun about working at PernixData is meeting people across the country and challenging them to re-think how they purchase storage and drive application performance. During the POC process, I love talking about technology with customers, learning about each company’s environments and challenges, and ultimately letting FVP speak for itself. In my opinion, being able to stand behind the product that you sell is ultimately what removes some of the challenges of being in sales and makes it enjoyable.

If you work for PernixData at HQ, there’s a lot of perks but as a remote employee I don’t benefit from but working at home is a great supplement. I don’t know how I could ever go back to working in an office, that’s for sure! Personally, the transition from office worker to teleworker hasn’t been difficult because I talk to quite a few customers everyday and always staying in touch with team members.

On a personal note, my wife is almost always home because she works at night as a registered nurse. For some, working from home while their spouse is there has presented challenges. This hasn’t been the case for us but YMMV! Another perk that I enjoy about working from home is being to take my kids to school occasionally and always being here when they get home from school. Once they’re home, they love to come into my office and keep me company for the rest of the day. It’s not always unicorns and rainbows though, I have to kick them out quite a bit!

Overall, I’m very pleased with how the last 9 months have turned out and always looking forward to the next day.

Advertisements

Updating the firmware on a Micron p420m

Update: Micron released B218 in July 2015 to resolve a critical issue with command time outs. Micron highly recommends upgrading to this firmware release.

A common PCIe flash card for PernixData customers to use is the Micron P420m in their environment. It’s a very high performing and cost effective PCIe card and has a variety of applications.

Like all hardware devices, the p420m has firmware that occasionally needs to be updated. To perform the firmware update, we’re going to download the Micron rssdm utility (packaged with the ESXi drivers) on Micron’s site in the Support Pack for Linux and VMware package. As of January 2015, the support pack B145.03 from September 2014 is still current.

The first step to determining which firmware version the card is running is to install rssdm. Put the host into maintenance mode, copy the vib for your version of ESXi to the host, and run the esxcli software vib install -v command and reboot the host.

Once the host is back up, log in and execute the /opt/micron/bin/rssdm -L command to see the firmware version of the card.

micron-rssdm-output

As you can see, my card is running firmware version B2100600 and needs to be updated. At the time this article is posted, current firmware version is B2120500. We’re going to copy the new firmware to the host or shared datastore and perform the upgrade.

With the host in maintenance mode and the device removed from the FVP Cluster, copy the B145.03.00.ubi firmware image downloaded from the Micron Support Pack above to a location accessible by the host. The B145.03.00.ubi file will be in the Unified Image folder.

Then execute /opt/micron/bin/rssdm -T /path/to/file/B145.03.00.ubi -n 0

micron-firmware-update

Once it’s complete, reboot the host.

When the host is back up, verify the new firmware is active.

micron-new-firmware

Secure erase the drive by executing: /opt/micron/bin/rssdm -X -n 0 -p ffff

Screen Shot 2015-01-19 at 10.40.10 PM

Note: For those interested, -X is to perform the secure erase, -n is to specify the drive ID, and -p is for password (default is ffff)

After the secure erase is complete, remove the host out of maintenance mode and add the device back to the FVP Cluster.

Use Case for Invalidating PernixData Read Cache

When sizing an acceleration resource for PernixData FVP, we look to recommend a resource that can contain the working set of data for the VMs using that resource. The majority of the time this isn’t a problem. However, we recently came across a use case where the working set of the VM was so large that the VM used the whole flash device and began garbage collection after just 3 days. Let’s take a closer look at what’s going on and how we found a feasible solution for the customer.
The workload is a batch processing application that runs once a day and churns through a lot of data during it’s processing time. Before implementing FVP, this was taking approximately 2.25 hours. The first day that it ran after installing FVP, the processing time was reduced a noticeable amount as FVP accelerates every write operation in write back but caches blocks for read acceleration as the VM requests them. By the 2nd or 3rd day, batch processing time was nearly reduced by 50% and customer was thrilled. But the 4th day, processing time was creeping back up to around 2.25 hours. It turned out that even with an 800 GB SSD, the drive filled up and was affected by frequent write amplification and garbage collection to take on newly read blocks and service the writes. So what about stringing some SSDs together in RAID0?
This is a very common question that we get so let’s investigate SSD and RAID a bit.
First, it’s important to understand that SSDs don’t fail like HDD. As SSDs near failure, they might continue to service IO but at an extremely high and invariable latency whereas we typically see consistent performance. In turn, a single SSD that is failing will drag down performance of the RAID array because the array can only performa at the latency and throughput of the slowest SSDs.
As a side note, FVP has adaptive resource management. As soon as FVP determines that going to a given flash device is slower than simply using the backend SAN, it will correctly deactivate said flash device from the FVP cluster. This is technology that is built from ground up to deal with the failure characteristics of SSDs (i..e good ssd vs slow SSD vs failed SSD). RAID was never built to make a distinction between a slow SSD and a failed SSD. Being highly available with an impractical-to-use flash device is pointless — that is actually worse than being not available.
The customer is also using blade servers so a larger capacity PCIe card isn’t an option. So the customer has the largest available capacity in SSD format. Currently the customer needed a solution to consistently achieve the reduction in batch processing time.
What can we do?
PernixData FVP ships with some very robust PowerShell cmdlets that allows us to manage the environment and automate all the things. We first considered using PowerShell to remove the VM from the FVP cluster every few days to force the cache to drop but that would also lose all the VM performance graphs. Instead, our solution was to use PowerShell to blacklist the VM every 3 days. The customer is happy with this solution because even on the first run with batch processing and FVP, run time was still better than straight to the array.
What does this look like?
#Setup some variables
$fvp_server = "fvpservername"
$fvp_username = "domain\username"
#TODO: This isn't very scalable for large environment. I know there's a better way.
$vms = "vm01", "vm02"
#A file we're going to use to store the password in an encrypted format
$passwordfile = "c:\path\fvp_enc.txt"

import-module prnxcli

# Authentication and connection strings removed for brevity. 

#Loop through the list of VMs and set each to be blacklisted, wait 30 seconds then add them back to cluster in WB
foreach ($vm in $vms)
{
     write-host "Blacklisting $vm to invalidate cache"
     Set-PrnxProperty $vm -Name cachePolicy –value 1
     Start-Sleep -seconds 10
     write-host “Adding $vm back to FVP cluster in write back"
     Set-PrnxProperty $vm -Name cachePolicy -value 3
     Set-PrnxAccelerationPolicy -WaitTimeSeconds 30 -name $vm -WB -NumWBPeers 1 -NumWBExternalPeers 0
}

Disconnect-PrnxServer

This scenario isn’t very typical but with large data sets, sometimes you need to get creative!

Clone this repo on GitHub to get the full script or fork and make it fancy! https://github.com/bdwill/prnx-invalidate-cache