Sunday, December 29, 2013

Part 4 - Architecting vSphere - Remember the Design Dimensions & Process - From My vForum Prezo!

This article is the Part 4 of the Series "Architecting vSphere Environments". Here are the other 3 parts which I would highly recommend to read:-

I have a very strong feeling that this should actually be the Part 1 of this entire series. In this article we will have a quick look at the different facets of a vSphere Design and also throw some light on the processes and procedures one needs to follow to create a successful architecture for a vSphere or that matter any IT infrastructure.

While I write this, I realize that processes can be a little boring as compared to technical stuff. However in my experience, no matter how technical one is, unless and until you have a correct approach to an architecture, you will end up failing 9 times out of 10. This approach to architecture forms the Process.

Let us have a quick look on some of the dimensions or facets which are involved in designing and architecting a vSphere Environment and briefly discuss them before looking at the stages in  design process. This time around I would like to give the content credit to VMware vSphere Design Book authors - Forbes Guthrie, Scott Lowe & Maish Saidel-Keesing. The way they explain this topic in the mentioned book is absolutely fantastic. The slide below depicts the same:-



I have seen articles about this concept from this book before. I still wanted to include this in my presentation and this article since I experience these facets on a daily basis while doing projects of  small to large scale. 

If you take up any environment which you need to architect, you would need to consider the Technical Facet, Organizational Facet & the Operational Facet. With each facet you need to ask yourself and the project members, questions would help you to create a design which not only help meet the requirements but also helps you define the scope of the project.

  • With Technical Dimension, you would go into your favorite questions which would usually be questions with "WHAT"?
  • With Organizational Dimension, you start looking at Responsibility, Authority and Accountability related concerns, hence the questions begin with "WHO"
  • &, With Operational Dimension, you look at the most important part of an Architecture which might impact the Operational Procedures and Processes.

Therefore, the above mentioned facets are very important and it is critical that we give them utmost importance in the entire process of Architecting a vSphere environment. With this, let's see what are the various stages in a design process:-



If you carefully walk through all the stages in the design process, you would end up with a successful Design and the right tools to implement and validate the design. With this I will close this article and I hope the recommendations in this article will help you adopt the right strategy when you architect a vSphere environment for any organization.


Share & Spread the Knowledge!!

Monday, December 16, 2013

Part 3 - Architecting Storage for vSphere Environments - A Scoop from my vForum Prezo!


In continuation to the series on "Architecting vSphere Environments", this post talks about Architecting the storage for a vSphere Infrastructure. For those who have not read the previous parts of this series, I would highly recommend you go through them in order to get a complete picture on the  considerations which matter the most while designing the various components of a vSphere Infrastructure.

Here are the links to the parts written before:-

Considering you have read the other parts, you would know that I have been talking about "Key Considerations" only in this series of articles, hence this post would be no different and would only talk about the most important points to keep in mind while designing the STORAGE architecture on which you will run your virtual machines. As mentioned before this comes from my experiences which I have gained from the field and from advises which I have read & discussed with a lot of Gurus in the VMware community.

With the changes in the storage arena in the past 2 to 3 years, I will not only talk about traditional storage design, but would also throw in some advice on the strategy of adopting Software Defined Storage (SDS). The slide below indicates the same.




I would start with talking about the Traditional Storage and the key areas. Let's begin with talking about IOPS (Input/Output Per Second). Have a look at the slide below.





The credit for the numbers and formula shown in the above slide goes to Duncan Epping. He has a great article which explains IOPS. Probably the first search result on Google if you search for the keyword "IOPS". I included this into my presentation because this is still the most miscalculated and ignored area in more than 50% of virtual infrastructures. In my experience only 2 out of 10 customers I meet discuss IOPS. Such facts worry me as deep inside I know that someday the Virtual Infrastructure would come down like a pack of cards if the storage is not sized appropriately. With this let's look at some of the key areas around IOPS.

  • Size for Performance & Not Capacity - Your storage array cannot be sized for the capacity of data which you need to store. You would always have to size for the Performance which you need. In 90% of the cases you would need to buy more disks than you need, in order to make sure that you meet the IOPS requirements of the workloads which you are planning to run on a Volume/LUN/Datastore.

  • Front End vs. Back End IOPS - This is the most common mistake which is committed while sizing storage. Though your intent might be correct to size the storage on the basis of workload requirements, please remember that workload demands FRONT-END IOPS while Storage Arrays provides BACK-END IOPS. In order to convert Front-end IOPS you need to consider Read/Write ratio of IOPS, the type of disk being used (i.e. SSD, SAS, SATA etc) and finally the RAID Penalty. I have explained this concept with the formula above (Courtesy - Duncan Epping)

An application owner asks you for 1000 IOPS for a workload which has 40% Reads and 60% Writes. These 1000 IOPS are front end IOPS. Here is how you will determine the Back-end IOPS on the basis of which you will architect the Volume/Datastore or may be buy disks in case you are into the procurement cycles.

Back-End IOPS = (Front-End IOPS X % READ) + ((Front-End IOPS X % Write) X RAID Penalty)

The RAID Penalty is shown in the slide above along with the number of IOPS which you receive with different types of disks available for a storage array today. So considering our example, if we are choosing RAID 5 for this workload, the raid penalty would be 4 IOPS. Let's do the math now:-


Back-End IOPS = (1000 X 40%) + ((1000 X 60%) X 4)
                           = (1000 X 0.4) + ((1000 X 0.6) X 4)
                           = (400) + (600 X 4)
                           = 400 + 2400
                           = 2800 IOPS

So you can Clearly see that 1000 Front-End IOPS mean 2800 Back-End IOPS. That is 2.8 times the actual requirement. While you can get 1000 IOPS from  just 7, 15k RPM SAS drives, you need a whopping 17 Disks to get 100 Front-End IOPS.

I hope that gives you a clear picture on the fact that you not only need to size for performance, you also need to size correctly for performance, since storage can make or break your virtual infrastructure. If you don't believe ask the VMware Technical Support team the next time you speak to them. A whopping 80% of the performance issue case which they deal with are related to a poorly designed storage.


Along with IOPS let us see a few more areas where we need to be cautious.



  • DO NOT simply upgrade from an older version of VMFS to a new version. Please note that I am referring to major releases only. In-fact to be more precise I am referring to an upgrade of VMFS 3.X to VMFS 5.X. For those who follow VMFS (Proprietary Virtual Machine File System developed by VMware) would know that VMFS 5.x was introduced with vSphere 5.0. Prior to this the version of VMFS was 3.x. VMware has done some major changes to VMFS 5.0 which changes the way the blocks and metadata functions. If you are upgrading from vSphere 4.1 or before to vSphere 5.x, then I would highly recommend to re-format the datastores and create them afresh to get the latest version of VMFS. An in-place upgrade from VMFS 3.x to 5.x would bring in the legacy features of the file-system and that can have a performance impact on operations such as vMotion & SCSI locking. You should empty each datastore by using Storage vMotion format it and then move the workloads back on it. Time consuming but absolutely worth it.
  • If you are using IP based storage, then you should have a separate IP fabric for the transport of storage data. This ensures security due to isolation and high performance since you have a dedicated network card / switching gear for storage. This is a very basic recommendation and people tend to overlook it. Please invest here and you would see that IP storage working at par with FC Storage.
  • Storage DRS is cool but should be partially used if you have a storage with Auto-Tiered disks. Auto-Tiering is usually available in the newer arrays and allows workloads to access HOT data (frequently used) off the fastest disks (SSD), while the cool data (not frequently used) is staged off to slower spindles such as SAS and SATA. The IO metric feature available within Storage DRS is made to improve the performance for NON-AUTO Tiered storage only. Hence, if you have an Auto-Tiered Array, please disable this feature since it will not do any good and simply become an overhead. Having said that, SDRS also gives you the feature of automatically placing virtual machines when they are first created on an appropriate datastore on the basis of capacity and performance requirements (with storage profiles). Hence use the Auto Tier feature of your storage to manage performance while use SDRS only for initial placement of VM.
  • With vSphere 5.5 the size of VMDKs and RDMs have been increased. They are now at 62 TB limit. Use them if you have monster workloads in your infrastructure and only if you need them. Over-provisioning in a VMware environment is as bad as under provisioning.
  • IOPS help you size, but throughput and multipathing calculations are also critical since they are the bridge between the source (servers) & the destination (Spindles). make sure you have the correct path policies and appropriate amount of throughput, qdepth for smooth transmission of data packets across fabric.
  • Coming to the last point, it is important to understand the nuances of the VMFS file system. Thin vs Thick disks, Thin on Thick or Thick on Thick etc are various considerations which you need to keep in mind. I would recommend you read this excellent VMware blog article written by Cormac Hogan, which talks about almost all the possible file system combinations which one can have. My 2 cents on this would be to keep things simple and to use EAGERED ZEROED THICK VMDKs for all your latency sensitive virtual machines. This will save you that extra time which ESXi takes to zero down the blocks before it could write new data in case of LAZY ZEROED THICK VMDKs. You do not have to panic in case you are reading this now and already have your latency sensitive workloads running on Lazy Zeroed Disks. There are a number of ways by which you can convert the existing Lazy Zeroed Disks to Eagered Zero. The slide below shows all the options.



Finally let's quickly move on to the things from the New World of Software Defined everything. Yesssss.. I am talking about Storage of the new era a.k.a. SOFTWARE DEFINED STORAGE (SDS). If you have a good memory, you would remember that I told you to size your storage for Performance and not Capacity. I am taking back my words right away and would want to tell you that with Software Defined Storage, you no longer have to Size for Performance. You just need to Size for Capacity and Availability. 

Let's have a look at the slide below and then we will discuss the key areas of software defined storage as conceptualized by VMware.



The above slide clearly indicates the 2 basic solutions from VMware in the arena of Software Defined Storage. The first is vFRC (vSphere Flash Read Cache) which allows you to pool cache either from SSD or PCI Cards as the first read device, hence improving read performance for read intensive workloads such as Collaboration Apps, Databases etc. Imagine this as ripping of the cache from your storage array and bringing it close to the ESXi server. This cuts down the travel path and hence improves the performance incredibly.

Another feather in the cap of Software Defined Storage is VSAN. As Rawlinson of Punching Clouds fame always says, it's VSAN not with a small "v" like other VMware products. Only he knows the secret behind the uppercase "V" of VSAN. Earlier this year I met Rawlinson at a technology event in Malaysia and he gave me some great insights on VSAN. 

VSAN uses the local disks installed on an ESXi server which has to be combination of SSD + SAS or SATA drives. The entire storage from all the ESXi hosts in a cluster (currently the limit is minimum 3 and maximum 8 nodes) is pooled together in 2 tiers. The SSD tier for Performance and the SAS/SATA tier for capacity. To read more about VSAN I would suggest you to read a series of articles written by Duncan available on this link and Cormac on this link.

Remember VSAN is in beta right now, hence you can look at trying it for the use cases which I have mentioned in the slide above and not for your core production machines. 

With this I will close this article and I hope this will guide you to make the right choices for choosing, architecting and using storage be it traditional or modern for your vSphere Environments. feel free to comment and share your thoughts and opinions around this post.


Share & Spread the Knowledge!!



Wednesday, December 11, 2013

Using Secondary Management Network for vSphere Replication!!

In the past I have written a lot about using vSphere Replication and the best practices around it. This time around I wanted to share an experience of implementation of SRM and vSphere Replication in a brownfield Virtual Infrastructure.

As we all know that vSphere Replication uses the management interface for the ESXi servers to send the replication traffic to the DR site vSphere Replication Appliance, it is important that we understand the network flow clearly before I do a deep dive into configuration of the networks. The diagram below illustrates on how the data flows. I have tried to include all the objects in the diagram, involved in the implementation to avoid any confusions.

Let's see how the traffic flows in generic sense and then we will add IP addressing to it:-



1- Changed Blocks are captured by the VR Filter on the ESXi server in primary site.

2- This data is sent to DR Site VR Appliance using primary management interface of ESXi server.

3- VR Appliance in the DR Site passed the data to the ESXi servers in the DR Site using the NFC Service.

4- This data is then written on to the designated DR Site datastore.

Note -  Just reverse this sequence when you will do a reverse replication while doing re-protect in SRM.

We will now look into a real life setup and see how this replication will flow. Let me give you a quick view of the setup which I have along with the IP addresses:-













Let's look at each component one by one:-

1 - This is the IP address of the vCenter Server. Notice that the IP sub-nets are different in the Primary Site and DR Site.

2- This is the IP address of the SRM Server. Notice that the IP sub-nets are different in the Primary Site and DR Site similar to vCenter Server.

3- The IP address of the VRA server is not in the same range. The reason is that we do not want to use same ip segment as the management network. In this case we have a Point to Point connectivity between site and the IP configured is on that 10.12.12.x sub-net. This is configured on both sites as the VRA server will receive the traffic from ESXi servers on this interface. Remember this would be a Virtual Machine port group on which this appliance will connect. 

The default gateway for this Subnet is 10.12.12.1 at Primary Site and 10.11.12.1 at the DR Site.

4- VMK0 is the primary management network interface. This is used to manage the ESXi servers in the primary site. If you notice ESXi and vCenter are on the same sub-net. 

5- VMK1 is configured for vMotion on a Non-Routable VLAN. That is the reason you have a completely different IP segment here. Not our concern anyways.

6- VMK2 is the third VMKernel interface I have configured. This is to use the Point to Point connectivity for vSphere Replication. I want to the vSphere Replication traffic to go out of this VMK interface and reach the vSphere Replication appliance on the DR Site.

7- Last and one of the most important thing to note that in case of ESXi, the Default Gateway would always be the one which is defined with VMK0. Hence you will notice that all the VMKernel port-groups will have the same default gateway.


The last point here is the problem for me. Since I do not want the vSphere Replication Traffic to hit that gateway (172.16.3.1) in the DR site, when the traffic is sent to the vSphere Replication appliance in that site. I want it to hit the gateway configured for 10.11.12.x sub-net. The default gateway is 10.11.12.1 to be precise.

Now this is not possible until you define a static route which would force the vSphere Replication Traffic to go through the vSphere Replication Interface(VMK2) and then hit the vSphere Replication appliance on the DR Site with that default gateway. Remember you will have to just reverse this action and add a static route on the ESXI servers in the DR site for (10.12.12.1) Default gateway in the primary site.

Here are the commands to do it.

~ # esxcli network ip route ipv4 add --gateway <Gateway for vSphere Replication Subnet> --network <IP range for vSphere Replication Network in DR Site>

So in my case I will run the following command:-

~ # esxcli network ip route ipv4 add --gateway 10.12.12.1 --network 10.11.12.0/50

You would also need to add this line to the rc.local to make this setting consistent across reboots.

~ # vi /etc/rc.local.d/local.sh

Add the following line just before exit command in the script:-

~ # esxcli network ip route ipv4 add --gateway 10.12.12.1 --network 10.11.12.0/50

Save and Exit from this file and you are done on the primary Site. You need to do the same on the ESXi servers in the DR site for reverse replication to work. The command for DR Site ESXi servers would be:-

~ # esxcli network ip route ipv4 add --gateway 10.11.12.1 --network 10.12.12.0/50

Do remember to add this to the local.sh script as you did in the primary site.


Now let's see how the traffic would flow in this case diagrammatically:-























Here is KB articles from VMware which might help you with this setup.

Configuring static routes for vmkernel ports on an ESXi host (2001426)


Hope this makes things easy for you and allows you to setup vSphere Replication on your preferred network interface.

Share & Spread the Knowledge!!




Wednesday, December 4, 2013

Part 2 - Architecting vCenter Single Sign On (SSO) – A Scoop from my vForum Prezo

This article is the second part of the series of articles on - "Architecting vSphere Environments - Everything you wanted to know!"

Although, each part of this series can be used as an individual tool to learn about architecting different components of a VMware vSphere Infrastructure, I would highly recommend that you read the first part before reading this article to understand the context and reason behind me writing this series.


In this article I will specifically talk about Best Practices around vCenter Single Sign-On Server and the related components. I would began this discussion with giving you a bite into the need and importance of vCenter Single-Sign On and later move towards recommendations on how to lay out the architecture of SSO. I would also like to give the credits for these slides to Nick Marshall from VMware. Nick presented this material at vForum Sydney and was kind enough to share it with his global counterparts.
























  • As mentioned in the slide above, vCenter SSO is the Authentication Platform for just the vSphere and related management components. This is very commonly mistaken as an enterprise wide single sign on solution.You do not have to buy a separate license for SSO as it is a part of the vCenter License and installation bundle.
  • SSO was launched with vCenter 5.1 and is now shipped along with vCenter 5.5 as well. SSO forms the authentication domain in a vSphere Infrastructure, hence a user unlike earlier version of vCenter, does not log in directly to vCenter Server. A user when logs into vCenter either via Web Client or C# client (thick client), first hits the SSO server which can be integrated to an AD/LDAP resource for user mapping. At this point a SAML 2.0 token is generated for the user which is exchanged as user credentials for that user to log in to vCenter or other vSphere Components which are supported today by vCenter SSO.
  • No operational SSO means no access to vSphere Components, hence it is the first component which needs to be designed and implemented to have a stable access mechanism.

With this I will move to the 2nd slide which talks about the VMware solutions which are integrated with vCenter SSO today. This makes it even more obvious that SSO is here to stay and we need to ensure that we design & implement it properly for a stable infrastructure.
























  • Nearly all the components in a VMware Stack are integrated with SSO.
  • It is important to note that for vCloud Director the Provider Side of things are integrated with SSO. 
  • From a future perspective, I can clearly see VMware integrating SSO with other components of the management stack in the days to come.

For those who have used SSO with vSphere 5.1 would agree that there were issues & concerns around implementing and using SSO. There was a lot of buzz around the community which was not in favor of the concept of Single Sign-On as a vSphere component. I, being hands on guy would completely agree with the community since I faced many of those issues which made circles around the blogs & forums.

Thanks to the engineering teams at VMware, with vSphere 5.5, the entire SSO was re-written. This was a great move since it not only solved all the issues which were noticed in 5.1, it also improved the performance of the vCenter Server in its new avatar. Let’s have a quick look on what is new with vCenter Single Sign-On 5.5
 





















I believe the slide itself is self-explanatory, however I would like to point out to a few changes which I am impressed with. One being Built-in Replication and the other being Exclusion of Database. With these features you do not have to manually update new roles/users if you have multiple SSO instances. You can just go ahead and update one site and the replication will take care of syncing that information between all the SSO servers which are paired together. With no database, you do not have to run those nasty scripts to ensure you have a working database for SSO. Quite Cool.. Ain't it!


On this note let's see what deployment models & upgrade options you have with vCenter SSO 5.5 in the slide below.






















  • If you upgrade from vCenter 5.1 to vCenter 5.5, you can do so from any of the existing deployment model which you chose while install 5.1.
  • If you have the option of re-installing or if you are installing the vCenter 5.5 for the first time, you do not have to worry about the complex deployment models at all. You can use a Single Virtual Machine for all vCenter components, within same site or across the sites. In case you have 6 or more local vCenter, then you can have a single instance of SSO server, where all the vCenter servers will talk to this SSO server for authentication. This is to avoid multiple streams of replications among the SSO servers within the same site.

The recommendation of having a single virtual machine for all the components of the vCenter Server is showcased in the slide below.





















  • Use the simple installer to have all the components install on the same virtual machine, rather than performing a split install.
  • You can install the database here, however having it on a separate VM would be beneficial when the environment scales.
  • Make sure you give enough compute power to this single virtual machine as it is hosting all the components.

Let us also look at recommendations around multi-site deployment model in the last slide.






















  • Each site runs all its components individually while SSO replication maintains a single SSO domain across sites.
  • Use of Linked Mode configuration can give you a single pane of glass here.
  • So a simple install at each site would be the Best way getting rid of all the SSO nightmares you can think of.

With this, I would close this article. Hopefully you will enjoy reading this and apply the recommendations which are mentioned in this article in your environments. Feel free to leave your thoughts in the comments section.


As mentioned before, I will continue to share stuff around Architecting vSphere in the forthcoming parts. 

Stay tuned!!


Share & Spread the Knowledge!!