darren burnett senior technical support engineer n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Top Support issues and how to solve them – Part II PowerPoint Presentation
Download Presentation
Top Support issues and how to solve them – Part II

Loading in 2 Seconds...

play fullscreen
1 / 80

Top Support issues and how to solve them – Part II - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Darren Burnett Senior Technical Support Engineer. Top Support issues and how to solve them – Part II. Unable to connect to the Service Console. Rebuild Networking. Network Connection Problem. Deleting the vSwitch that vSwif0 is connected Connecting the wrong NICs to vSwitch0 Upgrade issues

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Top Support issues and how to solve them – Part II' - knox


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
rebuild networking
Rebuild Networking

Network Connection Problem.

Deleting the vSwitch that vSwif0 is connected

Connecting the wrong NICs to vSwitch0

Upgrade issues

Incorrect IP Address

External Network Changes

rebuild networking1
Rebuild Networking

At this stage you can no longer connect to your ESX server using VI Client or SSH.

You can connect to the Service Console remotely if you have ILO, DRAC an IP KVM or something similar.

Otherwise, it’s time to use some shoe leather and walk to the server room.

rebuild networking2
Rebuild Networking

The following procedure will work. However it is a quick and inelegant way to get your VI client connected.

Other options include,

Crossover cable connected to a laptop

Adding or removing NICs to a vSwitch

rebuild networking3
Rebuild Networking

Use the esxcfg-vswitch –l command to list all of you vSwitches

rebuild networking4
Rebuild Networking

[root@newross root]# esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch0 32 4 32 vmnic0

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

VM Network portgroup1 0 0 vmnic0

Service Console portgroup0 0 1 vmnic0

VMkernel portgroup7 0 1 vmnic0

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch1 64 2 64 vmnic1

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

vlan100 portgroup10 100 0 vmnic1

install VLAN 310 portgroup6 0 0 vmnic1

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch2 64 2 64 vmnic6

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

crossovercable portgroup9 0 0 vmnic6

rebuild networking5
Rebuild Networking

Delete all vSwitches

esxcfg-vswitch –d vSwitch0

esxcfg-vswitch –d vSwitch1

esxcfg-vswitch –d vSwitch2

rebuild networking6
Rebuild Networking

Create a vSwitch

esxcfg-vswitch -a vSwitch0

Create the Service Console portgroup

esxcfg-vswitch -p "Service Console" vSwitch0

Add a NIC to the vSwitch

esxcfg-vswitch –L vmnic0 vSwitch0

Add a vswif interface and configure

esxcfg-vswif –a vswif0 -p "Service Console" i 10.10.10.3 –n

255.0.0.0

rebuild networking7
Rebuild Networking

Check if you can connect.

Use PING both to and from the ESX server

Try SSH

Try VI Client

rebuild networking8
Rebuild Networking

If you still can’t connect.

Use “esxcfg-nics –l” to list available NICs.

rebuild networking9
Rebuild Networking

[root@newross root]# esxcfg-nics -l

Name PCI Driver Link Speed Duplex Description

vmnic0 03:0c.00 e1000 Up 1000Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic1 03:0c.01 e1000 Down 0Mbps Half Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic8 07:00.00 tg3 Up 1000Mbps Full Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet

vmnic6 08:00.00 tg3 Down 0Mbps Half Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet

rebuild networking10
Rebuild Networking

Remove and add NICs to vSwitch0

Remove

esxcfg-vswitch –U vmnic0 vSwitch0

Add

esxcfg-vswitch –L vmnic8 vSwitch0

rebuild networking11
Rebuild Networking

It might also need a VLAN ID

esxcfg-vswitch –v 101 –p “Service Console” vSwitch0

rebuild networking12
Rebuild Networking

To avoid this issue, be careful when configuring the Service Console virtual NIC or its parent virtual switch property that can affect the Service Console virtual NIC connectivity, for example, the uplink.

If possible, before updating the Service Console virtual NIC, create another independent working Service Console NIC so that, in the event the configuration brings the console NIC down, the second Service Console NIC is still available to repair the configuration.

rebuild networking13
Rebuild Networking

As stated, depending on your environment it may be easier not to delete all your configurations.

It may just require reassigning a NIC or changing VLAN ID.

Check the following guide.

http://www.rtfm-ed.eu/docs/vmwdocs/esx3.x-vc2.x-serviceconsole-guide.pdf

network bond
Network Bond

NICs in a Bond not on the same Broadcast Domain

network bond1
Network Bond

NICs in a Bond should be in the same broadcast domain.

Determine what NICs are in each Bond.

network bond2
Network Bond

[root@cork root]# esxcfg-info|grep -i -A 2 VirtualSwitchImpl

\==+VirtualSwitchImpl :

|----Name.........................................vSwitch0

|----Uplinks......................................vmnic0

--

\==+VirtualSwitchImpl :

|----Name.........................................vSwitch1

|----Uplinks......................................vmnic4

--

\==+VirtualSwitchImpl :

|----Name.........................................vSwitch2

|----Uplinks......................................vmnic1,vmnic3

--

slide21
[root@cork root]# esxcfg-info |grep -i -B 5 hint

\==+PnicImpl :

|----_name..............................................vmnic3

|----_bus...............................................6

|----_slot..............................................3

|----_function..........................................1

|----Network Hint.......................................0 10.16.157.00/255.255.255.192

--

\==+PnicImpl :

|----_name..............................................vmnic0

|----_bus...............................................11

|----_slot..............................................7

|----_function..........................................0

|----Network Hint.......................................0 10.16.156.00/255.255.255.00

--

\==+PnicImpl :

|----_name..............................................vmnic1

|----_bus...............................................12

|----_slot..............................................8

|----_function..........................................0

|----Network Hint.......................................0 10.16.156.00/255.255.255.00

ha and stp
HA and STP

Spanning Tree Protocol

When there is a network change STP can cause a temporary outage on your Network

slide24
HA

An ESX Server will determine that it is isolated after 15 Seconds

Depending on the “Isolation Response” that you have set, all your VMs may power down

slide25
It is therefore worth checking your network to determine if STP can be configured to reduce the temporary network outage
expanding vm with a snapshot
Expanding VM with a Snapshot

You can NOT expand a VM’s VMDK file while it still has snapshots.

e.g.

#ls *

important.vmdk important-000001-delta.vmdk

#vmkfstools –X 20G important.vmdk

expanding vm with a snapshot1
Expanding VM with a Snapshot

If you do, you will now have a VM that won’t boot

expanding vm with a snapshot2
Expanding VM with a Snapshot

Tricking ESX into seeing the expanded VMDK as the original size.

In this example we have a test.vmdk that we expand from 5GB to 6GB

#vmkfstools -X 6G test.vmdk

expanding vm with a snapshot3
Expanding VM with a Snapshot

If we check test.vmdk we see

# Disk DescriptorFile

version=1

CID=3f24a1b3

parentCID=ffffffff

createType="vmfs"

# Extent description

RW 12582912 VMFS "test-flat.vmdk"

# The Disk Data Base

#DDB

ddb.virtualHWVersion = "4"

ddb.geometry.cylinders = "783"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.adapterType = "buslogic"

expanding vm with a snapshot4
Expanding VM with a Snapshot

Original - RW 10485760 VMFS "test-flat.vmdk“

New - RW 12582912 VMFS "test-flat.vmdk“

expanding vm with a snapshot5
Expanding VM with a Snapshot

If we have no “BACKUPS” how do we get the original value?

#grep -i rw test-000001.vmdk

RW 10485760 VMFSSPARSE “test-000001-delta.vmdk"

expanding vm with a snapshot6
Expanding VM with a Snapshot

We change test.vmdk RW value.

# Disk DescriptorFile

version=1

CID=3f24a1b3

parentCID=ffffffff

createType="vmfs"

# Extent description

RW 10485760 VMFS "test-flat.vmdk"

# The Disk Data Base

#DDB

ddb.virtualHWVersion = "4"

ddb.geometry.cylinders = "783"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.adapterType = "buslogic"

expanding vm with a snapshot7
Expanding VM with a Snapshot

Commit The snapshot(s)

#vmware-cmd /pathtovmx/test.vmx removesnapshots

expanding vm with a snapshot8
Expanding VM with a Snapshot

Grow the VMDK file

#vmware-cmd –X 6GB test.vmdk

expanding vm with a snapshot9
Expanding VM with a Snapshot

If needed add a snapshot

#vmware-cmd pathtovmx/test.vmx createsnapshot

<name> <description>

corrupted snapshots
Corrupted Snapshots

In this example we will deal with a corrupt .VMSD file.

Let’s first look at a working .VMSD file, separated into 3 slides for the people at the back

corrupted snapshots1
Corrupted Snapshots

snapshot.lastUID = "4"

snapshot.numSnapshots = "3"

snapshot.current = "4"

snapshot0.uid = "2"

snapshot0.filename = "VC1.3to201_standard-Snapshot2.vmsn"

snapshot0.displayName = "myfirst"

snapshot0.description = "My first test snapshot"

snapshot0.createTimeHigh = "273684"

snapshot0.createTimeLow = "942632403"

snapshot0.numDisks = "1"

snapshot0.disk0.fileName = "VC1.3to201_standard.vmdk"

snapshot0.disk0.node = "scsi0:0"

corrupted snapshots2
Corrupted Snapshots

snapshot.needConsolidate = "FALSE"

snapshot1.uid = "3"

snapshot1.filename = "VC1.3to201_standard-Snapshot3.vmsn"

snapshot1.parent = "2"

snapshot1.displayName = "second"

snapshot1.description = "My second test snapshot"

snapshot1.createTimeHigh = "273684"

snapshot1.createTimeLow = "980947483"

snapshot1.numDisks = "1"

snapshot1.disk0.fileName = "VC1.3to201_standard-000001.vmdk"

snapshot1.disk0.node = "scsi0:0"

corrupted snapshots3
Corrupted Snapshots

snapshot2.uid = "4"

snapshot2.filename = "VC1.3to201_standard-Snapshot4.vmsn"

snapshot2.parent = "3"

snapshot2.displayName = "third"

snapshot2.description = "My third test snapshot"

snapshot2.createTimeHigh = "273684"

snapshot2.createTimeLow = "1088942286"

snapshot2.numDisks = "1"

snapshot2.disk0.fileName = "VC1.3to201_standard-000002.vmdk"

snapshot2.disk0.node = "scsi0:0"

corrupted snapshots4
Corrupted Snapshots

After corruption of the .VMSD file the file now looks like this.

corrupted snapshots6
Corrupted Snapshots

However we see that the snapshots still exist

[root@newross VC1.3to201_standard]# ls

VC1.3to201_standard-000001-delta.vmdk VC1.3to201_standard-Snapshot4.vmsn

VC1.3to201_standard-000001.vmdk VC1.3to201_standard.vmdk

VC1.3to201_standard-000002-delta.vmdk VC1.3to201_standard.vmsd

VC1.3to201_standard-000002.vmdk VC1.3to201_standard.vmx

VC1.3to201_standard-000003-delta.vmdk VC1.3to201_standard.vmxf

VC1.3to201_standard-000003.vmdk vmware-1.log

VC1.3to201_standard-flat.vmdk vmware-2.log

VC1.3to201_standard.nvram vmware-3.log

VC1.3to201_standard-Snapshot2.vmsn vmware.log

VC1.3to201_standard-Snapshot3.vmsn

corrupted snapshots7
Corrupted Snapshots

At this stage rename the

.VMSD file to .VMSD.OLD

corrupted snapshots8
Corrupted Snapshots

We are going to create a new .VMSD file

What kind of magic is required to build a new VMSD file?

corrupted snapshots9
Corrupted Snapshots

Create another snapshot to automatically recreate a .VMSD file

#vmware-cmd VC1.3to201_standard.vmx createsnapshot addedforrecovey "Hope it works"

corrupted snapshots10
Corrupted Snapshots

You wont be able to selectively rollback to a particular snapshot.

You will have to commit them all.

corrupted snapshots11
Corrupted Snapshots

Commit the Snapshots

#vmware-cmd VC1.3to201_standard.vmx removesnapshots

All a bit too easy 

corrupted snapshots12
Corrupted Snapshots

Corrupted Snapshot

corrupted snapshots13
Corrupted Snapshots

What happens if the last snapshot is corrupt?

This can be caused by the VMFS volume being full.

Now there is data loss.

We can try limit this to losing only the last changes since the last snapshot.

corrupted snapshots14
Corrupted Snapshots

Move the last delta file to a temp area (or delete).

corrupted snapshots15
Corrupted Snapshots

Edit the .VMX file and point to the second last -000xx.vmdk file

corrupted snapshots16
Corrupted Snapshots

[root@newross VC1.3to201_standard]# ls

VC1.3to201_standard-000001-delta.vmdk VC1.3to201_standard-Snapshot4.vmsn

VC1.3to201_standard-000001.vmdk VC1.3to201_standard.vmdk

VC1.3to201_standard-000002-delta.vmdk VC1.3to201_standard.vmsd

VC1.3to201_standard-000002.vmdk VC1.3to201_standard.vmx

VC1.3to201_standard-000003-delta.vmdk VC1.3to201_standard.vmxf

VC1.3to201_standard-000003.vmdk vmware-1.log

VC1.3to201_standard-flat.vmdk vmware-2.log

VC1.3to201_standard.nvram vmware-3.log

VC1.3to201_standard-Snapshot2.vmsn vmware.log

VC1.3to201_standard-Snapshot3.vmsn

corrupted snapshots17
Corrupted Snapshots

scsi0:0.present = "TRUE"

scsi0:0.fileName = " VC1.3to201_standard-000003.vmdk"

scsi0:0.present = "TRUE"

scsi0:0.fileName = " VC1.3to201_standard-000002.vmdk"

corrupted snapshots18
Corrupted Snapshots

When the .VMX has been updated to point to the second last snapshot.

Commit the snapshots.

#vmware-cmd VC1.3to201_standard.vmx removesnapshots

corrupted snapshots19
Corrupted Snapshots

Examining the Snapshots.

corrupted snapshots20
Corrupted Snapshots

The original file will contain something similar.

[root@newross VC1.3to201_standard]# more VC1.3to201_standard.vmdk

# Disk DescriptorFile

version=1

CID=9e6bfa08

parentCID=ffffffff

createType="vmfs"

# Extent description

RW 16777216 VMFS "VC1.3to201_standard-flat.vmdk"

# The Disk Data Base

#DDB

ddb.virtualHWVersion = "4"

ddb.geometry.cylinders = "1044"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.adapterType = "lsilogic"

ddb.toolsVersion = "7201"

corrupted snapshots21
Corrupted Snapshots

A snapshot disk can look similar to this

[root@newross VC1.3to201_standard]# more VC1.3to201_standard-000001.vmdk

# Disk DescriptorFile

version=1

CID=9e6bfa08

parentCID=9e6bfa08

createType="vmfsSparse"

parentFileNameHint="VC1.3to201_standard.vmdk"

# Extent description

RW 16777216 VMFSSPARSE "VC1.3to201_standard-000001-delta.vmdk"

# The Disk Data Base

#DDB

corrupted snapshots22
Corrupted Snapshots

[root@newross VC1.3to201_standard]# more VC1.3to201_standard-000007.vmdk

# Disk DescriptorFile

version=1

CID=678cf29b

parentCID=9e6bfa08

createType="vmfsSparse"

parentFileNameHint=" VC1.3to201_standard-00006.vmdk "

# Extent description

RW 16777216 VMFSSPARSE "VC1.3to201_standard-000007-delta.vmdk"

# The Disk Data Base

#DDB

avoiding issues with extents
Avoiding Issues with Extents

When you add an extent to a VMFS volume only one ESX server is aware of the change.

It is best practice to rescan VMFS volumes from all hosts.

Otherwise it is possible to add another extent from another ESX server and cause issues with the VMFS volume.

slide63
#esxcfg-rescan vmhba1

#esxcfg-rescan vmhba2

#service mgmt-vmware restart

recover vmfs
Recover VMFS

How to recover after deleting a VMFS partition

recover vmfs1
Recover VMFS

How to recover after deleting a VMFS partition

Why would somebody delete their partition?

recover vmfs2
Recover VMFS

How to recover after deleting a VMFS partition

Why would somebody delete their partition?

fdisk

dd over the beginning of the disk

Unattended install of Linux with clearpart –all

LUN corruption of partition information

recover vmfs3
Recover VMFS

What options are available?

If the VMFS volume is corrupted or formatted over the recovery procedure is less likely to work.

If it is only the partition information then recreating the partition will most likely bring it back.

Here we will use fdisk to recreate the partition information

recover vmfs4
Recover VMFS

First we need to identify the correct device

ESX 2.X

vmkpcidivy –q vmhba_devs

ESX 3.X

esxcfg-vmhbadevs –m

For this example we will assume it is /dev/sdf

Check that there are no partitions by using the command

fdisk –l /dev/sdf

recover vmfs5
Recover VMFS

Checking the Volume Header of a VMFS3 Volume.

What does it look like?

recover vmfs6
Recover VMFS

f15e2fab000400002c15accef245465e293b00032304a5c5026c00006300616c69726f695f6e756c316e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200001000000000002c00acce014500002400accec64507f7449a00002304a5c5016c000000000000000000000000000000000100040000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

recover vmfs7
Recover VMFS

f15e2fab000400002c15accef245465e293b00032304a5c5026c00006300616c69726f695f6e756c316e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200001000000000002c00acce014500002400accec64507f7449a00002304a5c5016c000000000000000000000000000000000100040000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

recover vmfs8
Recover VMFS

Extracting Volume Header

dd if=/dev/sdf bs=1k count=1 skip=19456 2>/dev/null|od -x -v |awk '{print $2, $3, $4, $5,$6,$7, $8, $9}'|tr -d " "|tr -d '\n‘

magicnumber=${longstring:4:4}${longstring:0:4}

recover vmfs9
Recover VMFS

Recreate the partition using the following commands

fdisk /dev/sdf

n (to create a new partition)

p (to create a primary partition)

1 (to create the 1st partition)

[enter] to keep the default value

[enter] to keep the default value

t (to change the type of partition)

fb (to set the partition as VMFS)

w (to save)

vmkfstools -V (to discover the VMFS)

recover vmfs10
Recover VMFS

If the VMFS still isn’t present this can be due to the fact that it needs to be realigned.

To do this use fdisk again

fdisk /dev/sdf

x (to move to expert mode)

b (to change the beginning of the partition)

128 (to move to the block 128 the beginning of the partition)

w (to save)

vmkfstools -V (to discover the vmfs back)

slide75

Recover VMFS

What happens if the LUN was formatted?

Do you have a

BACKUP?

recover vmfs11
Recover VMFS

At this stage any data that you can get back is a bonus.

If there are still ESX servers with running VMs on the VMFS volume there are three options

Run VMware Converter. Convert the VM doing a V2V conversion.

Run backup software in the VM.

Copy the files from the VM.

recover vmfs12
Recover VMFS

If the VMs are powered down and an ESX server can still see the LUN.

Copy all files (VMDK VMX etc) to another LUN

best practices
Best Practices

Backups

Run vm-support before and after changes. Also periodically and move tar.gz to another location.

Change control for your full environment

If issues are intermittent, record the time and date when this happens.