slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speakers PowerPoint Presentation
Download Presentation
Speakers

Loading in 2 Seconds...

play fullscreen
1 / 135

Speakers - PowerPoint PPT Presentation


  • 427 Views
  • Uploaded on

DirectX ® And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric Rudolph, Software Design Engineer Microsoft Corporation. Speakers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speakers' - rosemary


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

DirectX® And Streaming Video DriversJeff Noyle, Development LeadGary Sullivan, Software Design EngineerWilliam Messmer, Software Design EngineerEric Rudolph,Software Design Engineer Microsoft Corporation

speakers
Speakers
  • “DirectX Graphics Drivers,” Jeff Noyle, Lead Developer, DirectDraw®/Direct3D®, Microsoft Corporation
  • “DirectX VA Video Acceleration Drivers,” Gary Sullivan, Software Design Engineer, DMD Video Services Group, Microsoft Corporation
  • “Writing AVStream Minidrivers for Windows® XP,” William Messmer, Software Design Engineer, Digital Audio-Video, Microsoft Corporation
  • “Testing Your WDM Driver with DirectShow®,” Eric Rudolph, SDE, DirectShow Editing Services, Microsoft
directx graphics drivers jeff noyle development lead directdraw direct3d microsoft corporation
DirectX Graphics DriversJeff NoyleDevelopment LeadDirectDraw/Direct3DMicrosoft Corporation
prerequisites
Prerequisites
  • I’m assuming
    • Basic familiarity with DirectDraw and Direct3D concepts:
      • System Architecture
      • Surfaces
      • Page flipping
    • The DDK can be hard to read
agenda
Agenda
  • Single-source issues
  • Windows 9x issues
  • OS-independent issues
  • DirectX 7.0 implementation details
  • Changes in DirectX 8.0
  • What can you do next?
slide7

Single-Source IssuesStuff you should know if you want one code-base to support Windows 9x OS versions and Windows NT® OS versions

allocating system memory per surface
Allocating System Memory Per-Surface
  • (Do NOT use this process to allocate surface memory itself...See later)
  • Normally system memory is charged against a particular process
    • Can’t free it in some other process (as in ctrl-alt-del mechanism)
  • Use EngAllocPrivateUserMem and EngFreePrivateUserMem
    • Uses DirectDraw object to locate proper process context
yuv fourcc surfaces
YUV/FOURCC Surfaces
  • System memory YUV/FOURCC
  • surfaces on NT systems
    • DirectDraw Kernel-mode “pretends” that these surfaces are 8bpp RGB for the purposes of allocating memory
    • DXTn:
      • Height: height in 4x4 blocks
      • Width: width in blocks * sizeof(block)
      • You must undo these transformations at CreateSurface time
yuv fourcc surfaces1
YUV/FOURCC Surfaces
  • NT kernel mode doesn’t understand any FOURCC formats, so:
    • The driver must handle video memory allocation for these types
    • The driver must handle Lock forthese types
windows 2000 issue fixed in windows xp
Windows 2000 Issue (Fixed In Windows XP)
  • During allocation of an AGP surface...
  • If the driver fails to allocate and:
    • returns DDHAL_DRIVER_HANDLED
    • AND sets an error code in ddRVal
    • AND sets the surface’s lpVidMemHeapto non-zero
  • Then the system will ignore the error
  • So NULL the lpVidMemHeap on error!
atomic surface creation
Atomic Surface Creation
  • On Windows 9x, drivers are givena list of surfaces
  • On Windows NT, drivers are given surfaces one-at-a-time, unless:
    • Driver reports GUID_NTPrivateDriverCaps
    • and sets DDHAL_PRIVATECAP_ ATOMICSURFACECREATION
windows nt extra
Windows NT Extra
  • You can use the GUID_NTPrivateDriverCaps to request notification of primary surface:
    • Set DDHAL_PRIVATECAP_ NOTIFYPRIMARYCREATION
system to video blts
System-To-Video Blts
  • To speed up some titles, implement system-to-video blts
  • All you need to implement is SRCCOPY, no stretch
    • But you should implement sub-rects
  • DirectDraw assumes your driver requires system memory to be pagelocked during Blt
    • If this is not true, set DDCAPS2_NOPAGELOCKREQUIRED
heapvidmemallocaligned
HeapVidmemAllocAligned
  • It’s an “Eng” function in Windows NT versions
  • It’s a ddraw.dll export in Windows 9x
  • You can use this to allocatesurface memory
  • You must have passed the heap to DirectDraw previously
  • You must fill in the fpHeapOffset, fpVidmem and lpVidmemHeapof the surface
heap offsets explained
Heap Offsets Explained

Return values from

HeapVidmemAllocAligned

are these offsets:

fpEnd

(points TO

last byte)

Heap

(Note fpStart is set to 0x1000

by DirectDraw for AGP heaps)

Surface

Return value from

HVMAA and fpHeapOffset

fpStart

“0”

ddscaps videomemory
DDSCAPS_VIDEOMEMORY
  • Remember that this includes AGP unless combined with DDSCAPS_LOCALVIDMEM
  • At GetAvailDriverMem time,a request that specifies DDSCAPS_VIDEOMEMORY (and not any explicit type: local or non-local) should include both types in the total
getscanline
GetScanLine
  • Implement this, if you can!
  • DirectX 8.0 uses it a lot for presentation-Blt timing
  • Set DDCAPS_READSCANLINE, so DirectX 8.0 knows
createsurfaceex
CreateSurfaceEx
  • More on this later
  • NEVER fail CreateSurfaceEx for system memory surfaces, even if you don’t understand the pixel format
    • Just return DDHAL_DRIVER_HANDLED and DD_OK
    • (Otherwise new system-memory formats used by the reference rasterizercan’t be created)
alpha in the primary
Alpha-In-The-Primary
  • If your driver can do this in 32bpp:
    • Create an A8R8G8B8 render target
    • Blt that to the primary surface IGNORING the alpha channel
      • (And stretch/shrink (please))
  • Then you should set:
    • DDHALINFO.vmiData.ddpfDisplay. dwFlags |= DDPF_ALPHAPIXELS
    • DDHALINFO.vmiData.ddpfDisplay. dwRGBAlphaBitMask = 0xFF000000
windowed applications and blt queuing
Windowed Applications And Blt Queuing
  • Don’t allow “many” presentation-bltsin your queue
    • That is, don’t allow a large latency between scheduling and retiring a presentation-blt
  • WHQL enforces low latency for DirectX 8.0 drivers
    • Check DDBLT_PRESENTATION, and don’t allow more than three
  • More info in ddraw.h
ddblt wait and ddblt donotwait
DDBLT_WAIT And DDBLT_DONOTWAIT
  • Drivers should never look at these
  • They are set by the application/ DirectDraw runtime
  • They are handled by the DirectDraw runtime
    • Sometimes DirectDraw spins, and wants to do that in user-mode
    • Applies to DDFLIP_WAIT as well
ddblt async
DDBLT_ASYNC
  • Ignore this flag
  • Always perform your blts asynchronously, if possible
what are ddrops
What Are DDROPS?
  • We don’t know either
  • An idea of the original designer of DirectDraw, but never implementedor specified
  • In short: ignore!
blt and yuv surfaces
Blt And YUV Surfaces
  • DirectShow can gain performance benefits if it knows it can use Blt to copy Overlay surfaces
  • Check to see if you can support DDCAPS2_COPYFOURCC
  • This means you can SRCCOPY, no sub-rects, no stretch, no overlap between two FOURCC surfaces of the same type
update overlay etc
Update Overlay, Etc.
  • If multiple overlays are created, but you have hardware for only one:
    • Succeed all CreateSurface calls
    • Fail the UpdateOverlay call
flip flags
Flip Flags
  • DDFLIP_NOVSYNC
    • This means: flip immediately; do not wait for vertical blank
    • The hardware must be capable of re-latching the new primary surface address immediately, or at least on thenext scanline
    • In other words, don’t allow the remaining raster scans to read from the oldback buffer
flip flags1
Flip Flags
  • DDFLIP_INTERVALn
    • Please don’t implement by busy-waitingin the driver
    • But please do implement if your hardware can defer flips for n frames
gamma ramps
Gamma Ramps
  • DirectDraw and Direct3D’s gamma ramps are passed through the GDI DDI call SetDeviceGammaRamp
  • This call is poorly prototyped
  • This is the struct you will be passed:

struct

{

WORD red[256]; //WORDs not BYTEs

WORD green[256];

WORD blue[256];

};

overview of directx 7 0 model
Overview Of DirectX 7.0 Model
  • Direct3D refers to surfacesvia “handles”
  • Driver keeps a look-up table indexedby handle
  • Driver keeps everything it needs to know about a surface in this table
createsurfaceex1
CreateSurfaceEx
  • Called after CreateSurface
  • Assigns a Direct3D-allocated handle to the surface(s)
  • Driver runs attachment lists, creates internal structures for eachsurface in list
createsurfaceex is hard
CreateSurfaceEx Is Hard
  • Driver has to run surfaceattachment list
  • Z buffer might be attached, orseparate surface
  • Cubic Environment Maps arethe hardest...
cubemap attachments abstract view

Positive

X

Negative

X

Positive

Y

Mip Sub-

Level

Mip Sub-

Level

Mip Sub-

Level

Cubemap Attachments(Abstract View)

...

...

...

...

cubemaps struct view

Positive X

Positive Y

Negative X

lpAttachList

lpAttachList

lpAttachList

lpLink

lpLink

lpLink

lpLink

lpLink

lpLink

lpAtt..

lpAtt..

lpAtt..

lpAtt..

lpAtt..

lpAtt..

Cubemaps (Struct View)

+ X Mip

lpAttachList

+ X Mip

- X Mip

lpAttachList

drivers cannot
Drivers Cannot
  • Keep pointers to DirectDraw’s surface structures in their own structures
  • Flip confusion (explained later)
  • Overhead
    • Under DirectX 8.0, we don’t keep the DirectDraw structure
    • ...So DirectX 8.0 drivers CAN’T store pointers – they will crash
flip confusion explained
Flip Confusion Explained

User Mode

Front Buffer

Handle A

User Mode

Back Buffer

Handle A

Before Flip:

Driver

Surface A

Driver

Surface B

after flip
After Flip

User Mode

Back Buffer

Handle A

User Mode

Front Buffer

Handle B

The user-mode

structures now

refer to different

pieces of memory.

=> You cannot store

pointers to the

user-mode structs

in the driver structs.

Driver

Surface A

Driver

Surface B

aliasing what it is
Aliasing: What It Is
  • Video memory is a shared resource
  • On mode switch, all must be given up
  • But the application may be writing directly to video memory
  • We re-map the application’s view of video memory to a dummy page, then allow the mode switch to proceed
    • Only done at app’s request: DDLOCK_NOSYSLOCK
aliasing how it s done
Aliasing: How It’s Done
  • When the driver returns a pointer to video memory at CreateSurface time:
    • The offset into the frame buffer is calculated, and then an equivalentaliased pointer is returned tothe application
    • If the pointer lies outside of video memory, no aliasing is done (we don’t knowenough to do so)
aliasing how to break it
Aliasing: How To Break It
  • On Windows NT systems, the driver must NOT return a pointer outside of video memory at Lock time
    • This pointer will not be aliased
    • The application will crash if a modeswitch happens
  • Drivers should allocate system memory at CreateSurface time (PLEASE_ALLOC_USERMEM)
driver capabilities are constant across modes
Driver Capabilities Are Constant Across Modes
  • This means everything in D3DCAPS8
  • The caps are allowed to be “nothing” in some modes, e.g., 24bpp
  • You are allowed to support different back buffer formats
    • That is, the one that matches thefront buffer
pixel formats in directx 8 0

Vendor ID (0=Microsoft) Nonzero Format

(Use your PCI Vendor ID) => FOURCC Number

Pixel Formats In DirectX 8.0
  • Goodbye DDPIXELFORMAT
  • Hello D3DFORMAT
    • All FOURCCs are D3DFORMATs
    • D3DFMT has this form

Byte 3 Byte 2 Byte 1 Byte 0

d3dformat examples
D3DFORMAT Examples
  • D3DFMT_A1R5G5B5
    • 0x00000019
  • IHV-defined Format
    • 0xACAT0001
    • (PCI ID 0xACAT, not FOURCC, format 1)
  • FOURCC “UYVY”
    • 0x55595659
    • (Byte 2 is non-zero)
ihv def d texture formats
IHV-Def’d Texture Formats
  • Since Direct3D doesn’t understand
    • These formats cannot be “managed”
    • Applications can lock thesesurfaces directly
    • (In fact this is the only way to fill such surfaces with data)
directx 8 0 format op list
DirectX 8.0 Format Op-list
  • The format op-list tells DirectX 8.0 everything about capabilities thatvary with surface format
  • For each format, the driver sets bitsthat indicate:
    • Can Texture from this format
    • Render to this format
    • Switch display mode to this format
    • Has caps in modes of this format
format op list tricks
Format Op-List Tricks
  • The runtime searches for the first entry that has all required capabilities
  • Example: Application wishes to render to 565 texture
  • Runtime will search for an Op-Listentry with:
    • D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET
format op list tricks1
Format Op-List Tricks
  • Driver A can render to 565 texture
  • Sets this entry:
    • Format = D3DFMT_R5G6B5
    • Ops = D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET
format op list tricks2
Format Op-List Tricks
  • Driver B can NOT render and texture from the same surface, but can doboth operations individually
  • Sets TWO entries
    • Format1 = D3DFMT_R5G6B5
    • Ops1 = D3DFORMAT_OP_TEXTURE
    • Format2 = D3DFMT_R5G6B5
    • Ops2 = D3DFORMAT_OP_OFFSCREEN _RENDERTARGET
what can you do next
What Can You Do Next?
  • If you develop DX Graphics Drivers:
    • You need a relationship with Microsoft’s DirectX team, and should contact IHV Program Manager:
      • Michele Boland (MBoland@microsoft.com)
    • Install and run against DEBUG runtimes
      • Available in the DirectX SDK
      • Will output debug messages forcommon errors
slide54

DirectX VAVideo AccelerationDriversGary SullivanGarySull@microsoft.comSoftware Design EngineerDMD Video Services GroupMicrosoft Corporation

agenda1
Agenda
  • DirectX VA design and status
  • Current and future requirementsand tests
  • Future plans and potential extensions
  • What can you do next?
directx va prime directive

DirectX VA

Decouple software decoder operation from hardware accelerator design to achievefull interoperability

DirectX VA Prime Directive

Any other

MPEG-4

MPEG-2

H.263++

MPEG-1

H.261

Motion Comp

Inverse DCT

VLD

what is dxva what can it achieve
What Is DXVA?What Can It Achieve?
  • Interoperable interface between video decoding software and advanced-capability graphics accelerators
  • Increases video capability for theconsumer’s PC
  • Increases the demand for advanced graphics accelerators and video applications
  • Decreases implementation effort forsoftware decoder writers
  • Decreases support burden for graphics accelerator companies
  • Decreases testing burden for OEMs
directx va general status
DirectX VAGeneral Status
  • Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00)
  • See http://www.microsoft.com/hwdev/DirectX_VA
  • OEMs love it – it enables separate WHQL qualification of decoders and drivers
  • Software decoder companies are developing with it (Mediamatics, Intervideo, Ravisent, Cyberlink, MGI/Zoran, MbyN, …)
  • Hardware accelerator companies are supporting it in drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …)
directx va capabilities
DirectX VA Capabilities
  • Emphasis on MPEG-2 and DVD “sub-picture”
  • Support of all important video coding standards (H.261, H.263, MPEG-1,MPEG-2, MPEG-4)
    • And some non-standard variations onthe standards
  • Alpha graphic blending (e.g., DVD subpicture)
  • Three basic degrees of decoding configuration capability:
    • Motion compensation on accelerator with host residual difference decoding
    • Motion compensation and IDCT on accelerator
    • Full raw bitstream decoding
  • Externally-defined encryption support
how does dxva operate
How Does DXVA Operate?
  • Operation with Windows 2000 Overlay Mixer (OVM) or new Windows XP Video Mixing Renderer (VMR)
  • Requires DirectX 8.0 or Windows XP
  • Decoders use it through existing Windows 2000 “IAMVideoAccelerator” API
  • Drivers use it through corresponding Windows 2000 “MoComp” DDI
  • DirectVA specifies payload content of data buffers that previously had accelerator-specific formats
host versus accelerator functional split
Host Versus AcceleratorFunctional Split
  • Bitstream processing either on hostor accelerator
  • Accelerator handles the primary data flow and performs the intensivesignal processing
  • PCI/AGP is the bridge between the two
  • Reconstruction loop maintained in graphics Accelerator memory
  • Host processing converts standard-specific streams into generic Accelerator work units
today s directx va
Today’s DirectX VA

Compressed Video

Source

Variable-Length

Decoding

Residual Difference

Decoding (IDCT)

Motion

Compensation

Sum & Clip

Frame Storage

OVM/VMR/3D

Graphic

Source

Graphic Decoder

Graphic Blending

(Content Protection Supported Outside of Scope)

constrained parameter profiles
Constrained Parameter Profiles
  • Strategy is to define a general interface and a number of constrained-parameter profiles, with decoder data structure configuration settings
  • Profiles defined:
    • MPEG-2 Main Profile with and withoutDVD Subpicture
    • Several H.263/MPEG-4 profiles
    • MPEG-1
    • H.261 with and without deblockingpost-processing
defined buffer types
Defined Buffer Types
  • Picture-level decoding parameter buffers
  • Buffers for bitstream decoding:
    • Bitstream data buffers
    • Bitstream slice control buffers
    • Inverse quantization matrix buffers
  • Buffers for macroblock-level decoding:
    • Macroblock control buffers
    • Residual difference data buffers
  • Buffers for graphic blending:
    • Alpha+YUV graphic buffers
    • AI44 graphic buffers
    • DVD DPXD graphic buffers
    • DVD highlight definition buffers
    • DVD display control command buffers
    • Alpha blend combination buffers
  • Deblocking filter control buffers
  • Picture resampling buffers
  • Read-back data buffers
dxva requirement plans primary goals
DXVA Requirement PlansPrimary Goals
  • Clear specification for MPEG-2 interoperability (and front-end DVD subpicture) is the primary goal
  • Driver and decoder that claim video acceleration must support DXVA
  • Specific “minimal interoperability set” for each defined profile
july 01 stated requirements
July ’01 Stated Requirements
  • MPEG2_A and MPEG2_C required
  • MPEG1_A required
  • H263_A required (?!)
  • Arithmetic accuracy required
  • IDCT accuracy required
  • Picture resolutions up to 720x576
  • Uncompressed surface types must include NV12 in supported list
  • Must have “front end” capability to convert to YUY2 from format in use
july 01 actual tests
July ’01 Actual Tests
  • StRowe test decoder developed
  • Test driver also developed
  • Released DCT400 driver tests cover MPEG2_A, _B, _C, _D profiles
  • Pass/Fail based on MPEG2_A and _B
  • Tests are currently of functional operation and visual performance
  • Contact us (?!) if any test problems
  • Don’t ship untested features (?!)
structure of motion comp data
Structure Of Motion Comp Data
  • All standards send only luma motion vectors, deriving chroma vectorsfrom luma vectors
  • Each standard derives chroma vectors in its own way
  • Switches for configuring the motion comp are provided to minimize host “translation” requirements
  • MPEG-2 Dual-Prime motion vectors derived on host
dxva macroblock control example
/* Basic form for P and B pictures */

typedef struct _DXVA_MBctrl_P_OffHostIDCT_1

{

WORD wMBaddress;

WORD wMBtype;

DWORD dwMB_SNL;

WORD wPatternCode;

UINT8 NumCoef[6];

DXVA_MVvalue MVector[4];

} DXVA_MBctrl_P_OffHostIDCT_1;

DXVA Macroblock ControlExample
structure of residual data background 1 of 2
Structure Of Residual DataBackground (1 of 2)
  • Things that vary within and across standards:
    • Coefficient scan schemes
    • Intra Coefficient prediction schemes
    • VLC schemes
    • Inverse quantization schemes
    • Mismatch-control schemes
  • These things need lots of logic – not always justified for accelerator implementation
structure of residual data background 2 of 2
Structure Of Residual DataBackground (2 of 2)
  • Things that do not vary within and across standards
    • IDCT definition
      • Conformance rules may slightly differ – but multi-standard conformance not a big problem
    • Many zero-valued coefficients
    • Predicted-versus-Intra operation
    • Only a few currently-specifiedinverse scans
structure of residual data the chosen method
Structure Of Residual DataThe Chosen Method
  • Keep standard-specific issues on the host to the extent possible
  • Support host-based or accelerator-based IDCT
  • Send only non-zero coefficients
  • Send index or run-length for coefficients
residual difference example off host idct 16b tcoeff
typedef struct _DXVA_TCoefSingle

{

WORD wIndexWithEOB;

SHORT TCoefValue;

} DXVA_TCoefSingle, *LPDXVA_TCoefSingle;

/* Macros for Reading EOB and Index Values */

#define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1)

#define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1)

/* Macros for Writing EOB and Index Values */

#define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)->wIndexWithEOB = ((idx) << 1) | (eob))

#define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |= ((idx) << 1))

#define setDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB |= 1)

Residual Difference Example(Off-Host IDCT 16b TCOEFF)
decoding configurations part 1 of 2
Decoding Configurations(Part 1 of 2)
  • Bitstream decoding vs. Host VLD
  • Encryption:
    • Bitstream data if bitstream decoding
    • Macroblock control commands and/or residual difference data if Host VLD
    • Type of encryption protocol supported
  • For Host VLD:
    • Host-based residual difference decoding versus Accelerator-based residual differencedecoding versus both
    • Macroblock control commands in raster-scanorder versus arbitrary order
decoding configurations part 2 of 2
Decoding Configurations(Part 2 of 2)
  • For host-based residual difference decoding
    • 8b vs. 16b differences
    • If 8b differences, overflow supported, or not
    • If 8b differences, subtract second pass, or not
    • Interleaved chroma or not
    • Host clips range of data, or not
    • Intra residuals unsigned, or not
  • For accelerator-based difference decoding
    • Specific IDCT support
    • Inverse scan on host or accelerator
    • Coefficients sent in groups of four, or singly
alpha blending configurations
Alpha Blending Configurations
  • AYUV alpha blend graphic loading
    • AI44 or IA44 +palette or DPXD+Highlightor AYUV
  • Alpha blend combination operation:
    • Front-end versus back-end
    • Picture resizing or not
    • Only use picture destination area or not
    • Graphic resizing or not
    • Whole plane alpha or not
longer term requirements
Longer Term Requirements
  • Include H263_A, _B, _C in tested requirements
  • Include mathematical motion comp and IDCT accuracy in tests
  • Add speed performance testing
  • Picture resolutions up to 1920x1088
  • Six or more uncompressed surfaces
  • Specific FOURCC surface types for uncompressed surfaces
kill superfluous configs
Kill Superfluous Configs
  • bConfigRasterOrder = 0
  • bConfigResidDiffHost = 1
  • (bConfigResid8Subtraction = 1 with bConfigSpatialResid8 = 1) or (bConfigResidDiffHost = 1 with (bConfigSpatialResid8 = 0 and bConfigSpatialHost8or9Clipping = 0))
  • bConfigIntraResidUnsigned = 0
  • bConfigSpatialResidInterleaved = 0
  • bConfigHostInverseScan = 0
  • bConfig4GroupedCoefs = 0
enhance blending configs
Enhance Blending Configs
  • Eliminate duplication of AI44 & IA44 (bConfigDataType = 0 & 1)
  • Require both AYUV and AI44/IA44 (bConfigDataType = 3 and 0/1)
  • Require front-end blend (bConfigBlendType = 0)
  • bConfigPictureResizing = 1
  • bConfigOnlyUsePicDestRectArea = 0
  • bConfigGraphicResizing = 1
  • bConfigWholePlaneAlpha = 1
hot issue wmv h 263 mpeg 4
Hot Issue:WMV/H.263/MPEG-4
  • Codecs beyond MPEG-2 need support
  • H263_A profile needs:
    • Different derivation of chroma motion
  • H263_B profile needs:
    • Rounding control
    • Motion vectors over picture boundaries
    • 8x8 motion vectors
    • Alternative inverse scan (or host inverse scan)
  • H263_C profile needs:
    • Deblocking filter support (also in H263_B?!)
desirable future extensions
Desirable Future Extensions
  • De-interlacing
  • Interoperable encryption / DRM
  • Compressed-video encoding (includingME, DCT, and so on)
  • Inverse-telecine
  • Hue/contrast/brightness/gamma/color corrections
  • Future decoding methods (MPEG-4v2,WMV, H.26L)
  • Frame rate conversion
  • Precise separable re-sampling
  • Gen lock/frame rate synchronization
  • TV out control
new guids reducing memory use
New GUIDsReducing Memory Use
  • Add three new GUIDs to parallel MPEG2_A, MPEG2_B, and MPEG2_D
  • New GUID adds raw bitstream decoding to the “minimal interoperability set” of the corresponding existing GUID
  • Driver with raw bitstream support then need not allocate buffers for macroblock-level processing with these GUIDs
  • Drivers could also not expose bitstream processing with existing GUIDs tosave memory
interoperable encryption
Interoperable Encryption
  • Define an interoperable encryption scheme
  • Much like the old draft DXVA scheme
  • Certificates for establishing trust (perhaps X.509 or something else rather than old draft scheme)
  • RSA key exchange
  • AES (RIJNDAEL) content encryption
other in scope additions
Other In-Scope Additions
  • Add new features for other codecs – WMV, H.26L, MPEG-4v2, etc.
    • 1/4-sample motion comp
    • Added motion comp sizes and shapes
    • New inverse transforms (e.g., 4x4)
    • Fine granularity scalability
    • Global motion comp
    • Studio profile features
  • More possible GUIDs for precise codec/configuration needs
new video building blocks
New Video Building Blocks
  • Deinterlacing
  • Inverse Telecine
  • Frame rate conversion
  • Contrast/Brightness/Gamma/Color
  • Precisely-specified resampling
  • Video compression encoding
deinterlace inverse telecine
Deinterlace/Inverse Telecine
  • Deinterlace is crucial
  • Becoming a standard feature ofhigh-end consumer TVs
  • 1080i in weave can look awful
  • 1080i in bob can look wrong too
  • Deinterlace can be useful for either decoding or encoding
hypothetical dxva structure
Hypothetical DXVA Structure

Interoperable DRM/Conditional Access/Content Protection/Encryption

Today’s Scope of

DirectX VA

De-interlace /

Inverse Telecine

Frame Rate

Conversion

?

OVM/VMR/3D

Color Conversions

& Adjustments

??

Scaling

???

video encoding
Video Encoding

?

Uncompressed

Video Source

Motion Estimation

Frame Storage

Inverse Telecine /

De-interlace

Motion

Compensation

Sum and Clip

Mode & Motion

Vector Decision

Color Conversions

And Adjustments

Residual Difference

Transform (DCT)

Quantization

Residual Difference

Decoding (IDCT)

Variable Length

Encoding

?

what can you do next to all give us your proposals
What Can You Do Next?(To All) Give Us Your Proposals
  • About any difficulties/problems in design
  • About encryption design
  • About new in-scope feature needs
  • About how to support new features
    • Deinterlace/inverse telecine
    • Encoding
    • Frame rate conversion
    • Contrast/Brightness/Gamma/Color
    • Resampling
what can you do next for graphic accelerator designers
What Can You Do Next?(For Graphic Accelerator Designers)
  • Make your MPEG-2 and DVD subpicture DXVA solution rock-solid, fully-tested with every available decoder, and frighteningly fast
  • Fully support YUV surfaces as texturesfor input to 3-D
    • Conversion to RGB, and so on
  • Design maximal WMV/H.263/MPEG-4 feature support into your next generation
    • But don’t expose them unless fully tested
  • Move to the preferred configurations and uncompressed surface types
  • Support new memory-conserving GUIDs
slide91
Writing AVStream Minidrivers For Windows XPWilliam Messmer, SDE Digital Audio-VideoMicrosoft Corporation
agenda2
Agenda
  • AVStream minidriver architecture
    • When and why to use AVStream
    • Exposing minidriver functionality
  • Data processing
  • Writing a minidriver: key issuesand pitfalls
    • Walk through sample code
    • Common problems and mistakes
    • DirectX 8.0 versus Windows XP
  • What can you do next?
why avstream
Why AVStream
  • THE next generation class driver
    • More efficient streaming
    • Reduces the amount of minidriver code
    • Simplifies development; faster to market
    • One minidriver, one model – no more confusion over stream class versus port class
  • New features, new technologies will only be supported in AVStream; stream and port class, however, are still supported!
when to use avstream
When To Use AVStream
  • BDA Drivers
  • New Device Types
    • Which are not already written to stream class or port class
  • Combined A/V devices
  • Kernel Software Transforms
    • Audio Global Effects (GFX) Filters
  • No necessity to port existing stream or port class drivers
minidriver architecture
Minidriver Architecture
  • Functionality is exposed as a tree hierarchy described throughstatic descriptors
    • Device – described by Device Descriptor
    • Filter Factory – creates a type of Filter
    • Filter – described by Filter Descriptor
    • Pin Factory – creates a type of Pin
    • Pin – described by Pin Descriptor
  • Functionality provided through static dispatch and automation tables
minidriver architecture1

Device

Device Dispatch

Device Dispatch

>= 1

Filter Factory

Filter Descriptor

Filter Create

Filter Dispatch

Filter Dispatch

>= 1

Filter Automation

Filter

Pin Automation

>= 1

Pin Factory

Pin Descriptor

Pin Dispatch

Pin Create

>= 1

Pin

Key:

Minidriver Dispatch Routine

Minidriver Provided Table

Public AVStream Construct

Private AVStream Construct

Minidriver Architecture

Device Descriptor

Add Device

Pin Dispatch

exposing minidrivers
Exposing Minidrivers
  • Expose your driver to AVStream
    • Call KsInitializeDriver in DriverEntry passing your Device Descriptor
    • Return the status from KsInitializeDriver
  • AVStream handles PnP to get your driver set up; minidriver gets calls through device dispatch
  • Filter Factories set up by AVStream during Add Device and Start Device
exposing minidrivers1
Exposing Minidrivers
  • AVStream creates filters/pins based on descriptors
    • Minidriver receives creation dispatch
    • Creation dispatch associates minidriver specific context with object
    • Object bags available as containers for dynamic memory like contexts
  • AVStream handles cleanup of objects based on bags
    • No forgetting to free dynamic memory
minidriver architecture2
Minidriver Architecture
  • Sample Code (Exposing Functionality)
data processing
Data Processing
  • AVStream queues data/buffers
    • Minidriver queues not necessary
    • Cancellation handled in the queue
    • Data exposed through two abstractions: stream pointers and process pins
      • Stream pointers are robust and allow versatile queue management; typically used inhardware drivers
      • Process pins work purely at a single buffer level making for very simple software transforms
design issues
Design Issues
  • Two distinct ways to handle data processing
    • Filter-Centric processing
      • Specify filter process dispatch
    • Pin-Centric processing
      • Specify pin process dispatches
  • The choice of which to use will influence design greatly
filter centric processing
Filter-Centric Processing
  • Filter is called to process data in a context where data is available onall required pins
  • Typically used for software transforms
  • Stream pointer use not required
  • Processing based on an index of process pins
    • Index/pins stable during processing
  • Minidriver does transform, specifies how many bytes of each buffer used
process pins
Process Pins
  • One per pin – points back to the pin
  • Contains a stream pointer if needed
  • Contains a buffer virtual address and size for data manipulation
  • Informs the process routine of the pin’s relationships with other pins
    • InPlaceCounterpart – other pin in an in-place transform pair
    • CopySource – pin data is copied from
    • DelegateBranch – pin that delegates frames (in the same pipe)
transform example

IN

OUT

2880 Bytes Buffer

1920 Bytes Data

Transform Example

INPUT

OUTPUT

1. Frame(s) arrive

Frame

Gone

Frame

(1920)

Frame

(960)

Frame

(2880)

Frame

Gone

2. Filter is called to process.

Filter sees two process pins:

3. Process Pins Point to Buffers

Frame

(1100)

Frame

(140)

4. Filter performs transform;

Sets 1920 bytes used on input and output

5. Filter is called back;

more data to transform

6. Process Repeats Similarly

1100 Bytes Data

960 Bytes Buffer

pin centric processing
Pin-Centric Processing
  • Each pin called to process data in a context independent of other pins
  • Typically used for hardware drivers
  • Data accessed through stream pointer abstraction
stream pointers
Stream Pointers
  • Reference a single frame in a queue
  • Hold that frame in the queue
  • Can be in multiple states
    • Locked – referenced data is safe to access; Irp cannot be cancelled
    • Unlocked – not guaranteed to even reference data; Irp can be cancelled
  • Can be cloned to create new pointers into the data stream
  • Can schedule time-outs
stream pointers1
Stream Pointers
  • Contain two offsets into the data stream for ease of in-place use
  • Address data at one of two granularities:
    • Byte – access via virtual address
    • Mapping – access via logical DMA address
      • KSPIN_FLAG_GENERATE_MAPPINGS
  • Minidriver usable context available per stream pointer
stream pointers and queues

Frame

(1)

Frame

(1)

Frame

(1)

Leading

Edge

Trailing

Edge

Clone

Frame

(0)

Frame

(1)

Frame

(2)

Clones

Frame

(1)

Frame

(0)

Frame

(3)

Leading

Edge

Frame

(0)

Frame

(1)

Frame

(0)

Clones

Leading

Edge

Stream Pointers And Queues

Oldest Frames

Newest Frames

direct dma example
Direct DMA Example

QUEUE

1. Frame(s) arrive; minidriver called to process

  • 2. Processing routine acquires leading edge
    • KsPinGetLeadingEdgeStreamPointer

Frame Gone

Frame

(1)

Frame

(2)

Frame

(1)

Frame

  • 3. Leading edge is cloned
    • KsStreamPointerClone

Frame

Gone

Frame

(2)

Frame

(1)

Frame

(1)

Frame

4. DMA Hardware is programmed

5. Leading edge is advanced

Frame

Gone

Frame

(1)

Frame

(2)

Frame

Frame

(1)

6. Process may repeat for more frames

7. Hardware interrupts for DMA completion

8. ISR Schedules a DPC

  • 9. DPC releases the associated frames
    • KsStreamPointerDelete
  • 10. May need to continue processing
    • KsPinAttemptProcessing
data frame control
Data Frame Control
  • Held non-cancelable for a period
    • Use locked stream pointers
    • Consider stream pointer timeouts
  • Can relinquish claim with callback
    • Use unlocked stream pointers with a cancel callback
  • Periodic access where frame can disappear between accesses
    • Use unlocked stream pointers and lock periodically
processing decisions
Processing Decisions
  • Filter-Centric
    • All pins are involved in the decision
    • Each pin type can have separate requirements
    • One pin not fulfilling requirements will veto processing for the entire filter
  • Pin-Centric
    • Only one pin is involved in the decision
    • Each pin type can have separate requirements which do not influence other pins
when processing happens
When Processing Happens
  • Default case (no pin flags)
    • Attempt made when frame arrives and leading edge points to no frame
  • Attempt will succeed if
    • Involved pin(s) are >= KSSTATE_PAUSE
    • Involved pin(s) all have data
  • Continuing processing
    • STATUS_SUCCESS returned from dispatch and conditions still met
adjusting processing
Adjusting Processing
  • KSPIN_FLAG_
    • _INITIATE_PROCESSING_ON_EVERY…
      • Every frame arrival initiates
    • _DO_NOT_INITIATE_PROCESSING
      • No frame arrival initiates
    • PROCESS_IN_RUN_STATE_ONLY
      • Pin must be in KSSTATE_RUN
    • FRAMES_NOT_REQUIRED…
      • Data is not required on this pin
adjusting processing1
Adjusting Processing
  • Some mentioned flags usefulfor pin-centric
  • Most flags useful for filter-centric where all pins are involved in the decision as to when to process data
  • See the DDK for a complete description of flags
  • Understand when processinghappens based on your flags!
adjusting processing2
Adjusting Processing
  • Processing can happen in a DPC!
    • KSFILTER_FLAG_DISPATCH_LEVEL_PROCESSING
    • KSPIN_FLAG_DISPATCH_LEVEL_PROCESSING
  • Dispatch level processing still synchronized
    • Processing mutex still held during dispatch level processing
    • Can still be used to synchronize with processing
  • Data manipulation (stream pointer) API fully dispatch level ready!
walkthrough sample code
Walkthrough Sample Code
  • Pin-centric sample code
common problems
Common Problems
  • Internal mutexes are exposed
    • Three mutex types in a hierarchy
      • Device Mutex
      • Filter Control Mutex
      • Processing Mutex
  • Some calls require mutexes held
    • Sometimes AVStream holds the mutex for you; sometimes you must hold the mutex!
    • See the DDK for this!
common problems1
Common Problems
  • Mutex Rules
    • Do NOT take mutexes out of order: device then control then processing
    • Do NOT take a mutex and call out – not for properties, not for anything!
    • Walking the object hierarchy requires mutexes held:
      • Device Mutex – device down to filter
      • Filter Control Mutex – filter down to pins
common problems2
Common Problems
  • Do not traverse the object tree (filters and pins) during processing!
    • KsFilterGetFirstChildPin
    • KsPinGetNextSiblingPin
  • Pin-centric filters should not need todo this; filter-centric filters have the process pins index
directx 8 0 versus windows xp
DirectX 8.0 Versus Windows XP
  • Mutexes in DirectX 8.0 are fast mutexes
    • Certain APIs require mutexes held
    • Client must be careful of when toacquire mutexes!
  • Mutexes in Windows XP arefull mutexes
    • Completely backwards compatible with DirectX 8.0 drivers
    • Less APIs require mutex acquisition
    • Mutex acquisition more lenient
directx 8 0 versus windows xp1
DirectX 8.0 Versus Windows XP
  • New flags in Windows XP
    • _SOME_FRAMES_REQUIRED…
      • One or more pin instances of this typerequires frames
      • Can be programmatically done in DirectX 8.0
    • _PROCESS_IF_ANY_IN_RUN_STATE
      • One or more pin instances of this type must be >= KSSTATE_RUN; others must be >= KSSTATE_PAUSE
      • Processing routine must check in DirectX 8.0
what can you do next1
What Can You Do Next?
  • Install the DirectX 8.0 or Windows XP DDK
  • Try out the samples in the DDK
  • Write AVStream minidrivers fornew hardware!
testing your wdm driver with directshow
Testing your WDM Driver with DirectShow

Eric RudolphSystem Design EngineerDirectShow Editing ServicesMicrosoft Corporation

agenda3
Agenda
  • DirectShow supports capture from 1394, USB, analog video/audio, TV tuner, and custom devices
    • Demonstrate the use of the DirectShow-based generic graph editor, GraphEdt, as a WDM driver test tool
    • Walk through sample code that uses the GraphBuilder COM object
what tools exist to test your driver
What tools exist to test your driver?
  • Included in DX8: GraphEdt.exe, a generic graph editor
  • Also in DX8: AmCap.exe, a simple capture application
  • New for Windows XP: Still Image devices show up in the shell (Explorer)
  • New for Windows XP: Movie Maker (on Start Menu)
graphedt overview
GraphEdtOverview
  • Ships with DX8
  • Provides UI to build dataflow graphs and then uses DirectShow to run, pause, and stop the data
  • Views different filter categories
    • Capture, compressor, crossbar, DMO, and so on
  • Connects different filters together
  • Accesses property pages
  • Writes out files
  • Controls 1394 devices
graphedt filter categories
GraphEdtFilter Categories
  • Categories enable you to easily find a particular type of DirectShow filter
  • Many categories predefined in ksuuids.h & uuids.h
  • WDM drivers have many of their own categories
  • Capture devices can show up in both non-WDM and WDM categories
  • As you add/remove WDM devices, if they send device notifications, they will auto show/hide from category lists
graphedt property pages
GraphEdtProperty Pages
  • The filter itself can expose multiple property pages
  • Each pin can expose 1 or more property pages
  • When you query an output pin’s property pages, you will see 1 extra page per pin which lists available output media connection types
  • Capture property pages are often exposed by capture applications (using standard DirectShow methods), so make them look nice!

Example property page

graphedt property pages and media types
GraphEdt Property PagesAnd Media Types
  • Output pins provide one or more media types
  • Input pins normally do not provide a list of types, but instead accept types
  • When you render a pin, DirectShow will try to find appropriate filters to render
  • When you try to connect two pins, DirectShow will find try and find intermediate filters
  • The media types must agree between any output pin and its connected input pin
  • Buffers are also negotiated

The different media types Indeo 5.11 decompressor provides

slide130

Common Problems

  • Hot unplug while streaming
  • Device add/remove while streaming
  • Enter hibernation while streaming
  • Multiple camera enumeration
  • Multiple camera streaming (one driver, multiple devices)
  • Video shows up black or wrong
  • Changing display props while streaming
  • Overlay and DDraw issues
slide131

GraphEdt DemosPart 1

  • Capture from USB, both with 1 pin and with 2 pins (capture & preview)
  • DV capture and device control
  • Device Insertion / Removal and how the Graph refreshes
graphedt demos part 2
GraphEdt DemosPart 2
  • How to write AVI, WAV, and WM files
  • New Video Mixing Renderer has slightly different connection model than old Video Renderer
  • How to force a filter to produce a media type with a Type Enforcer
  • Timestamps are important!
  • Using .GRF files
sample code using the graphbuilder com object
Sample CodeUsing the GraphBuilder COM Object
  • CaptureGraphBuilder makes connecting capture devices easy
  • See the AmCap sample code in the DX8/DirectShow SDK directory
  • Sample code walkthrough
what can you do next2
What Can You Do Next?
  • Test your WDM drivers! Under many different conditions!
  • Read up on the DX8 docs, they’re great!
  • DirectShow contact:stanpenn@microsoft.com
  • Get on the DirectX A/V list