1 / 36

Changes to Sizing Spread Sheet for Documentum 5.3

Changes to Sizing Spread Sheet for Documentum 5.3. Documentum Performance Group. Agenda. Changes to the Customer Input Page Changes to the Output Page Some Sizing Examples. Changes to the Customer Input Page. App server cluster support in WDK/Webtop Fulltext query rate Fulltext space.

jgoodwin
Download Presentation

Changes to Sizing Spread Sheet for Documentum 5.3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Changes to Sizing Spread Sheet for Documentum 5.3 Documentum Performance Group

  2. Agenda • Changes to the Customer Input Page • Changes to the Output Page • Some Sizing Examples

  3. Changes to the Customer Input Page • App server cluster support in WDK/Webtop • Fulltext query rate • Fulltext space

  4. App Server Cluster support overhead Will Factor in CPU cost associated with Session Serialization in Clustered HA environment

  5. 5.3 Sizing changes for WDK • 5.3 webtop consumes 40% more CPU than 5.2.5 • Due partly to inclusion of new features (drag & drop) and infrastructure changes • This overhead is being reduced for SP1. Sizing spreadsheet for SP1 will reflect this. • 5.3 App Server cluster support has an additional 50% overhead • This is due to cost of replicating state • Is Worst case: memory-based replication (between two App servers) • To be reduced in 5.3 SP1, will be reflected in SP1 spreadsheet

  6. Fulltext query rate Will Factor in CPU cost associated large numbers of full text queries

  7. Fulltext indexing Characteristics Most sizing requests specify docs/day, but normally that load is not for 365 days out of the year

  8. Fulltext indexing Characteristics Will Factor in CPU cost and Disk I/O associated with the indexing portion of fulltext

  9. Fulltext indexing Characteristics: Options • None = No full text indexing enabled • Immediate Indexing = Attempt to minimize index time from 'save' to 'searchable‘ • Default for 5.3 • Expensive relative to disk space, CPU utilization, and I/O • Delayed Indexing = Attempt to reduce disk space, memory, or CPU util at cost to ‘save to searchable’ latency • Initial Focus: Transient Disk Space tuning • Requires some detailed Index Server tuning

  10. Transient Fulltext Index Space Tuning Transient Space needs for building a large partition with all documents Transient Space needs for building four small equal sized partitions within Index More information on this tuning to be provided in an FAQ

  11. Fulltext space consumption Will Factor in content information for fulltext disk space and CPU calculations

  12. Will at times factor in known platform differences

  13. Output Page changes Hardware resources needed for Index Agent and Index Server

  14. Option #2 is changed to reflect likely “Content Server and Indexing Servers” on same host scenario

  15. Example Option #2 Index Agent • Content Server & Indexing software on same host Pros: - Easy to install and administer - Grow Capacity by adding more CPUs, disk, and memory Cons: - Resource contention risks - Footprint of Indexing subsystem could exceed excess capacity of a pre-5.3 production system Dftxml msg Index Server (FAST) Staging Area Meta data & content Query & results Index Content Server Content

  16. Option #3 is changed to add Index Agent/Index Server on separate host scenario Note: The initial release will not cover multi-node configurations of the Index Agent/Server

  17. Additional Supported Scenarios for FCS Index Agent • All Full Text Components on a Separate host Pros: • Separates resource consumption “new” 5.3 full text from a rest of Content Server • Likely to arise in upgrade scenarios from 5.2.X Cons: • Additional server required Dftxml msg Index Server (FAST) Staging Area Meta data & content Query & results Index Content Server Content

  18. Sizing Exercises • Generic document repository (< 2 million docs) • Large system: 100,000 docs/day

  19. Generic Document repository • Provided System characteristics: • Upgrade from 5.2.5 (repository already existing) • Total size of system < 1 million objects • Total content Size = 240 GB • Ingest: ½ GB/day • Approximately 1,000 objects/day • Average file size ½ MB • Less than 1000 users (20 active at any one time)

  20. questions: How much of the content might be fulltext indexable? • Check size and number of objects by format • Example: • 40 GB of the 240 TB is of content is of a format that can be indexed • Less than 500,000 objects have content that can be indexed • About 360,000 objects have content that can’t be indexed • At least 102 separate formats! • However, Word and PDF dominate the content space that can be fulltext indexed (90%) • All objects have at least their meta-data indexed

  21. Enter average size, number of docs, and whether content can be indexed for 4 rows below: • Word: 106K byte average, 160,000 docs, content indexed=Y • PDF: 352K bytes average, 56,000 docs, content indexed=Y • Other: 20K bytes average, 275,000 docs, content indexed=Y • Images: 550K bytes average, 360,000 docs, content indexed=N

  22. What would that imply for hardware to do upgrade? • So far we haven’t calculated growth • Estimate for space needed for fulltext: 19 GB

  23. What about growth? • Assume 260 busy days in the year and 1,000 docs per busy day • Assume document proportions remain the same: • Word (19%)  19% of 1000 = 190  190 x 260 = 49,400/yr • PDF (7%)  7% of 1000 = 70  70 x 260 = 18,200/yr • Other (32%)  32% of 1000 = 320  83,000 /yr • Image (42%)  42% of 1000 = 420  109,000 /yr

  24. Index Size after 3 years • Around 30 GB needed

  25. Could I size fulltext as a simple 40% of total content size? • Old, tried and true(?) method • It can, especially if “Non-indexable” content could dominate! • In this example [system without growth] • 40% of 240 GB = 96 GB vs. 19 GB • Example with system including growth • 40% of 385 GB = 154 GB vs. 30 GB • For small systems, the cost of overestimating is small

  26. Other notes • Index Subsystem can co-reside with Content Server • Existing system must have spare CPU capacity & memory capacity • New fulltext index should reside on high capacity disk array or SAN, not on NAS device or single disk • At 1+ million docs the indexing side could bottleneck on the disk • Spreadsheet shows minimal disk I/O requirements, but these are averages spread over 24 hour periods • actual ones will be higher during indexing process

  27. Large system (100,000 docs per day) • Provided System characteristics: • Ingest: 110GB/day • The data is primarily static once submitted • Approximately 100,000 objects/day • Average file size 1MB • Average metadata size per file: 10kb • Estimated total: 4TB in 3 years on Tier 1- 120TB on Tier 2, 1TB database • Tier 1 Storage - Symmetrix for 30 days • Tier 2 Storage - Centera • Initial pilot: 50 users • 10% of objects/capacity applying text search

  28. Initial observations on provided information • How many days in a year will see 100,000 docs/day? • Lets assume 260 busy days a year • If weekend load rate significant then it should be factored into average per day • 100,000 docs/day x 260 days/yr x 3 yrs = 78 million docs • This is more than can be handled by 5.3 FCS! • 5.3 SP1 features are needed • 5.3 SP1 features needed for large systems • Ability for single repository to have multiple “collections” • Multi-node Index Server support

  29. 5.3 Large Full Text support: FCS vs. SP1 • In 5.3 FCS each Content Server repository is mapped to a single Index Server “collection” • In 5.3 SP1: • Collections can be mapped to a single index search “column” • Content Server will be able to have multiple collections per repository • Index Agent to provide mapping of “a_storage_type” to Index Server collection • This can be used to “range partition” the fulltext data • Once a collection reaches a certain size ( < 10 million) data can be routed to • Older “static” data can be put in older collections • CPU burn no longer needed to rebuild older collections

  30. Which area in the spreadsheet should I enter the document profile?

  31. Normally, this input area could be used exclusively • This assumes about 40% of the original content size is for fulltext • Probably not a big deal for small repositories, but could potentially lead to large overestimate for ones like this (with 78 million docs)

  32. What does “10% of objects/capacity applying text search” mean? • Does it mean: “10% of objects will have content to full text”? • Does it mean: “10% of the objects will be fulltext indexed”? • Does it mean: “10% of the searches will be fulltext (as opposed to just the attributes)”? • Assume the first one. • Note that the “Content Loading” area does not allow you to model this!!

  33. 10% word docs to have content FT indexed Space consumption based on meta-data + content 90% images to have only meta-data fulltext indexed Space consumption only on attributes fulltext indexed Alternate model

  34. Other model (con’t) • Uses an alternate space calculation • Reflects that most documents will just have a small amount of meta-data to fulltext index • Total fulltext index size now 4 TB vs. 30 TB of previous

  35. Other model (con’t): note CPU • Note that the CPU’s have not changed between models • This is incorrect (initial model should have at least twice the CPUs as stated) • To be fixed in an upcoming version of spreadsheet

  36. Other items to worry about • Disk I/O needs (I/O’s per sec) reported in spreadsheet reflect average (over loading period) not peak needed values • To reach high throughput fulltext Disk I/O subsystem needs to always be able achieve several hundred I/O’s per second • Do not put fulltext index on single drive (except in case of tiny repository)

More Related