1 / 17

Re-development of the Cell Suppression Methodology at the US Census Bureau

Re-development of the Cell Suppression Methodology at the US Census Bureau. Philip Steel, James Fagan, Paul Massell , Richard Moore Jr., John Slanta , Bei Wang. Background. Jewett’s network flow program Need for new program 2012 economic census LP (linear programming) methodology

niles
Download Presentation

Re-development of the Cell Suppression Methodology at the US Census Bureau

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Re-development of the Cell Suppression Methodology at the US Census Bureau Philip Steel, James Fagan, Paul Massell, Richard Moore Jr., John Slanta, Bei Wang

  2. Background • Jewett’s network flow program • Need for new program • 2012 economic census • LP (linear programming) methodology • R&M cell suppression team

  3. Processing Model • Preprocessing • Create table description • Determine primaries • Unduplicate • Sequential processing of primaries • Queue reduction • Test company protection (aggregate/supercell) • Sequential processing of supercells

  4. Table relations • Marginals are the sum of interior cells • Geographic relationships tend to generate our most complex sets of table relations • State is the sum of metropolitan areas within the state and the balance. • State is also the sum of counties • Of the form A=B+..+Z where A,B,…,Z are (one of) rows columns or levels that define some Cartesian integer space (i,j,k) • Duplicates are recorded as A=B (eg a county is also a place)

  5. Objective Function

  6. Additivity constraint generator (based on rowrelations) (b) for ii = 1, ... , rr, j = 1,..,cols, k = 1, ... , levs : limr(ii) ≥ 1, ws(ii,j,k) = 0

  7. Bounds hi,j,k = max(0,vi,j,k) for i = 1, ... , rows, j = 1, ... , col, k = 1, ... , levs : (i,j,k) ⋲A

  8. For the primary

  9. Skip P • Model changes only on the target primary constraints. • How can the minimal solution for one target be transformed to be a solution for another target? • By applying a scalar that converts the flow through the second P to the fixed value of the model! • Can be done when the scalar does not violate the bounding conditionsand the complementary flow in the target is 0. • I.e. when the solutions flow through the secondary target exceeds its protection requirement.

  10. Empirical confirmation • In our large sparse tables, we would see a lot of objective 0 results. • That is, the solver finds a 0 cost pattern to protect the primary … it is already protected! • Skip P eliminated most objective 0 results and left intact the sequence of positive objectives their solutions.

  11. Fat solution • CPLEX is using a dual simplex method to find solutions. • The solutions have a growing 0 cost component, with many more cells than are required to protect the target P. • The flow in the 0 cost cells far exceeds what is required to protect the target P (except in very small or dense examples). • The solution “lights up” the possible flows in the table’s current state, giving a “fat” solution.

  12. Skip P and the fat solution

  13. dg10 sector 44 • Cartesian cells: 367,605 (2d) • Non-zero cells: 159,849 • Relations: 283 (row and column) • 14,000 potential tables, linked • P: 95,062 • LP problems: 10,604 • Typical LP size • Reduced LP has 64826 rows, 156809 columns, and 528838 nonzeros • Time: 8hr:37min (includes everything)

  14. Comparison between network and LPon one (of hundreds) dataset from 2007 Statistics based on unduplicated data with an approximation of a published status flag

  15. Thankyou! philip.m.steel@census.gov

More Related