1 / 22

.Stat Suite Architecture

.Stat Suite Architecture. A look at the key non-functional topics. Prepared and presented by Jens Dossé (OECD). .Stat Suite Architecture. Code quality. .Stat Suite DevOps. managed Gitflow process automated unit tests with coverage min 70 % coding standards applied

quinto
Download Presentation

.Stat Suite Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. .Stat Suite Architecture A look at the key non-functional topics Prepared and presented by Jens Dossé (OECD)

  2. .Stat Suite Architecture

  3. Code quality

  4. .Stat Suite DevOps • managed Gitflowprocess • automated unit tests with coverage min 70% • coding standards applied • automated builds and deployments QA/Staging environments in the cloud Containers Code base: Dev/Master

  5. Kanban development methodology • Issue review • Peer-review • Quality assurance • Documentation standards

  6. Data Model & Performance Review

  7. Revised V8 Data model In V8 2.0 the Filter table (series identifiers) is back! [Data].[FILT_{dsd id}] - PK: SID (int) = Unique Identity surrogate key (auto-incremented int) - ROW_ID (binary) = Unique, derived ID of the data row (assuring series unicity) - DIM_{component id#x}… = dimension ID with Non-Clustered Column Stored index Fact table [Data].[FACT_{dsd id}_{A|B}] - PK: SID (int) = Unique, surrogate key generated on the FILT table - PK: DIM_TIME (int)= References the time dimension member in CL_TIME table with Non-Clustered Column Stored index* - COMP_{component id#x}… = for coded/uncoded attributes, with Non-Clustered Column Stored index* - VALUE - LAST_UPDATED Deleted Data Fact table[Data].[Fact_{dsd id}_{A|B}_DELETED] - SID, DIM_TIME, LAST_UPDATED https://gitlab.com/sis-cc/dotstatsuite-documentation/blob/9abdbdbf7bb9c23dfecf422b9da58521f49b4ef0/content/framework/architecture/db%20architecture.md *To speed up actual/available content constraints

  8. Revised V8 Data model • Dataflow attributes[Data].[ATTR_{dsd id}_{A|B}_DF] • - DF_ID (int) = ID of the referenced dataflow item in ARTEFACT table. • - COMP_{component id#x}… = for coded/uncodedattributes • Dimension group attributes[Data].[ATTR_{dsd id}_{A|B}_DIMGROUP] • - PK: ROWID = Unique ID of the related data row in the Fact_{dsd id}_{A|B} table. • - DIM_TIME (int) = References the time dimension member in CL_TIME table. • - DIM_{component id#x}... = dimension ID • - COMP_{component id#x}… = for coded/uncodedattributes Binary unique ROW_ID generation for FACT table: Dimension #1 ID: 1 600 000Dimension #2 ID: 32100Dimension #3 ID: 16 000 000 Binary version of Dimension #1 ID: 0x186A00Binary version of Dimension #2 ID: 0x007D64Binary version of Dimension #3 ID: 0xF42400  Unique row ID: 0x186A00007D64F42400

  9. .Stat Suite SDMX Data upload and download speed Upload, Update, and Delete: slices of 500K rows Data download: 1 million rows Test (virtual) machine setup: 16 GB RAM, vCPUs: 8 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.4 GHz, 250 GB hard disk, Windows Server 2016 standard (64-bit), Microsoft SQL Server 2017 Enterprise Edition Dataset: 9 dimensions, 1 observation level attribute, 4 series-level attributes, 70 million observations For 1m Performance objectives: Up to 1b per 1m First 100m per 1m ISTAT: 1.37; 0.42; 1.29 1.48; 0.54; 2.29

  10. .Stat Suite SDMX Data download speed in sec

  11. .Stat Suite SDMX Data upload and download speed Filtered data download speed (in milliseconds) per 1000 data points for up to 3000 data points: Performance objectives: Per 1000 Per 1000

  12. Remaining potentials for SDMX performance improvements To be discussed with Eurostat: - Speed of SDMX structure retrievals in NSI web service - SDMXSource speed

  13. Focus on DE Performance Tuning • SOLR-based Search engine migrated from .Net to nodejs with goals: • Query response time <= a few hundred of milliseconds, but in general < 100 milliseconds • 99,99% availability • Shardingand load balancing are usable to guarantee availability and performance when scaling • Indexing activities should not significantly impact the search API performance and availability, even if indexing activity happens very frequently (every one or few minutes) as Dataflows are updated almost constantly • Data Explorer front end optimisation: • Scoped list replaces classical tree views • New table design and specific implementation optimise performance

  14. Availability

  15. Availability

  16. Security

  17. Security review – Source code (JavaScript)

  18. Security review – Source code (.Net)

  19. Other security measures secure configuration/credentials/password management

  20. Scalability/Recoverability

  21. Scalability/Recoverability

  22. Thank you! Contact: jens.dosse@oecd.org

More Related