.Stat Suite Architecture

.Stat Suite Architecture A look at the key non-functional topics Prepared and presented by Jens Dossé (OECD)

.Stat Suite Architecture

Code quality

.Stat Suite DevOps • managed Gitflowprocess • automated unit tests with coverage min 70% • coding standards applied • automated builds and deployments QA/Staging environments in the cloud Containers Code base: Dev/Master

Kanban development methodology • Issue review • Peer-review • Quality assurance • Documentation standards

Data Model & Performance Review

Revised V8 Data model In V8 2.0 the Filter table (series identifiers) is back! [Data].[FILT_{dsd id}] - PK: SID (int) = Unique Identity surrogate key (auto-incremented int) - ROW_ID (binary) = Unique, derived ID of the data row (assuring series unicity) - DIM_{component id#x}… = dimension ID with Non-Clustered Column Stored index Fact table [Data].[FACT_{dsd id}_{A|B}] - PK: SID (int) = Unique, surrogate key generated on the FILT table - PK: DIM_TIME (int)= References the time dimension member in CL_TIME table with Non-Clustered Column Stored index* - COMP_{component id#x}… = for coded/uncoded attributes, with Non-Clustered Column Stored index* - VALUE - LAST_UPDATED Deleted Data Fact table[Data].[Fact_{dsd id}_{A|B}_DELETED] - SID, DIM_TIME, LAST_UPDATED https://gitlab.com/sis-cc/dotstatsuite-documentation/blob/9abdbdbf7bb9c23dfecf422b9da58521f49b4ef0/content/framework/architecture/db%20architecture.md *To speed up actual/available content constraints

Revised V8 Data model • Dataflow attributes[Data].[ATTR_{dsd id}_{A|B}_DF] • - DF_ID (int) = ID of the referenced dataflow item in ARTEFACT table. • - COMP_{component id#x}… = for coded/uncodedattributes • Dimension group attributes[Data].[ATTR_{dsd id}_{A|B}_DIMGROUP] • - PK: ROWID = Unique ID of the related data row in the Fact_{dsd id}_{A|B} table. • - DIM_TIME (int) = References the time dimension member in CL_TIME table. • - DIM_{component id#x}... = dimension ID • - COMP_{component id#x}… = for coded/uncodedattributes Binary unique ROW_ID generation for FACT table: Dimension #1 ID: 1 600 000Dimension #2 ID: 32100Dimension #3 ID: 16 000 000 Binary version of Dimension #1 ID: 0x186A00Binary version of Dimension #2 ID: 0x007D64Binary version of Dimension #3 ID: 0xF42400  Unique row ID: 0x186A00007D64F42400

.Stat Suite SDMX Data upload and download speed Upload, Update, and Delete: slices of 500K rows Data download: 1 million rows Test (virtual) machine setup: 16 GB RAM, vCPUs: 8 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.4 GHz, 250 GB hard disk, Windows Server 2016 standard (64-bit), Microsoft SQL Server 2017 Enterprise Edition Dataset: 9 dimensions, 1 observation level attribute, 4 series-level attributes, 70 million observations For 1m Performance objectives: Up to 1b per 1m First 100m per 1m ISTAT: 1.37; 0.42; 1.29 1.48; 0.54; 2.29

.Stat Suite SDMX Data download speed in sec

.Stat Suite SDMX Data upload and download speed Filtered data download speed (in milliseconds) per 1000 data points for up to 3000 data points: Performance objectives: Per 1000 Per 1000

Remaining potentials for SDMX performance improvements To be discussed with Eurostat: - Speed of SDMX structure retrievals in NSI web service - SDMXSource speed

Focus on DE Performance Tuning • SOLR-based Search engine migrated from .Net to nodejs with goals: • Query response time <= a few hundred of milliseconds, but in general < 100 milliseconds • 99,99% availability • Shardingand load balancing are usable to guarantee availability and performance when scaling • Indexing activities should not significantly impact the search API performance and availability, even if indexing activity happens very frequently (every one or few minutes) as Dataflows are updated almost constantly • Data Explorer front end optimisation: • Scoped list replaces classical tree views • New table design and specific implementation optimise performance

Availability

Security

Security review – Source code (JavaScript)

Security review – Source code (.Net)

Other security measures secure configuration/credentials/password management

Scalability/Recoverability

Thank you! Contact: jens.dosse@oecd.org

.Stat Suite Architecture

.Stat Suite Architecture

Presentation Transcript

STAT

STAT Transcriptions

Stat 226

STAT E100

AP Stat

STAT E100

Stat That

STAT E100

XL Stat

STAT E100

Stat 226

UniTesK Test Suite Architecture

TCP/IP Protocol suite architecture

STAT 203

Stat 100

Merchant Suite Architecture

STAT