1 / 43

A State of PHP 2018 Backed by Data

How is our beloved language doing? Analyzing all of GitHub, StackOverflow and Hacker News via the publicly available Google BigQuery datasets (some 40TB of data) this presentation aims to give humorous and ingenious insights into our TIOBE Index Top 10 language. If you’ve ever wondered which PHP versions are still in use, which packages are most widely used or if you’re the only ‘Full StackOverflow Developer’ this is the presentation for you. Similarly we’ll look into things like PSR adoption (who’s still using tabs), framework popularity and community participation with a view to how we are doing and where we should be going. Trolls are welcome – I have the data…

bradm
Download Presentation

A State of PHP 2018 Backed by Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A state of PHP 2018 Backed by Data PHP South Africa 2018 Brad Mostert /bsinkwa /mostertb

  2. What? • Analyse Big Data on GitHub • Give some insight on our Community and Craft • Be Interesting (or atleast funny) • Influenced heavily by the work of Felipe Hoffa (@felipehoffa) from Google • Lies, Damn Lies and Statistics

  3. WHO AM I Senior developer at Afrihost Server Shepard PHP Joburg Organizer Crazy! Gave the Advanced Composer workshop and Design Patterns in PHP talk last year

  4. Data Sources: Github Contents *Statistics compile by me this week • BigQuery Public Dataset • cloud.google.com/bigquery/public-data/github* • 3.5TB+* • 3.4 Million Projects • 222 Million Commits • 2.3 Billion Unique File Paths • Latest Revision of 245 Million Files (RegEx Searchable) • Update ~Weekly

  5. Data Sources: Github Contents • To Be Included • Public Repo • Clear OpenSource License • Detected by GitHub API • developer.github.com/v3/licenses/ • ASCII Files Less than 10MB • Mostly non-forked • Excludes “Un-notable” projects

  6. Data Sources: GHTORRENT ghtorrent.org • Watch GitHub Public Event Timeline • api.github.com/events • Exhaustively Retrieve Related Information from GitHub Knowledge Graph • Data Since 2012 • MongoDB • Raw JSON Representations • 10TB+ • MySQL • Links Dependencies between Data

  7. Data Sources: GHTORRENT – Mysql data • Updated Monthly • Most recent version in BigQuery from April 2018 • Manually Imported 2018-09-01 dump • 291GB in CSV • 98+ Million Repos • 89 Million Excluding deleted Repos and Users • 1+ Billion Commits

  8. Data Sources:GH Archive gharchive.org • Also queries the GitHub Events API • Stores only events in BigQuery Tables • bigquery.cloud.google.com/table/githubarchive:day.yesterday • Records contain both common fields (like Repo Name) and the full JSON Payloads • Broken up into tables Per Day, Month Year • Updated Hourly • Raw JSON also available for download • Size • Day: 2.7k tables Total 3.636TB • Times 3 for Month and Day

  9. Tools: Google BigQuery • Highly scalable, fully managed data warehouse and analytics platform • Part of Google Cloud Platform • Distributed Columnar Database (Dremel) • Billed on ‘amount of data processed’ and storage • $5 per 1TB processing • 1TB processing free per month + 10GB storage • $300 Free Tier Trial (cloud.google.com/free) Free Trial 0 Free Trial 1 Free Trial 2 github.com/mostertb/state-of-php-2018-scratch Free Trial 3

  10. Tools: HomeLaB • Personal Server: pre-processing the GHTorrent Data • Dell R720 • 128GB RAM • Duel Xeon E5-2670 @ 2.60GHz • 3x 300GB 15000k SAS + 480GB SSD • I’ll give you access to my GHTorrent Datasets MyISAM actually works well for this application

  11. Number Projects: GHTORRENT • Total in GHTorrent: 98 Million • Less Deleted Repos and Users: 89 Million • Octoverse 2017 Report has 67 Million • Empty Repos? Forks without changes? • With any PHP: 2.5 Million • Non-forked with any PHP: 1.2 Million • >10KB Code and ‘PHP Bytes’ > 0: 939,895 • Reported by github/linguist (No vendor, docs or generated) • Same criteria over all GHTorrent Repos: 7.9 Million

  12. 11.91% of unique, non-trivial projects on github involve PHP 939,895 / 7,892,367 = 11.91 % • Based on GHTorrent Data • Not Deleted • Not Forked • More than 10KB non-vendor / non-generated Lies, Damn Lies and Statistics…

  13. Active projects?

  14. PHP Repo Events over past year All branches. Includes pushing tags ‘Staring’ a repo. Doesn’t include ‘un-staring’ Anything to do with a PR (assigned, unassigned, labeled, unlabeled, opened, edited, closed, reopened) Create repository, branch, or tag • GH Archive • Between September 2017 and August 2018 • Non-forked Repos • Repo Size >10KB New Release published Private Repo becomes Public

  15. PHP Projects Active over Last Year 161,896 Projects out of 939,895 • Events: • PushEvent • WatchEvent • PullRequestEvent • CreateEvent • ReleaseEvent • Non-forked, non-deleted • > 10KB non-vendor/non-generated code 17.22 % compared to 20.96% over all repos

  16. How much PHP?

  17. Languages used with PHP • C and C++ in 11th and 12th in both cases • Smarty is still at ~1% • Vue gains 2.57% • Hack gains 0.64% putting it in 16th on Active • Dockerfile ranks 61st at 0.13% • Java sees no percentage change • Perl drops down to 14th All Active in Last Year

  18. Languages where PHP is Primary Active in Last Year >=90% Bytes PHP

  19. PHP Project Owner location 37.84% of owners provide a geo-codable location

  20. PHP Project Owner location: Africa

  21. South African Project Owners

  22. GitHub Contents • 3,353,813 Projects Total • 344,215 Projects with any PHP • 290,206 PHP Projects >= 10KB Detected Code • To Be Included • Clear OpenSource License • ASCII Files Less than 10MB • Mostly non-forked • Excludes “Un-notable” projects

  23. Github contents: composer files • Only in the root directory • Only on master branch • All PHP projects included in the BigQuery Public Datasset Total Files: 152,188 In Root Path: 150,899 Master Branch: 144,044

  24. Github contents: composer Packages

  25. Github contents: composer Packages 34,510 Distinct Packages

  26. Github contents: composer Packages Other Favorites Rather use ‘require-dev’ Easy pull request?

  27. Composer Packages: Framework Ranking • Only considering BigQuery GitHub Public Dataset • Not taking into account activity or size • Not taking into account versions

  28. Composer Packages by Vendor 12,213 Unique Vendors ~35% Unique

  29. Composer Packages: Required Extensions

  30. Composer Packages: Require-Dev Top 10 Notable

  31. Composer Packages: MINIMUM PHP VERSION • Naively matched: LIKE ‘%<version>%’ • 840 unmatched values • 402 only provide major version 7 • 105,948 Composer Files provide a PHP version

  32. Contents: PHP Projects • 344,215 with any PHP • 290,206 >= 10KB • 28,778 In the List of GHTorrent Active

  33. Contents • 2,896,713 PHP Files (<=10MB) • 16.66 GB

  34. TABS vs Spaces Credit to Felipe Hoffa • Files at least 10 lines • 413,175,973 Analysed • 649,150 Tab Files • 1,748,235 Space Files

  35. Full Stack overflow developers “Stack Overflow” (and variations) occurs on 4694 lines in 3892 in PHP projects in the dataset updated in the last year These are just the examples with atttibution

  36. Stack overflow comments

  37. PHP 7 Language Features • Files with: • Spaceship Operator (<=>): 1,869 • “yield”: 3951 • (something,); : 110556

  38. South African Developers: 0 to E

  39. South African Developers: E to L

  40. South African Developers: L to S

  41. South African Developers: s to z

  42. Questions? PHP South Africa 2017 Brad Mostert /bsinkwa /mostertb

  43. github.com/mostertb/phpsa-2018-profiles /bsinkwa /mostertb

More Related