280 likes | 327 Views
Learn about Visual Studio IntelliCode's AI-powered capabilities to enhance productivity. Discover the data science principles and practical solutions used to improve model precision and efficiency. Explore the key features and benefits for developers. Unleash the power of intelligent coding assistance with IntelliCode.
E N D
Improving Developer Productivity with Visual Studio IntelliCode Allison Buchholtz-Au Program Manager II @allison_au Shengyu Fu Principal Data Science Lead
What is Visual Studio IntelliCode? • Range of capabilities that offers new productivity enhancements through artificial intelligence (AI) • AI-assisted IntelliSense: • Uses current code context and patterns based on thousands of highly rated, open-source projects on GitHub. • Predicts the most likely and most relevant suggestions • Argument completion
Data Science Jouney • Understanding data first, draw intuition and define metrics before building machine learning model. • Keep engineering constraints in mind, be practical for model productization. • Heavily relying on offline evaluation for model improvement.
Data Source – Open source code Number of C# repos on GitHub with good quality >2K Number of solutions we were able to build and parse to form our training dataset >5k Number of .cs documents in the dataset >200K
Extract Training Data from Source Code Features we can extract about an invocation: • Span start: 139 • Is in conditional bracket: false • Is in loop: false • Class invoked: System.Console • Method/property invoked: WriteLine • Containing class: Program • Containing function: Main • Method is override, static, virtual, definition, abstract, sealed: all false • Invoking kind: named type Used Roslyn APIs to compile each solution to get the • syntax tree • semantic model for each document
What questions can we ask of this dataset? How to make recommendations? Which features are useful? How is C# used? • Which are the most frequently used classes? • Are there patterns in how methods of one class are used? • Which pieces of information extracted by the parser would be helpful? • What is the reasonable code context to look at – the entire document/function or the most recent calls? • Will the same model and parameters work for all classes? • Do we have enough data?
Do we have enough training data? Learning curves: Model precision on training and testing data over a varying number of training data sizes • Prediction on test data improves as training data size increases • Testing and training results converge at similar values • Allow us to verify when a model has learned as much as it can from the data
Metrics • Precision • Coverage • Average Reciprocal Rank
Modeling Statistical Language Model Deep Learning Model Clustering Model Frequency Model • Difficult to tune the cluster size for each class • Reasonable precision with bigger model size • Much better precision with smaller model size • In Production! • Best precision and coverage • Largest model size • Slowest execution time • Simple • Low precision
Offline Model Evaluation An example of a data point constructed from usage data Training Data Test Data Features Model Model Model Model Evaluation Label Prediction Provided by model Test Results
Online Evaluation New Invocation Features Model makes recommendation Log a recommendation event If user choose method from recommended list Yes Log a commit event
Current Status and Future Work • Intellicode for member completion was released for C#/C++/XAML in VS and Python/Typescript/Javascript/Java in VSCode. • Method Argument Recommendation has been in preview for C# in VS • Custom Model Training on user's own codebase is in preview for C# in VS • Enable live A/B testing for different models • How do we further improve model in production? • Tune the current statistical language model • Improve model precision by incorporating feedback telemetry within GDPR compliance • Deep learning - how to reduce the model size and improve runtime? • Expand to line/snippet level completion
How do DS/Engineering/PM work together?What lessons did we learn?
Online Survey • 73% of users across languages see an increase in productivity. • Survey users every quarter.