Code Recognition and Cognitive Linguistic Modeling Using AST in Semantic Signal Processing
This work explores the application of Abstract Syntax Trees (AST) in Code Recognition and Cognitive Linguistic Modeling within the field of Semantic Signal Processing (SSP). By generating and analyzing AST representations, we can semantically understand programming code fragments for various applications. This paper reviews AST-based code recognition techniques, discusses the re-hosting and semantic analysis of code, and presents a demo of recognizing filter patterns in C++ files. Future work aims to expand coverage to more programming languages and enhance computational modeling capabilities.
Code Recognition and Cognitive Linguistic Modeling Using AST in Semantic Signal Processing
E N D
Presentation Transcript
Code recognition & CL modeling through AST XingzhongXu Hong Man
Outline • Introduction of AST in SSP • AST for Code Recognition • AST for Cognitive Linguistic Modeling • Summary and Future Work Semantic Signal Processing Stevens
Introduction of AST in SSP • Most language application use Abstract Syntax Tree(AST) as an Intermediate Representation(IR) to help the computer semantically understanding code in programming domain.* • Signal Processing Code • How to semantically analyzing it? • How to semantically modeling it? for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i]; } *Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007 **ANTLR Semantic Signal Processing Stevens
Code Recognition • In order to perform code re-hosting and other semantic code analysis, we may firstly recognize the functionality of each code segment. • In Computer Science, there are two approaches to perform Code Recognition: • AST based recognition [Gabel, 2008] [Roy 2009] • Generate the AST • Perform Tree Matcher • Random Test based recognition [Jiang, 2009] [Bertran, 2005] • Segment the code • Test the I/O behavior Semantic Signal Processing Stevens
Code Recognition • AST represents the source code in programming domain. • Radio and computational primitives has their feature in AST. • Filter ≈ LOOP + ACCUMULATION + MULTIPLY for (i = 0; i < n; i++){ acc0 += d_taps[i] * input[i]; } Semantic Signal Processing Stevens
Code Recognition Result • In order to test the idea, I design a Code Recognition demo (not fully debugged). • Source: GNU-Radio 3.2.2 (C++) • Objective: Recognize and print the filter code. • Platform: Ubuntu 10.04 + Java SE 1.6+ ANTLR 3.2 • Process: • Generate AST for each C++ file. • Match the filter sub-tree pattern. • Print the matched code segment. Semantic Signal Processing Stevens
Code Recognition Result • Result: • Totally 932 C++ source files in GNU-Radio. • 689 files successfully analyzed (to be continued). • 59 filter patterns found. for (i = 0; i < n; i += N_UNROLL){ acc0 += d_taps[i + 0] * input[i + 0]; acc1 += d_taps[i + 1] * input[i + 1]; acc2 += d_taps[i + 2] * input[i + 2]; acc3 += d_taps[i + 3] * input[i + 3]; } for (int j = 0; j < d_len; j++) {if (j != 0)d_pn= 2.0*d_reference->next_bit()-1.0; sum += *in++ * d_pn;} for (i=0; i < d_ff_taps.size(); i++) acc += conj(d_ff_delayline[(i+d_ff_index) & ff_mask]) * d_ff_taps[i]; Semantic Signal Processing Stevens
CL Modeling • Intermediate Representation: • AST (Programming Domain) • CL Modeling (Signal Processing Domain) k = N – i; Semantic Signal Processing Stevens
CL Modeling • Rewrite and mapping the structure and tokens from the AST to CL Modeling Tree. k = N – i; Semantic Signal Processing Stevens
CL Modeling Result • In order to test our idea, I designed a CL Modeling demo based on AST.* • One tree rewriter will translate and modify the current AST to CL Modeling Tree. • Based on the CL Modeling Tree, print the CL Modeling XML file. https://sites.google.com/site/stevensxingzhong/home/clmb *Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010. Semantic Signal Processing Stevens
Summary & Future Work • The programming domain AST is a key interface for language application, in SSP project: • Code Recognition: Determine the functionality of the code segment. • Cognitive Linguistic Modeling: As an intermediate form to modeling the radio code. • Future Work: • Cover more code, C++, Matlab, VHDL etc. • Discover more computational and radio primitive. • Fully support CL Modeling. Semantic Signal Processing Stevens
Reference • Jiang L. and Su, Z. 2009. Automatic Mining of Functionally equivalent code fragments via random testing. In Proceedings of the Eighteenth international Symposium on Software Testing and Analysis. • Gabel, M., Jiang, L., and Su, Z. 2008. Scalable detection of semantic clones. In Proceedings of the 30th international Conference on Software Engineering. • C.K. Roy, J.R. Cordy and R. Koschke B. 2009. Comparison and Evaluation of code Clone Detection Techniques and Tools: A Qualitative Approach. Science of Computer Programming. • Bertran, M., Babot, F., and Climent, A. 2005. An Input/Output Semantics for Distributed Program Equivalence Reasoning. Electron. Notes Theor. Comput. Sci. 137,1 (Jul.2005) • Terence Parr, The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers), 2007 • Terence Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, Pragmatic Programmers, 2010. Semantic Signal Processing Stevens