1 / 27

Standard Types and Regular Expressions

CS 480/680 – Comparative Languages. Standard Types and Regular Expressions. Numbers. Most integers are Fixnum objects When they grow too large, the are converted to Bignum objects An arbitrary length list of fixnums Literals: 12345 – decimal Underscores ignored (12_345 == 12345) (Why?)

preston
Download Presentation

Standard Types and Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 480/680 – Comparative Languages Standard Types and Regular Expressions

  2. Numbers • Most integers are Fixnum objects • When they grow too large, the are converted to Bignum objects • An arbitrary length list of fixnums • Literals: • 12345 – decimal • Underscores ignored (12_345 == 12345) (Why?) • 0377 – octal (leading 0) • 0x3F7A – hex • 0b110111010001 – binary Types & Regular Expressions

  3. Numeric Classes • Integer classes support a number of iterators • 3.times { … } • 1.upto(5) { … } • 99.downto(7) { … } • 50.step(80, 5) { … } = 50, 55, 60, 65, …, 80 Types & Regular Expressions

  4. Strings • A String is a sequence of 8-bit bytes • Usually holds ASCII characters, but not necessary, can hold numbers • String literals • Single quotes: only \\\ and \’’ • Double quotes: • Escape sequences like \n • Any ruby expression: • #{var1} • #{2*$var2+var3/7} Types & Regular Expressions

  5. String Literals • If you want to use another delimeter, you can use %q (single quotes) or %Q (double quotes) • %q/string string “string”/ • %Q(This ‘is’ a #{var2} string) • Opening bracket, brace, parenthesis, or less-than sign: matching delimeter • Anything else – same character Types & Regular Expressions

  6. “Here Documents” • Specify a delimiter string using <<STRING • Delimiter must be in first column • <<-STRING allows indented delimeter aString = <<END_OF_STRING     The body of the string     is the input lines up to     one ending with the same     text that followed the '<<' END_OF_STRING Includes newlines and spaces print <<-STRING1, <<-STRING2 Concat STRING1 enate STRING2 produces: Concat enate Types & Regular Expressions

  7. String Methods • String is one of the largest classes in Ruby • Over 75 standard methods • Many of the more powerful methods use regular expressions, so we’ll come back to the topic of String Methods after we discuss regular expressions in more detail… Types & Regular Expressions

  8. Ranges • In Ruby ranges can be used for sequences, conditions, and intervals • 1..5 = 1, 2, 3, 4, 5 • 1…5 = 1, 2, 3, 4 (0…x is useful for arrays) • Stored efficiently – a range object only stores the min and max values as Fixnums • Can convert to an array with to_a • (1..5).to_a  [1, 2, 3, 4, 5] • (‘bar’..’bat’).to_a  [‘bar’, ‘bas’, ‘bat’] Types & Regular Expressions

  9. Range Methods and Iterators • A few useful operations on ranges: digits = 0..9 digits.include?(5) » true digits.min » 0 digits.max » 9 digits.reject {|i| i < 5 } » [5, 6, 7, 8, 9] digits.each do |digit| dial(digit) end Types & Regular Expressions

  10. Range Contents • Ranges can even be created on objects that you define, provided that your class… • Implements the succ() method, providing the next object in the sequence, and • Objects are comparable using <=> (the “spaceship operator”) • Returns -1/0/1 depending on whether the first object is less-than/equal-to/greater-than the second Types & Regular Expressions

  11. Ranges of objects • VU holds a volume level, 0 to 9 class VU include Comparable attr_reader :volume def initialize(volume) # Should be 0..9 @volume = volume # ERROR CHECKING HERE! end def inspect # Prints out as ######... '#' * @volume end # Support for ranges def <=>(other) self.volume <=> other.volume end def succ raise(IndexError, "Too loud") if @volume >= 9 VU.new(@volume.succ) end end Types & Regular Expressions

  12. Volume Example • Volume object print out as 0 to 9 #’s • Can make ranges of volume objects, since they follow the rules medium = VU.new(4)..VU.new(7) medium.to_a » [####, #####, ######, #######] Actually, four VU objects medium.include?(VU.new(3)) » false Types & Regular Expressions

  13. Conditions and Intervals • Ranges can also be used as conditions and as intervals for controlling loops • We’ll see these uses when we talk about loops in Ruby Types & Regular Expressions

  14. Regular Expressions • Regular expressions are a powerful tool for matching patterns against strings • Available in many languages (AWK, Sed, Perl, Python, C/C++, others) • Matching strings with RegExp’s is very efficient and fast • In Ruby, RegExp’s are objects, like everything else Types & Regular Expressions

  15. RegExp literals • There are three ways to create a regular expression • a = Regexp.new(‘pattern’) • b = /pattern/ • c = %r(pattern) • Match a Regexp against a string using • exp.match(string) • string =~ exp (positive match) • string !~ exp (negative match) Types & Regular Expressions

  16. String Matching • =~ and !~ are also defined for strings • The string on the right is converted to a Regexp • Return the position of the first match, or nil • Zero-based a = "Fats Waller" a =~ /a/ » 1 a =~ /z/ » nil a =~ "ll" » 7 Types & Regular Expressions

  17. Regular Expression Patterns • Most characters match themselves • Wildcard: . (period) = any character • Anchors • ^ = “start of line” • $ = “end of line” Types & Regular Expressions

  18. Character Classes • Character classes: appear within [] pairs • Most special Regexp characters (^, $, etc) turned off • Escape sequences (\n etc) still work • [aeiou] • [0-9] • ^ as first character = negate the class • You can use the literal characters ] and – if they appear first: []-abn-z] Types & Regular Expressions

  19. Predefined character classes • These work inside or outside []’s: • \d = digit = [0-9] • \D = non-digit = [^0-9] • \s = whitespace, \S = non-whitespace • \w = word character [a-zA-Z0-9_] • \W = non-word character Types & Regular Expressions

  20. Repetition in Regexps • These quantify the preceding character or class: • * = zero or more • + = one or more • ? = zero or one • {m, n} = at least m and at most n • {m, } = at least m • High precedence – Only matches one character or class, unless grouped: • /^ran*$/ vs. /^r(an)*$/ Types & Regular Expressions

  21. Alternation • | is like “or” – matches either the regexp before the | or the one after • Low precedence – alternates entire regexps unless grouped • /red ball|angry sky/ matches “red ball” or “angry sky” not “red ball sky” or “red angry sky) • /red (ball|angry) sky/ does the latter Types & Regular Expressions

  22. Side Effects (Ruby Magic) • After you match a regular expression some “special” Ruby variables are automatically set: • $& – the part of the expression that matched the pattern • $‘ – the part of the string before the pattern • $’ – the part of the string after the pattern Types & Regular Expressions

  23. Side effects and grouping • When you use ()’s for grouping, Ruby assigns the match within the first () pair to: • \1 within the pattern • $1 outside the pattern “mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss” /([aeiou][aeiou]).*\1/ Types & Regular Expressions

  24. Repetition and greediness • By default, repetition is greedy, meaning that it will assign as many characters as possible. • You can make a repetition modifier non-greedy by adding ‘?’ a = "The moon is made of cheese“ showRE(a, /\w+/) » <<The>> moon is made of cheese showRE(a, /\s.*\s/) » The<< moon is made of >>cheese showRE(a, /\s.*?\s/) » The<< moon >>is made of cheese showRE(a, /[aeiou]{2,99}/) » The m<<oo>>n is made of cheese showRE(a, /mo?o/) » The <<moo>>n is made of cheese Types & Regular Expressions

  25. String Methods Revisited • s.split(regexp) – returns a list of substrings, with regexp as a delimeter • Can assign to an array, or use multiple assignment • s.sqeeze(string) – reduces any runs of more than one character from string to only one songFile.each do |line| file, length, name, title = line.chomp.split(/\s*\|\s*/) songs.append(Song.new(title, name, length)) end Types & Regular Expressions

  26. String Methods • s.scan(regexp) – returns a list of parts that match the pattern st = "123 45 hello out67there what's 23up?" a = st.scan(/\d+/) puts a » 123 45 67 23 Many more in Built-in Classes and Methods! Types & Regular Expressions

  27. Regexp substitutions • a.sub (one replacement) & a.gsub (global) • Replace a regular expression with a string • The string can include \1, \2, etc. to match parts of the original pattern • See substitutions.rb & Ruby book: Standard Types Types & Regular Expressions

More Related