Download
parsing strings n.
Skip this Video
Loading SlideShow in 5 Seconds..
parsing strings PowerPoint Presentation
Download Presentation
parsing strings

parsing strings

115 Views Download Presentation
Download Presentation

parsing strings

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. parsing strings Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis

  2. Working Example • As software developers we frequently find ourselves needing to parse strings. • We will focus on character strings using the STL string class but the techniques used have broader applicability • A common scenario is we have a file which contains tabulated data that we must read, perform some processing than store the results back in a file • A popular file format is CSV or Comma Separated Values • commas are used to separate fields and newlines to separate records CS422 – Operating Systems Concepts

  3. CSV # Registration table # Last <fs> First <fs> MI <fs> ID <fs> Email <fs> Comments # Each line represents the record for one registered person Smith , John , M , 1001 , john@someplace.com , needs receipt Jackson, Mary , I , 2010 , mary@thatplace.edu , Mitchel, Mark, L, 4000, mm@candy.com, must call Hicks,,, 2110, , must get missing information • Convenient o think of the file as a two-dimensional array of records and fields • Be specific about your assumption concerning data format and whether comments and escape sequences are permitted • don’t assume that are fields will have values or that the proper number of field separators are present, especially if people are permitted to edit the file CS422 – Operating Systems Concepts

  4. Accessing the file • C++ gives you access to the C and C++ standard libraries for I/O. See required text for the details. • I assume you need input and output files open: char ch; ifstream fin(“data.cvs”) if (!fin) {cerr<<“open fin failed”; exit(1);} ofstream fout(“result.cvs”); if (fout) {cerr<<“open fout failed”; exit(1);} while (fin.get(ch)) { … fout.put(ch);} if(!fin.eof() || !fout){cerr<<“File IO error”;exit(1);} • you may explicitly open a file fin.open(“filename”); • stream destructor closes file or you may explicitly close it fin.close(); CS422 – Operating Systems Concepts

  5. Operations • stream objects have stategood() – next operation expected to succeedeof() – end of file (input) reachedfail() – next operation will failbad() – corrupted stream • An operation on a stream not in a good state is a null op • bool operator!() const on a stream returns fail() • operator void*()const returns fail() ? 0 : -1; • char oriented I/O uses get, put, read, write, getline and the operators << and >>. • get(char*,…) does not remove ‘\n’ but getline(char *,…) does. • Can also use the non-member function getline which takes a string CS422 – Operating Systems Concepts

  6. Reading CSV • Questions: • is it OK to add fl to the vector records? • does the line read retain all whitespace? istream fin(argv[1]; string line; vector<string> lines; vector<FieldList> records; while (getline(fin, line)) { lines.push_back(line); // example of using vectors FieldList fl(line); records.push_back(fl); } CS422 – Operating Systems Concepts

  7. Reading the fields, one or many ways FieldList::FieldList(const string &rec, …) // you can fill in the missing pieces string fld; string::size_type indx, fend, tmp, end = rec.size(); for (indx = 0; indx <= end; indx = fend + 1) { // skip over any initial white space indx = rec.find_first_not_of(ws_, indx); ??? flds_.push_back(fld); } • To solve this consider the edge cases • Make sure you explicitly address each case • Draw a picture • Do you allow comments? • What about quoted text with embedded field separators? CS422 – Operating Systems Concepts

  8. Simple Examples • You can use the find family of string member function to split up this line: find(), find_first_of(), find_first_not_of(), find_last_of(), find_last_not_of() a, b, c\n char a , b , c 0 1 2 3 4 5 index Record as it appears in file string representation of record after a cal to getline(fin, line). line.size() == 5 CS422 – Operating Systems Concepts