1 / 25

Searching and Sorting

Searching and Sorting. Why Use Data Files?. There are many cases where the input to the program may come from a data file.Using data files in your programs offer the following advantages. Users do not have to input repetitive information.

pilar
Download Presentation

Searching and Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching and Sorting

  2. Why Use Data Files? • There are many cases where the input to the program may come from a data file.Using data files in your programs offer the following advantages. • Users do not have to input repetitive information. • Several users and even programs can share common information. • Programs are able to run on their own without waiting for a user to input anything.

  3. Disadvantages of Data Files • The disadvantage to using data files is that it usually a little more involved to take input from a file, then directly from the user.

  4. Searching and Sorting • The reason for this is that the data file may contain hundreds, or thousands Units of data. You must be able to search for the information you wish to find. In addition when writing to data files it would be preferable to have the file in some sort of order.

  5. Searching/Sorting Commands • Unix comes with several command to facilitate searching and sorting including the following • grep • (g)awk • sed • sort • cut • uniq • diff

  6. grep • grep is a command that will search a file for a certain “string” of information. When it finds a match it will show the whole line. • IE) grep bigelow /etc/passwd bigelow:x:1711:100:,,,:/home_staff/bigelow:/bin/bash

  7. The password File • The password file in Unix contains information about the users on the system. • Every user on the system has an entry in the password file • This file is consulted during login and to calculate file permissions. • In most modern versions of Unix the password file doesn’t contain user passwords.

  8. /etc/passwd file syntax USER:PASSWORD:UID:GID:COMMENT:HOME DIR:SHELL User- The user’s login name Password- Where the password used to be (now in shadow file) UID/GID- The User’s ID number and Group ID number Comment-Stores address or other general info about the user HOME DIR – Specifies the user’s home directory SHELL – Specifies the user’s shell

  9. grep switches • grep –i BiGeLoW /etc/passwd  Case insensitive • grep –v bigelow  Search for everything but the string (used to remove lines from files). • IE)grep –v –i bobo ~/data.txt >data.temp • This will remove all line(s) that contain the string bobo and re-direct the output to a temp file.

  10. More Grep switches • There may be times where you will need to see what occurred above or below the line being searched for. • IE) grep –2 Jim ~/data.txt  Will show the 2 lines above Jim and below Jim. Plus the line its self. • grep –c Jim  Show how many lines contain the word Jim

  11. diff • The diff command shows what's different between any 2 files. The diff command uses < > to indicate which file contains that line. • IE) diff file1 file2 • <line in file1 but not file2 • >line in file2 but not file1

  12. uniq • The uniq command is used to remove duplicate entries from a file. • IE) uniq data.txt • bobo • fred • john • Frank

  13. The sort command • The sort command can be used to sort any file. • IE) sort /etc/passwd would put the file in order alphabetical order based on the first field.

  14. Sorting by other fields • sort –t<file separator> +<file number> file IE) sort –t: +2 /etc/passwd sorts the passwd file based on UID (second field)

  15. How sort number the fields • NAME:PASS:UID:GID: Fields • 0 1 2 3 Field Number sort <file><sorts by first field> sort +0 <sorts by first field> sort +1 <sorts by second field>

  16. (g)awk • awk or gawk is more then just a simple comand. awk is a powerful programming language. • Awk is a great prototyping language • Start with a few lines and keep adding until it does what you want

  17. AWK • A programming language for handling common data manipulation tasks with only a few lines of program • Awk is a pattern action language • The language looks a little like C but automatically handles input, field splitting, initialization, and memory management • Built-in string and number data types • No variable type declarations

  18. gawk general syntax • gawk ‘/pattern/ {output}’ file • pattern - is what is being searched for. • output - what will the command output when the pattern is matched. • file -the file being search • The quotes are the single quotes found next to the enter key.

  19. Simple Output From AWK • If an action has no pattern, the action is performed for all input lines • gawk ‘{ print }’ filename • will print all input lines on stdout • gawk ‘{ print $0 }’ filename will do the same thing

  20. Printing specific fields • Multiple items can be printed on the same output line with a single print statement • gawk ‘ { print $1, $3 }’ file • This will print the first and third fields in the file. • commas are used in the print statement to indicate spaces.

  21. Changing the field Separator • The default field separator in gawk is a space. • To change specify a different field simply use the field separator switch (-F) • gawk –F: ‘{print $1,$7}’ /etc/passwd • would print the first and seventh fields (name and shell ) from the password file

  22. Using gawk to search • By including a pattern in the gawk statement this will actually allow the gawk command to search • gawk ‘/root/ {print $1,$7}’ /etc/passwd • This will only print the login and shell of those lines that contain the string root

  23. Consider the following text file Joe,Smith,1234567 fred,Sam,7654321 Hank,Joe,9876543 • Suppose you wanted only the people who’s last name are Joe. • How would you structure a gawk command to accomplish that?

  24. Solution • gawk –F, ‘$2~/joe/ {print }’ datafile • This gawk statement is read as follows • Using field separator of a comma search the second field for the string ‘joe’ and print the whole line using the datafile as input

  25. Interactive exercise • Determine the commands that will accomplish the following; • Sort the password file based on UID and save a copy of the file in you home directory called passwd.sorted

More Related