Professional Documents
Culture Documents
This article introduces the concept of playing a file line by line in Linux with the help of examples and tips along with a guided tour of initiating a loop. The article discusses the errors committed while reading a file line by line on the Linux platform. With samples and illustrations, it shows how the 'for loop' and 'while loop' differ in their respective outputs. It also provides tips on how to use the while loop and depicts its syntax. It concludes with the process behind initiating a loop along with the side effects the while loops can exhibit. One of the most common errors when using scripts bash on GNU / Linux is to read a file line by line by using a for loop (for line in $ (cat file.txt) do. ..), which in this example leads to an assessment for each line and not every word of the file. It is possible to change the value of the variable $ IFS (Internal Field Separator, internal field separator) with a for loop before starting the loop. Sample output with a for loop: for line in $ (cat file.txt) do echo "$ line" done
This is row No 1 This is row No 2 This [...] The solution is to use a while loop coupled with the internal read. It is possible to get the result with a for loop provided to change the value of the variable $ IFS (Internal Field Separator, internal field separator) before starting the loop. While loop The while loop remains the most appropriate and easiest way to read a file line by line. Syntax
Example
The starting file:
This is line 1
This is line 2
This is line 3
or in a "bash" script:
#! / bin / bash while read line do echo-e "$ line \ n" done <file.txt The output on the screen (stdout): This is line 1 This is line 2
Tips
It is entirely possible from a structured file (like an address book or /etc/passwd, for example), to retrieve the values of each field and assign them to several variables with the command 'read'. Be careful to properly assign the IFS variable with good field separators (space by default). Example:
#! /bin/bash while IFS=: read user pass uid gid full home shell do echo -e "$full :\n\ Pseudo : $user\n\ UID :\t $uid\n\ GID :\t $gid\n\ Home :\t $home\n\ Shell :\t $shell\n\n" done < /etc/passwd
Bonus
while read i; do echo -e "Paramtre : $i"; done < <(echo -e "a\nab\nc")
Initiate a Loop
Although the while loop is the easiest method, it has its side effects. It obliterates the formatting of lines including spaces and tabs.
Moreover, the for loop coupled with a change of IFS helps keep the structure of the document output.
Syntax
# save the field separator # new field separator, the end of line
scripting, I hope you can understand easily. I extracted last five lines from my /etc/passwd file, and stored in a file "file_passwd". [root@www blog]# tail -5 /etc/passwd > file_passwd [root@www blog]# cat file_passwd venu:x:500:500:venu madhav:/home/venu:/bin/bash padmin:x:501:501:Project Admin:/home/project:/bin/bash king:x:502:503:king:/home/project:/bin/bash user1:x:503:501::/home/project/:/bin/bash user2:x:504:501::/home/project/:/bin/bash I use this file whenever a sample file required.
Method 1:
#!/bin/bash # SCRIPT: method1.sh # PURPOSE: Process a file line by line with PIPED while-read loop. FILENAME=$1 count=0 cat $FILENAME | while read LINE do let count++ echo "$count $LINE" done echo -e "\nTotal $count Lines read" With catting a file and piping the file output to a while read loop a single line of text is read into a variable named LINE on each loop iteration. This continuous loop will run until all of the lines in the file have been processed one at a time. Bash can sometimes start a subshell in a PIPED "while-read"
loop. So the variable set within the loop will be lost (unset) outside of the loop. Therefore, $count would return 0, the initialized value outside the loop. Output: [root@www blog]# sh method1.sh file_passwd 1 venu:x:500:500:venu madhav:/home/venu:/bin/bash 2 padmin:x:501:501:Project Admin:/home/project:/bin/bash 3 king:x:502:503:king:/home/project:/bin/bash 4 user1:x:503:501::/home/project/:/bin/bash 5 user2:x:504:501::/home/project/:/bin/bash Total 0 Lines read
Method 2:
#!/bin/bash #SCRIPT: method2.sh #PURPOSE: Process a file line by line with redirected whileread loop. FILENAME=$1 count=0 while read LINE do let count++ echo "$count $LINE" done < $FILENAME echo -e "\nTotal $count Lines read" We still use the while read LINE syntax, but this time we feed the loop from the bottom (using file redirection) instead of using a pipe. You will find that this is one of the fastest ways to
process each line of a file. The first time you see this it looks a little unusual, but it works very well. Unlike method 1, with method 2 you will get total number of lines out side of the loop. Output: [root@www blog]# sh method2.sh file_passwd 1 venu:x:500:500:venu madhav:/home/venu:/bin/bash 2 padmin:x:501:501:Project Admin:/home/project:/bin/bash 3 king:x:502:503:king:/home/project:/bin/bash 4 user1:x:503:501::/home/project/:/bin/bash 5 user2:x:504:501::/home/project/:/bin/bash Total 5 Lines read Note: In some older shell scripting languages, the redirected loop would also return as a subshell.
another file, command, program, or script. Each open file gets assigned a file descriptor. The file descriptors for stdin,stdout, and stderr are 0,1, and 2, respectively. For opening additional files, there remain descriptors 3 to 9 (may be vary depending on OS). It is sometimes useful to assign one of these additional file descriptors to stdin, stdout, or stderr as a temporary duplicate link. This simplifies restoration to normal after complex redirection and reshuffling . There are two steps in the method we are going to use. The first step is to close file descriptor 0 by redirecting everything to our new file descriptor 3. We use the following syntax for this step: exec 3<&0 Now all of the keyboard and mouse input is going to our new file descriptor 3. The second step is to send our input file, specified by the variable $FILENAME, into file descriptor 0 (zero), which is standard input. This second step is done using the following syntax: exec 0<$FILENAME At this point any command requiring input will receive the input from the $FILENAME file. Now is a good time for an example. #!/bin/bash #SCRIPT: method3.sh #PURPOSE: Process a file line by line with while read LINE Using #File Descriptors
FILENAME=$1 count0= exec 3<&0 exec 0< $FILENAME while read LINE do let count++ echo "$count $LINE" done exec 0<&3 echo -e "\nTotal $count Lines read" while loop reads one line of text at a time.But the beginning of this script does a little file descriptor redirection. The first exec command redirects stdin to file descriptor 3. The second exec command redirects the $FILENAME file into stdin, which is file descriptor 0. Now the while loop can just execute without our having to worry about how we assign a line of text to the LINE variable. When the while loop exits we redirect the previously reassigned stdin, which was sent to file descriptor 3, back to its original file descriptor 0. exec 0<&3 In other words we set it back to the systems default value. Output: [root@www tempdir]# sh method3.sh file_passwd 1 venu:x:500:500:venu madhav:/home/venu:/bin/bash 2 padmin:x:501:501:Project Admin:/home/project:/bin/bash 3 king:x:502:503:king:/home/project:/bin/bash 4 user1:x:503:501::/home/project/:/bin/bash 5 user2:x:504:501::/home/project/:/bin/bash
when a print command appears by itself, the full contents of the current line are printed. Here is another awk example that does exactly the same thing: $ awk '{ print $0 }' /etc/passwd In awk, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing. Now is a good time for an example. #!/bin/bash #SCRIPT: method4.sh #PURPOSE: Process a file line by line with awk FILENAME=$1 awk '{kount++;print kount, $0} END{print "\nTotal " kount " lines read"}' $FILENAME Output: [root@www blog]# sh method4.sh file_passwd 1 venu:x:500:500:venu madhav:/home/venu:/bin/bash 2 padmin:x:501:501:Project Admin:/home/project:/bin/bash 3 king:x:502:503:king:/home/project:/bin/bash 4 user1:x:503:501::/home/project/:/bin/bash 5 user2:x:504:501::/home/project/:/bin/bash Total 5 lines read Awk is really good at handling text that has been broken into multiple logical fields, and allows you to effortlessly reference each individual field from inside your awk script. The following script will print out a list of all user accounts on your system:
}' /etc/passwd
Above, when we called awk, we use the -F option to specify ":" as the field separator. By default white space (blank line) act as filed separator. You can set new filed separator with -F option. When awk processes the print $1 "\t " $3 command, it will print out the first and third fields that appears on each line in the input file. "\t" is used to separate field with tab.
[root@www blog]# sh method5.sh file_passwd 1 venu:x:500:500:venu madhav:/home/venu:/bin/bash 2 padmin:x:501:501:Project Admin:/home/project:/bin/bash 3 king:x:502:503:king:/home/project:/bin/bash 4 user1:x:503:501::/home/project/:/bin/bash 5 user2:x:504:501::/home/project/:/bin/bash Total 5 lines read
[root@www blog]# time ./method1.sh bigfile.4227 >/dev/null real 6m2.911s user 2m58.207s sys 2m58.811s [root@www blog]# time ./method2.sh bigfile.4227 > /dev/null
real 2m48.394s user 2m39.714s sys 0m8.089s [root@www blog]# time ./method3.sh bigfile.4227 > /dev/null real 2m48.218s user 2m39.322s sys 0m8.161s [root@www blog]# time ./method4.sh bigfile.4227 > /dev/null real 0m2.054s user 0m1.924s sys 0m0.120s [root@www blog]# time ./method5.sh bigfile.4227 > /dev/null I waited more than half day, still i didn't get result, then I created a 10000-line file to test this method. [root@www tempdir]# time ./method5.sh file.10000 > /dev/null real user sys 2m25.739s 0m21.857s 1m12.705s
Method 4 came in first place,it has taken very less time 2.05 seconds, but we can't compare Method 4 with other methods, because awk is not just a command, but a programming language too. Method 2 and method 3 are tied for second place, they produce mostly the same real execution time at 2 minutes and 48 seconds . Method 1 came in third at 6 minutes and 2.9 seconds. Method 5 has taken more than half a day. 2 minutes 25 seconds to process just a 10000 line file, how stupid it is. Note: If file contain escape characters, use read -r instead of read, then Backslash does not act as an escape character. The back-slash is
considered to be part of the line. In particular, a backslash-newline pair may not be used as a line continuation.
The script accepts the file name as first argument. It supposes that: 1. All the files are named *.csv 2. All the fields are splitted with a whitespace
view plaincopy to clipboardprint?
1. #!/bin/sh 2. 3. grep "\.csv" $1|while read LINE; do 4. FILENAME=`echo $LINE|cut -d ' ' -f 9` 5. SIZE=`echo $LINE|cut -d ' ' -f 5` 6. echo "File: " $FILENAME ", size: " $SIZE 7. done
EX script to read: 8. FILE=/home/file.txt 9. 10. if [ -f $FILE ]; 11. then 12. echo "File $FILE exists" 13. cnt=$(cat $FILE | wc -l) # deliberate UUOC 14. if [ $cnt -gt 3 ] ; 15. then 16. echo "$FILE is larger than 3 lines" 17. fi 18. else 19. echo "File $FILE does not exist" 20.Fi Another one:
21. awk '{x++}END{ print x}' filename