Second, Soft Worker Personal Project: Text Information Status

Code of this software: https://github.com/amekao/SE_work1

Interface:

< p>share picture p>


< strong>1. Requirement analysis stage:

Requirement analysis:

Total requirements: users are required to run programs in cmd, Provide the corresponding calculation mode according to the input parameters

Basic functions: -c -w -l display the number of characters, words, and lines

Extended functions: -a display specific lines Number, -s can recursively traverse the files in the specified directory

Advanced features: -x pop-up interface allows users to select the text to be counted and display all the information

< hr>

II. Design stage:

Consider The python language provides a better interface for text operations, and the file encoding is also richer, so I decided to use python to complete this project, but in addition to pay the time cost of learning python
Time division:

< tr>

Subtasks Expected time (h) Actual time (h)
Initial learning python 6h 7h
Program framework construction 2h

1.5h
implementation-c 1h 0.2h
Realization-w 1h 1.5h
Realization-l 1h 0.2h
Realization-a 1.5h 0.5h
Implement -s 2h 2h
Support wildcards 1.5h 2h
implement-x 1h 1h
Connect modules 1h 2h
Subsequent new function design 2h 4h
Total time used 20h 21.9h

>

Design ideas for various functions:

1. Get cmd Parameter function: Use the argv function defined by the OS module to obtain the data input by the user in the buffer in the form of a list, and use the flag bit to record what parameters the user enters, so as to achieve the requirements of the compound use of parameters

2. -c and -l function: len corresponds to the number of entries in the list

3. -w function: wildcards separate each word and store it in the list, len list

4. -a function: one by one What kind of row is the row judgment?

5. The function of recursively traversing the directory: use the interface column provided by the os library List the files and folders in the current directory, if it is a folder, continue to traverse recursively, determine whether the file is the corresponding file type, if it is, save it in a table

6. Use wildcard function : Use the interface of fnmatch

7. -x pop-up file selection dialog: use the interface provided by win32ui to get the path of the file selected by the user


< p>

3. Software development stage:

In the actual code input When you find a variety of new problems, you will find that a new design will be better. The final program call relationship is as shown in the figure below

share picture

More features added in the future:

1. Add -b to display the detailed word count : When testing len(open_file.read()), it was found that the number of characters displayed was always incorrect. Later, it was discovered that line breaks were also included. However, this is not in line with the habit of normal people, and it is also necessary to make a character judgment module It’s not difficult, so you can display the number of line breaks, and let the user decide whether to add line breaks to the number of characters.

2. At the beginning, the program limited the length of argv, that is, only support at the same time I used a parameter and a file path, but when he gave it to his classmates to help with the test, he had too many restrictions on the parameters. Later, the program was reanalyzed and found that the -c -w -l -a -b modules did not affect each other, so I changed it to a method where this field appears in argv and the corresponding flag bit is set to True, so that the five common parameters are All of them can be used at the same time, and all five functions can be tested at one time, which greatly speeds up the test efficiency.

3. When designing the -s mode, I found that the fnmatch library (used for wildcard matching) is powerful, so this function is also added in the non-s mode, even if the user wants to enter “self-made array.c” But you can find it by accidentally entering “array.c”

Enter: share picture

Output: share picture

4. In addition to the .c file, python’s read function can also open some other text file formats, such as .py and .java, etc., so this software also supports the statistics of such files

Design experience:

1. This experiment is the first time I write a program in python. Many functions of the C language have to be written by myself, and the text encoding support is not good, while java I am not very familiar with the concept of middle class, and I am always confused about the functions defined in the class, so I use python programming and spend a day or two watching python related tutorials to get started. I have to sigh that python is really easy to use.

2. The most difficult module in this design is the function of recursively traversing the directory. It is said that the os.walk function can be used to traverse, but the description of the os.walk function on the Internet is really obscure, what to return The ternary list is really beyond the scope of my common sense, and I know it is a tree structure, it is not difficult to write a traversal by myself, but it took a lot of time to test successfully

3. Another problem is Assemble the code. At the beginning, the main function of identifying argv is very mentally retarded, that is, the element [0] is wc.exe, the element [1] is various parameters, and the element [2] is the file path, but this is achieved The function of wc.exe -s -a file.c is no longer available, and it is annoying to change the parameters one by one in the test module, so it took a while to let it support multiple parameters.

4. Write code The more I write, the more I want to modify it better. After all, I am also my own biological son. I found that the hard-written traversal function can be reused after a little modification and it will make the experience better. It took a little time to change the ” “File” is divided into “file path” and “file name”, so that wildcards can be more flexible, relative path and absolute path are supported, and the file type can also be modified, killing three birds with one stone


4. Test phase:

The tests are all done manually. Most of them are testing while writing the program. Fortunately, there are not many branch statements used in this program. They are needed when judging the argv parameter, and most of the parameters are parallel, so there is no multi-layer if else statement. Nested and supports multi-parameter input, so one instruction can test most branches

Later, I learned how to use pytest and wrote the code to test each count module:

< div class="code">

1 def test_wc():

2 test_path = "Homemade array.c"

3 op = open(test_path, "r", encoding='UTF-8')

4 assert char_count(op) == 551

5 assert complex_char_count(op) == (9, 327, 49 , 45, 25, 96)

6 assert line_count(op) == 46

7 assert complex_line_count(op) == (4, 10, 32 )

8 assert word_count(op) == 61< /pre>

share picture< /p>

Currently known bugs:

1. Permission problem: Encountered inaccessible Errors may occur in files (such as system files)

2. Encoding format problem: This software should support UTF-8 and GBK encoding, but in rare cases it cannot be opened, and UnicodeEncodeError will appear

3. Personal ability is limited. There are many kinds of comments in the C language /**/ method. Counting all the situations will make the code full of many if else statements, making the code very messy, so the /* comment lines cannot Statistics

4. The interception of a word is a way of matching a string of English letters with wildcards, and it is impossible to determine whether the letter string has actual meaning. For example, a letter string such as "adsadasds" will be judged as a word

p>

1 def test_wc():

2 test_path = "Homemade array.c"

3 op = open(test_path, "r", encoding='UTF-8')

4 assert char_count(op) == 551

5 assert complex_char_count(op) == (9, 327, 49 , 45, 25, 96)

6 assert line_count(op) == 46

7 assert complex_line_count(op) == (4, 10, 32 )

8 assert word_count(op) == 61< /pre>

Leave a Comment

Your email address will not be published.