[ale] Text Processing Happiness - I'm lost
David Tomaschik
ozone at webgroup.org
Sat Aug 18 01:49:22 EDT 2007
Bruce wrote:
> Hey all, it's been a while since I was on the Ale list
> - but I have a question, and figured this is the best
> place to ask.
>
> I am running a Netflow Collector (NFC5.0.2) and have a
> config file in XML. The config file basically
> associates applications with TCP and UDP ports. Since
> the config file is pretty limited, most of my traffic
> is not getting associated correctly.
>
> I pulled down a listing of well-known and registered
> ports from IANA, figuring on taking the scattershot
> approach.
>
> A short section is here:
> "<case><value> 1 </value><label> TCP_ tcpmux - 1 -tcp
> </label></case>"
> "<case><value> 2 </value><label> TCP_ compressnet - 2
> -tcp </label></case>"
> "<case><value> 3 </value><label> TCP_ compressnet - 3
> -tcp </label></case>"
> "<case><value> 5 </value><label> TCP_ rje - 5 -tcp
> </label></case>"
> "<case><value> 7 </value><label> TCP_ echo - 7 -tcp
> </label></case>"
> "<case><value> 9 </value><label> TCP_ discard - 9 -tcp
> </label></case>"
> "<case><value> 11 </value><label> TCP_ systat - 11
> -tcp </label></case>"
> "<case><value> 13 </value><label> TCP_ daytime - 13
> -tcp </label></case>"
> "<case><value> 17 </value><label> TCP_ qotd - 17 -tcp
> </label></case>"
> "<case><value> 18 </value><label> TCP_ msp - 18 -tcp
> </label></case>"
> "<case><value> 19 </value><label> TCP_ chargen - 19
> -tcp </label></case>"
> "<case><value> 20 </value><label> TCP_ ftp-data - 20
> -tcp </label></case>"
> "<case><value> 21 </value><label> TCP_ ftp - 21 -tcp
> </label></case>"
> "<case><value> 22 </value><label> TCP_ ssh - 22 -tcp
> </label></case>"
> "<case><value> 23 </value><label> TCP_ telnet - 23
> -tcp </label></case>"
> "<case><value> 25 </value><label> TCP_ smtp - 25 -tcp
> </label></case>"
> "<case><value> 27 </value><label> TCP_ nsw-fe - 27
> -tcp </label></case>"
> "<case><value> 29 </value><label> TCP_ msg-icp - 29
> -tcp </label></case>"
> "<case><value> 31 </value><label> TCP_ msg-auth - 31
> -tcp </label></case>"
> "<case><value> 33 </value><label> TCP_ dsp - 33 -tcp
> </label></case>"
> "<case><value> 37 </value><label> TCP_ time - 37 -tcp
> </label></case>"
> "<case><value> 38 </value><label> TCP_ rap - 38 -tcp
> </label></case>"
> "<case><value> 39 </value><label> TCP_ rlp - 39 -tcp
> </label></case>"
> "<case><value> 41 </value><label> TCP_ graphics - 41
> -tcp </label></case>"
> "<case><value> 42 </value><label> TCP_ name - 42 -tcp
> </label></case>"
> "<case><value> 42 </value><label> TCP_ nameserver - 42
> -tcp </label></case>"
> "<case><value> 43 </value><label> TCP_ nicname - 43
> -tcp </label></case>"
> "<case><value> 44 </value><label> TCP_ mpm-flags - 44
> -tcp </label></case>"
>
> And what I want it to look like is here:
> <case><value>1</value><label>TCP_tcpmux-1-tcp</label></case>
> <case><value>2</value><label>TCP_compressnet-2-tcp</label></case>
> <case><value>3</value><label>TCP_compressnet-3-tcp</label></case>
> <case><value>5</value><label>TCP_rje-5-tcp</label></case>
> <case><value>7</value><label>TCP_echo-7-tcp</label></case>
> <case><value>9</value><label>TCP_discard-9-tcp</label></case>
> <case><value>11</value><label>TCP_systat-11-tcp</label></case>
> <case><value>13</value><label>TCP_daytime-13-tcp</label></case>
> <case><value>17</value><label>TCP_qotd-17-tcp</label></case>
> <case><value>18</value><label>TCP_msp-18-tcp</label></case>
> <case><value>19</value><label>TCP_chargen-19-tcp</label></case>
> <case><value>20</value><label>TCP_ftp-data-20-tcp</label></case>
> <case><value>21</value><label>TCP_ftp-21-tcp</label></case>
> <case><value>22</value><label>TCP_ssh-22-tcp</label></case>
> <case><value>23</value><label>TCP_telnet-23-tcp</label></case>
> <case><value>25</value><label>TCP_smtp-25-tcp</label></case>
> <case><value>27</value><label>TCP_nsw-fe-27-tcp</label></case>
> <case><value>29</value><label>TCP_msg-icp-29-tcp</label></case>
> <case><value>31</value><label>TCP_msg-auth-31-tcp</label></case>
> <case><value>33</value><label>TCP_dsp-33-tcp</label></case>
> <case><value>37</value><label>TCP_time-37-tcp</label></case>
> <case><value>38</value><label>TCP_rap-38-tcp</label></case>
> <case><value>39</value><label>TCP_rlp-39-tcp</label></case>
> <case><value>41</value><label>TCP_graphics-41-tcp</label></case>
> <case><value>42</value><label>TCP_name-42-tcp</label></case>
> <case><value>42</value><label>TCP_nameserver-42-tcp</label></case>
> <case><value>43</value><label>TCP_nicname-43-tcp</label></case>
> <case><value>44</value><label>TCP_mpm-flag-44-tcp</label></case>
>
> The label is the name - I am keeping TCP_ (and UDP_)
> at the start of the label, as the tool I use to
> display stats looks for the TCP and UDP character. I
> follow the IANA name with the port and protocol so I
> won't get duplicate application names (a lot of the
> apps. listen on both UDP and TCP).
>
> Any pointers? How do I get rid of the " character? I'm
> guessing there are tabs in the file, since I created
> it using Excel(I know, I should have figured a way to
> simply grab the IANA well-known ports page and process
> it directly). How do I get rid of tabs?
>
>
>
>
No more quotes or tabs:
tr -d '"\t' infile > outfile
--
David Tomaschik
Moderator, LinuxQuestions.org
http://matir.wordpress.com
More information about the Ale
mailing list