[ale] Text Processing Happiness - I'm lost

Bruce callmebruce2002 at yahoo.com
Fri Aug 17 23:07:11 EDT 2007


Hey all, it's been a while since I was on the Ale list
- but I have a question, and figured this is the best
place to ask.

I am running a Netflow Collector (NFC5.0.2) and have a
config file in XML. The config file basically
associates applications with TCP and UDP ports. Since
the config file is pretty limited, most of my traffic
is not getting associated correctly.

I pulled down a listing of well-known and registered
ports from IANA, figuring on taking the scattershot
approach. 

A short section is here:
"<case><value>	1	</value><label>	TCP_	tcpmux	-	1	-tcp
</label></case>"
"<case><value>	2	</value><label>	TCP_	compressnet	-	2
-tcp	</label></case>"
"<case><value>	3	</value><label>	TCP_	compressnet	-	3
-tcp	</label></case>"
"<case><value>	5	</value><label>	TCP_	rje	-	5	-tcp
</label></case>"
"<case><value>	7	</value><label>	TCP_	echo	-	7	-tcp
</label></case>"
"<case><value>	9	</value><label>	TCP_	discard	-	9	-tcp
</label></case>"
"<case><value>	11	</value><label>	TCP_	systat	-	11
-tcp	</label></case>"
"<case><value>	13	</value><label>	TCP_	daytime	-	13
-tcp	</label></case>"
"<case><value>	17	</value><label>	TCP_	qotd	-	17	-tcp
</label></case>"
"<case><value>	18	</value><label>	TCP_	msp	-	18	-tcp
</label></case>"
"<case><value>	19	</value><label>	TCP_	chargen	-	19
-tcp	</label></case>"
"<case><value>	20	</value><label>	TCP_	ftp-data	-	20
-tcp	</label></case>"
"<case><value>	21	</value><label>	TCP_	ftp	-	21	-tcp
</label></case>"
"<case><value>	22	</value><label>	TCP_	ssh	-	22	-tcp
</label></case>"
"<case><value>	23	</value><label>	TCP_	telnet	-	23
-tcp	</label></case>"
"<case><value>	25	</value><label>	TCP_	smtp	-	25	-tcp
</label></case>"
"<case><value>	27	</value><label>	TCP_	nsw-fe	-	27
-tcp	</label></case>"
"<case><value>	29	</value><label>	TCP_	msg-icp	-	29
-tcp	</label></case>"
"<case><value>	31	</value><label>	TCP_	msg-auth	-	31
-tcp	</label></case>"
"<case><value>	33	</value><label>	TCP_	dsp	-	33	-tcp
</label></case>"
"<case><value>	37	</value><label>	TCP_	time	-	37	-tcp
</label></case>"
"<case><value>	38	</value><label>	TCP_	rap	-	38	-tcp
</label></case>"
"<case><value>	39	</value><label>	TCP_	rlp	-	39	-tcp
</label></case>"
"<case><value>	41	</value><label>	TCP_	graphics	-	41
-tcp	</label></case>"
"<case><value>	42	</value><label>	TCP_	name	-	42	-tcp
</label></case>"
"<case><value>	42	</value><label>	TCP_	nameserver	-	42
-tcp	</label></case>"
"<case><value>	43	</value><label>	TCP_	nicname	-	43
-tcp	</label></case>"
"<case><value>	44	</value><label>	TCP_	mpm-flags	-	44
-tcp	</label></case>"

And what I want it to look like is here:
<case><value>1</value><label>TCP_tcpmux-1-tcp</label></case>
<case><value>2</value><label>TCP_compressnet-2-tcp</label></case>
<case><value>3</value><label>TCP_compressnet-3-tcp</label></case>
<case><value>5</value><label>TCP_rje-5-tcp</label></case>
<case><value>7</value><label>TCP_echo-7-tcp</label></case>
<case><value>9</value><label>TCP_discard-9-tcp</label></case>
<case><value>11</value><label>TCP_systat-11-tcp</label></case>
<case><value>13</value><label>TCP_daytime-13-tcp</label></case>
<case><value>17</value><label>TCP_qotd-17-tcp</label></case>
<case><value>18</value><label>TCP_msp-18-tcp</label></case>
<case><value>19</value><label>TCP_chargen-19-tcp</label></case>
<case><value>20</value><label>TCP_ftp-data-20-tcp</label></case>
<case><value>21</value><label>TCP_ftp-21-tcp</label></case>
<case><value>22</value><label>TCP_ssh-22-tcp</label></case>
<case><value>23</value><label>TCP_telnet-23-tcp</label></case>
<case><value>25</value><label>TCP_smtp-25-tcp</label></case>
<case><value>27</value><label>TCP_nsw-fe-27-tcp</label></case>
<case><value>29</value><label>TCP_msg-icp-29-tcp</label></case>
<case><value>31</value><label>TCP_msg-auth-31-tcp</label></case>
<case><value>33</value><label>TCP_dsp-33-tcp</label></case>
<case><value>37</value><label>TCP_time-37-tcp</label></case>
<case><value>38</value><label>TCP_rap-38-tcp</label></case>
<case><value>39</value><label>TCP_rlp-39-tcp</label></case>
<case><value>41</value><label>TCP_graphics-41-tcp</label></case>
<case><value>42</value><label>TCP_name-42-tcp</label></case>
<case><value>42</value><label>TCP_nameserver-42-tcp</label></case>
<case><value>43</value><label>TCP_nicname-43-tcp</label></case>
<case><value>44</value><label>TCP_mpm-flag-44-tcp</label></case>

The label is the name - I am keeping TCP_ (and UDP_)
at the start of the label, as the tool I use to
display stats looks for the TCP and UDP character. I
follow the IANA name with the port and protocol so I
won't get duplicate application names (a lot of the
apps. listen on both UDP and TCP).

Any pointers? How do I get rid of the " character? I'm
guessing there are tabs in the file, since I created
it using Excel(I know, I should have figured a way to
simply grab the IANA well-known ports page and process
it directly). How do I get rid of tabs?


       
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC



More information about the Ale mailing list