Specialty Forums
Security and Virus
General Hardware
CPUs/Overclocking
Networking
Digital Photo/Video
Office Software
PC Gaming
Console Gaming
Programming
Database
Web Development
Digital Home

General Forums
Windows XP
Windows Vista
Windows 95/98
Windows Me
Windows NT
Windows 2000
Win Server 2008
Win Server 2003
Windows 3.1
Linux
PDAs
BeOS
Novell Netware
OpenVMS
Solaris
Disk Op. System
Unix
Mac
OS/2

Drivers
Driver Scan
Driver Forum

Software
Automatic Updates

BIOS Updates

My Computing.Net

Solution Center

Free IT eBook

Howtos

Site Search

Message Find

RSS Feeds

Install Guides

Data Recovery

About

Home
Reply to Message Icon Go to Main Page Icon

Sort and seperate data from a file

Original Message
Name: alanevenden
Date: July 18, 2007 at 06:52:50 Pacific
Subject: Sort and seperate data from a file
OS: Unix
CPU/Ram: N/A
Model/Manufacturer: Sun
Comment:

I have an .xml file in which I need to I need to parse and extract data from via a ksh script.

The file (simplified) looks like this:

<Site>
blah blah blah
VARIABLE 1
</Site>
<Site>
blah blah blah
VARIABLE 1
</Site>

What I need to do is print all information between and including <Site> and </Site> of every occurrence to individual files and preferably only do so if VARIABLE 1 is matched.

I'm sure some sort of if/sed combo could do it, but I just don't know how.

All help is very much appreciated!



Report Offensive Message For Removal


Response Number 1
Name: lankrypt0
Date: July 18, 2007 at 12:11:53 Pacific
Subject: Sort and seperate data from a file
Reply: (edit)
There is probably an easier way, but this will work:

#!/usr/bin/ksh
results=$(sed -e "s/$/:zzaqmkoz:/g" < tfile3 | tr -d "\n"|sed -e "s/<\/Site>/<\/Site>\n/g"|sed -e "s/:zzaqmkoz:<Site>/<Site>/g"|grep $1)
case $results in
"")
print BLANK
;;
*)
print $results|sed -e "s/:zzaqmkoz:/\n/g"|tr -s "\n"
;;
esac


Report Offensive Follow Up For Removal

Response Number 2
Name: alanevenden
Date: July 19, 2007 at 02:09:43 Pacific
Subject: Sort and seperate data from a file
Reply: (edit)
This looks great... but a little complicated for my newbie (lack of) skills!

Could you please explain it a little more? Is tfile3 the .xml file I want to extract data from?

Many thanks!


Report Offensive Follow Up For Removal

Response Number 3
Name: lankrypt0
Date: July 19, 2007 at 09:40:55 Pacific
Subject: Sort and seperate data from a file
Reply: (edit)
Yeah sorry, I used tfile3 as my input file. You would just change taht to whatever your input file is. The script takes a variable as whatever you want to search for.

Ths script is not as complicated as it actually look, let me try to break it down for you. In defining $results the first command:
sed -e "s/$/:zzaqmkoz:/g" < tfile3
takes your file and throws :zzaqmkoz: at the end of every line (I use that as a custome delimiter so I can easily go back and replace it later, and im 99% sure it wont be found "naturally"

The second command:
tr -d "\n" removes the new line characters in the file, so now you have one HUGE line of text.

The third command:
sed -e "s/<\/Site>/<\/Site>\n/g"
takes any <\Site> it finds and replaces it with <\Site> followed by a new line. So now every line starts with :zzaqmkoz:<Site> then has your text, then ends with a </Site>, so it looks like:
:zzaqmkoz:<site> text text </site>
this makes it easy to search for your variable.

The next command:
sed -e "s/:zzaqmkoz:<Site>/<Site>/g"
finds and line that starts with my custom delimiter ":zzaqmkoz:" and the <Site>, and simply changes that back to <Site>.

The last command is a grep and simply searches for your variable.

The case command takes the results line above as its input. If it is blank, it simply prints blank. If it finds a result, it takes the custom delimiter :zzaqmkoz: and changes it to new line, then "squeezes" multiple newlines to one (the tr -s "\n" command).


Report Offensive Follow Up For Removal




Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Sort and seperate data from a file 

Comments:

 
  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 


Data Recovery Software




My PC has been hijacked!

Lexmark 2600 Printer Issues

btk1w1 infected start here post

Unwanted message remians on screen

Slow boot time


The information on Computing.Net is the opinions of its users. Such opinions may not be accurate and they are to be used at your own risk. Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE

All content ©1996-2007 Computing.Net, LLC