Computing.Net > Forums > Solaris > HELP! Boot problems on production system

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

HELP! Boot problems on production system

Reply to Message Icon

Name: James
Date: June 25, 2002 at 02:06:57 Pacific
Comment:

Hi,

If any one can help me out with this one it would be HUGELY appreciated as its a production machine and I'm seriously desperate!!!!

The system is a Sun E250 running Solaris 2.6, 2Gb RAM and 18Gb SCSI (bootable) disk. It also has an A1000 RAID attached though the problem still occurs if its removed.

The machine packed up at the weekend (its been running over a year without problems) and when I got to the console I found it continuously rebooting itself using the same sequence below:


Boot device: /pci@1f,4000/scsi@3/disk@0,0 File and args: -r
SunOS Release 5.6 Version Generic_105181-30 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
BAD TRAP: cpu=1 type=0x10 rp=0x301f9120 addr=0x1000c094 mmu_fsr=0x0
autopush: illegal instruction fault:
addr=0x1000c094
pid=4, pc=0x1000c094, sp=0x301f91b0, tstate=0x44f0001e02, context=0x6
g1-g7: 10013f58, be, be, 1ffffc000, 0, 0, 6184ba40
Begin traceback... sp = 301f91b0
Called from 100229f4, fp=301f9290, args=2fa80400 1c00 0 117ae8c0 104111cc 0
Called from 601b7fbc, fp=301f92f8, args=2fa80400 400 1c00 0 2fa80000 10044c98
Called from 601b7b98, fp=301f9388, args=1 600bbd88 0 0 301f94c0 1
Called from 10093924, fp=301f9450, args=0 2000 0 0 0 1
Called from 1006ace8, fp=301f94d0, args=0 0 5020c000 0 0 6018c9c0
Called from 601acc08, fp=301f9530, args=5020c000 0 10412000 400 1 301f9590
Called from 601aa478, fp=301f9598, args=600bbd00 0 0 301f9624 6161e000 ff00
Called from 601b58d4, fp=301f9628, args=0 301f9710 600bbde0 0 1 600bbd00
Called from 1007be14, fp=301f9690, args=0 301f9710 301f970c 60129f80 0 600bbd00
Called from 1007bae8, fp=301f9810, args=301f9920 0 1 0 301f99ac 61625ec0
Called from 1007b974, fp=301f9878, args=301f9920 0 1 0 301f99ac 1
Called from 100dd258, fp=301f9930, args=0 0 1 0 301f99ac 3b8e4
Called from 10083bc0, fp=301f9a10, args=3b8e4 0 3 1 301f9a74 0
Called from 1002f8e4, fp=301f9a80, args=3b8e4 3 1 4d440000 4d44 ff00
Called from 3260c, fp=effffbe8, args=3b8e4 2 1 4d440000 4d44 ff00
End traceback...
panic[cpu1]/thread=0x6184ba40: trap
syncing file systems... done
rebooting...
Resetting ...

First of all I checked .post for hardware errors but it said all was OK.

I thought it may have been a CPU or memory problem so I tried disabling various parts of the subsystem, but get this...

I tried disabling bank0 of memory using asr-disable bank0, followed by boot -r.

As it came up again I received the following message and the box did boot:

cpu0: SUNW,UltraSPARC-II (upaid 0 impl 0x11 ver 0xa0 clock 400 MHz)
SunOS Release 5.6 Version Generic_105181-30 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
WARNING: status 'fail-Disabled by Command' for '/SUNW,UltraSPARC-II@1,0'

But! Upon restarting the machine (via an init 6) the machine reverted back to the first error AND trying the same trick again (asr-diable bank0) didn't work again!

I've tried again on different banks/CPU's and it only works once then reverts back to the original error!

Any ideas ? I'm running out of sub-systems to disable ! Pleeeeeaaasssee!!!!!!

Many thanks,

James



Sponsored Link
Ads by Google

Response Number 1
Name: PaulS
Date: June 25, 2002 at 07:27:48 Pacific
Reply:

I think I would be inclined to replace memory if possible.


0

Response Number 2
Name: Boarddude
Date: June 25, 2002 at 21:16:47 Pacific
Reply:

I think you hav serious problems on your cpu!
Call your hardware support guys and let them replace cpu 1, or remove tha cpu first boot
your system and see if its work!
( don't forget the interlock switch when open
ing your cover!)

Goodluck


0

Response Number 3
Name: James
Date: June 26, 2002 at 02:16:32 Pacific
Reply:

Thanks for your posts guys.

I'm almost certain its a CPU fault now. As the box is up I can see its only using CPU 0 via top.

I've opened the box up and switched the memory about and know that all the banks are sound.

What I don't know is how to remove a CPU! Any one know how they are mounted or can point me to any tutorials ?

Thanks again !

James


0

Response Number 4
Name: James
Date: June 26, 2002 at 02:22:40 Pacific
Reply:

Forget that! Used my common sense and tracked it down on the Sun site. Doh!

Many thanks again :)

James


0

Response Number 5
Name: Babu Raj
Date: June 29, 2002 at 00:42:12 Pacific
Reply:

Seems that its a memory problem...try replacing it if possible...


thanx..

Babu


0

Related Posts

See More



Response Number 6
Name: James
Date: July 8, 2002 at 05:01:48 Pacific
Reply:

For anyone else unlucky enough to get this set of errors, it was a faulty CPU :( Taking it out was the simple solution.

Thanks to everyone who replied.

James


0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Solaris Forum Home


Sponsored links

Ads by Google


Results for: HELP! Boot problems on production system

boot problems sparcstation 20 www.computing.net/answers/solaris/boot-problems-sparcstation-20/4174.html

Boot Problem www.computing.net/answers/solaris/boot-problem/3264.html

Solaris 8 for Intel - boot problem www.computing.net/answers/solaris/solaris-8-for-intel-boot-problem/3581.html