Announcement

Collapse
No announcement yet.

RX Vega 64 Hot Spot Temperature

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    RX Vega 64 Hot Spot Temperature

    Hello gentlemen and the lady or two,

    I have two related questions that I hope can be answered without me having to resort to going to Reddit.

    What is the highest hot spot temperature you've seen on your Vega 64? What is the maximum temperature it can get to before your card turns itself off in a max-fan-shutdown fashion?

    GPU-Z seems to be the best application to find this, except it doesn't log real-time and instead dumps the data when you tell it to stop logging. This is not good for seeing what temps are before a crash.

    Under load, I am seeing regular hits of 106C and peaks of 108C while the GPU core temp stays at a comfortable 80C. Previously to learning about hotspots today, I was leaning towards the mas-fan-shutdown being a driver glitch because it will happen even when the GPU is reporting as low as 74C. I now think it may be heat related to this dastardly array of hot spots that my MSI AirBoost OC may not properly cool.

    What little I find on the Vega 64 hot spot is sourceless posts on reddit, sometimes referencing other reddit posts that have no source. The 5700XT apparently can hit 110C and be okay.

    #2
    When I looked at mine when I first got it seemed under 100c at all times. But undervolting it to 1.1v really brought temps and power consumption down a lot. It didnt crash before undervolting but it was slower. About 1580 mhz core under load. Now I get 1700 mhz core solid.

    This is the blower cooler model from saphirre. So if you are crashing Id undervolt even if only a little bit.

    Some have also repasted and or tightened the screws as some heatsinks were apparently not making good contact with the gpu.
    I talked to the tree. Thats why they put me away!..." Peter Sellers, The Goon Show
    Only superficial people cant be superficial... Oscar Wilde

    Piledriver Rig 2016: Gigabyte G1 gaming 990fx. FX 8350 cpu. XFX RX 480 GTR Cats 22.7.1, SoundBlaster ZXR, 2 x 8 gig ddr3 1866 Kingston. 1 x 2tb Firecuda seagate with 8 gig mlc SSHD. Sharp 60" 4k 60 hz tv. Win 10 home.

    Ryzen Rig 2017: Gigabyte X370 K7 F50d bios. Ryzen 5800X3D :). 2 x 8 ddr4 3600 (@3200) Cas 16 Gskill. Sapphire Vega 64 Reference Cooler Cats 22.4.1. 1700 mhz @1.1v. Soundblaster X Ae5, 32" Dell S3220DGF 1440p Freesync Premium Pro monitor, Kingston A2000 1TB NVME. 4 TB HGST NAS HD. Win 11 pro.

    Ignore List: Keystone, Andino... -My Baron, he wishes to inform you that vendetta, as he puts it in the ancient tongue, the art of kanlee is still alive... He does not wish to meet or speak with you...-
    "Either half my colleagues are enormously stupid, or else the science of darwinism is fully compatible with conventional religious beliefs and equally compatible with atheism." -Stephen Jay Gould, Rock of Ages.
    "The Intelligibility of the Universe itself needs explanation. It is not the gaps of understanding of the world that points to God but rather the very comprehensibility of scientific and other forms of understanding that requires an explanation." -Richard Swinburne

    www.realitysandwich.com

    www.plasma-universe.com/pseudoskepticism/

    Comment


      #3
      I'd love to see my readings stay below 100C.

      As for the crashing, a year ago it happened all the time with The Division 2, and then with Red Dead Redemption 2. Issues seemed to have been resolved with drivers in games but now I'm folding and its reared its ugly head again.

      I forgot to mention the above numbers are with a -5% power limit and 930 Mhz HBM down-clock; I suspect at default clock/power settings that my hot spot temps reach 110C, and that's where my crash can happen.

      Comment


        #4
        have you under volted at all or just lowered your power limit
        Main rig: look at system spec tab
        Storage Server: Dual AMD Opteron 6120 CPUs, 64Gigs ECC Ram 50TB usable space across 3 zfs2 pools


        HOURGLASS = most appropriate named ICON/CURSOR in the Windows world :-)

        In a dank corner of ATI central, the carpet covered with corn flakes, the faint sound of clicking can be heard........Click......click, click............as the fate of the graphics world and the future of the human race hangs in the balance.

        I know....I know........Keep my day job :-)- catcather

        Comment


          #5
          Yeah you really need to lower your voltage, not just adjust the power limit. Power limit will cap how much the card draws, but it will still be drawing too much at any given frequency because the voltage is set too high.

          By lowering voltage you lower the power consumption at a given frequency. Usually with Vegas you can cut the voltage back quite a bit, although presumably there are some bad chips where that doesn't work or they wouldn't have shipped at those voltages (I wouldn't fully rule out straight up incompetence though because Vega seems severely overvolted).

          Comment


            #6
            Originally posted by Gandalfthewhite View Post
            have you under volted at all or just lowered your power limit
            Just decreasing the power limit seems to be enough to make it stable, so I haven't really bothered learning how to tweak things for this card.

            I did Auto Undervolt religiously but got tired of re-enabling it after a crash. It didn't seem to help much, but I don't know what the Hot Spot readings were at the time either.

            Comment


              #7
              yeah so typically I cap mine at 1.1v for both core and mem
              Main rig: look at system spec tab
              Storage Server: Dual AMD Opteron 6120 CPUs, 64Gigs ECC Ram 50TB usable space across 3 zfs2 pools


              HOURGLASS = most appropriate named ICON/CURSOR in the Windows world :-)

              In a dank corner of ATI central, the carpet covered with corn flakes, the faint sound of clicking can be heard........Click......click, click............as the fate of the graphics world and the future of the human race hangs in the balance.

              I know....I know........Keep my day job :-)- catcather

              Comment


                #8
                Originally posted by Gandalfthewhite View Post
                yeah so typically I cap mine at 1.1v for both core and mem
                ... and what hot spot temps are being reported?

                Comment


                  #9
                  Mine hover in the 140-150F range. @ 1,800Mhz core, 1.240V
                  Be a pirate.

                  Comment


                    #10
                    @ Mr. Watercooled

                    I had another Max Fan crash this morning using settings that kept the hotspot at 104C or lower. Just limiting the voltage to 1.1V was not enough.

                    After a number of tweaks, this is what I ended up with, and it keeps the hotspot below 100C.



                    Lowering the speed to this increases my GPU Time Per Frame (folding) by about 5%. Hopefully I'm done with crashes.

                    Comment


                      #11
                      Originally posted by Crawdaddy79 View Post
                      After a number of tweaks, this is what I ended up with, and it keeps the hotspot below 100C.
                      This is no longer true. I see the above numbers when folding a Pr 16435 work unit - which is the one giving me the most trouble. Hotspot temps consistently reach 104C with other folding Work Units while the system stays stable. GPU-Z reports a max voltage of 1.0875, but the large majority of the time it's below 1.07.

                      Comment


                        #12
                        Capturing a fold of a 11748 Work Unit today at default settings - it shows 110C max (averages about 104C).



                        ^ Stable.

                        Over Sunday night, it was folding a 16435 WU with downclocked/undervolted settings and it got a Max Fan crash. Capture of the logfile (changed header titles to tighten it up for posting; sorted largest to smallest at hotspot column):



                        ^ Not stable.

                        So this "max fan" crash that I've been chasing appears to be not heat related at all, but a bug. Notice in the GPUz capture, it shows my SOC VRM temp reaching over 3000C Afterburner captures the same thing, but I blamed the program thinking it was bugged. I don't know of a program other than Afterburner that saves stats "real time", but only dumps what's in memory when you tell it to. Right now I don't have a way of knowing what's going on at the time of the crash, but may reinstall Afterburner if this bothers me more.

                        The 3000C+ reading is not limited to SOC VRM; I've also seen it in GPU VRM, Mem VRM, Power Draw (yes, it read over 3000W), and Memory Temp.

                        I think somehow my fan RPM reading is bleeding over to other values. I'm wondering if this is a common issue with all Vega 64 or all MSI Airboost OC Vega 64, or just my Vega 64.

                        My new suspicion is that sometimes these blips happen in rapid succession, causing a software panic of SHUT THIS CARD DOWN NOW with the fan immediately going to 100% and staying there indefinitely and I have to hold the power button to shut it off. It doesn't seem to occur when the card is idle (fans at min 233 RPM don't show a 233C max on other temp readings; could be that I just haven't let it idle enough though).

                        Comment


                          #13
                          Is your HBM memory with resin or not? If you have resin it would be easier to get better temps than the 'naked' one. Used to have the naked reference 56. I have to re-seat the cooler 3 times before I get good temps on Hot Spot temps. If you seat it properly it should be about 10C difference from core temps, if not it could vary between 15-20C.

                          https://linustechtips.com/main/topic...made-the-same/
                          Originally posted by jimjobob
                          If 3 fans left the station at 10:30am cooling at a temp of 63c and one fan was derailed by a stray sata cable on the track, how long would it take all 3 fans to get on Seyiji's last nerve?

                          (a) Mmmm donuts
                          (b) 70c
                          (c) Hey there muscly arms, why the long face?
                          (d) Must use fire.

                          Comment


                            #14



                            GPU speed fluctuates as I don't have it locked but thats what I got.
                            Be a pirate.

                            Comment


                              #15
                              Originally posted by Apocalypsee View Post
                              Is your HBM memory with resin or not? If you have resin it would be easier to get better temps than the 'naked' one. Used to have the naked reference 56. I have to re-seat the cooler 3 times before I get good temps on Hot Spot temps. If you seat it properly it should be about 10C difference from core temps, if not it could vary between 15-20C.

                              https://linustechtips.com/main/topic...made-the-same/
                              I learned about that in the last couple of weeks and I honestly have no idea which version I have, and I don't have the thermal grease or expertise to take my cooler off and put it back on.

                              Originally posted by Flyordie View Post



                              GPU speed fluctuates as I don't have it locked but thats what I got.
                              Can you set GPUz to hold the maximum (high) values, and let it run for 45 minutes or so? I'm really curious if your SOC VRM temp (or some other reading) will glitch out like mine does. 99.999% of the time mine is at 67C - but 0.001% of the time it reads over 3000C and it's easy to miss unless you set the software to hold max values.

                              Comment


                                #16
                                Originally posted by Crawdaddy79 View Post
                                I learned about that in the last couple of weeks and I honestly have no idea which version I have, and I don't have the thermal grease or expertise to take my cooler off and put it back on.



                                Can you set GPUz to hold the maximum (high) values, and let it run for 45 minutes or so? I'm really curious if your SOC VRM temp (or some other reading) will glitch out like mine does. 99.999% of the time mine is at 67C - but 0.001% of the time it reads over 3000C and it's easy to miss unless you set the software to hold max values.
                                The "naked" ones are the ones with hynix memory, the ones with Samsung memory are not naked in most cases. GPU-Z can tell you, so you can find out without taking it apart. Almost all AIB Vega 56 use hynix memory, and Vega 64 it's 50/50 it seems. I can't give you much advice on your hot spot temperatures because When I had my Vega 64 reference, I converted it to water cooling right after I bought it, and my old man memory can't remember the two weeks I ran it on air 2 years ago. It is now sitting in my closet collecting dust because I don't have another liquid cooled rig to put it in.

                                However, you can find just about any answer here (it will just take time to read thru all the posts):

                                https://www.overclock.net/forum/67-a...rs-thread.html
                                Last edited by NWR_Midnight; May 20, 2020, 04:03 PM.
                                I speak my mind! if you can't handle that, you might want to leave, because **** is going to get real!!

                                ~I had the right to remain silent, I just didn't have the ability. ~ Ron White
                                ~You can't fix Stupid! ~ Ron White
                                ~There's not a pill you can take; there's not a class you can go to. - ~Stupid is forever. ~ Ron White
                                ~Life is a hard teacher, it gives you the test before it teaches you the lesson.
                                ~It's never to late to have a good childhood! The older you are, the better the toys! ~ My Dad
                                ~Live everyday as though it is your last, it can all end at any moment!

                                Comment


                                  #17
                                  Mine has Samsung HBM - didn't expect to learn that my GPU is likely one with the resin.

                                  I read through the reddit Vega Underclocking/undervolting megathread and learned a bit in that. What I've found from playing with my settings is that underclocking/undervolting does not make this a more stable folding machine. I did learn that most people up their power limit to +50% while undervolting (makes sense if the board is requiring higher amperage because you're limiting the voltage) and I thought that was the solution that I needed. I never would have tried it on my own, but as it turns out, it fixes nothing.

                                  There is no pattern that associates my crashes with higher/lower temperatures, just with 16435 Work Units with folding, and today I had issues with another project number - my PC crashed three times in two hours. The only thing it had in common with 16435 was the checkpointing frequency was high. Usually they're 2.5% or higher, but this was 0.25% - and 16435 is 0.20%, which means it's "taking a breath" every 30 seconds as opposed to every two minutes with other projects (this also means that temps are generally lower for these projects I'm having the most trouble with). My next move is to uninstall FAH from my SSD, then install it on a platter drive and see if that does anything.

                                  If I get time during work hours to read that thread, I will.
                                  Last edited by Crawdaddy79; May 20, 2020, 04:28 PM.

                                  Comment


                                    #18
                                    Originally posted by Crawdaddy79 View Post
                                    Mine has Samsung HBM - didn't expect to learn that my GPU is likely one with the resin.

                                    I read through the reddit Vega Underclocking/undervolting megathread and learned a bit in that. What I've found from playing with my settings is that underclocking/undervolting does not make this a more stable folding machine. I did learn that most people up their power limit to +50% while undervolting (makes sense if the board is requiring higher amperage because you're limiting the voltage) and I thought that was the solution that I needed. I never would have tried it on my own, but as it turns out, it fixes nothing.

                                    There is no pattern that associates my crashes with higher/lower temperatures, just with 16435 Work Units with folding, and today I had issues with another project number - my PC crashed three times in two hours. The only thing it had in common with 16435 was the checkpointing frequency was high. Usually they're 2.5% or higher, but this was 0.25% - and 16435 is 0.20%, which means it's "taking a breath" every 30 seconds as opposed to every two minutes with other projects (this also means that temps are generally lower for these projects I'm having the most trouble with). My next move is to uninstall FAH from my SSD, then install it on a platter drive and see if that does anything.

                                    If I get time during work hours to read that thread, I will.

                                    It could be a MB bios issue. There where some issues when the vega 64 first came out with black screens, crashes that where related to the MB bios. You may want to check into that for your MB. Also, all thought you have the minimum recommended power supply (750 watt) for a Vega 64, it could be the issue.
                                    Last edited by NWR_Midnight; May 20, 2020, 04:59 PM.
                                    I speak my mind! if you can't handle that, you might want to leave, because **** is going to get real!!

                                    ~I had the right to remain silent, I just didn't have the ability. ~ Ron White
                                    ~You can't fix Stupid! ~ Ron White
                                    ~There's not a pill you can take; there's not a class you can go to. - ~Stupid is forever. ~ Ron White
                                    ~Life is a hard teacher, it gives you the test before it teaches you the lesson.
                                    ~It's never to late to have a good childhood! The older you are, the better the toys! ~ My Dad
                                    ~Live everyday as though it is your last, it can all end at any moment!

                                    Comment


                                      #19
                                      Originally posted by NWR_Midnight View Post
                                      It could be a MB bios issue. There where some issues when the vega 64 first came out with black screens, crashes that where related to the MB bios. You may want to check into that for your MB. Also, all thought you have the minimum recommended power supply (750 watt) for a Vega 64, it could be the issue.
                                      BIOS is updated, and my UPS reads a 515W draw at max - this includes my monitor, router, speakers, etc.

                                      Comment


                                        #20
                                        Originally posted by Crawdaddy79 View Post
                                        I learned about that in the last couple of weeks and I honestly have no idea which version I have, and I don't have the thermal grease or expertise to take my cooler off and put it back on.



                                        Can you set GPUz to hold the maximum (high) values, and let it run for 45 minutes or so? I'm really curious if your SOC VRM temp (or some other reading) will glitch out like mine does. 99.999% of the time mine is at 67C - but 0.001% of the time it reads over 3000C and it's easy to miss unless you set the software to hold max values.

                                        Mine will glitch but it takes a day or so of running. Its usually on the water temp reading though. Not the core/hotspot readings.
                                        Be a pirate.

                                        Comment


                                          #21
                                          Heres an example of a small washer mod that got junction temp a long way down:

                                          https://www.reddit.com/r/Amd/comment...the_owners_of/

                                          Its basically the same as the tighten the screws on the gpu hs a bit mod...

                                          Hi everyone, I'm writing this post for the few owners of this card: the mounting system for the heatsink is trash.

                                          TLDR:

                                          Before :73C edge 103 Junction

                                          3 cents and 5 minutes after: 73C edge and 83 Junction



                                          I was playing with a few stress test and i noticed that i always had a huge difference between edge temperature and juction temperature. Edge was at around 75C and junction was at 105C, 30C of delta seemed way too much and it severly limited my overclock, I even reapplied the thermal paste two times using noctua nt-h1 but nothing changed, then user u/f-ben (Thanks again!) suggested to not use the blob method, so I spread the paste on the die to ensure even coverage. This helped a little with temps, now at 73C and 103C, but the delta was still at 30C

                                          The system that makes the heatsink apply pressure on the die consists of four spring loaded screws, but the thing is that the screw threads bottom out way before the springs can apply the proper mounting pressure. I started to think if maybe the washer mod used for the reference design could work on this card. I bought for 70 cents a pack of 100 3mm washers, about 0.7mm thick. I put one in each of the four recesses on the backplate and tightened the screws with the usual cross pattern, whitout even reapplying the thermal paste. The results are way better than i expected (I didn't even think this could do anything significant, i though i just had bad luck with the silicon lottery) .

                                          Edge temperature has not changed and sits at 73C but the junction is now a massive 20C lower at 83C, of course every test was done with the exact same settings and fixed fan speed.
                                          I talked to the tree. Thats why they put me away!..." Peter Sellers, The Goon Show
                                          Only superficial people cant be superficial... Oscar Wilde

                                          Piledriver Rig 2016: Gigabyte G1 gaming 990fx. FX 8350 cpu. XFX RX 480 GTR Cats 22.7.1, SoundBlaster ZXR, 2 x 8 gig ddr3 1866 Kingston. 1 x 2tb Firecuda seagate with 8 gig mlc SSHD. Sharp 60" 4k 60 hz tv. Win 10 home.

                                          Ryzen Rig 2017: Gigabyte X370 K7 F50d bios. Ryzen 5800X3D :). 2 x 8 ddr4 3600 (@3200) Cas 16 Gskill. Sapphire Vega 64 Reference Cooler Cats 22.4.1. 1700 mhz @1.1v. Soundblaster X Ae5, 32" Dell S3220DGF 1440p Freesync Premium Pro monitor, Kingston A2000 1TB NVME. 4 TB HGST NAS HD. Win 11 pro.

                                          Ignore List: Keystone, Andino... -My Baron, he wishes to inform you that vendetta, as he puts it in the ancient tongue, the art of kanlee is still alive... He does not wish to meet or speak with you...-
                                          "Either half my colleagues are enormously stupid, or else the science of darwinism is fully compatible with conventional religious beliefs and equally compatible with atheism." -Stephen Jay Gould, Rock of Ages.
                                          "The Intelligibility of the Universe itself needs explanation. It is not the gaps of understanding of the world that points to God but rather the very comprehensibility of scientific and other forms of understanding that requires an explanation." -Richard Swinburne

                                          www.realitysandwich.com

                                          www.plasma-universe.com/pseudoskepticism/

                                          Comment


                                            #22
                                            Unfortunately I don't have the thermal grease or expertise to take my card apart and put it back together. It is interesting that a design flaw like that could exist, and I wouldn't put it past my card to have something similar to that 5700XT.

                                            The pattern behind my crashes don't follow times of high heat. In fact, stability seems to increase when I decrease air flow and increase the temps. I haven't had a crash during a game in a very long time - probably since last year. It's only when folding.

                                            @ Flyordie: Thanks for looking at it. Mine glitches 1 - 20 times per ten minute interval. I also found that it glitches when not under load as well, but haven't had a max fan crash while idling.

                                            I haven't put the folding work on my platter drive yet. It's been pretty stable today. Not a single 16435 WU but did get a BSOD on a 13851 (haven't seen that before). It recovered and finished.

                                            Comment


                                              #23
                                              I got onto a chat with MSI support. I told them my issue.

                                              Video card crashes with fans kicking instantly to 100%, seemingly regardless of temperature but always when under load.

                                              Their response?

                                              You need to RMA it. No "Have you tried X, Y, Z?" No "Is it caked with dust?" No "What type of PSU do you have?" No questions at all. Here's the RMA link. Your card is bad.

                                              My worry now is that if I RMA it, they not be able to replicate my issue and send it back to me as is, unnecessarily costing me shipping $$$ plus time without playing s00per 1nt3ns3 gamez. But if I can get a new card out of it that doesn't crash HOW WONDERFUL that would be awesome for my sanity.

                                              Comment


                                                #24
                                                Originally posted by Crawdaddy79 View Post
                                                I got onto a chat with MSI support. I told them my issue.

                                                Video card crashes with fans kicking instantly to 100%, seemingly regardless of temperature but always when under load.

                                                Their response?

                                                You need to RMA it. No "Have you tried X, Y, Z?" No "Is it caked with dust?" No "What type of PSU do you have?" No questions at all. Here's the RMA link. Your card is bad.

                                                My worry now is that if I RMA it, they not be able to replicate my issue and send it back to me as is, unnecessarily costing me shipping $$$ plus time without playing s00per 1nt3ns3 gamez. But if I can get a new card out of it that doesn't crash HOW WONDERFUL that would be awesome for my sanity.
                                                Honestly, I feel like that's the way support has been working lately. No real attempt to troubleshoot. Just push you towards RMA.

                                                I always spend like an hour writing up a very descriptive message about the issue I'm having and try to ask questions about what could be the problem and how I could fix it. Then two days later, they have a basic message about trying to RMA it, and don't even bother to address anything I've said.

                                                And the RMA process is frustrating because you're usually out of the shipping cost. The problem isn't your fault. It's like a $10-$20 extra fee on top of your purchase. And, at least for me, I usually look for a deal and specifically bought the product at a price I wanted. Not $20 more.

                                                Comment


                                                  #25
                                                  I was surprised as well. I think because people have to pay for shipping, most won't even bother RMAing a sub-$200 card. Saves the company money in repair technicians, back-to-customer shipping costs (though large businesses get like 90% discounts - when I worked for L3 I used it all the time), and time wasted with the help desk by just telling the customer to RMA. It cost me $30 to ship this one for 5 day ground.

                                                  They say a 15 - 35 day turnaround for video cards, but I see threads in Reddit that show 1 - 3 day turnaround is quite common. Some people report free upgrades because the parts are not in stock to repair the current card.

                                                  Looking forward to my 5700XT Evoke next week.

                                                  Comment


                                                    #26
                                                    I hope you guys don't mind me turning this thread in this mostly dead section into a "blog" of sorts on my experience with RMAing with MSI.

                                                    I requested an RMA - one of the options as an issue was "BSOD". While I had other issues with the card, I often did get BSOD so I picked that option. Submitted the RMA request and waited my 48 hours for feedback. Got nothing.

                                                    Waited an additional 24 hours, and still nothing.

                                                    Submitted a 2nd RMA request but picked "Other" and described my GPU crashing while kicking the fans on to 100%. Within 10 minutes, got an RMA number - it was enough 'automation' that I thought I must have done something wrong on my first try.

                                                    I removed the card and put my 8yr old 7970 in its place. Used it for a day to make sure it was good. Sent the Vega 64 out 32 hours after getting the RMA number.

                                                    Eight days from the initial request, I get an approved RMA number for it. I ignore the new RMA#, and will continue to do so.

                                                    MSI says they have received it.



                                                    MSI says they are working on it.



                                                    Please fix this thing guys.

                                                    Comment


                                                      #27


                                                      Seems like good news, except no tracking information:


                                                      (RMA number and ship time edited because it seems like the right thing to do in this day of bots)

                                                      Also no feedback. No emails. Nothing stating they found something or they did not find something. So far, only about 50% as bad as the experience could possibly be. If they actually fixed the card, everything is roses. If the card is not fixed, I am boycotting MSI.

                                                      Late edit: have tracking info now. Should arrive Tues.
                                                      Last edited by Crawdaddy79; Jul 23, 2020, 04:16 AM.

                                                      Comment


                                                        #28
                                                        7970 basically folded the entire time the Vega was gone. Not a single crash.

                                                        Vega arrived this morning.

                                                        Packed neatly in a foam cut out box. They then put that box in a larger box with no packaging material so it could continuously slide around on its voyage across the country.

                                                        Did all necessary steps to uninstall drivers, put Vega back in, reinstalled drivers. Ran 3DMark. Not as high scoring as previous runs but new drivers whatever.

                                                        Left all settings at default.

                                                        Started folding. Two "fan at 100%" crashes in 5 hours.

                                                        **** this card. **** MSI.

                                                        Very late edit: I edited two days ago to take the above statement back. It was premature. New PSU did not fix my issue.
                                                        Last edited by Crawdaddy79; Aug 9, 2020, 01:17 PM.

                                                        Comment


                                                          #29
                                                          Originally posted by Crawdaddy79 View Post
                                                          7970 basically folded the entire time the Vega was gone. Not a single crash.

                                                          Vega arrived this morning.

                                                          Packed neatly in a foam cut out box. They then put that box in a larger box with no packaging material so it could continuously slide around on its voyage across the country.

                                                          Did all necessary steps to uninstall drivers, put Vega back in, reinstalled drivers. Ran 3DMark. Not as high scoring as previous runs but new drivers whatever.

                                                          Left all settings at default.

                                                          Started folding. Two "fan at 100%" crashes in 5 hours.

                                                          **** this card. **** MSI.
                                                          What a sham(e)



                                                          Fargin MSI indeed >:E
                                                          ,____,
                                                          [^_^]
                                                          /)___)

                                                          -"---"-
                                                          Rage3D PC Gaming Hit-List
                                                          Official PC Gaming Deals Thread
                                                          Has the above thread been misplaced/renamed/merged/stickied/locked? Well then there's a doins transpirin! Find the tome and bring forth the sacrifice to restore peace and order.
                                                          "VIAGRA FALLS, slowly I turned, and step by step, inch by inch, I walked up to him, I smashed him, I hit him, I bonked him, I bopped him, I socked him and I mashed his face and I knocked him down."

                                                          Comment


                                                            #30
                                                            Originally posted by Seyiji View Post
                                                            What a sham(e)



                                                            Fargin MSI indeed >:E
                                                            This is the 2nd GPU I've had with their brand on it and the 2nd GPU that has been jacked up in my lifetime of owning GPUs. Previous was MSI X800 Pro with artifacting straight out of the box.

                                                            I have an MSI laptop. It works great. I have had three MSI motherboards. They are great as well. No more MSI GPUs for me.

                                                            Comment


                                                              #31
                                                              Originally posted by Crawdaddy79 View Post
                                                              This is the 2nd GPU I've had with their brand on it and the 2nd GPU that has been jacked up in my lifetime of owning GPUs. Previous was MSI X800 Pro with artifacting straight out of the box.

                                                              I have an MSI laptop. It works great. I have had three MSI motherboards. They are great as well. No more MSI GPUs for me.
                                                              This really sucks. Do you have any other recourse to get the card fixed?

                                                              Comment


                                                                #32
                                                                Originally posted by luxor View Post
                                                                This really sucks. Do you have any other recourse to get the card fixed?
                                                                I've read a few threads/reddit posts that say after you've RMA'd the same card three times they replace it, or you get a credit or refund for the purchase price. For me that would be $280 because I redeemed the three games from the promotion at the time, effectively subtracting $150 from my $430 purchase price. Considering 3 shipping purchases at $30, I come out with $190 and ~three months of using my 7970 instead (which is on its last leg).

                                                                Other than that, my only recourse is to get screwed.

                                                                Yesterday I clicked the "Request Service" button. In it I explained the card returned from RMA and had the same issue. Explicitly said "I am hoping for another solution other than RMAing the card again.". Their response? An RMA ticket. Pretty sure I'm being trolled.

                                                                Comment


                                                                  #33
                                                                  How much power is it drawing/how hot is it getting when folding? It sucks that there seems to be something wrong with the card, but maybe it's worth trying to get to the bottom of what the problem is.

                                                                  You could install MSI Afterburner and trying bumping the core and memory clock down a little, and/or reduce the power limit (do them one at a time), and see if you can figure out what the issue is. That may help get it resolved, or at least provide a work around.

                                                                  Anyway, this sort of thing really sucks. I had a Gigabyte X370 motherboard that suffered from the "soft brick" bug (basically sometimes when powered off the board just wouldn't boot as if it were dead, but pulling the battery fixed it). It was so bad that it was happening multiple times every week, so I tried to RMA it but it wasn't fixed. I eventually just gave the board away to a friend as there wasn't any way to get it fixed properly, and I couldn't sell it to anyone in good conscience with that problem. You definitely have my sympathies dealing with a troublesome RMA situation.

                                                                  Comment


                                                                    #34
                                                                    Originally posted by Nagorak View Post
                                                                    How much power is it drawing/how hot is it getting when folding? It sucks that there seems to be something wrong with the card, but maybe it's worth trying to get to the bottom of what the problem is.

                                                                    You could install MSI Afterburner and trying bumping the core and memory clock down a little, and/or reduce the power limit (do them one at a time), and see if you can figure out what the issue is. That may help get it resolved, or at least provide a work around.

                                                                    Anyway, this sort of thing really sucks. I had a Gigabyte X370 motherboard that suffered from the "soft brick" bug (basically sometimes when powered off the board just wouldn't boot as if it were dead, but pulling the battery fixed it). It was so bad that it was happening multiple times every week, so I tried to RMA it but it wasn't fixed. I eventually just gave the board away to a friend as there wasn't any way to get it fixed properly, and I couldn't sell it to anyone in good conscience with that problem. You definitely have my sympathies dealing with a troublesome RMA situation.
                                                                    I've changed numerous settings, and different WUs cause different power load. For instance, the one that I'm crunching now on Power Saver preset was resumed from a crash overnight. It's drawing a meager 115W at peak.

                                                                    I've fiddled with core voltages, clock settings, and aggressive fan ramp-up. None of it seems to matter.

                                                                    The one time (before RMA) that it ran for days without a crash was when I used the Turbo Preset (Vega has Power Saver, Balanced, and Turbo Presets in addition to the Auto/Manual tabs) (screenshot), at 82C average GPU temp and average 240W power draw. But then randomly went back to crashing every few hours.

                                                                    I ran/folded with my HD 7970 at (guestimate) 180W continuously for over a month without a single crash. It was awesome in that respect.

                                                                    I'm tempted to pick up a $280 5600XT at a ~15% performance loss and be done with it.

                                                                    Comment


                                                                      #35
                                                                      Likewise, sorry to hear about your motherboard situation. So far that's a score of 0/2 for anecdotal computer component RMA experiences. My previous motherboard was a Gigabyte UD3R X58. Memory controller failed, but it was 13 years old.

                                                                      I don't know if I could stomach RMAing a motherboard on my primary PC.

                                                                      Comment


                                                                        #36
                                                                        The Vega 64 outside of the stock settings, and specially when under volting/manual overclocking/under clocking has/had issue with clock spikes that would hard lock the machine because of how it controls clock speeds based on temperature/voltage/heat and head room in combination to all those. Since mine was reference aircooled, converted to water, when I overclocked it, I had issues constantly with this, but only in a couple particular games. It took me a lot of time to figure out what was causing it and find setting that wouldn't spike. Example, I could not set my max core clock speed higher than 1680 otherwise it would spike to 1800+ and hard lock (AMD liquid cooled come set default at 1750, air cooled 1560). Since my Vega was not stable over 1720, any spikes above that would cause issues, and undervaluing made it worse. The reason is how AMD's clock speed algorithm works in conjunction with the temperature/voltage/heat as I mentioned above.

                                                                        Also, I know you where originally concerned about the hot spot temp.. IIRC, the max hot spot temp for Vega 64 air cooled is 115 C before it throttles due to the Hot spot temperature. There is this little voice in the back of my head that keeps trying to tell me it's 130C, but I like to ignore that voice..lol (I could very well be 130C, I don't fully remember)



                                                                        Anyhow, it could very well be your clock speeds spiking without you realizing it.


                                                                        One game that seemed to draw this issue out believe it or not was 7 days 2 die. I could play nearly every game for hours (10+ hours battlefield 1/ Battlefield 5)and never have an issuees But with 7 days 2 die, due to the gpu clock spike (causing by my fiddling with core clocks, voltages, undervolting, etc) it would hard lock either 5 minutes into the game or 2 hours into the game. Many times it was when I paused the game to grab something to eat or drink.. and I would come back to a locked up machine.. I had to start logging gpu clock speeds with MSI afterburning, to notice the clock spikes, and then do a lot of reading to try and understand why.. and I still really don't fully understand how AMD's Algorithm works, hence why I won't even attempt to try to explain it to you. Power limits play roll in this as well.
                                                                        Last edited by NWR_Midnight; Aug 1, 2020, 06:25 AM.
                                                                        I speak my mind! if you can't handle that, you might want to leave, because **** is going to get real!!

                                                                        ~I had the right to remain silent, I just didn't have the ability. ~ Ron White
                                                                        ~You can't fix Stupid! ~ Ron White
                                                                        ~There's not a pill you can take; there's not a class you can go to. - ~Stupid is forever. ~ Ron White
                                                                        ~Life is a hard teacher, it gives you the test before it teaches you the lesson.
                                                                        ~It's never to late to have a good childhood! The older you are, the better the toys! ~ My Dad
                                                                        ~Live everyday as though it is your last, it can all end at any moment!

                                                                        Comment


                                                                          #37
                                                                          This card is supposed to down throttle the clock when it gets to 75C, and for the most part, it's successful. Right now it's at 1530 Mhz and drawing 170W (at 75C).

                                                                          Here's an old log capture from GPUz when playing a $2 shooter on Steam.
                                                                          Code:
                                                                                  Date        	 GPU Clock [MHz] 	 Memory Clock [MHz] 	 GPU Temperature [°C] 	 GPU Temperature (Hot Spot) [°C] 
                                                                          5/15/2020 23:40	1613	920	76	93
                                                                          5/15/2020 23:40	1598	919	76	93
                                                                          5/15/2020 23:40	1548	924	76	91
                                                                          5/15/2020 23:40	1526	917	76	93
                                                                          5/15/2020 23:40	1502	920	76	93
                                                                          5/15/2020 23:40	1599	919	77	93
                                                                          5/15/2020 23:40	1599	919	77	93
                                                                          5/15/2020 23:40	1557	833	76	90
                                                                          5/15/2020 23:40	1557	833	76	85
                                                                          5/15/2020 23:40	1609	741	76	90
                                                                          5/15/2020 23:40	1609	741	76	90
                                                                          5/15/2020 23:40	1516	819	76	88
                                                                          5/15/2020 23:40	1516	819	76	90
                                                                          5/15/2020 23:40	1613	882	76	87
                                                                          5/15/2020 23:40	1412	870	76	97
                                                                          5/15/2020 23:40	1459	855	77	99
                                                                          5/15/2020 23:40	1431	871	78	101
                                                                          5/15/2020 23:40	1431	871	78	101
                                                                          5/15/2020 23:40	1514	842	78	102
                                                                          5/15/2020 23:40	26	836	78	96
                                                                          5/15/2020 23:40	26	924	78	96
                                                                          5/15/2020 23:40	1563	926	77	95
                                                                          5/15/2020 23:40	1596	919	78	94
                                                                          5/15/2020 23:40	1545	908	78	95
                                                                          5/15/2020 23:40	1617	910	78	95
                                                                          5/15/2020 23:40	1582	923	78	94
                                                                          5/15/2020 23:40	1617	921	78	94
                                                                          5/15/2020 23:40	1595	903	77	95
                                                                          5/15/2020 23:41	1600	915	77	94
                                                                          5/15/2020 23:41	1561	920	78	95
                                                                          5/15/2020 23:41	1614	923	77	100
                                                                          5/15/2020 23:41	1407	845	78	102
                                                                          5/15/2020 23:41	1539	845	78	102
                                                                          5/15/2020 23:41	1475	844	78	102
                                                                          5/15/2020 23:41	1418	811	79	102
                                                                          5/15/2020 23:41	1469	825	80	103
                                                                          5/15/2020 23:41	1449	827	80	104
                                                                          5/15/2020 23:41	1437	827	80	104
                                                                          5/15/2020 23:41	1571	824	80	104
                                                                          5/15/2020 23:41	1462	827	80	104
                                                                          5/15/2020 23:41	1456	825	81	104
                                                                          5/15/2020 23:41	1446	814	81	104
                                                                          5/15/2020 23:41	1427	827	81	104
                                                                          5/15/2020 23:41	1429	840	82	104
                                                                          5/15/2020 23:41	1431	831	82	104
                                                                          5/15/2020 23:41	1568	839	81	105
                                                                          5/15/2020 23:41	1445	832	82	105
                                                                          5/15/2020 23:41	1441	841	81	104
                                                                          5/15/2020 23:41	1432	840	81	104
                                                                          5/15/2020 23:41	1465	804	82	103
                                                                          5/15/2020 23:41	1459	801	81	104
                                                                          5/15/2020 23:41	1431	800	81	104
                                                                          5/15/2020 23:41	1442	800	82	104
                                                                          5/15/2020 23:41	1388	800	82	105
                                                                          5/15/2020 23:41	1400	819	82	104
                                                                          5/15/2020 23:41	1428	800	82	104
                                                                          5/15/2020 23:41	1417	800	82	104
                                                                          5/15/2020 23:41	1434	800	82	104
                                                                          5/15/2020 23:41	1472	801	82	106
                                                                          5/15/2020 23:41	1517	857	82	104
                                                                          5/15/2020 23:41	1518	844	82	104
                                                                          5/15/2020 23:41	1454	800	81	105
                                                                          5/15/2020 23:41	1584	918	83	104
                                                                          5/15/2020 23:41	1526	895	82	105
                                                                          5/15/2020 23:41	1378	882	82	104
                                                                          5/15/2020 23:41	1398	807	81	105
                                                                          5/15/2020 23:41	1404	800	81	105
                                                                          5/15/2020 23:41	1433	804	82	104
                                                                          5/15/2020 23:41	1498	830	82	104
                                                                          5/15/2020 23:41	1444	834	82	106
                                                                          5/15/2020 23:41	1526	837	82	104
                                                                          5/15/2020 23:41	1522	844	82	105
                                                                          5/15/2020 23:41	1443	811	82	106
                                                                          5/15/2020 23:41	1416	830	82	104
                                                                          5/15/2020 23:41	1463	800	82	105
                                                                          5/15/2020 23:41	1427	815	82	104
                                                                          5/15/2020 23:41	1430	802	81	103
                                                                          5/15/2020 23:41	1555	805	82	107
                                                                          5/15/2020 23:41	1417	846	82	104
                                                                          5/15/2020 23:41	1443	800	81	104
                                                                          5/15/2020 23:41	1439	800	82	105
                                                                          5/15/2020 23:41	1438	804	82	104
                                                                          5/15/2020 23:41	1465	802	83	104
                                                                          5/15/2020 23:41	1444	800	82	104
                                                                          5/15/2020 23:41	1392	800	82	105
                                                                          5/15/2020 23:41	1465	812	82	107
                                                                          5/15/2020 23:41	1453	839	82	105
                                                                          5/15/2020 23:41	1439	848	82	104
                                                                          5/15/2020 23:42	1439	800	82	105
                                                                          5/15/2020 23:42	1420	809	82	104
                                                                          5/15/2020 23:42	1430	801	82	104
                                                                          5/15/2020 23:42	1442	799	82	104
                                                                          5/15/2020 23:42	1431	800	82	104
                                                                          5/15/2020 23:42	1419	800	82	106
                                                                          5/15/2020 23:42	1431	840	82	104
                                                                          5/15/2020 23:42	1438	800	82	105
                                                                          5/15/2020 23:42	1540	828	82	105
                                                                          5/15/2020 23:42	1395	809	83	104
                                                                          5/15/2020 23:42	1451	801	82	106
                                                                          5/15/2020 23:42	1451	853	82	104
                                                                          5/15/2020 23:42	1463	800	81	103
                                                                          5/15/2020 23:42	1464	856	82	104
                                                                          5/15/2020 23:42	1460	801	82	104
                                                                          5/15/2020 23:42	1442	800	81	105
                                                                          5/15/2020 23:42	1398	825	82	104
                                                                          5/15/2020 23:42	1444	800	82	104
                                                                          5/15/2020 23:42	1436	800	81	105
                                                                          5/15/2020 23:42	1423	809	82	104
                                                                          5/15/2020 23:42	1442	800	81	106
                                                                          5/15/2020 23:42	1408	836	82	104
                                                                          5/15/2020 23:42	1397	802	82	104
                                                                          5/15/2020 23:42	1441	800	82	104
                                                                          5/15/2020 23:42	1449	800	82	104
                                                                          5/15/2020 23:42	1441	800	81	104
                                                                          5/15/2020 23:42	1440	825	81	106
                                                                          5/15/2020 23:42	1423	800	82	105
                                                                          5/15/2020 23:42	1483	817	82	105
                                                                          5/15/2020 23:42	1425	800	81	105
                                                                          5/15/2020 23:42	1506	800	83	104
                                                                          5/15/2020 23:42	1444	800	82	104
                                                                          5/15/2020 23:42	1450	800	82	105
                                                                          5/15/2020 23:42	1422	845	82	105
                                                                          5/15/2020 23:42	1438	803	82	102
                                                                          5/15/2020 23:42	26	721	81	98
                                                                          5/15/2020 23:42	1548	923	80	96
                                                                          5/15/2020 23:42	1524	909	80	97
                                                                          5/15/2020 23:42	1612	922	80	98
                                                                          5/15/2020 23:42	1595	928	80	96
                                                                          5/15/2020 23:42	1547	911	79	97
                                                                          5/15/2020 23:42	1599	922	79	97
                                                                          As you can see, temps got pretty high and the clock speeds were kept down. To be fair to your point, I did not experience any crashes while playing this.

                                                                          Seems a buggy algorithm causing a clock spike would be a BIOS issue and fixed easily. That said, being part of a 0.2% - 0.5% market share has its downsides.

                                                                          Comment


                                                                            #38
                                                                            So... I found out by pure accident that the strange switch on the side of my card is a BIOS switch. There's very little official information about it... Primary mode is full power, Secondary mode is reduced.

                                                                            In an attempt to solve my crashing issues and save $280, I decided to try switching the switch. PC crashed, I powered it down, took case apart, switched the switch, powered the PC on. Three long beeps and no video. Powered PC back off, switched back the switch, and it boots up fine.

                                                                            Is there anything special I need to do to try out Secondary mode, or is my card more dorked up than I previously thought?

                                                                            Comment


                                                                              #39
                                                                              Originally posted by Crawdaddy79 View Post
                                                                              So... I found out by pure accident that the strange switch on the side of my card is a BIOS switch. There's very little official information about it... Primary mode is full power, Secondary mode is reduced.

                                                                              In an attempt to solve my crashing issues and save $280, I decided to try switching the switch. PC crashed, I powered it down, took case apart, switched the switch, powered the PC on. Three long beeps and no video. Powered PC back off, switched back the switch, and it boots up fine.

                                                                              Is there anything special I need to do to try out Secondary mode, or is my card more dorked up than I previously thought?
                                                                              You done fugged up poking the bios switch with the rig on. You're only supposed to do that with the computer off. Might have corrupted the bios when switching it with the pc on.

                                                                              That's just me adding a bit of common sense to the mix so i may be wrong but my thought process usually checks out on things like this.

                                                                              If it's corrupted there may be a way to flash the bios so it works again on the proper setting.
                                                                              ,____,
                                                                              [^_^]
                                                                              /)___)

                                                                              -"---"-
                                                                              Rage3D PC Gaming Hit-List
                                                                              Official PC Gaming Deals Thread
                                                                              Has the above thread been misplaced/renamed/merged/stickied/locked? Well then there's a doins transpirin! Find the tome and bring forth the sacrifice to restore peace and order.
                                                                              "VIAGRA FALLS, slowly I turned, and step by step, inch by inch, I walked up to him, I smashed him, I hit him, I bonked him, I bopped him, I socked him and I mashed his face and I knocked him down."

                                                                              Comment


                                                                                #40
                                                                                Please tell me you didn't flip the BIOS switch with the PC running.

                                                                                My God what would possess you to think that's a good idea?
                                                                                Originally posted by curio
                                                                                Eat this protein bar, for it is of my body. And drink this creatine shake, for it is my blood.
                                                                                "If you can't handle me when I'm bulking, you don't deserve me when I'm cut." -- Marilyn Monbroe

                                                                                Comment

                                                                                Working...
                                                                                X