Lessons from installing TensorFlow 1.7 for NVIDIA GPU on a Samsung Odyssey running Ubuntu 17.10

I’ve never been so jubilant to see custard apple (score = 0.00147) in my terminal window. It meant that I had finally classified an image using TensorFlow on my brand new GPU. Despite my confidence as I sat down with the visually appealing official guide, I found the process to be time consuming and frustrating. Based on the number and diversity of issues I saw others having as I Googled (actually DDGed) around, it looks like I’m not alone. As the beneficiary of their hard won experience, I wanted to contribute some of the things that I learned in the process.

I’m going to experiment a bit with the structure, alternating between abstract and specific thoughts. The value of specific thoughts is intuitive, but worth illuminating: None of this article has any value if it doesn’t help you, the reader, do something differently. Not “change your viewpoint” or “deepen your understanding”, but literally tap a different sequence of keys on your keyboard than you would have otherwise. Directly saying “Type this, not that” is the shortest path to this goal, and shorter paths are less likely to be waylaid.

Unfortunately, as Barney the Purple dinosaur tried to warn us, we’re all unique in our own way You're special!. This is mostly a good thing, but it can make it difficult to share advice. If nothing else, simply copying my .bash_history would start to fail as soon as you got to paths starting with `/home/mritter/`. You’re smart enough to trivially take that, abstract it up to “He means his home directory” and granularize it back to `/home/jsmith/` or whatever. You’re smart, you could do this, but there’s no reason I should make all of my readers perform that same first step, particularly for less obvious situations.

Specific: Ensure the right graphics driver is being used by blacklisting the default

Even after going through the installation steps, my Samsung Odyssey laptop wasn’t recognizing the existance of my GPU. The final step to fixing this was editing my /etc/modprobe.d/blacklist-nouveau.conf file to contain:

blacklist nouveau
options nouveau modeset=0

then running sudo update-initramfs -u

and restarting. I could then confirm the recognition of the GPU with `lshw -C video`. I tried other things beforehand (the probably-relevant parts of which will be detailed below), but I can’t know whether they were critical to this final bug or totally separate.

Abstract: The state space with positive outcomes was much smaller than I expected

Because the guides that I was reading were a few months old (which is years in internet time, and centuries in Deep Learning time), I assumed that I should just use the latest version of each suggested library or driver. This assumption has served me well for dozens of previous installation processes, but it failed this time. Perhaps I should have been more suspicious because of the unusual cross-corporate nature of the situation, or maybe you just win some and lose some. I won’t get into all of the other instances where a minor deviation from the advice cause cascading issues, but it was an important reminder that “extremely similar” configurations are not always good enough.

Specific: Be careful with CUDA 9.1!

The first major issue that I identified after trying to follow this comprehensive guide is that I’d installed CUDA 9.1 instead of 9.0 I assumed that since it wasn’t a major version number, it would be fully backwards compatible. To its credit the official documentation mentions the correct version number, but some of the commands it suggests default to the more recent version of various libraries, which have presumably changed since it was published. This short video  does a good job of outlining the small changes you need to make for it to work.

Note that you can get away with 9.1 if you build TensorFlow from source. But that sounded like opening up a shipping container of boxes of cans of worms, so I didn’t go down that route.

General: This stuff is still bleeding edge

I’ve always had a romantic notion of what it would have been like to work with steam engines during the Victorian age, or airplanes when they were new. New records being set every day! Limitless opportunity! …And frustrating setbacks caused by obscure parts!

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

The Wright Brothers, for example, had attempted a flight before the one which went down in history. It took two whole days to repair the ‘minor’ damage that the machine suffered, so that they could make their successful attempt. Their inspiration, a world famous glider pilot named Lilienthal, had (over the course of his 5 years in the spotlight) spent just 5 hours in the air. About half a workday actually doing the thing he was world famous for, the rest of the time handling logistics.

Good user experience fades into the background, and it’s easy to forget how hard and complex things are. When you’re at the bleeding edge, there’s nobody in change of making your experience pleasant, or even guaranteeing that what you want to do is even possible. When you’re lucky enough to find a guide, it usually assumes that you have considerable experience, which will let you fill in the gaps. For example, when was the last time someone digressed from their Stack Overflow answer to clarify “sudo means that you have to type your admin password”? That’s just a common denominator on that website, as are hundreds of other little bits of knowledge. Somehow our computing culture has come together on some de facto curriculum that lets most people understand each other, most of the time. But on the bleeding edge, when you’re talking about graphics drivers and rapidly updating libraries, those gaps can become impossible to bridge.

Specific: These commands are your friend

sudo dpkg/apt-get --purge <package> # Completely remove an installed system package, including drivers
apt list --installed | grep <package> # Search through installed packages (make sure they're all the right version!)
sudo dpkg -l | grep "cuda" # Search through installed packages (make sure they're all the right version!)
lshw -C video # See whether the GPU is visible to the machine
lsmod | grep nvidia # See list of relevant drivers (Make sure none are of the wrong version)
cat /proc/driver/nvidia/version # See Driver information
/usr/lib/nvidia-384/bin/nvidia-smi # See GPU details

The hardest part of the project was not doing things, but UNdoing them. Followed closely by knowing whether I had to undo them in the first place.

General: Learn to quickly Create, Read, Update and Delete in the system you’re debugging

Because I was largely operating in a space that I’m unfamiliar with, I didn’t know how to verify that I was on track until the end of the installation process. That would not have been as bad if the errors I got there had been more specific, but I was left with a diagnosis that boiled down to “One (or more!) of the 10 steps that you took is interacting with one (or more!) of your unknowable number of system configurations incorrectly” That, combined with my lack of fluency with the basic CRUD operations around drivers, made debugging by elimination extremely slow.

Working through it with a friend who both had this background, plus a running system to validate against, was critical for getting mine set up. THANK YOU STAN!!!

Specific: CUDNN doesn’t usually seem to be the root of issues, and CUDA versions often are

The download page is mercifully specific about which CUDA version each CUDNN option requires. I didn’t have to re-install it after moving a bunch of other things around – it really is just a few files, which you can see with the ls /usr/local/cuda-9.0/lib64/libcudnn* command.

Make sure that you’ve got the right CUDA version (denoted by the 3 digit number) on your PATH (and its parent directory on your LD_LIBRARY_PATH) /usr/lib/nvidia-384/bin

That’s what I’ve got for you. If you’re trying to get TensorFlow set up, I wish you the best of luck – it’s definitely possible (as long as you actually have a GPU!), and actually doesn’t take too long if you’re lucky enough to find a guide that aligns with your needs perfectly  If you run into issues, I definitely recommend finding someone who’s been through it before recently. In this and so many things, there’s a lot to be said for good friends! (I’ll take this opportunity to thank Stan again – I couldn’t have done it without the 150+ chat messages that we shared while debugging everything.)

 

[출처] http://www.matthewritter.net/tensorflow-nvidia-gpu-ubuntu/

 

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
공지 오라클 기본 샘플 데이터베이스 졸리운_곰 2014.01.02 25085
공지 [SQL컨셉] 서적 "SQL컨셉"의 샘플 데이타 베이스 SAMPLE DATABASE of ORACLE 가을의 곰을... 2013.02.10 24564
공지 [G_SQL] Sample Database 가을의 곰을... 2012.05.20 25943
465 Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code. file 졸리운_곰 2018.07.02 86
464 Recurrent Neural Network for Text Calssification file 졸리운_곰 2018.07.02 317
463 char-rnn-tensorflow file 졸리운_곰 2018.07.02 43
462 TensorFlow-Char-RNN file 졸리운_곰 2018.07.02 42
461 Docker가 있는 SQL Server 2017 컨테이너 이미지를 실행 하는 빠른 시작 file 졸리운_곰 2018.06.26 131
460 MongoDB CRUD 동작의 이해 졸리운_곰 2018.06.26 54
459 Install TensorFlow with GPU Support the Easy Way on Ubuntu 18.04 (without installing CUDA) file 졸리운_곰 2018.06.25 76
458 A step by Step Guide to Install Tensorflow GPU on Ubuntu 18.04 LTS file 졸리운_곰 2018.06.25 55
457 MariaDB 10의 NoSQL 기능과 MySQL의 Json 관련 UDF 졸리운_곰 2018.06.22 112
456 [MySQL] Select 결과 Update하는 SQL 작성 file 졸리운_곰 2018.06.20 65
455 조건에 맞게 select 한 후 update 시키기 졸리운_곰 2018.06.20 24
454 MySQL (select) UPDATE file 졸리운_곰 2018.06.20 54
» Lessons from installing TensorFlow 1.7 for NVIDIA GPU on a Samsung Odyssey running Ubuntu 17.10 file 졸리운_곰 2018.06.20 157
452 [MSSQL] Management Studio 이용해 데이터베이스 생성하기 file 졸리운_곰 2018.06.17 87
451 MySQL에서 중복 값 찾기 졸리운_곰 2018.06.15 111
450 [MSSQL - GROUP BY HAVING 을 이용한 중복 데이타 체크] file 졸리운_곰 2018.06.15 72
449 mysql case문 사용하기 졸리운_곰 2018.06.14 69
448 CNTK 설치 및 테스트 file 졸리운_곰 2018.06.11 45
447 TensorFlow Lite 101 - MoblieNet 맛보기 file 졸리운_곰 2018.05.30 41
446 딥러닝(Deep learning)을 R로 구현하기 – Prediction Model file 졸리운_곰 2018.05.30 641
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED