深度学习环境
备注
本文学习实践以 动手学深度学习 v2 - 从零开始介绍深度学习算法和代码实现 为参考
我的运行环境
部署 FreeBSD Linux Jail (必须同时配置 FreeBSD VNET Jail VNET网络堆栈,否则 Jupyter - 数据科学开发平台 运行会因为Linux兼容层缺乏 socket 支持而失败) ,启动名为
d2l
的Jail容器为方便工作,完成 Linux Jail初始化 ,通过ssh登录容器以后
chroot
进入 Debian 运行环境
可能需要的依赖编译环境,可以安装 build-essential
软件包精简安装( Debian精简系统初始化 ):
sudo apt install build-essential cmake vim-nox python3-dev -y
动手学深度学习 v2 - 从零开始介绍深度学习算法和代码实现 教程中李沐安装的是python 3.8,我这里替换为 python-dev (包含了 python)
安装Coda
安装Conda ( 也可以 安装Anaconda )
Anaconda Download 提供了Anaconda和Miniconda安裝下載(腳本)
# x86_64版本
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
# ARM 64位版本
# curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -o Miniconda3-latest-Linux-aarch64.sh
# 修订运行权限并安装
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
在同意了license之后,按照提示进行安装
Do you accept the license terms? [yes|no]
>>> yes
Miniconda3 will now be installed into this location:
/home/admin/miniconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
[/home/admin/miniconda3] >>> /home/admin/conda
...
Downloading and Extracting Packages:
Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
run the following command when conda is activated:
conda config --set auto_activate_base false
You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
no change /home/admin/conda/condabin/conda
no change /home/admin/conda/bin/conda
no change /home/admin/conda/bin/conda-env
no change /home/admin/conda/bin/activate
no change /home/admin/conda/bin/deactivate
no change /home/admin/conda/etc/profile.d/conda.sh
no change /home/admin/conda/etc/fish/conf.d/conda.fish
no change /home/admin/conda/shell/condabin/Conda.psm1
no change /home/admin/conda/shell/condabin/conda-hook.ps1
no change /home/admin/conda/lib/python3.12/site-packages/xontrib/conda.xsh
no change /home/admin/conda/etc/profile.d/conda.csh
modified /home/admin/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
Thank you for installing Miniconda3!
安装 d2l
环境
使用
conda
来安装和激活d2l-zh
环境:
d2l-zh
环境# -n 参数是指定环境名
conda env remove -n d2l-zh
# 指定python 3.11来创建环境
# 经过验证,juypter和d2l都指定依赖了低版本numpy,无法在python 3.12 上安装
conda create -y -n d2l-zh python=3.11 pip
# 激活环境
conda activate d2l-zh
创建/销毁 环境的时候需要使用 -n
参数来指定名字
以上命令创建环境时提示信息:
d2l-zh
环境时提示信息Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/admin/conda/envs/d2l-zh
added / updated specs:
- pip
- python=3.11
The following packages will be downloaded:
package | build
---------------------------|-----------------
pip-24.2 | py311h06a4308_0 2.8 MB
python-3.11.11 | he870216_0 32.9 MB
setuptools-75.1.0 | py311h06a4308_0 2.2 MB
wheel-0.44.0 | py311h06a4308_0 145 KB
------------------------------------------------------------
Total: 38.1 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
ca-certificates pkgs/main/linux-64::ca-certificates-2024.11.26-h06a4308_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl pkgs/main/linux-64::openssl-3.0.15-h5eee18b_0
pip pkgs/main/linux-64::pip-24.2-py311h06a4308_0
python pkgs/main/linux-64::python-3.11.11-he870216_0
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-75.1.0-py311h06a4308_0
sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0
tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0
tzdata pkgs/main/noarch::tzdata-2024b-h04d1e81_0
wheel pkgs/main/linux-64::wheel-0.44.0-py311h06a4308_0
xz pkgs/main/linux-64::xz-5.4.6-h5eee18b_1
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate d2l-zh
#
# To deactivate an active environment, use
#
# $ conda deactivate
在激活(activate)
d2l-zh
环境之后,可以看到 Python virtualenv 的提示符从(base)
变成了(d2l-zh)
,此时检查python
和pip
就会看到都在~/conda/envs/d2l-zh/bin/
目录下(conda
会在自己的envs
目录下构建不同的Python运行环境,也是管理Python环境的好样板):
d2l-zh
工作环境(d2l-zh) admin@d2l:~ $ which pip
/home/admin/conda/envs/d2l-zh/bin/pip
(d2l-zh) admin@d2l:~ $ which python
/home/admin/conda/envs/d2l-zh/bin/python
安装需要的软件包:
d2l-zh
Python virtualenv 环境中继续安装必要软件包# jupyter 依赖的早期版本numpy无法在Python 3.12上安装
pip install jupyter d2l torch torchvision
# 我验证安装 jupyterlab 是可行的,但是 d2l 依赖 numpy==1.23.5 也需要修订就比较麻烦了
# 所以最终还是回退了python 版本到 3.11
# pip install jupyterlab d2l torch torchvision
在安装软件包的时候出现如下报错
d2l-zh
环境安装软件包报错...
Collecting jupyter
Using cached jupyter-1.0.0-py2.py3-none-any.whl.metadata (995 bytes)
Collecting numpy==1.23.5 (from d2l)
Using cached numpy-1.23.5.tar.gz (10.7 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [33 lines of output]
Traceback (most recent call last):
File "/home/admin/conda/envs/d2l-zh/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
...
File "/home/admin/conda/envs/d2l-zh/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 999, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
import setuptools.version
File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
import pkg_resources
File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
register_finder(pkgutil.ImpImporter, find_on_path)
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
参考 AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? ,原因是移除了一个长期不使用的 pkgutil.ImpImporter
类, pip
命令可能不能和 Python 3.12
以期工作。解决的方法是手工在 Python 3.12
中安装pip:
python -m ensurepip --upgrade
python -m pip install --upgrade setuptools
cd conda/bin
ln -s pip3 pip
不过,上述更新pip没有解决问题,原因还是安装 numpy
的问题: NumPy does not install on Python 3.12.0b1 #23808 说明需要升级到 NumPy 1.26.0
才解决问题。所以实际上需要手工指定安装 NumPy
版本。原先安装失败是因为 jupyter
已经是比较古早的notebook了,依赖了比较早期的 numpy
版本。
我的解决方法是使用 jupyterlab
(下一代jupyter)来替代 jupyter
,这样就会自动安装搞版本 numpy` ,也就解决了 Python 3.12
上运行的困境。
下载代码和执行
一切就绪,现在下载代码和执行:
d2l-zh
下载代码和执行wget https://zh-v2.d2l.ai/d2l-zh.zip
unzip d2l-zh.zip
jupyter notebook
如果一切正常(我在 FreeBSD Jail 环境中折腾了很久,终于通过 FreeBSD Linux Jail + FreeBSD VNET Jail 运行成功),就可以看到 jupyter
启动后监听在 8888
端口,就可以通过浏览器访问 http://127.0.0.1:8888
备注
由于我是在 Jail 环境中运行 Jupyter Notebook,所以需要设置 Jupyter远程访问
异常排查
我在学习 B站: 动手学深度学习 v2 > 03 安装 ,参考李沐课程,加载某个 .ipynb
文件时页面会提示错误: Error Starting Kernel! NetworkError when attempting to fetch resource.
检查后端 jupyter
终端输出信息可以看到出现异常时:
.jpynb
后台异常[W 2025-01-07 18:01:06.820 ServerApp] Notebook pytorch/chapter_linear-networks/linear-regression-scratch.ipynb is not trusted
[W 2025-01-07 18:01:08.406 ServerApp] Notebook pytorch/chapter_linear-networks/linear-regression-scratch.ipynb is not trusted
Operation not permitted (src/thread.cpp:315)
Aborted
(d2l-zh) admin@d2l:~/docs $ Operation not permitted (src/thread.cpp:315)
此时在系统dmesg日志中会添加一条:
jupyter
进程被杀死的系统日志pid 35897 (python3.11), jid 8, uid 1000: exited on signal 6 (no core dump - too large)
pid 35918 (python3.11), jid 8, uid 1000: exited on signal 6 (no core dump - too large)
在google的AI提示: 如果在jail内部创建一个线程遇到这个错误,可能需要调整jail配置允许必要的系统调用。不过,AI的建议是很泛泛的,只是针对这个现象而不是具体的报错。
AI举了一个例子,例如允许 clone() 系统调用,则配置 allow.sysvipc = 1;
(这个配置在Jail中运行 PostgreSQL 需要,不过现在应该使用分离的3个配置(我这里记录备用):
在Jail中运行 PostgreSQL 需要配置sysvshm = new; sysvsem = new; sysvmsg = new;