深度学习环境

我的运行环境

可能需要的依赖编译环境,可以安装 build-essential 软件包精简安装( Debian精简系统初始化 ):

安装vim以及服务器开发所需软件集
sudo apt install build-essential cmake vim-nox python3-dev -y

动手学深度学习 v2 - 从零开始介绍深度学习算法和代码实现 教程中李沐安装的是python 3.8,我这里替换为 python-dev (包含了 python)

安装Coda

安裝miniconda
# x86_64版本 
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
# ARM 64位版本
# curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -o Miniconda3-latest-Linux-aarch64.sh

# 修订运行权限并安装
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
  • 在同意了license之后,按照提示进行安装

交互方式安装
Do you accept the license terms? [yes|no]
>>> yes

Miniconda3 will now be installed into this location:
/home/admin/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/admin/miniconda3] >>> /home/admin/conda
...
Downloading and Extracting Packages:

Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
   run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
no change     /home/admin/conda/condabin/conda
no change     /home/admin/conda/bin/conda
no change     /home/admin/conda/bin/conda-env
no change     /home/admin/conda/bin/activate
no change     /home/admin/conda/bin/deactivate
no change     /home/admin/conda/etc/profile.d/conda.sh
no change     /home/admin/conda/etc/fish/conf.d/conda.fish
no change     /home/admin/conda/shell/condabin/Conda.psm1
no change     /home/admin/conda/shell/condabin/conda-hook.ps1
no change     /home/admin/conda/lib/python3.12/site-packages/xontrib/conda.xsh
no change     /home/admin/conda/etc/profile.d/conda.csh
modified      /home/admin/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

Thank you for installing Miniconda3!

安装 d2l 环境

  • 使用 conda 来安装和激活 d2l-zh 环境:

创建 d2l-zh 环境
# -n 参数是指定环境名
conda env remove -n d2l-zh
# 指定python 3.11来创建环境
# 经过验证,juypter和d2l都指定依赖了低版本numpy,无法在python 3.12 上安装
conda create -y -n d2l-zh python=3.11 pip
# 激活环境
conda activate d2l-zh

创建/销毁 环境的时候需要使用 -n 参数来指定名字

以上命令创建环境时提示信息:

创建 d2l-zh 环境时提示信息
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/admin/conda/envs/d2l-zh

  added / updated specs:
    - pip
    - python=3.11


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-24.2                   |  py311h06a4308_0         2.8 MB
    python-3.11.11             |       he870216_0        32.9 MB
    setuptools-75.1.0          |  py311h06a4308_0         2.2 MB
    wheel-0.44.0               |  py311h06a4308_0         145 KB
    ------------------------------------------------------------
                                           Total:        38.1 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2024.11.26-h06a4308_0 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 
  libuuid            pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 
  ncurses            pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 
  openssl            pkgs/main/linux-64::openssl-3.0.15-h5eee18b_0 
  pip                pkgs/main/linux-64::pip-24.2-py311h06a4308_0 
  python             pkgs/main/linux-64::python-3.11.11-he870216_0 
  readline           pkgs/main/linux-64::readline-8.2-h5eee18b_0 
  setuptools         pkgs/main/linux-64::setuptools-75.1.0-py311h06a4308_0 
  sqlite             pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 
  tk                 pkgs/main/linux-64::tk-8.6.14-h39e8969_0 
  tzdata             pkgs/main/noarch::tzdata-2024b-h04d1e81_0 
  wheel              pkgs/main/linux-64::wheel-0.44.0-py311h06a4308_0 
  xz                 pkgs/main/linux-64::xz-5.4.6-h5eee18b_1 
  zlib               pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 



Downloading and Extracting Packages:
                                                                                                                                
Preparing transaction: done                                                                                                     
Verifying transaction: done                                                                                                     
Executing transaction: done                                                                                                     
#
# To activate this environment, use
#
#     $ conda activate d2l-zh
#
# To deactivate an active environment, use
#
#     $ conda deactivate
  • 在激活(activate) d2l-zh 环境之后,可以看到 Python virtualenv 的提示符从 (base) 变成了 (d2l-zh) ,此时检查 pythonpip 就会看到都在 ~/conda/envs/d2l-zh/bin/ 目录下( conda 会在自己的 envs 目录下构建不同的Python运行环境,也是管理Python环境的好样板):

检查 d2l-zh 工作环境
(d2l-zh) admin@d2l:~ $ which pip
/home/admin/conda/envs/d2l-zh/bin/pip
(d2l-zh) admin@d2l:~ $ which python
/home/admin/conda/envs/d2l-zh/bin/python
  • 安装需要的软件包:

d2l-zh Python virtualenv 环境中继续安装必要软件包
# jupyter 依赖的早期版本numpy无法在Python 3.12上安装
 pip install jupyter d2l torch torchvision

# 我验证安装 jupyterlab 是可行的,但是 d2l 依赖 numpy==1.23.5 也需要修订就比较麻烦了
# 所以最终还是回退了python 版本到 3.11
# pip install jupyterlab d2l torch torchvision

在安装软件包的时候出现如下报错

d2l-zh 环境安装软件包报错
...
Collecting jupyter
  Using cached jupyter-1.0.0-py2.py3-none-any.whl.metadata (995 bytes)
Collecting numpy==1.23.5 (from d2l)
  Using cached numpy-1.23.5.tar.gz (10.7 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      Traceback (most recent call last):
        File "/home/admin/conda/envs/d2l-zh/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
...
        File "/home/admin/conda/envs/d2l-zh/lib/python3.12/importlib/__init__.py", line 90, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
        File "<frozen importlib._bootstrap_external>", line 999, in exec_module
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/tmp/pip-build-env-wnxch9oh/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

参考 AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? ,原因是移除了一个长期不使用的 pkgutil.ImpImporter 类, pip 命令可能不能和 Python 3.12 以期工作。解决的方法是手工在 Python 3.12 中安装pip:

手工安装pip
python -m ensurepip --upgrade
python -m pip install --upgrade setuptools
cd conda/bin
ln -s pip3 pip

不过,上述更新pip没有解决问题,原因还是安装 numpy 的问题: NumPy does not install on Python 3.12.0b1 #23808 说明需要升级到 NumPy 1.26.0 才解决问题。所以实际上需要手工指定安装 NumPy 版本。原先安装失败是因为 jupyter 已经是比较古早的notebook了,依赖了比较早期的 numpy 版本。

我的解决方法是使用 jupyterlab (下一代jupyter)来替代 jupyter ,这样就会自动安装搞版本 numpy` ,也就解决了 Python 3.12 上运行的困境。

下载代码和执行

一切就绪,现在下载代码和执行:

d2l-zh 下载代码和执行
wget https://zh-v2.d2l.ai/d2l-zh.zip
unzip d2l-zh.zip
jupyter notebook

如果一切正常(我在 FreeBSD Jail 环境中折腾了很久,终于通过 FreeBSD Linux Jail + FreeBSD VNET Jail 运行成功),就可以看到 jupyter 启动后监听在 8888 端口,就可以通过浏览器访问 http://127.0.0.1:8888

备注

由于我是在 Jail 环境中运行 Jupyter Notebook,所以需要设置 Jupyter远程访问

异常排查

我在学习 B站: 动手学深度学习 v2 > 03 安装 ,参考李沐课程,加载某个 .ipynb 文件时页面会提示错误: Error Starting Kernel! NetworkError when attempting to fetch resource.

检查后端 jupyter 终端输出信息可以看到出现异常时:

加载 .jpynb 后台异常
[W 2025-01-07 18:01:06.820 ServerApp] Notebook pytorch/chapter_linear-networks/linear-regression-scratch.ipynb is not trusted
[W 2025-01-07 18:01:08.406 ServerApp] Notebook pytorch/chapter_linear-networks/linear-regression-scratch.ipynb is not trusted
Operation not permitted (src/thread.cpp:315)
Aborted
(d2l-zh) admin@d2l:~/docs $ Operation not permitted (src/thread.cpp:315)

此时在系统dmesg日志中会添加一条:

jupyter 进程被杀死的系统日志
pid 35897 (python3.11), jid 8, uid 1000: exited on signal 6 (no core dump - too large)
pid 35918 (python3.11), jid 8, uid 1000: exited on signal 6 (no core dump - too large)

在google的AI提示: 如果在jail内部创建一个线程遇到这个错误,可能需要调整jail配置允许必要的系统调用。不过,AI的建议是很泛泛的,只是针对这个现象而不是具体的报错。

AI举了一个例子,例如允许 clone() 系统调用,则配置 allow.sysvipc = 1; (这个配置在Jail中运行 PostgreSQL 需要,不过现在应该使用分离的3个配置(我这里记录备用):

在Jail中运行 PostgreSQL 需要配置
sysvshm = new;
sysvsem = new;
sysvmsg = new;

参考