# v0.1.* 目前已实现,主要为基本功能的DEMO ## Authentication - [x] Authentication 基本功能的实现 * 未登录,查询认证状态 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET http://localhost:8000/api/auth {"status_code": 200, "status": "OK", "error": "", "output": {"session_lifetime": 0, "username": null, "auth": false, "newt_sessionid": null}} ``` * 登录操作 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X POST -d "username=test3&password=test3" -D cookie.txt http://localhost:8000/api/auth {"status_code": 200, "status": "OK", "error": "", "output": {"session_lifetime": 1209600, "username": "test3", "auth": true, "newt_sessionid": "ahmqkj4f6o0r2b11zxm8we8mvn36jkv0"}} ``` * 登录后用token 查询认证状态 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET -b cookie.tx00/api/authcalhost:80 {"status_code": 200, "status": "OK", "error": "", "output": {"session_lifetime": 1209600, "username": "test3", "auth": true, "newt_sessionid": "ahmqkj4f6o0r2b11zxm8we8mvn36jkv0"}} ``` [nscc-gz_jiangli@cn16357 tests]$ curl -b cookie.txt -X DELETE http://localhost:8000/api/auth ``` [nscc-gz_jiangli@cn16357 tests]$ curl -b cookie.txt -X DELETE http://localhost:8000/api/auth {"status_code": 200, "status": "OK", "error": "", "output": {"session_lifetime": 0, "username": null, "auth": false, "newt_sessionid": null}} ``` * 注销后检查COOKIE 是否已失效 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET -b cookie.txt http://localhost:8000/api/auth {"status_code": 200, "status": "OK", "error": "", "output": {"session_lifetime": 0, "username": null, "auth": false, "newt_sessionid": null}} ``` ### 说明 在此版本中,未进行实际的用户认证操作,只是一个接口实现示例,在2.0 中需考虑两种认证模式 : 基于 api-server-user-db 和 基于 ldap ## Status Get System Status 原来的是 ping_adapter , 对应 'NAME': 'localhost', 'HOSTNAME': 'localhost' 测试 : curl -X GET http://localhost:8000/api/status/ res: {"status_code": 200, "output": [{"status": "up", "system": "localhost"}], "error": "", "status": "OK"} 修改settings: ``` 'SYSTEMS': [ {'NAME': 'localhost', 'HOSTNAME': 'localhost' }, {'NAME': 'ln3', 'HOSTNAME': 'ln3-gn0' }, {'NAME': 'ln4', 'HOSTNAME': 'ln4-gn0' }, {'NAME': 'ln6', 'HOSTNAME': 'ln6-gn0' }, {'NAME': 'lnerror', 'HOSTNAME': 'lnerror' }, ], ``` res : ``` {"output": [{"status": "up", "system": "localhost"}, {"status": "up", "system": "ln3"}, {"status": "up", "system": "ln4"}, {"status": "up", "system": "ln6"},{"status": "down", "system": "lnerror"}], "status_code": 200, "status": "OK", "error": ""} ``` 但PING 只能证明网络联通, ssh 正常连接是更好的判断方法。 newt 原装的 passthrough_adapter.py 似乎是将 STATUS 请求转发到另外一个API : ``` base_url = settings.STATUS_URL if machine_name==None: url = base_url else: url = '%s?%s=%s' % (base_url, 'system', machine_name) ``` 原来的 settings 中并没有 STATUS_URL 的设定。不过这里也相当于一个调用其他的 REST API 的方式。 很容易从ping_adapter 改为 ssh_adapter ; 这里获得的返回信息中就可以发现ln6 我目前是无法登录的。 ``` {"status": "OK", "output": [{"status": "up", "system": "localhost"}, {"status": "up", "system": "ln3"}, {"status": "up", "system": "ln4"}, {"status": "down", "system": "ln6"}, {"status": "down", "system": "lnerror"}], "status_code": 200, "error": ""} ``` ssh_adapter 目前可以认为可用,2.0 或后续 可以考虑的优化有: * 生成一个检查的脚本,由SSH 调用 * 缓存 STATUS 信息, 定期失效,故障后失效 * Status Get MOTD* **newt 并未实现,非核心功能,暂不用考虑** ## File NEWT 提供了globus_file_adapter.py localfile_adapter.py 两种实现 , 其中 globus_file_adapter 是通过拼接 globus FTP 命令来实现 文件功能的。 而 localfile_adapter 则是直接在本地创建文件。如果我们的API SERVER 连接了共享存储,那么就可以通过 localfile_adapter 来实现所需的功能。 - [x] Files Directory Listing @TH2 ``` urlpatterns = patterns('file.views', (r'^/?$', FileRootView.as_view()), (r'^/(?P[^/]+)(?P/.*)$', FileView.as_view()), (r'^(?P.+)/$', ExtraFileView.as_view()), ) ``` 由于共享存储的原因, localfile_adapter 可以正常工作: ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET http://localhost:8000/api/file/ln3/HOME/nscc-gz_jiangli/virtualenv/eHPC/tests {"status": "OK", "status_code": 200, "error": "", "output": [{"hardlinks": "2", "size": "4096", "symlink": "", "group": "nscc-gz", "perms": "drwxr-xr-x", "name": ".", "user": "nscc-gz_jiangli", "date": "2016-10-31 01:36"}, {"hardlinks": "6", "size": "4096", "symlink": "", "group": "nscc-gz", "perms": "drwxr-xr-x", "name": "..", "user": "nscc-gz_jiangli", "date": "2016-10-23 21:56"}, {"hardlinks": "1", "size": "628", "symlink": "", "group": "nscc-gz", "perms": "-rw-r--r--", "name": "cookie.txt", "user": "nscc-gz_jiangli", "date": "2016-10-31 01:18"}, {"hardlinks": "1", "size": "127141", "symlink": "", "group": "nscc-gz", "perms": "-rw-r--r--", "name": "log.test", "user": "nscc-gz_jiangli", "date": "2016-10-23 22:05"}, {"hardlinks": "1", "size": "1553", "symlink": "", "group": "nscc-gz", "perms": "-rwxr-xr-x", "name": "test1.sh", "user": "nscc-gz_jiangli", "date": "2016-10-20 22:52"}, {"hardlinks": "1", "size": "519", "symlink": "", "group": "nscc-gz", "perms": "-rwxr-xr-x", "name": "test_auth.sh", "user": "nscc-gz_jiangli", "date": "2016-10-31 01:36"}]}[nscc-gz_jiangli@cn16357 tests]$ ``` 不过由于ls 命令参数不同的原因,显示的结果有所不同(多了个 symlink): ``` Files Directory Listing Method GET [newt_base_url]/file/[machine]/[path]/ Output JSON Array [{"perms": string, "hardlinks": string, "user": string, "group": string, "size": string, "date": string, "name": string}, ... ] Semantics Get directory listing if [path] is a directory; get listing for individual file if [path] is a file. ``` 目前 localfile_adapter 是从 / 开始索引 ; 更好的处理办法是从用户的家目录开始索引? 暂时不处理,后续再议。 - [ ] **== 派生TASK , 如何设置在不同的 登录节点派生 worker 来处理作业队列,并做到方便管理 ?** 0.2.0 中需要根据用户的身份进行一些权限的判断控制 - [x] Files Download a File @TH2 ** curl -X GET "http://localhost:8000/api/file/ln3/HOME/nscc-gz_jiangli/virtualenv/eHPC/README.md?download=true"** 实际实现的控制参数和文档有所区别, 文档采用“?view=read”实际判断为“ if request.GET.get("download", False):” *0.3.0* 目前直接可用,直接使用绝对路径放在URL中进行访问可能不怎么好,后期建议通过hash 值进行优化。 - [x] Files Create/Update a File @TH2 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -T ../README.md "http://localhost:8000/api/file/ln3/HOME/nscc-gz_jiangli/virtualenv/eHPC/tests" {"status": "OK", "status_code": 200, "error": "", "output": {"location": "/HOME/nscc-gz_jiangli/virtualenv/eHPC/tests"}}[nscc-gz_jiangli@cn16357 tests]$ ls cookie.txt log.test newt_eqpgql6m test1.sh test_auth.sh [nscc-gz_jiangli@cn16357 tests]$ less newt_eqpgql6m ``` 可以上传,自动进行了重命名? 其实是参数的错误 。 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -T ../README.md "http://localhost:8000/api/file/ln3/HOME/nscc-gz_jiangli/virtualenv/eHPC/tests/t.md" {"status": "OK", "status_code": 200, "error": "", "output": {"location": "/HOME/nscc-gz_jiangli/virtualenv/eHPC/tests/t.md"}}[nscc-gz_jiangli@cn16357 tests]$ [nscc-gz_jiangli@cn16357 tests]$ [nscc-gz_jiangli@cn16357 tests]$ ls cookie.txt log.test newt_eqpgql6m test1.sh test_auth.sh t.md [nscc-gz_jiangli@cn16357 tests]$ ``` 对应的一些错误需要处理 ## JOB - [x] Jobs Run a command @TH2-SLURM 原来的方式是,拼接本地命令或globus命令: ``` Jobs Run a command Method POST [newt_base_url]/command/[machine] Input executable="/path/to/exec options" # full command that needs to be run Optional parameters: loginenv=true # setting to true initializes a full bash login env before running Output JSON { "status": ["OK" | "ERROR"], "output": string, "error": string } Semantics Run a command (synchronously) on the compute system and get back output. Optionally setting "loginenv=true" will initialize a full bash login env before running (this could result in a small additional delay). Also note that "status" can only capture higher level failures, and not failures within the command. Make sure you check "output" and "error" for the results. ``` 提交并执行命令 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X POST -d "command=hostname" -b cookie.txt http://localhost:8000/api/command/localhost {"status": "OK", "status_code": 200, "error": "", "output": {"retcode": 0, "error": "", "output": "cn16357\n"}}[nscc-gz_jiangli@cn16357 tests]$ ``` 原来的实现中有: exec_adapter.py , sudo_adapter.py , globus_command_adapter.py 三种,分别如: ``` #exec_adapter.py : (output, error, retcode) = run_command(command) #sudo_adapter.py : user = request.POST.get('sudo_user') logger.debug("Running command: %s as %s" % (command, user)) command = "sudo -u %s %s " % (user, command) (output, error, retcode) = run_command(command) #globus_command_adapter (output, error, retcode) = run_command(gridutil.GLOBUS_CONF['LOCATION'] + "bin/globus-job-run %s %s" % (machine['hostname'], command), env=env) ``` 两种模式 ?: 1. ssh ln1-gn0 "cd workdir && yhbatch job.sh " 2. distribute celery work queue v0.1.0 先实现了第一种接口: 类似于 sudo_adapter , 实现 ssh machine command 的实现模式 ``` [nscc-gz_jiangli@cn16357 adapters]$ curl -X POST -d "command=hostname" http://localhost:8000/api/command/ln4 {"status_code": 200, "status": "OK", "error": "", "output": {"retcode": 0, "error": "", "output": "ln4\n"}} ``` - [x] Jobs View Queue @TH2-SLURM - [x] - [ ] Jobs Submit Job to Queue @TH2-SLURM ~~- [ ] Jobs View Job in Queue @TH2-SLURM~~ ~~- [ ] Jobs View Job Account Information (including recently completed jobs) @TH2-SLURM~~ - [ ] Jobs View Job - [ ] Jobs Delete Job from Queue @TH2-SLURM 提交一个任务并查询 ``` [nscc-gz_jiangli@cn16357 ~]$ curl -X POST -d "jobscript=date" -b cookie.txt http://localhost:8000/api/job/localhost/ {"status": "OK", "status_code": 200, "error": "", "output": {"jobid": "78997"}} [nscc-gz_jiangli@cn16357 eHPC]$ curl -X GET http://localhost:8000/api/job/localhost/78997/ {"status": "OK", "status_code": 200, "error": "", "output": {"status": "0", "jobid": "78997", "time_used": "0:00:00", "time_start": "2016-11-01 02:00:03+00:00", "output": "Mon Oct 31 21:00:03 CDT 2016", "time_end": "2016-11-01 02:00:03+00:00", "user": ""}} ``` JOB 模块中提供了两种 adapter : * globus_job_adapter.py * unix_adapter.py 其中 unix_adapter 是将系统进程作为作业来进行处理,主要需要参考的是 globus 模块。 通过类似 ssh host-ln slurm-command 的模式,可以正常工作 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET -b cookie.txt http://localhost:8000/api/job/ln3/ {"output": [{"jobid": "3340804", "time": "13:49", "state": "R", "job_name": "bash", "nodelist": "cn13809", "user": "nscc-gz_jiangli", "partition": "work", "nodes": "1"}], "status_code": 200, "error": "", "status": "OK"} ``` ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X POST -b cookie.txt -d "jobfile=/HOME/nscc-gz_jiangli/tmp/test2.sh" http://localhost:8000/api/job/ln3/ {"status": "OK", "status_code": 200, "output": {"jobid": "3342455"}, "error": ""} ``` ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X POST -b cookie.txt -d "jobscript=#/bin/bash hostname" http://localhost:8000/api/job/ln3/ {"status": "ERROR", "status_code": 500, "output": "", "error": "qsub failed with error: sbatch: error: Unable to open file /tmp/newt_a8jvlfar\n"} ``` 目前提交作业的 jobfile 模式可正常工作,但jobscript 的临时文件创建暂时有些问题。 - Jobs View Job in Queue @TH2-SLURM - Jobs View Job Account Information (including recently completed jobs) @TH2-SLURM 这两个接口在NEWT 的实际实现中融合为了一个 JobDeetailView get_info 即 Jobs View Job 。这里实现为 sacct 的封装, sacct 会显示一个最近的有提交的总表,sacct -j jobid 则只显示一行 ; sacct --long 则显示很多的信息(不过其中很多并无用处) yhcontrol show job jobid 则会显示更多信息 ; 但这两个都无法显示更古老的作业信息。 0.1.0 阶段,考虑到简便性,只考虑 sacct -j jobid 这一种处理方式 ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X GET -b cookie.txt -d "jobfile=/HOME/nscc-gz_jiangli/tmp/test2.sh" http://localhost:8000/api/job/ln3/3345300/ {"status": "OK", "status_code": 200, "error": "", "output": [{"partition": "work", "exitcode": "1:0", "account": "nscc-gz", "alloccpus": "24", "state": "FAILED", "jobname": "test2.sh", "jobid": "3345300"}]} ``` cancel job : ``` [nscc-gz_jiangli@cn16357 tests]$ curl -X DELETE -b cookie.txt http://localhost:8000/api/job/ln3/3358292/ {"output": "", "status_code": 200, "status": "OK", "error": ""} ``` 问题1: 如果设置为 LOCAL-DB-USERS 作业信息如何管理 ? 问题2: 不同machine 的JOBID 不一致 问题3: 文件的存储方式