orientation

We have a slack page for the lab for most communications (papers, conferences, etc.)

Additionally, we have a lab mailing list at khatri_lab_main_list@stanford.edu

    The Khatri Lab currently operates two servers: khatrilab-dev1 and khatrilab-db1. The first is a compute server and the second a database server that runs MySQL and PostGres (another relational database manager). Each has its own file space. Whenever a program running on dev1 needs to access MySQL, it initiates a database connection to db1 to get what it wants  In order to access the compute server, one needs to use a terminal (Terminal app on MacOS, putty on Windows). The address of the compute server is khatrilab-dev1.stanford.edu . Username and password are your SUnet ID and password, respectively.
  • Lab door should never be opened or unlocked when no one is the lab; keys can be obtained from Michele King (mking [at] stanford [dot] edu)
  • It is highly recommended that your laptop be cable-secured at all times

 

  • Box is the only sharing site accepted by the School of Medicine; use no other.
  • Box = good; e-mail = bad.
    • DON'T SEND files by e-mail

 

  • Accessing a local machine or printer behind the firewalls (e.g., a desktop machine or printer) requires Stanford VPN.

 

  • If you have a lot of data and/or you want to make sure you can find it and/or work from a single version, it's a darn good idea to put it in a MySQL database
  • Databases hosted on khatrilab-db1 are named according to a specific convention listed at XYZ
  • DBs are created by Alex Skrenchuk (alex [dot] skrenchuk [at] stanford [dot] edu), our (part-time) sys admin.
  • Don't keep databases on your personal machine. They should reside on khatrilab-db1.
  • If your project relies on a database hosted in khatrilab-db1, you should create a DB named "proj_<project name>".
    • Try to come up with a short yet understandable (to third parties) name, geez…
    • Do not store project tables in user_<your name>.
  • Work faster/better by use a query editor, e.g., SequelPro (Mac), SQL Manager for MySQL (Win) or HeidiSQL (Win).

 

It's a good idea to create a repository for ongoing projects, even when unpublished and particularly if collaborators are involved

 

  • Make sure your machine is backed-up … somehow – multiple options are available.
  • You must request from Alex that "proj_" databases be backed-up regularly

 

  • Take the required HIPAA training classes promptly. We are critically dependent on being in compliance, so take this seriously, not to mention a legal requirement. Compliance is monitored, so no escaping this.
  • Whole disk encryption: If you deal in personally-identifiable health information (PHI), or if you just want to be extra safe, apply whole disk encryption: http://itservices.stanford.edu/service/encryption/wholedisk. Note that this has been deemed untrustworthy and is rumored to change to another product as of this writing.
  • Avoid connecting via wireless, as this is fundamentally less secure than the wired network. I.e., if working with sensitive data, you may be broadcasting it... Indeed, while on wireless you are outside the , in addition to low bandwidth compared with the fired network and inability to connect directly to e.g., printers.
  • Create a project under /projects/<yourname>. This is a backed up area, contrary to /scratch…
  • You don't live in a bubble, even your own. Ask yourself: Will you be able to read your code three months from now?
  • If something is missing on cluster (e.g., a library), get Alex Skrenchuk to fix it. Installing stuff in your local environment is bad process as it doesn't fix the underlying problem and makes it hard for others to run your code. Inconsiderate coders will be forced to COBOL.
  • Use …
    • comments
    • meaningful variable names
    • subroutines
    • functions
  • Consider diagramming tools to plan and document your code, e.g., OmniGraffle (Mac) or MS Visio (Win). And no, PowerPoint doesn't cut it for decent diagrams…
  • Think failure so write-in error handling, particularly with R
    • Only way to be sure that results are accurate is by being paranoid about code executing correcting. E.g., in Perl, use e.g., use strict, warnings, etc.
  • Provide a wee bit of header documentation: what does this program do? What does it need? What does it produce? What does it depend on? Who the heck wrote it?
  • Use a code editor, or better still, an integrated development environment.
    • E.g., for R, save your neurons and use R Studio.
  • Keep your code in bitbucket; it's the best way to …
    • keep it safe from your newborn
    • revert back to previous versions
    • enable others to make modifications in a controlled manner

 

Our khatrilab-dev1 servers use SLURM for submitting computational jobs. As an example, save the following as test.sub
#!/bin/sh 

#SBATCH --mail-type=ALL

#SBATCH --mail-user=[YOUR USERNAME]@stanford.edu

#SBATCH --job-name=rscript

#SBATCH --ntasks-per-node 48 # Number of cores

#SBATCH --nodes=1 # Ensure that all cores are on one machine

#SBATCH --mem-per-cpu=5000

#SBATCH --partition=khatrilab

#SBATCH --error=job.%J.err 

#SBATCH --output=job.%J.out

#SBATCH -t 1-00:05 # Runtime in D-HH:MM

module load R/3.2.0

Rscript "$1"
and run the following from khatrilab-dev1
sbatch test.sub test.R

  • orientation.txt
  • Last modified: 2017/11/17 16:08
  • by mdonato