Quick Hacks
This is a collection of quick hacks that I find/come up with when working on a multitude of unrelated tasks. Intended for quick lookup. Search this page by keyword to find relevant notes. For mode detail on any of the Quick Hacks published here, please visit my blog on Medium, where I explain the meaning behind the commands and the code.
Q: How to calculate the number of molecules in an .sdf
file from the command line?
A: fgrep -c '$$$$' <sdfile>
or fgrep -c "M END" <sdfile>
Tags: linux, sdf
Q: How to calculate the number of molecules in a .mol2
file from the command line?
A: fgrep -c "@<TRIPOS>ATOM" <mol2file>
Tags: linux, mol2
Q: How to unite (concatenate) several .sdf
files into a single file using only the command line?
A: cat *.sdf > new_file.sdf
Tags: linux, sdf
Q: How to quickly convert SDF to SMILES using Babel
?
A: babel -isdf YOURFILE.sdf -osmi YOURNEWFILE.smi
You can provide the paths without parentheses.
Tags: babel
Q: How to quickly convert SDF to SMILES using molconvert
?
A: molconvert smiles input.sdf -o output.smiles
Tags: molconvert
Q: How to quickly learn how many molecules does a .smi
file contain?
Usually, the
.smi
files contain one SMILES string per line, so you can just count the number of lines in the file:
A: wc -l input.smi
Tags: linux
Q: How to read molecules from SMILES and render them in Jupyter Notebook?
A:
from rdkit import Chem
m = Chem.MolFromSmiles("c1ccccc1OC")
Tags: rdkit, jupyter
Q: How to make a SMILES string out of an RDKit molecule object?
A:
smiles_string = Chem.MolToSmiles(m, isomericSmiles = True)
Tags: rdkit, jupyter
Q: How to make molecules draw inline in Jupyer Notebook?
A:
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
IPythonConsole.ipython_useSVG=True
After running this, molecules will draw inline.
Tags: rdkit, jupyter
Q: How to list all files in a directory in such a way to see the creation date?
A: ls -ltr
Tags: linux
Q: How to read an .sdf
file in RDKit?
A:
iterator = Chem.SDMolSupplier("your_sdf_file.sdf")
mols = [m for m in iterator if m is not None]
Tags: rdkit, jupyter
Q: How to color only specific chains in a specific color using ChimeraX?
A: You can either use color /<CHAIN LETTER> color name
or color :<RESIDUE_NAME> color name
. Please consult the list of all available colors here.
Tags: chimerax, graphics
Q: How to download and open a PDB structure from ChimeraX?
A: open pdb:4tv9
.
Tags: chimerax, graphics, pdb
Q: How to download and open a PDB structure from PyMOL?
A: fetch 4tv8
.
Tags: pymol, graphics, pdb
Q: How to color a single ligand in ChimeraX in nice-looking colors by element?
A: color :3GT byelement
, where 3GT
is the residue code.
Tags: chimerax, graphics
Q: How to calculate 3D-conformations for a molecule in RDKit and then calculate 3D descriptors for each conformer?
A:
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Descriptors3D
m = Chem.MolFromSmiles("CCCOC")
m = Chem.AddHs(m)
AllChem.EmbedMolecule(m)
AllChem.UFFOptimizeMolecule(m)
cids = AllChem.EmbedMultipleConfs(m, numConfs=10, pruneRmsThresh=1)
for cid in cids:
AllChem.MMFFOptimizeMolecule(m, confId = cid)
# you can either do:
Descriptors3D.Asphericity(m, confId=cid)
# or:
Chem.rdMolDescriptors.CalcMORSE(m, confId = cid)
Please check the list of all available 3D RDKit descriptors and their calculation methods here.
Tags: rdkit, descriptors
Q: How to unzip a .tar.gz
archive in Linux?
A: tar -xzf yourfile.tar.gz
Tags: linux
Q: How to unzip a .gz
archive in Linux?
A: gunzip yourfile.gz
Tags: linux
Q: How to extract field values from a .sdf
file to a separate file?
A: Let’s say you have a file called hits.sdf
. Let’s say each molecule has a bunch of fields related to it, but you specifically want to extract all values of the field called “ChEMBL ID” into a separate file.
To do so, run the following command:
awk '/<ChEMBL ID>/ {getline;print}' ./hits.sdf >> all_chembl_ids.txt
What if you wanted to extract the values of not one, but several fields? E.g., named “Molecular Weight” and “LogP” in your hits.sdf
file?
Well, for this you can chain the field names like this:
awk '/<ChEMBL ID>/ || /<Molecular Weight>/ || /<LogP>/ {getline;print}' ./hits.sdf
Notice that the field names are in <...>
just because that’s how sdf
file format goes. Basically, this awk
command will just search for and return whatever pattern is enclosed within the slashes (/.../
).
Tags: linux, awk, sdf
Q:
A:
Tags: