# Convert A String Categorical Variable With Patsy

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.

Originally from: Data Origami.

### import modules

```import pandas as pd
import patsy
```

### Create dataframe

```raw_data = {'patient': [1, 1, 1, 0, 0],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'normal', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df
```
patient obs treatment score
0 1 1 0 strong
1 1 2 1 weak
2 1 3 0 normal
3 0 1 1 weak
4 0 2 0 strong

### Convert df['score'] into a categorical variable ready for regression (i.e. set one category as the baseline)

```# On the 'score' variable in the df dataframe, convert to a categorical variable, and spit out a dataframe
patsy.dmatrix('score', df, return_type='dataframe')
```
Intercept score[T.strong] score[T.weak]
0 1 1 0
1 1 0 1
2 1 0 0
3 1 0 1
4 1 1 0

### Convert df['score'] into a categorical variable without setting one category as baseline

This is likely what you will want to do

```# On the 'score' variable in the df dataframe, convert to a categorical variable, and spit out a dataframe
patsy.dmatrix('score - 1', df, return_type='dataframe')
```
score[normal] score[strong] score[weak]
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 1
4 0 1 0

### Create a variable that is "1" if the variables of patient and treatment are both 1

```patsy.dmatrix('patient + treatment + patient:treatment-1', df, return_type='dataframe')
```
patient treatment patient:treatment
0 1 0 0
1 1 1 1
2 1 0 0
3 0 1 0
4 0 0 0