python - how to make feature vector from the lists -


i'm new python. have train data in bag of words.each line of train data article. labels of train data in file , each label equal article in train data. did stemming on train data , removed stop words. output lists of words of each article(line). want extract feature vector of , use in knn classifier in python.. don't know how it! appreciate quick answer. here's code things did:

  import nltk   nltk.corpus import stopwords   nltk import stem   stemmer=stem.porterstemmer()     open('data.txt')as file:   while 1:       line=file.readline().split()       filtered_words = [w w in line if not w in stopwords.words('english')]       documents = [stemmer.stem(line) line in filtered_words]        print(documents)         if not line:          break       pass 

take @ scikit-learn's countvectorizer or tfidfvectorizer. these can take list of documents (these lists of tokens, in example) input, , return feature matrix:

from sklearn.feature_extraction.text import countvectorizer count_vect = countvectorizer() x_train_counts = count_vect.fit_transform(your_list_of_documents) 

you can find more information in working text data tutorial.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -