Home

History Key

  • New content
  • Removed content

Recent Versions

Choose two versions to compare, or click the link to view it.

  1. 50. over 1 year by therm000
  2. 49. over 1 year by therm000
  3. 48. over 1 year by piotorroja
  4. 47. about 2 years by therm000
  5. 46. about 2 years by therm000
  6. 45. about 2 years by therm000
  7. 44. about 2 years by therm000
  8. 43. about 2 years by piotorroja
  9. 42. about 2 years by piotorroja
  10. 41. about 2 years by piotorroja
  11. 40. about 2 years by therm000
  12. 39. about 2 years by therm000
  13. 38. about 2 years by piotorroja
  14. 37. about 2 years by piotorroja
  15. 36. about 2 years by piotorroja
  16. 35. about 2 years by therm000
  17. 34. about 2 years by therm000
  18. 33. about 2 years by therm000
  19. 32. about 2 years by therm000
  20. 31. about 2 years by therm000
  21. 30. about 2 years by therm000
  22. 29. about 2 years by therm000
  23. 28. about 2 years by therm000
  24. 27. about 2 years by therm000
  25. 26. about 2 years by therm000
  26. 25. about 2 years by therm000
  27. 24. about 2 years by therm000
  28. 23. about 2 years by therm000
  29. 22. about 2 years by therm000
  30. 21. about 2 years by therm000
  31. 20. about 2 years by therm000
  32. 19. about 2 years by therm000
  33. 18. about 2 years by therm000
  34. 17. about 2 years by therm000
  35. 16. about 2 years by therm000
  36. 15. about 2 years by therm000
  37. 14. about 2 years by therm000
  38. 13. about 2 years by therm000
  39. 12. about 2 years by therm000
  40. 11. about 2 years by therm000
  41. 10. about 2 years by therm000
  42. 9. about 2 years by therm000
  43. 8. about 2 years by therm000
  44. 7. about 2 years by therm000
  45. 6. about 2 years by therm000
  46. 5. over 2 years by therm000
  47. 4. over 2 years by Anonymous
  48. 3. over 2 years by Anonymous
  49. 2. over 2 years by therm000
  50. 1. over 2 years by therm000
 

Cuenca Matanza-Riachuelo (espacio de investigación): Análisis del sitio-archivo de la ACUMAR


-

Aca podran conocer herramientas para analizar el corpus de documentos públicos del espacio urbano llamado Cuenca Matanza-Riachuelo. También se pueden configurar para analizar otros corpora.

Lecturas Introductorias

Corpus Básico (ACUMAR):

Todos lo HTMLs y DOCs y PDFs chicos indexados por Google pasados a TXT y concatenados (hay algunos errores para Ñ y acentos pero no muchos):

Análisis con nubes de palabras o TagClouds

Ejemplos:

      

TagCloud para actions

    

    Publicaciones

    Bibliografía Extra

    Requerimientos minimos:

    - Ubuntu Linux (podría correr en Windows, no fue probado).

    - Python 2.6 (debería andar con Python 2.4 y 2.5 también).

    - Catdoc para usar corpus_normalization.py con PDFs (se instala con sudo apt-get install catdoc).

    - Html2text para usar corpus_normalization.py con DOCs (se instala con sudo apt-get install html2text).

    Faltantes:

    - No soporta PDFs que no contengan texto digitalizado. Es decir todavía no digitalizamos texto en imagenes con un OCR (Optical Character Recognition).